Fastai 9 - collaborative filtering as a classifier
An experiment in cross entropy loss on recsys.
- Experiment intro:
- Download and preparing the data
- Option 1- addjusting the loss and accuracy functions:
- Option 2- addjusting the dataloaders:
- Summary
There is an interesting experiment on chapter 8 that suggests further research of converting the collaborative filtering model to a classification model. This post will demonstrate two different ways to convert this model using cross entropy loss.
from fastai.collab import *
from fastai.tabular.all import *
path = untar_data(URLs.ML_100k)
ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None,
names=['user','movie','rating','timestamp'])
movies = pd.read_csv(path/'u.item', delimiter='|', encoding='latin-1',
usecols=(0,1), names=('movie','title'), header=None)
ratings = ratings.merge(movies) #adding movie names instead of ID number
dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)
dls.show_batch()
embs = get_emb_sz(dls)
embs
class CollabNN(Module):
def __init__(self, user_sz, item_sz, n_act=30):
self.user_factors = Embedding(*user_sz)
self.item_factors = Embedding(*item_sz)
self.layers = nn.Sequential(
nn.Linear(user_sz[1]+item_sz[1], n_act),
nn.ReLU(),
nn.Linear(n_act, 5))
def forward(self, x):
embs = self.user_factors(x[:,0]),self.item_factors(x[:,1])
x = self.layers(torch.cat(embs, dim=1))
return x
model = CollabNN(*embs)
loss = CrossEntropyLossFlat()
def myLoss(prediction, target):
return loss(prediction, target-1)
def myAccuracy(prediction, target):
return accuracy(prediction, target-1)
learn1 = Learner(dls, model, loss_func=myLoss, metrics=myAccuracy)
def lrfinder(learner):
lr_min, lr_steep, lr_valley, lr_slide = learner.lr_find(suggest_funcs=(minimum, steep, valley, slide))
print(f"Minimum/10:\t{lr_min:.2e}\nSteepest point:\t{lr_steep:.2e}\nLongest valley:\t{lr_valley:.2e}\nSlide interval:\t{lr_slide:.2e}")
return
lrfinder(learn1)
learn1.fit_one_cycle(5, 1.00e-02, wd=0.01)
learn1.recorder.plot_loss()
Why addjusting the loss and accuracy?
First trial gives the following error:
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Which is usualy because of some index mismatching so like you tried to train a network with 10 output nodes on a dataset with 15 labels. In this case Its better to restart the notebook, get a more accuracate traceback by moving to CPU.
With CPU we indeed recieve indexing mismatching problem:
IndexError: Target 5 is out of bounds.
Changing the last layer to 6th solve this issiue: https://forums.fast.ai/t/getting-runtime-error-for-cross-entropy-what-should-be-changed-and-why-it-is-coming/87726/12
My preffered way is to addjust the loss and accuracy problem, the target dataset range is 1-5 as categorized (dls.vocab will give error) the loss expecting 0-n so 5 is out of bounds.
cat_names = ['user', 'title']
cont_names = []
procs = [Categorify, FillMissing, Normalize]
splits = RandomSplitter()(range_of(ratings))
to1 = TabularCollab(ratings, procs, cat_names, cont_names, y_names="rating", splits=splits, y_block=CategoryBlock)
dls1 = to1.dataloaders()
dls1.show_batch()
embs = get_emb_sz(dls)
class CollabNN(Module):
def __init__(self, user_sz, item_sz, n_act=30):
self.user_factors = Embedding(*user_sz)
self.item_factors = Embedding(*item_sz)
self.layers = nn.Sequential(
nn.Linear(user_sz[1]+item_sz[1], n_act),
nn.ReLU(),
nn.Linear(n_act, 5))
def forward(self, x):
embs = self.user_factors(x[:,0]),self.item_factors(x[:,1])
x = self.layers(torch.cat(embs, dim=1))
return x
model2 = CollabNN(*embs)
learn2 = Learner(dls1, model2, loss_func=CrossEntropyLossFlat(), metrics=accuracy)
# learn.fit_one_cycle(5, 5e-3, wd=0.01)
lrfinder(learn2)
learn2.fit_one_cycle(5, 1.74e-03, wd=0.01)
learn2.recorder.plot_loss()
interp = ClassificationInterpretation.from_learner(learn2)
interp.plot_confusion_matrix()
In this case, both models reach the same results, and the results show that using a CE loss function (or generally converting this to a classification problem) is not so good an idea, as around 45% accuracy is better than random choice, but not better than the MSE.