Experiment intro:

There is an interesting experiment on chapter 8 that suggests further research of converting the collaborative filtering model to a classification model. This post will demonstrate two different ways to convert this model using cross entropy loss.

Download and preparing the data

from fastai.collab import *
from fastai.tabular.all import *
path = untar_data(URLs.ML_100k)
ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None,
                      names=['user','movie','rating','timestamp'])
movies = pd.read_csv(path/'u.item',  delimiter='|', encoding='latin-1',
                     usecols=(0,1), names=('movie','title'), header=None)
ratings = ratings.merge(movies) #adding movie names instead of ID number
100.15% [4931584/4924029 00:00<00:00]

Option 1- addjusting the loss and accuracy functions:

dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)
dls.show_batch()
user title rating
0 542 My Left Foot (1989) 4
1 422 Event Horizon (1997) 3
2 311 African Queen, The (1951) 4
3 595 Face/Off (1997) 4
4 617 Evil Dead II (1987) 1
5 158 Jurassic Park (1993) 5
6 836 Chasing Amy (1997) 3
7 474 Emma (1996) 3
8 466 Jackie Chan's First Strike (1996) 3
9 554 Scream (1996) 3
embs = get_emb_sz(dls)
embs
[(944, 74), (1665, 102)]
class CollabNN(Module):
    def __init__(self, user_sz, item_sz, n_act=30):
        self.user_factors = Embedding(*user_sz)
        self.item_factors = Embedding(*item_sz)
        self.layers = nn.Sequential(
            nn.Linear(user_sz[1]+item_sz[1], n_act),
            nn.ReLU(),
            nn.Linear(n_act, 5))

        
    def forward(self, x):
        embs = self.user_factors(x[:,0]),self.item_factors(x[:,1])
        x = self.layers(torch.cat(embs, dim=1))
        return x
model = CollabNN(*embs)

loss = CrossEntropyLossFlat()
def myLoss(prediction, target):
  return loss(prediction, target-1)  

def myAccuracy(prediction, target):
  return accuracy(prediction, target-1)

learn1 = Learner(dls, model, loss_func=myLoss, metrics=myAccuracy)
def lrfinder(learner):
  lr_min, lr_steep, lr_valley, lr_slide = learner.lr_find(suggest_funcs=(minimum, steep, valley, slide))
  print(f"Minimum/10:\t{lr_min:.2e}\nSteepest point:\t{lr_steep:.2e}\nLongest valley:\t{lr_valley:.2e}\nSlide interval:\t{lr_slide:.2e}")
  return 
lrfinder(learn1)
Minimum/10:	1.00e-02
Steepest point:	2.75e-02
Longest valley:	6.31e-03
Slide interval:	7.59e-03
learn1.fit_one_cycle(5, 1.00e-02, wd=0.01)
epoch train_loss valid_loss myAccuracy time
0 1.304140 1.300469 0.413550 00:11
1 1.262539 1.272743 0.431100 00:11
2 1.222846 1.250300 0.443000 00:11
3 1.160470 1.243430 0.453700 00:11
4 1.115513 1.253342 0.449450 00:11
learn1.recorder.plot_loss()

Why addjusting the loss and accuracy?

First trial gives the following error:

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Which is usualy because of some index mismatching so like you tried to train a network with 10 output nodes on a dataset with 15 labels. In this case Its better to restart the notebook, get a more accuracate traceback by moving to CPU.

With CPU we indeed recieve indexing mismatching problem:

IndexError: Target 5 is out of bounds.

Changing the last layer to 6th solve this issiue: https://forums.fast.ai/t/getting-runtime-error-for-cross-entropy-what-should-be-changed-and-why-it-is-coming/87726/12

My preffered way is to addjust the loss and accuracy problem, the target dataset range is 1-5 as categorized (dls.vocab will give error) the loss expecting 0-n so 5 is out of bounds.

Option 2- addjusting the dataloaders:

cat_names = ['user', 'title']
cont_names = []

procs = [Categorify, FillMissing, Normalize]

splits = RandomSplitter()(range_of(ratings))

to1 = TabularCollab(ratings, procs, cat_names, cont_names, y_names="rating", splits=splits, y_block=CategoryBlock)

dls1 = to1.dataloaders()
dls1.show_batch()
user title rating
0 109 Interview with the Vampire (1994) 3
1 640 Braveheart (1995) 4
2 523 Strawberry and Chocolate (Fresa y chocolate) (1993) 5
3 907 Clerks (1994) 4
4 933 Die Hard: With a Vengeance (1995) 1
5 496 Swingers (1996) 2
6 606 Mr. Holland's Opus (1995) 5
7 184 Amadeus (1984) 4
8 632 Empire Strikes Back, The (1980) 5
9 70 Bram Stoker's Dracula (1992) 4
embs = get_emb_sz(dls)

class CollabNN(Module):
    def __init__(self, user_sz, item_sz, n_act=30):
        self.user_factors = Embedding(*user_sz)
        self.item_factors = Embedding(*item_sz)
        self.layers = nn.Sequential(
            nn.Linear(user_sz[1]+item_sz[1], n_act),
            nn.ReLU(),
            nn.Linear(n_act, 5))

        
    def forward(self, x):
        embs = self.user_factors(x[:,0]),self.item_factors(x[:,1])
        x = self.layers(torch.cat(embs, dim=1))
        return x

model2 = CollabNN(*embs)

learn2 = Learner(dls1, model2, loss_func=CrossEntropyLossFlat(), metrics=accuracy)
# learn.fit_one_cycle(5, 5e-3, wd=0.01)
lrfinder(learn2)
Minimum/10:	4.79e-03
Steepest point:	2.29e-02
Longest valley:	1.74e-03
Slide interval:	6.31e-03
learn2.fit_one_cycle(5, 1.74e-03, wd=0.01)
epoch train_loss valid_loss accuracy time
0 1.272689 1.281745 0.427700 00:14
1 1.246751 1.252351 0.447700 00:11
2 1.208962 1.239627 0.447700 00:11
3 1.169290 1.240180 0.446000 00:11
4 1.145836 1.241863 0.446750 00:11
learn2.recorder.plot_loss()
interp = ClassificationInterpretation.from_learner(learn2)
interp.plot_confusion_matrix()

Summary

In this case, both models reach the same results, and the results show that using a CE loss function (or generally converting this to a classification problem) is not so good an idea, as around 45% accuracy is better than random choice, but not better than the MSE.