Experiment intro:

There is an interesting experiment on chapter 8 that suggests further research of converting the collaborative filtering model to a classification model. This post will demonstrate two different ways to convert this model using cross entropy loss.

Download and preparing the data

from fastai.collab import *
from fastai.tabular.all import *
path = untar_data(URLs.ML_100k)
ratings = pd.read_csv(path/'u.data', delimiter='\t', header=None,
                      names=['user','movie','rating','timestamp'])
movies = pd.read_csv(path/'u.item',  delimiter='|', encoding='latin-1',
                     usecols=(0,1), names=('movie','title'), header=None)
ratings = ratings.merge(movies) #adding movie names instead of ID number

Option 1- addjusting the loss and accuracy functions:

dls = CollabDataLoaders.from_df(ratings, item_name='title', bs=64)
dls.show_batch()

embs = get_emb_sz(dls)
embs

[(944, 74), (1665, 102)]

class CollabNN(Module):
    def __init__(self, user_sz, item_sz, n_act=30):
        self.user_factors = Embedding(*user_sz)
        self.item_factors = Embedding(*item_sz)
        self.layers = nn.Sequential(
            nn.Linear(user_sz[1]+item_sz[1], n_act),
            nn.ReLU(),
            nn.Linear(n_act, 5))

        
    def forward(self, x):
        embs = self.user_factors(x[:,0]),self.item_factors(x[:,1])
        x = self.layers(torch.cat(embs, dim=1))
        return x

model = CollabNN(*embs)

loss = CrossEntropyLossFlat()
def myLoss(prediction, target):
  return loss(prediction, target-1)  

def myAccuracy(prediction, target):
  return accuracy(prediction, target-1)

learn1 = Learner(dls, model, loss_func=myLoss, metrics=myAccuracy)

def lrfinder(learner):
  lr_min, lr_steep, lr_valley, lr_slide = learner.lr_find(suggest_funcs=(minimum, steep, valley, slide))
  print(f"Minimum/10:\t{lr_min:.2e}\nSteepest point:\t{lr_steep:.2e}\nLongest valley:\t{lr_valley:.2e}\nSlide interval:\t{lr_slide:.2e}")
  return 
lrfinder(learn1)

Minimum/10:	1.00e-02
Steepest point:	2.75e-02
Longest valley:	6.31e-03
Slide interval:	7.59e-03

learn1.fit_one_cycle(5, 1.00e-02, wd=0.01)

learn1.recorder.plot_loss()

Why addjusting the loss and accuracy?

First trial gives the following error:

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Which is usualy because of some index mismatching so like you tried to train a network with 10 output nodes on a dataset with 15 labels. In this case Its better to restart the notebook, get a more accuracate traceback by moving to CPU.

With CPU we indeed recieve indexing mismatching problem:

IndexError: Target 5 is out of bounds.

Changing the last layer to 6th solve this issiue: https://forums.fast.ai/t/getting-runtime-error-for-cross-entropy-what-should-be-changed-and-why-it-is-coming/87726/12

My preffered way is to addjust the loss and accuracy problem, the target dataset range is 1-5 as categorized (dls.vocab will give error) the loss expecting 0-n so 5 is out of bounds.

Option 2- addjusting the dataloaders:

cat_names = ['user', 'title']
cont_names = []

procs = [Categorify, FillMissing, Normalize]

splits = RandomSplitter()(range_of(ratings))

to1 = TabularCollab(ratings, procs, cat_names, cont_names, y_names="rating", splits=splits, y_block=CategoryBlock)

dls1 = to1.dataloaders()
dls1.show_batch()

embs = get_emb_sz(dls)

class CollabNN(Module):
    def __init__(self, user_sz, item_sz, n_act=30):
        self.user_factors = Embedding(*user_sz)
        self.item_factors = Embedding(*item_sz)
        self.layers = nn.Sequential(
            nn.Linear(user_sz[1]+item_sz[1], n_act),
            nn.ReLU(),
            nn.Linear(n_act, 5))

        
    def forward(self, x):
        embs = self.user_factors(x[:,0]),self.item_factors(x[:,1])
        x = self.layers(torch.cat(embs, dim=1))
        return x

model2 = CollabNN(*embs)

learn2 = Learner(dls1, model2, loss_func=CrossEntropyLossFlat(), metrics=accuracy)
# learn.fit_one_cycle(5, 5e-3, wd=0.01)

lrfinder(learn2)

Minimum/10:	4.79e-03
Steepest point:	2.29e-02
Longest valley:	1.74e-03
Slide interval:	6.31e-03

learn2.fit_one_cycle(5, 1.74e-03, wd=0.01)

learn2.recorder.plot_loss()

interp = ClassificationInterpretation.from_learner(learn2)
interp.plot_confusion_matrix()

Summary

In this case, both models reach the same results, and the results show that using a CE loss function (or generally converting this to a classification problem) is not so good an idea, as around 45% accuracy is better than random choice, but not better than the MSE.

	user	title	rating
0	542	My Left Foot (1989)	4
1	422	Event Horizon (1997)	3
2	311	African Queen, The (1951)	4
3	595	Face/Off (1997)	4
4	617	Evil Dead II (1987)	1
5	158	Jurassic Park (1993)	5
6	836	Chasing Amy (1997)	3
7	474	Emma (1996)	3
8	466	Jackie Chan's First Strike (1996)	3
9	554	Scream (1996)	3

epoch	train_loss	valid_loss	myAccuracy	time
0	1.304140	1.300469	0.413550	00:11
1	1.262539	1.272743	0.431100	00:11
2	1.222846	1.250300	0.443000	00:11
3	1.160470	1.243430	0.453700	00:11
4	1.115513	1.253342	0.449450	00:11

	user	title	rating
0	109	Interview with the Vampire (1994)	3
1	640	Braveheart (1995)	4
2	523	Strawberry and Chocolate (Fresa y chocolate) (1993)	5
3	907	Clerks (1994)	4
4	933	Die Hard: With a Vengeance (1995)	1
5	496	Swingers (1996)	2
6	606	Mr. Holland's Opus (1995)	5
7	184	Amadeus (1984)	4
8	632	Empire Strikes Back, The (1980)	5
9	70	Bram Stoker's Dracula (1992)	4

epoch	train_loss	valid_loss	accuracy	time
0	1.272689	1.281745	0.427700	00:14
1	1.246751	1.252351	0.447700	00:11
2	1.208962	1.239627	0.447700	00:11
3	1.169290	1.240180	0.446000	00:11
4	1.145836	1.241863	0.446750	00:11