### User and Items rating matrix

Suppose we have a matrix $r \in \mathbb{R}^{m \times n}$ which each row corresponds to a single user (i.e. person) and each column corresponds to a single item (i.e. movie) such that $r_{ui}$ corresponds to the rating user $u$ gave rating $i$. Now this matrix will generally be constructed from all user and rating information that is available and it is quite obvious that not every user will have rated every item available. In fact when we use the MovieLens dataset we'll see that only about 6% of the matrix $r$ are valid entries - the rest are missing and are the ratings we would like to predict by using our recommender system once it is built.

The sparsity of the matrix $r$ poses an issue using the SVD approach covered previously, as SVD aims to reconstruct a matrix as is; it considers all entries in the matrix $r$ as valid in the reconstruction process. So we need to make a slight tweak to our approach to building the recommender system. We still want a latent variable representation, but we only want to use the 6% of ratings that are populated.

### Latent space representation

In order to define the latent space representations, we make the following assumptions:

- Each user can be described by a $k$ dimensional latent vector $\textbf{x}_u = \left[ x_{u1} \dots x_{uk} \right] \in \mathbb{R}^k$
- Each item can be described by a $k$ dimensional latent vector $\textbf{y}_i = \left[ y_{i1} \dots y_{ik} \right]\in \mathbb{R}^k$

Hence we can define the loss function (with some regularisation) as trying to minimise the difference between the predicted and actual ratings a every user gave each item:

$$ L = \sum_{u,i \in S}\left( r_{ui} - \textbf{x}_{u}^{T} \cdot \textbf{y}_{i}\right)^2 + \lambda_x \sum_{u} ||\textbf{x}_u||^2 + \lambda_y \sum_{i} ||\textbf{y}_i||^2$$ where $S$ is the set of valid ratings and $\lambda_x$, $\lambda_y$ are regularisation hyperparameters. We can rewrite a little more succinctly using matrix notation: $$ L = \left( \textbf{r}_u - \textbf{x}^T_uY^T \right)^2 + \lambda_x \sum_{u} ||\textbf{x}_u||^2 + \lambda_y \sum_{i} ||\textbf{y}_i||^2 $$ where $$\textbf{r}_u = \left[ r_{u_1} \dots r_{u_m} \right]$$ is the rating vector for user

We can then minimize $L$ by differentiating with respect to the components of our user vector $x_u$ and setting to $0$. Using summation notation:

$$ \frac{\partial L}{\partial x_{ua}} = \sum_{i}^{m} -2 \left(r_{ui} - \sum_{j}^k x_{uj} Y_{ij} \right)Y_{ai} + 2 \lambda_x x_{ua}$$ Hence for a minimum and in matrix form $$ 0 = -\left( \textbf{r}_u - \textbf{x}^T_uY^T \right) Y + \lambda_x \textbf{x}_{u}$$ we can re-arrange to get $$ \textbf{x}_{u}^T = \textbf{r}_u Y \left( I \lambda_x + Y^TY\right)^{-1}$$ We can perform an analogous calculation for $\textbf{y}_i$ which yields $$ \textbf{y}_i^T = r_i X \left( I \lambda_i + X^TX \right)^{-1} $$ where $I$ is the $k \times k$ identity matrix. Hence we have derived the updates for our parameters required at each iteration - the next section looks at the results.

Next time we'll have a look at the python implementation and results to see how well our recommendation system works!

###

**u**'s ratings of the*m*items and $$Y = \left[ \begin{array}{ccc} -& y^{T}_{1} & - \\ - & y^{T}_{2} & - \\ & \vdots & \\ - & y^{T}_{m} & - \end{array} \right] $$ is a composite vector of the $y_i$'s. Now this equation looks quite familiar (recall the definition of the OLS loss function...) hence if we fix one of the $\textbf{x}^T_u$ or $\textbf{y}_{i}$ then we have effectively reduced the cost function to that of an OLS problem. The procedure is simple, we fix $\textbf{y}_i$ and find the corresponding $\textbf{x}^T_u$ which minimises the loss function, this then feeds into the next iteration where we fix $\textbf{x}^T_u$ and find the $\textbf{y}_i$ which minimises the updated loss function and continue until convergence (or some other appropriate stopping criteria). This process is known as (for obvious reasons) as*Alternating Least Squares*and we'll derive the close form solution in the next section.### Alternating Least Squares

We can then minimize $L$ by differentiating with respect to the components of our user vector $x_u$ and setting to $0$. Using summation notation:

$$ \frac{\partial L}{\partial x_{ua}} = \sum_{i}^{m} -2 \left(r_{ui} - \sum_{j}^k x_{uj} Y_{ij} \right)Y_{ai} + 2 \lambda_x x_{ua}$$ Hence for a minimum and in matrix form $$ 0 = -\left( \textbf{r}_u - \textbf{x}^T_uY^T \right) Y + \lambda_x \textbf{x}_{u}$$ we can re-arrange to get $$ \textbf{x}_{u}^T = \textbf{r}_u Y \left( I \lambda_x + Y^TY\right)^{-1}$$ We can perform an analogous calculation for $\textbf{y}_i$ which yields $$ \textbf{y}_i^T = r_i X \left( I \lambda_i + X^TX \right)^{-1} $$ where $I$ is the $k \times k$ identity matrix. Hence we have derived the updates for our parameters required at each iteration - the next section looks at the results.

Next time we'll have a look at the python implementation and results to see how well our recommendation system works!