Linear regression

Credits: This post records my learning of UIUC CS446@24SP.

Linear Regression

Simple Case

Linear regression is a simple method to solve a regression problem. It has a closed form solution. Let me first illustrate how to get the closed form solution. Suppose we have a dataset {$ {(x^1,y^1),…,(x^n,y^n)} $}, where $x^i\in\mathbb{R}^d$ and $y^i\in\mathbb{R}$. We want to find a linear function $f(x)=w^Tx+b$ to fit the data.

To make the question simple, we first discuss the case that d=1, i.e., $x^i$ is a real number. In this case, we can write the linear function as $f(x)=w_1x+w_2$. Therefore, what we want to find is the optimal $w_1$ and $w_2$ such that $\underset{w_1,w_2}{\operatorname{argmin}} \frac{1}{2}\sum_{i=1}^n(y^i-w_1x^i-w_2)^2$. Translate the problem into matrix form, we have

\[\underset{w_1,w_2}{\operatorname{argmin}} \frac{1}{2} \left\| \begin{bmatrix} y^{1} \\ \vdots \\ y^{n} \end{bmatrix} - \begin{bmatrix} x^{1} & 1 \\ \vdots & \vdots \\ x^{n} & 1 \end{bmatrix} \cdot \begin{bmatrix} w_1 \\ w_2 \end{bmatrix} \right\|^2_2\]

Then, we just denote the matrix above as $X^T, Y, w$ respectively. $ X^T\in \mathbb{R}^{n\times 2}$, Y $\in \mathbb{R}^{n}$, w $\in \mathbb{R}^{2}$. The problem becomes $\underset{w}{\operatorname{argmin}} \frac{1}{2}|Y-Xw|^2_2$. To solve this problem, we just need to take the derivative of the objective function with respect to w and set it to 0. With knowledge of matrix calculus, we have $L = \frac{1}{2}(Y-X^Tw)^T(Y-X^Tw)= \frac{1}{2}(Y^TY-Y^TX^Tw-w^TXY +w^TX^TXw)\\$

\[\frac{\partial L}{\partial w} = \frac{1}{2}(0-XY-XY+2X^TXw) = 0 \\\] \[w = (X^TX)^{-1}XY\]

General Case

Let’s go bakc to general case. We also talk about higher order polynomial regression and $x^i$ is no longer a real number. Suppose we have a dataset ${(x^1,y^1),…,(x^n,y^n)}$, where $x^i\in\mathbb{R}^d$ and $y^i\in\mathbb{R}$. We want to find a polynomial function $f(x)=w_0+w_1x+w_2x^2+…+w_dx^d$ to fit the data. $\underset{w_0, w_1, \ldots, w_d}{\operatorname{argmin}} \frac{1}{2} \left\| \begin{bmatrix} y^{(1)} \\ \vdots \\ y^{(N)} \end{bmatrix} - \begin{bmatrix} (x^{(1)})^d & \cdots & x^{(1)} & 1 \\ \vdots & \ddots & \vdots & \vdots \\ (x^{(N)})^d & \cdots & x^{(N)} & 1 \end{bmatrix} \cdot \begin{bmatrix} w_d \\ \vdots \\ w_1 \\ w_0 \end{bmatrix} \right\|^2$

Then, X^T $\in \mathbb{R}^{n\times d}$, Y $\in \mathbb{R}^{n}$, w $\in \mathbb{R}^{d}$. The problem is also $\underset{w}{\operatorname{argmin}} \frac{1}{2}|Y-Xw|^2_2$. Similarly, we can take the derivative of the objective function with respect to w and set it to 0. We get the same closed form solution as above. $w = (XX^T)^{-1}XY$

Regularization

In practice, we may encounter the problem that $n<d+1$. In this case, the matrix $XX^T$ is not invertible. To solve this problem, we can add a regularization term to the objective function. The objective function becomes $\underset{w}{\operatorname{argmin}} \frac{1}{2}|Y-Xw|^2_2+\frac{\lambda}{2}|w|^2_2$. The closed form solution becomes $w = (XX^T+\lambda I)^{-1}XY$. Also, regularization can make the parameters smaller and avoid overfitting.