{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Orthogonal Projections\n",
"\n",
"We will write functions that will implement orthogonal projections."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Learning objectives\n",
"\n",
"1. Write code that projects data onto lower-dimensional subspaces.\n",
"2. Understand the real world applications of projections."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# PACKAGE: DO NOT EDIT\n",
"import matplotlib\n",
"matplotlib.use('Agg')\n",
"import matplotlib.pyplot as plt\n",
"plt.style.use('fivethirtyeight')\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.datasets import fetch_olivetti_faces, fetch_lfw_people\n",
"from ipywidgets import interact\n",
"%matplotlib inline\n",
"image_shape = (64, 64)\n",
"# Load faces data\n",
"dataset = fetch_olivetti_faces()\n",
"faces = dataset.data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Advice for testing numerical algorithms\n",
"\n",
"Testing machine learning algorithms (or numerical algorithms in general)\n",
"is sometimes really hard as it depends on the dataset\n",
"to produce an answer, and you will never be able to test your algorithm on all the datasets\n",
"we have in the world. Nevertheless, we have some tips for you to help you identify bugs in\n",
"your implementations.\n",
"\n",
"#### 1. Test on small dataset\n",
"Test your algorithms on small dataset: datasets of size 1 or 2 sometimes will suffice. This\n",
"is useful because you can (if necessary) compute the answers by hand and compare them with\n",
"the answers produced by the computer program you wrote. In fact, these small datasets can even have special numbers,\n",
"which will allow you to compute the answers by hand easily.\n",
"\n",
"#### 2. Find invariants\n",
"Invariants refer to properties of your algorithm and functions that are maintained regardless\n",
"of the input. We will highlight this point later in this notebook where you will see functions,\n",
"which will check invariants for some of the answers you produce.\n",
"\n",
"Invariants you may want to look for:\n",
"1. Does your algorithm always produce a positive/negative answer, or a positive definite matrix?\n",
"2. If the algorithm is iterative, do the intermediate results increase/decrease monotonically?\n",
"3. Does your solution relate with your input in some interesting way, e.g. orthogonality? \n",
"\n",
"Finding invariants is hard, and sometimes there simply isn't any invariant. However, DO take advantage of them if you can find them. They are the most powerful checks when you have them."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import numpy.testing as np_test\n",
"def test_property_projection_matrix(P):\n",
" \"\"\"Test if the projection matrix satisfies certain properties.\n",
" In particular, we should have P * P = P, and P = P^T\n",
" \"\"\"\n",
" np_test.assert_almost_equal(P, P @ P)\n",
" np_test.assert_almost_equal(P, P.T)\n",
"\n",
"def test_property_projection(x, p):\n",
" \"\"\"Test orthogonality of x and its projection p.\"\"\"\n",
" np_test.assert_almost_equal(p.T @ (p-x), 0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Orthogonal Projections"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Recall that for projection of a vector $\\boldsymbol x$ onto a 1-dimensional subspace $U$ with basis vector $\\boldsymbol b$ we have\n",
"\n",
"$${\\pi_U}(\\boldsymbol x) = \\frac{\\boldsymbol b\\boldsymbol b^T}{{\\lVert\\boldsymbol b \\rVert}^2}\\boldsymbol x $$\n",
"\n",
"And for the general projection onto an M-dimensional subspace $U$ with basis vectors $\\boldsymbol b_1,\\dotsc, \\boldsymbol b_M$ we have\n",
"\n",
"$${\\pi_U}(\\boldsymbol x) = \\boldsymbol B(\\boldsymbol B^T\\boldsymbol B)^{-1}\\boldsymbol B^T\\boldsymbol x $$\n",
"\n",
"where \n",
"\n",
"$$B = [\\boldsymbol b_1,...,\\boldsymbol b_M]$$\n",
"\n",
"\n",
"Your task is to implement orthogonal projections. We can split this into two steps\n",
"1. Find the projection matrix $\\boldsymbol P$ that projects any $\\boldsymbol x$ onto $U$.\n",
"2. The projected vector $\\pi_U(\\boldsymbol x)$ of $\\boldsymbol x$ can then be written as $\\pi_U(\\boldsymbol x) = \\boldsymbol P\\boldsymbol x$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Projection 1d\n",
"\n",
"# ===YOU SHOULD EDIT THIS FUNCTION===\n",
"def projection_matrix_1d(b):\n",
" \"\"\"Compute the projection matrix onto the space spanned by `b`\n",
" Args:\n",
" b: ndarray of dimension (D, 1), the basis for the subspace\n",
" \n",
" Returns:\n",
" P: the projection matrix\n",
" \"\"\"\n",
" D, _ = b.shape\n",
" P = np.eye(D) # <-- EDIT THIS\n",
" return P\n",
"\n",
"# ===YOU SHOULD EDIT THIS FUNCTION===\n",
"def project_1d(x, b):\n",
" \"\"\"Compute the projection matrix onto the space spanned by `b`\n",
" Args:\n",
" x: the vector to be projected\n",
" b: ndarray of dimension (D, 1), the basis for the subspace\n",
" \n",
" Returns:\n",
" y: projection of x in space spanned by b\n",
" \"\"\"\n",
" p = np.zeros(3) # <-- EDIT THIS\n",
" return p\n",
"\n",
"# Projection onto a general (higher-dimensional) subspace\n",
"# ===YOU SHOULD EDIT THIS FUNCTION===\n",
"def projection_matrix_general(B):\n",
" \"\"\"Compute the projection matrix onto the space spanned by the columns of `B`\n",
" Args:\n",
" B: ndarray of dimension (D, E), the basis for the subspace\n",
" \n",
" Returns:\n",
" P: the projection matrix\n",
" \"\"\"\n",
" P = np.eye(B.shape[0]) # <-- EDIT THIS\n",
" return P\n",
"\n",
"# ===YOU SHOULD EDIT THIS FUNCTION===\n",
"def project_general(x, B):\n",
" \"\"\"Compute the projection matrix onto the space spanned by the columns of `B`\n",
" Args:\n",
" x: ndarray of dimension (D, 1), the vector to be projected\n",
" B: ndarray of dimension (D, E), the basis for the subspace\n",
" \n",
" Returns:\n",
" p: projection of x onto the subspac spanned by the columns of B; size (D, 1)\n",
" \"\"\"\n",
" p = np.zeros(x.shape) # <-- EDIT THIS\n",
" return p"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have included some unittest for you to test your implementation."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Orthogonal projection in 2d\n",
"# define basis vector for subspace\n",
"b = np.array([2,1]).reshape(-1,1)\n",
"# point to be projected later\n",
"x = np.array([1,2]).reshape(-1, 1)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'np_test' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;31m# Test 1D\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m np_test.assert_almost_equal(projection_matrix_1d(np.array([1, 2, 2]).reshape(-1,1)), \n\u001b[0m\u001b[1;32m 3\u001b[0m np.array([[1, 2, 2],\n\u001b[1;32m 4\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m [2, 4, 4]]) / 9)\n",
"\u001b[0;31mNameError\u001b[0m: name 'np_test' is not defined"
]
}
],
"source": [
"# Test 1D\n",
"np_test.assert_almost_equal(projection_matrix_1d(np.array([1, 2, 2]).reshape(-1,1)), \n",
" np.array([[1, 2, 2],\n",
" [2, 4, 4],\n",
" [2, 4, 4]]) / 9)\n",
"\n",
"np_test.assert_almost_equal(project_1d(np.ones(3),\n",
" np.array([1, 2, 2]).reshape(-1,1)),\n",
" np.array([5, 10, 10]) / 9)\n",
"\n",
"B = np.array([[1, 0],\n",
" [1, 1],\n",
" [1, 2]])\n",
"\n",
"# Test General\n",
"np_test.assert_almost_equal(projection_matrix_general(B), \n",
" np.array([[5, 2, -1],\n",
" [2, 2, 2],\n",
" [-1, 2, 5]]) / 6)\n",
"\n",
"np_test.assert_almost_equal(project_general(np.array([6, 0, 0]).reshape(-1,1), B), \n",
" np.array([5, 2, -1]).reshape(-1,1))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Write your own test cases here, use random inputs, utilize the invariants we have!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Eigenfaces (optional)\n",
"\n",
"Next, we will take a look at what happens if we project some dataset consisting of human faces onto some basis we call\n",
"the \"eigenfaces\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.datasets import fetch_olivetti_faces, fetch_lfw_people\n",
"from ipywidgets import interact\n",
"%matplotlib inline\n",
"image_shape = (64, 64)\n",
"# Load faces data\n",
"dataset = fetch_olivetti_faces()\n",
"faces = dataset.data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"mean = faces.mean(axis=0)\n",
"std = faces.std(axis=0)\n",
"faces_normalized = (faces - mean) / std"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The data for the basis has been saved in a file named `eigenfaces.py`, first we load it into the variable B."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"B = np.load('eigenfaces.npy')[:50] # we use the first 50 dimensions of the basis, you should play around with this.\n",
"print(\"the eigenfaces have shape {}\".format(B.shape))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Along the first dimension of $\\boldsymbol B$, each instance is a `64x64' image, an \"eigenface\", which we determined using an algorithm called Principal Component Analysis. Let's visualize \n",
"a few of those \"eigenfaces\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"plt.figure(figsize=(10,10))\n",
"plt.imshow(np.hstack(B[:5]), cmap='gray');"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Take a look at what happens if we project our faces onto the basis $\\boldsymbol B$ spanned by these 50 \"eigenfaces\". In order to do this, we need to reshape $\\boldsymbol B$ from above, which is of size (50, 64, 64), into the same shape as the matrix representing the basis as we have done earlier, which is of size (4096, 50). Then we can reuse the functions we implemented earlier to compute the projection matrix and the projection. Complete the code below to visualize the reconstructed faces that lie on the subspace spanned by the \"eigenfaces\"."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# EDIT THIS FUNCTION\n",
"@interact(i=(0, 10))\n",
"def show_face_face_reconstruction(i):\n",
" original_face = faces_normalized[i].reshape(64, 64)\n",
" B_basis = ??? # <-- EDIT THIS\n",
" face_reconstruction = project_general(faces_normalized[i], B_basis).reshape(64, 64)\n",
" plt.figure()\n",
" plt.imshow(np.hstack([original_face, face_reconstruction]), cmap='gray')\n",
" plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What would happen to the reconstruction as we increase the dimension of our basis? \n",
"\n",
"Modify the code above to visualize it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Least squares for predicting Boston housing prices (optional)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Consider the case where we have a linear model for predicting housing prices. We are predicting the housing prices based on features in the \n",
"housing dataset. If we denote the features as $\\boldsymbol x_0, \\dotsc, \\boldsymbol x_n$ and collect them into a vector $\\boldsymbol {x}$, and the price of the houses as $y$. Assuming that we have \n",
"a prediction model in the way such that $\\hat{y}_i = f(\\boldsymbol {x}_i) = \\boldsymbol \\theta^T\\boldsymbol {x}_i$.\n",
"\n",
"If we collect the dataset into a data matrix $\\boldsymbol X$, we can write down our model like this:\n",
"\n",
"$$\n",
"\\begin{bmatrix} \n",
"\\boldsymbol{x}_1^T \\\\\n",
"\\vdots \\\\ \n",
"\\boldsymbol{x}_2^T \n",
"\\end{bmatrix} \\boldsymbol{\\theta} = \\begin{bmatrix} \n",
"y_1 \\\\\n",
"\\vdots \\\\ \n",
"y_2 \n",
"\\end{bmatrix}.\n",
"$$\n",
"\n",
"That is,\n",
"\n",
"$$\n",
"\\boldsymbol X\\boldsymbol{\\theta} = \\boldsymbol{y}.\n",
"$$\n",
"\n",
"Our goal is to find the best $\\boldsymbol\\theta$ such that we minimize the following objective (least square).\n",
"\n",
"$$\n",
"\\begin{eqnarray} \n",
"& \\sum^n_{i=1}{\\lVert \\bar{y_i} - y_i \\rVert^2} \\\\\n",
"&= \\sum^n_{i=1}{\\lVert \\boldsymbol \\theta^T\\boldsymbol{x}_i - y_i \\rVert^2} \\\\\n",
"&= (\\boldsymbol X\\boldsymbol {\\theta} - \\boldsymbol y)^T(\\boldsymbol X\\boldsymbol {\\theta} - \\boldsymbol y).\n",
"\\end{eqnarray}\n",
"$$\n",
"\n",
"If we set the gradient of the above objective to $\\boldsymbol 0$, we have\n",
"$$\n",
"\\begin{eqnarray} \n",
"\\nabla_\\theta(\\boldsymbol X\\boldsymbol {\\theta} - \\boldsymbol y)^T(\\boldsymbol X\\boldsymbol {\\theta} - \\boldsymbol y) &=& \\boldsymbol 0 \\\\\n",
"\\nabla_\\theta(\\boldsymbol {\\theta}^T\\boldsymbol X^T - \\boldsymbol y^T)(\\boldsymbol X\\boldsymbol {\\theta} - \\boldsymbol y) &=& \\boldsymbol 0 \\\\\n",
"\\nabla_\\theta(\\boldsymbol {\\theta}^T\\boldsymbol X^T\\boldsymbol X\\boldsymbol {\\theta} - \\boldsymbol y^T\\boldsymbol X\\boldsymbol \\theta - \\boldsymbol \\theta^T\\boldsymbol X^T\\boldsymbol y + \\boldsymbol y^T\\boldsymbol y ) &=& \\boldsymbol 0 \\\\\n",
"2\\boldsymbol X^T\\boldsymbol X\\theta - 2\\boldsymbol X^T\\boldsymbol y &=& \\boldsymbol 0 \\\\\n",
"\\boldsymbol X^T\\boldsymbol X\\boldsymbol \\theta &=& \\boldsymbol X^T\\boldsymbol y.\n",
"\\end{eqnarray}\n",
"$$\n",
"\n",
"The solution that gives zero gradient solves the following equation:\n",
"\n",
"$$\\boldsymbol X^T\\boldsymbol X\\boldsymbol \\theta = \\boldsymbol X^T\\boldsymbol y.$$\n",
"\n",
"If you recall from the lecture on projection onto an $n$-dimensional subspace, this is exactly the same as the normal equation we have for projection (take a look at the notes [here](https://www.coursera.org/teach/mathematics-machine-learning-pca/content/edit/supplement/fQq8T/content) if you don't remember them).\n",
"\n",
"This means that if we solve for $\\boldsymbol \\theta = (\\boldsymbol X^T\\boldsymbol X)^{-1}\\boldsymbol X^T\\boldsymbol y$ we would find the best $\\boldsymbol \\theta$, i.e. the $\\boldsymbol \\theta$, which minimizes our objective.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's put things into perspective. Consider that we want to predict the true coefficient $\\boldsymbol \\theta$ \n",
"of the line $\\boldsymbol y = \\boldsymbol \\theta^T \\boldsymbol x$ given only $\\boldsymbol X$ and $\\boldsymbol y$.\n",
"\n",
"Note: In this particular example, $\\boldsymbol \\theta$ is a number. Still, we can represent it as an $\\mathbb{R}^1$ vector."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"x = np.linspace(0, 10, num=50)\n",
"\n",
"random = np.random.RandomState(42) # we use the same random seed so we get deterministic output\n",
"theta = random.randn() # we use a random theta, our goal is to perform linear regression which finds theta_hat that minimizes the objective\n",
"y = theta * x + random.rand(len(x)) # our theta is corrupted by some noise, so that we do not get (x,y) on a line\n",
"\n",
"plt.scatter(x, y);\n",
"plt.xlabel('x');\n",
"plt.ylabel('y');"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"X = x.reshape(-1,1)\n",
"Y = y.reshape(-1,1)\n",
"\n",
"theta_hat = np.linalg.solve(X.T @ X, \n",
" X.T @ Y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can show how our $\\hat{\\boldsymbol \\theta}$ fits the line."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"fig, ax = plt.subplots()\n",
"ax.scatter(x, y);\n",
"xx = [0, 10]\n",
"yy = [0, 10 * theta_hat[0,0]]\n",
"ax.plot(xx, yy, 'red', alpha=.5);\n",
"ax.set(xlabel='x', ylabel='y');\n",
"print(\"theta = %f\" % theta)\n",
"print(\"theta_hat = %f\" % theta_hat)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What would happend to $\\lVert \\hat{\\boldsymbol \\theta} - \\boldsymbol \\theta \\rVert$ if we increase the number of datapoints?\n",
"\n",
"Make your hypothesis, and write a small program to confirm it!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"N = np.arange(10, 10000, step=10)\n",
"# Your code here which calculates \\hat{\\theta} for different sample sizes.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see how we can find the best $\\boldsymbol \\theta$. In fact, we can extend our methodology to higher-dimensional dataset. Let's now try applying the same methodology to the Boston housing prices dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from sklearn.datasets import load_boston\n",
"boston = load_boston()\n",
"boston_X, boston_y = boston.data, boston.target\n",
"print(\"The housing dataset has size {}\".format(boston_X.shape))\n",
"print(\"The prices has size {}\".format(boston_X.shape))\n",
"boston_theta_hat = np.zeros(3) ## EDIT THIS to predict boston_theta_hat"
]
}
],
"metadata": {
"coursera": {
"course_slug": "mathematics-machine-learning-pca",
"graded_item_id": "5xKMs",
"launcher_item_id": "Wu0av"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}