Notes on Lagrange Interpolating Polynomials
Polynomial interpolation is a method of finding a polynomial function that fits a given set of data perfectly. More concretely, suppose we have a set of n+1 distinct points [1] : And we want to find the polynomial coefficients {a_0\cdots a_n} such that: Fits all our points; that is p(x_0)=y_0 , p(x_1)=y_1 etc. This post discusses a common approach to solving this problem, and also shows why such a polynomial exists and is unique. When we assign all points (x_i, y_i) into the generic polynomial p(x) , we get: We want to solve for the coefficients a_i . This is a linear system of equations that can be represented by the following matrix equation: The matrix on the left is called the Vandermonde matrix . This matrix is known to be invertible (see Appendix for a proof); therefore, this system of equations has a single solution that can be calculated by inverting the matrix. In practice, however, the Vandermonde matrix is often numerically ill-conditioned, so inverting it isn’t the best way to calculate exact polynomial coefficients. Several better methods exist. Lagrange interpolation polynomials emerge from a simple, yet powerful idea. Let’s define the Lagrange basis functions l_i(x) ( i \in [0, n] ) as follows, given our points (x_i, y_i) : In words, l_i(x) is constrained to 1 at and to 0 at all other x_j . We don’t care about its value at any other point. The linear combination: is then a valid interpolating polynomial for our set of n+1 points, because it’s equal to at each (take a moment to convince yourself this is true). How do we find l_i(x) ? The key insight comes from studying the following function: This function has terms (x-x_j) for all j\neq i . It should be easy to see that l'_i(x) is 0 at all x_j when j\neq i . What about its value at , though? We can just assign into l'_i(x) to get: And then normalize l'_i(x) , dividing it by this (constant) value. We get the Lagrange basis function l_i(x) : Let’s use a concrete example to visualize this. Suppose we have the following set of points we want to interpolate: (1,4), (2,2), (3,3) . We can calculate l'_0(x) , l'_1(x) and l'_2(x) , and get the following: Note where each l'_i(x) intersects the axis. These functions have the right values at all x_{j\neq i} . If we normalize them to obtain l_i(x) , we get these functions: Note that each polynomial is 1 at the appropriate and 0 at all the other x_{j\neq i} , as required. With these l_i(x) , we can now plot the interpolating polynomial p(x)=\sum_{i=0}^{n}y_i l_i(x) , which fits our set of input points: We’ve just seen that the linear combination of Lagrange basis functions: is a valid interpolating polynomial for a set of n+1 distinct points (x_i, y_i) . What is its degree? Since the degree of each l_i(x) is , then the degree of p(x) is at most . We’ve just derived the first part of the Polynomial interpolation theorem : Polynomial interpolation theorem : for any n+1 data points (x_0,y_0), (x_1, y_1)\cdots(x_n, y_n) \in \mathbb{R}^2 where no two x_j are the same, there exists a unique polynomial p(x) of degree at most that interpolates these points. We’ve demonstrated existence and degree, but not yet uniqueness . So let’s turn to that. We know that p(x) interpolates all n+1 points, and its degree is . Suppose there’s another such polynomial q(x) . Let’s construct: That do we know about r(x) ? First of all, its value is 0 at all our , so it has n+1 roots . Second, we also know that its degree is at most (because it’s the difference of two polynomials of such degree). These two facts are a contradiction. No non-zero polynomial of degree \leq n can have n+1 roots (a basic algebraic fact related to the Fundamental theorem of algebra ). So r(x) must be the zero polynomial; in other words, our p(x) is unique \blacksquare . Note the implication of uniqueness here: given our set of n+1 distinct points, there’s only one polynomial of degree \leq n that interpolates it. We can find its coefficients by inverting the Vandermonde matrix, by using Lagrange basis functions, or any other method [2] . The set P_n(\mathbb{R}) consists of all real polynomials of degree \leq n . This set - along with addition of polynomials and scalar multiplication - forms a vector space . We called l_i(x) the "Lagrange basis" previously, and they do - in fact - form an actual linear algebra basis for this vector space. To prove this claim, we need to show that Lagrange polynomials are linearly independent and that they span the space. Linear independence : we have to show that implies a_i=0 \quad \forall i . Recall that l_i(x) is 1 at , while all other l_j(x) are 0 at that point. Therefore, evaluating s(x) at , we get: Similarly, we can show that a_i is 0, for all \blacksquare . Span : we’ve already demonstrated that the linear combination of l_i(x) : is a valid interpolating polynomial for any set of n+1 distinct points. Using the polynomial interpolation theorem , this is the unique polynomial interpolating this set of points. In other words, for every q(x)\in P_n(\mathbb{R}) , we can identify any set of n+1 distinct points it passes through, and then use the technique described in this post to find the coefficients of q(x) in the Lagrange basis. Therefore, the set l_i(x) spans the vector space \blacksquare . Previously we’ve seen how to use the \{1, x, x^2, \dots x^n\} basis to write down a system of linear equations that helps us find the interpolating polynomial. This results in the Vandermonde matrix . Using the Lagrange basis, we can get a much nicer matrix representation of the interpolation equations. Recall that our general polynomial using the Lagrange basis is: Let’s build a system of equations for each of the n+1 points (x_i,y_i) . For : By definition of the Lagrange basis functions, all l_i(x_0) where i\neq 0 are 0, while l_0(x_0) is 1. So this simplifies to: But the value at node is , so we’ve just found that a_0=y_0 . We can produce similar equations for the other nodes as well, p(x_1)=a_1 , etc. In matrix form: We get the identity matrix; this is another way to trivially show that a_0=y_0 , a_1=y_1 and so on. Given some numbers \{x_0 \dots x_n\} a matrix of this form: Is called the Vandermonde matrix. What’s special about a Vandermonde matrix is that we know it’s invertible when are distinct. This is because its determinant is known to be non-zero . Moreover, its determinant is [3] : Here’s why. To get some intuition, let’s consider some small-rank Vandermonde matrices. Starting with a 2-by-2: Let’s try 3-by-3 now: We can use the standard way of calculating determinants to expand from the first row: Using some algebraic manipulation, it’s easy to show this is equivalent to: For the full proof, let’s look at the generalized n+1 -by- n+1 matrix again: Recall that subtracting a multiple of one column from another doesn’t change a matrix’s determinant. For each column k>1 , we’ll subtract the value of column k-1 multiplied by from it (this is done on all columns simultaneously). The idea is to make the first row all zeros after the very first element: Now we factor out x_1-x_0 from the second row (after the first element), x_2-x_0 from the third row and so on, to get: Imagine we erase the first row and first column of . We’ll call the resulting matrix . Because the first row of is all zeros except the first element, we have: Note that the first row of has a common factor of x_1-x_0 , so when calculating \det(W) , we can move this common factor out. Same for the common factor x_2-x_0 of the second row, and so on. Overall, we can write: But the smaller matrix is just the Vandermonde matrix for \{x_0 \dots x_{n-1}\} . If we continue this process by induction, we’ll get: If you’re interested, the Wikipedia page for the Vandermonde matrix has a couple of additional proofs.