In our last few posts we tackled the question of finding a global minimum for a function f(x)=c^Tx+\frac{1}{2} x^TCx. The heart of the algorithm was finding an orthonormal basis for \mathbb R^n with respect to the inner product \langle x,y\rangle =x^TCy.
Now suppose we have a set of equality constraints Ax=b. This defines an affine subspace of \mathbb R^n and if we can find an orthonormal basis of the same plane translated to the origin Ax=0, then we can just use the same algorithm as before.
There is a neat trick for doing this. First, assume that the rows of A a_1\ldots,a_m are linearly independent. (We will remove that restriction later.) Now add extra rows to A so that its rows form a basis of \mathbb R^n, to get a matrix we call D.
Claim: The m+1,\ldots, n columns of D^{-1} form a basis for the subspace Ax=0.
Proof:
Suppose Ax=0. Then Dx=(0,\ldots,0,*,\ldots,*)^T is 0 in the first m coordinates. Applying D^{-1} to both sides, we have x=D^{-1}(0,\ldots,0,*,\ldots,*)^T, which is in the column space of D^{-1} spanned by the last n-m columns. \Box
We proceed by doing Gram-Schmidt orthogonalization to the last n-m columns. So to begin, normalize the m+1st column and subtract off projections of this column from all other columns, including the initial m. These column operations correspond to similar row operations on D, so that we are adding a multiple of the m+1st row of D to the vectors spanning the subspace Ax=b (see below for a simple explanation of this phenomenon). Thus by the above claim, the last n-m-1 vectors of the new D^{-1} are orthogonal to the subspace spanned by the first m+1 columns.
Repeating this process with the remaining columns as we move to the right, we get a collection of orthonormal vectors spanning our subspace.
Thus we can use the algorithm developed in the unconstrained case with very little modification, but with a little more work needed in the initialization.
- First, you need to be able to identify a point x_0 satisfying the constraints: Ax_0=b.
- Second, you need to add rows to A to get an invertible matrix D, and start the algorithm off with GS=D^{-1} rather than just the identity.
- Also as a note, you have to make sure to apply the matrix update rules to the first m columns at each step, not just the columns to the right of the currently active one.
I hope to write this out as a detailed algorithm in my next post, but for now, let us return to the question of how row/column operations affect the inverse.
Lemma: An elementary row operation on a matrix M corresponds to an elementary column operation on M^{-1}.
Proof: An elementary row operation is realized by multiplying by an elementary matrix on the left EM. Taking the inverse we have M^{-1}E^{-1}. The inverse of an elementary matrix is an elementary matrix, and right multiplication is an elementary column operation. \Box