Processing math: 100%

Thursday, July 14, 2022

Quadratic Programming: The unconstrained case

In order to build our understanding of how quadratic programming works, we are first going to look at the unconstrained case:

  • Minimize f(x) = c^Tx+\frac{1}{2} x^TCx where C is a positive definite matrix.
Note this is a global optimization problem and can be solved using calculus. The minimum occurs where the gradient vanishes:
\nabla f = c^T+Cx = 0.
So one could simply solve this linear system in whatever way one wishes and be done with the problem, but following Best's textbook, we will present a basic algorithm that we can build off of in the constrained cases.

The basic idea is as follows. Start with any initial approximation x_0, and suppose we have a search direction in the form of a vector s_0. We want to walk along this search direction as far as possible to minimize f(x).

Let g_0=\nabla f(x_0). Suppose g_0^Ts_0\neq 0. Replace s_0 by -s_0 if necessary so that g_0^Ts_0>0. Looking along the line x_0-t s_0, we write the Taylor Series as
f(x_0-ts_0) = f(x_0) -tg_0^Ts_0+\frac{1}{2}t^2s_0^TCs_0

Note that s_0^TCs_0>0 since C is a positive definite matrix. So this function has a minimum, which can be found by taking the derivative with respect to t:
\frac{d}{dt}f(x_0-ts_0)=-g_0^Ts_0+ts_0^TCs_0=0
t=\frac{-g_0^Ts_0}{s_0^TCs_0}=\frac{-\nabla f(x_0)^Ts_0}{s_0^TCs_0}.
This quantity is called the optimal step size and tells you how far to walk along the ray to minimize the objective function on that ray.

One can iterate this by choosing a new search direction, so the question becomes how to choose these directions. A convenient choice of search directions is given by the concept of conjugate directions.

Definition: A set of vectors s_1,\ldots,s_k is said to be conjugate (with respect to C) if s_i^TCs_j=0 for i\neq j (and s_i^TCs_i>0). 

Theorem: Let s_0,\ldots,s_{n-1} be conjugate directions, and x_0 be arbitrary. Construct x_1,\ldots,x_n where x_i=x_{i-1}-t_{i-1}s_{i-1}, with t_{i-1}=\frac{-\nabla f(x_{i-1})^Ts_0}{s_{i-1}^TCs_{i-1}}. Then
  • \nabla f(x_{j+1})s_i=0 for 0\leq i\leq j\leq n-1
  • x_n is the optimal solution for the QP.
In other words, at each iteration the gradient becomes orthogonal to more search directions. In the end it is orthogonal to all of them, and since they span \mathbb R^n (exercise) the gradient must be zero.

Proof:
First we want to show that \nabla f(x_{j+1})^T s_j = 0. To see this note that
\nabla f(x_{j+1})=\nabla f(x_j) -t_jCs_j by Taylor series, and taking the inner product with s_j we get
\nabla f(x_{j+1})^Ts_j=\nabla f(x_j)^Ts_j -t_js_j^TCs_j=0 by definition of t_j.

Now choose i<j\leq n-1. Again by Taylor series, we have
\nabla f(x_{j+1})=\nabla f(x_{i+1})+C(x_{j+1}-x_{i+1}).
Taking the inner product with s_i:
\nabla f(x_{j+1})^Ts_i=\nabla f(x_{i+1})^Ts_i+(x_{j+1}-x_{i+1})^TCs_i.
\hspace 4em = (x_{j+1}-x_{i+1})^TCs_i.
\hspace 4em = -t_{i+1}s^T_{i+1}Cs_i-\cdots-t_js^T_jCs_i.
Each term is equal to 0 by definition of conjugate directions. \Box

The only missing ingredient is identifying a set of conjugate directions for a matrix C. We will do this in the next post.

No comments:

Post a Comment