Thursday, July 14, 2022

Quadratic Programming: The unconstrained case

In order to build our understanding of how quadratic programming works, we are first going to look at the unconstrained case:

  • Minimize $f(x) = c^Tx+\frac{1}{2} x^TCx$ where $C$ is a positive definite matrix.
Note this is a global optimization problem and can be solved using calculus. The minimum occurs where the gradient vanishes:
$$\nabla f = c^T+Cx = 0.$$
So one could simply solve this linear system in whatever way one wishes and be done with the problem, but following Best's textbook, we will present a basic algorithm that we can build off of in the constrained cases.

The basic idea is as follows. Start with any initial approximation $x_0$, and suppose we have a search direction in the form of a vector $s_0$. We want to walk along this search direction as far as possible to minimize $f(x)$.

Let $g_0=\nabla f(x_0)$. Suppose $g_0^Ts_0\neq 0$. Replace $s_0$ by $-s_0$ if necessary so that $g_0^Ts_0>0$. Looking along the line $x_0-t s_0$, we write the Taylor Series as
$$f(x_0-ts_0) = f(x_0) -tg_0^Ts_0+\frac{1}{2}t^2s_0^TCs_0$$

Note that $s_0^TCs_0>0$ since $C$ is a positive definite matrix. So this function has a minimum, which can be found by taking the derivative with respect to $t$:
$$\frac{d}{dt}f(x_0-ts_0)=-g_0^Ts_0+ts_0^TCs_0=0$$
$$t=\frac{-g_0^Ts_0}{s_0^TCs_0}=\frac{-\nabla f(x_0)^Ts_0}{s_0^TCs_0}.$$
This quantity is called the optimal step size and tells you how far to walk along the ray to minimize the objective function on that ray.

One can iterate this by choosing a new search direction, so the question becomes how to choose these directions. A convenient choice of search directions is given by the concept of conjugate directions.

Definition: A set of vectors $s_1,\ldots,s_k$ is said to be conjugate (with respect to $C$) if $s_i^TCs_j=0$ for $i\neq j$ (and $s_i^TCs_i>0$). 

Theorem: Let $s_0,\ldots,s_{n-1}$ be conjugate directions, and $x_0$ be arbitrary. Construct $x_1,\ldots,x_n$ where $x_i=x_{i-1}-t_{i-1}s_{i-1}$, with $t_{i-1}=\frac{-\nabla f(x_{i-1})^Ts_0}{s_{i-1}^TCs_{i-1}}$. Then
  • $\nabla f(x_{j+1})s_i=0$ for $0\leq i\leq j\leq n-1$
  • $x_n$ is the optimal solution for the QP.
In other words, at each iteration the gradient becomes orthogonal to more search directions. In the end it is orthogonal to all of them, and since they span $\mathbb R^n$ (exercise) the gradient must be zero.

Proof:
First we want to show that $\nabla f(x_{j+1})^T s_j = 0$. To see this note that
$$\nabla f(x_{j+1})=\nabla f(x_j) -t_jCs_j$$ by Taylor series, and taking the inner product with $s_j$ we get
$$\nabla f(x_{j+1})^Ts_j=\nabla f(x_j)^Ts_j -t_js_j^TCs_j=0$$ by definition of $t_j$.

Now choose $i<j\leq n-1$. Again by Taylor series, we have
$$\nabla f(x_{j+1})=\nabla f(x_{i+1})+C(x_{j+1}-x_{i+1}).$$
Taking the inner product with $s_i$:
$$\nabla f(x_{j+1})^Ts_i=\nabla f(x_{i+1})^Ts_i+(x_{j+1}-x_{i+1})^TCs_i.$$
$$\hspace 4em = (x_{j+1}-x_{i+1})^TCs_i.$$
$$\hspace 4em = -t_{i+1}s^T_{i+1}Cs_i-\cdots-t_js^T_jCs_i.$$
Each term is equal to $0$ by definition of conjugate directions. $\Box$

The only missing ingredient is identifying a set of conjugate directions for a matrix $C$. We will do this in the next post.

No comments:

Post a Comment