Thursday, July 14, 2022

Quadratic Programming

 In my next series of posts, I would like to start exploring the quadratic programming problem. This is similar to the linear case, except the objective function is allowed to be a convex quadratic function 

$$f(x) = c\cdot x+\frac{1}{2} x^TCx$$

where $C$ is a positive semidefinite matrix. We are still considering the constraints to be linear, and in fact, following Best's textbook, we will assume they are inequality constraints of the form $Ax\leq b$. 

Definition: Let $x_0\in\mathbb R^n$. We say a constraint $a_i\cdot x\leq b_i$ is active at $x_0$, if $a_i\cdot x_0=b_i$.

In other words, $x_0$ lies on the facet of the feasible region defined by the $i$th constraint. 

Definition: Let $I(x_0)$ be the set of indices of active constraints for $x_0$. We say $x_0$ is quasi-stationary if $x_0$ is the optimal solution for the problem

  • Minimize $c\cdot x+\frac{1}{2} x^TCx$
  • Subject to $a_i \cdot x=b_i, i\in I(x_0)$

A couple of things to note: any corner of the feasible region is quasi-stationary since the active region is just a point. 

Lemma: An optimal solution for a QP must be quasi-stationary.

Proof: Let $I$ be the set of active constraints for an optimal point $x_0$. Note that $I$ could be empty.

We need to show that $x_0$ is optimal on the entire facet defined by its active constraints. This is obvious, due to convexity of the objective function. If there were a point with smaller objective function, just walk toward it until you hit a facet of higher codimension, meaning that you had not included all active constraints. $\Box$

So we have a very similar situation to the linear programming case. Instead of looking at basic variables, we look at active constraints. Just as in the LP case, a crude but inefficient algorithm is available. Just look at all possible subsets of the constraints, solve them as equalities, and then find the global minimum of the objective function on the hyperplane defined by that facet.

We will conclude this first post on quadratic programming by deriving tests for a point to be quasi-stationary or optimal. 

Suppose we have a quasi-stationary point $x_0$. Recall $\nabla f$ is the vector which gives the direction of fastest increase of $f$. Because $x_0$ is optimal for its set of active constraints, $\nabla f$ must be orthogonal to the hyperplane given by those constraints. Hence $\nabla f$ is in the linear span of $\{a_i\,|\,i\in I(x_0)\}$:

$$-\nabla f = \sum_{i\in I(x_0)} u_i a_i,$$

where we introduce the negative sign for later convenience. If $x_0$ is actually optimal, then the direction of fastest increase must point into the feasible polytope, which means that $u_i\geq 0$ in the above.

A convenient way to encode this is as follows:

Theorem: A point $x_0$ is quasi-stationary for the QP 

$$\min\{f(x)\,|\, a_i\cdot x\leq b_i\}$$ if and only if

  • $x_0$ lies in the feasible region $R$: $\forall i\,\,a_i\cdot x_0\leq b_i $
  • $-\nabla f(x_0) = \sum_{i=1}^m u_ia_i$ for some scalars $u_i\in\mathbb R$
  • $\forall i\,\, u_i(a_i\cdot x_0-b_i)=0$
The last condition gives an alternative: either $u_i=0$ or $i\in I(x_0),$ so is just a reformulation of the statement that the gradient is a linear combination of the active constraint coefficients.

Theorem: A point $x_0$ is optimal if in addition to the above three conditions, each $u_i$ is nonnegative. Writing this in matrix notation:
  • $Ax_0\leq b$
  • $-\nabla f(x_0) = A^Tu$ for some $u\geq 0$
  • $u^T(Ax_0-b)=0$


No comments:

Post a Comment