Friday, July 15, 2022

Unconstrained quadratic optimization: the algorithm

 We finally have all the tools in place to describe a step algorithm for minimizing a (convex) quadratic objective function without any constraints.

  • Minimize $f(x)= c\cdot x+\frac{1}{2}x^TCx$ where $C$ is a positive semidefinite matrix.
We start with any seed value $x_0\in \mathbb R^n$. Recall that the modified Gram-Schmidt process starts with a set of independent vectors and gradually replaces them with an orthonormal set. We will start with the standard basis $e_1,\ldots,e_n$, recorded as the columns of a matrix $GS_0=I$. We use a vector $J\in\{Y,N\}^n$ to keep track of which columns have been orthonormalized. So to begin we have $J=\{N,\ldots,N\}$, and as we work on a column to create a conjugate direction, we add $Y$ to that index of $J$. Set $g_0=\nabla f(x_0)=c+Cx_0$ and $j=0$ to initialize, where $j$ represents the iteration of the algorithm.
  • Step 1: Computation of search direction $s_j$: Let $GS_j=[c_{1j},\ldots, c_{nj}]$ and $J_j=\{\alpha_{1j},\ldots,\alpha_{nj}]$. If each component of $J_j$ is equal to $Y$, then stop with optimal solution $x_j$. Otherwise, compute the smallest index $k$ such that $$|g_j^Tc_{kj}|= \max\{|g_j^Tc_{ij}|\,|\, \forall i \alpha_{ij}=N\}.$$ If $g^T_jc_{kj}=0$, stop with optimal solution $x_j$. Otherwise set $s_j=\operatorname{sgn}(g^T_jc_{kj})c_{kj}.$ Go to step 2.
  • Step 2: Computation of step size $t_j$: Compute $\langle s_j,s_j\rangle$. If it is $0$, then stop. The objective function is unbounded from below. Otherwise set: $$t_j=\frac{g_j^Ts_j}{\langle s_j,s_j\rangle} $$ Go to step 3.
  • Step 3: Updating data: Set
    • $x_{j+1}:=x_j-t_js_j$.
    • $g_{j+1} := \nabla f(x_{j+1}) = c+Cx_{j+1}$
    • $f(x_{j+1}) = c^Tx_{j+1} +\frac{1}{2}x_{j+1}^TCx_{j+1}$
    • $d_j = s_j/\sqrt{\langle s_j,s_j\rangle}$
    • $GS_{j+1}$ is formed from $GS_j$ by 
      • replacing the $k$th column $c_k$ by $c_k/(d_j^Tc_k)$ 
      • replacing all columns $c_i$ where $\alpha_{ij}=N$ by $c_i-\frac{d_j^Tc_i}{d_j^Tc_k}c_k$
      • leaving columns $c_i$ where $\alpha_{ij}=Y$ unchanged.
    • $J_{j+1}$ is formed from $J_j$ be setting $\alpha_{k}$ to $Y$ and leaving everything else unchanged.
    • Increment $j:=j+1$ and go to step 1!

To summarize the algorithm works as follows. Pick some point $x_0$. Choose a search direction $s_1$ parallel to the axes that maximizes the gradient slope. Locate the minimum of $f$ on the line through $x_0$ parallel to $s_1$ to get $x_1$. Comb the remaining basis vectors of $\mathbb R^n$ to be orthogonal to $s_1$ with respect to the inner product defined by $C$. Next find a search direction $s_2$ among these combed vectors that maximizes the gradient slope, locate the minimum of $f$ on the line through $x_1$ parallel to $s_2$ to get $x_3$, etc. After at most $n$ steps the algorithm will terminate.

I went ahead and programmed this into C# using the MathNet.Numerics.LinearAlgebra library and it seems to work quite well.

First we have a helper function that updates the matrix $GS$, followed by the main minimization routine.

public Matrix<double> GramSchmidt(Matrix<double> matrix, Vector<double> d, int k, bool[] J)
        {
            Matrix<double> A = matrix.Clone();
            double lambda = d.DotProduct(matrix.Column(k));
            A.SetColumn(k, matrix.Column(k) / lambda);
            for(int i = 0; i < A.ColumnCount; i++)
            {
                if (!J[i]&&i!=k)
                {
                    double mu = d.DotProduct(matrix.Column(i));
                    A.SetColumn(i, matrix.Column(i)-matrix.Column(k) * (mu / lambda));
                }
            }
            return A;
        }

 public Vector<double> Minimize(Vector<double> c,Matrix<double> C,Vector<double> x0, out bool unbounded)
        {
            unbounded = false;
            Vector<double> x = x0;
            int n = c.Count;
            bool[] J = new bool[n];//builds array of falses
            Matrix<double> GS = Matrix<double>.Build.DenseIdentity(n);
            Vector<double> g = c + C * x;
            for(int j = 0; j < n; j++)
            {
                //Step 1:
                bool init = true;
                foreach(bool index in J) { init = init && index; }
                if (init) { return x; }

                double max = 0;
                int k = -1;
                for(int i =0; i < n; i++)
                {
                    if (!J[i] && Math.Abs(g.DotProduct(GS.Column(i))) > max)
                          { max = Math.Abs(g.DotProduct(GS.Column(i)));k = i; }
                }

                if (g.DotProduct(GS.Column(k)) == 0) { return x; }
                Vector<double> s = GS.Column(k);
                if (g.DotProduct(GS.Column(k)) < 0) { s *= -1; }

                //Step 2:
                double normSquaredOfs = (s.ToRowMatrix() * C * s.ToColumnMatrix())[0,0];
                if (Math.Abs(normSquaredOfs) < .0000001) { unbounded = true; return x; }
                double t = g.DotProduct(s) / normSquaredOfs;

                //Step 3:
                x -= t * s;
                g = c + C * x;
                Vector<double> d = 1/ Math.Pow(normSquaredOfs, 0.5)*(C * s);
                GS = GramSchmidt(GS, d, k, J);
                J[k] = true;

            }

            return x;
        }

No comments:

Post a Comment