Processing math: 0%

Friday, July 15, 2022

Unconstrained quadratic optimization: the algorithm

 We finally have all the tools in place to describe a step algorithm for minimizing a (convex) quadratic objective function without any constraints.

  • Minimize f(x)= c\cdot x+\frac{1}{2}x^TCx where C is a positive semidefinite matrix.
We start with any seed value x_0\in \mathbb R^n. Recall that the modified Gram-Schmidt process starts with a set of independent vectors and gradually replaces them with an orthonormal set. We will start with the standard basis e_1,\ldots,e_n, recorded as the columns of a matrix GS_0=I. We use a vector J\in\{Y,N\}^n to keep track of which columns have been orthonormalized. So to begin we have J=\{N,\ldots,N\}, and as we work on a column to create a conjugate direction, we add Y to that index of J. Set g_0=\nabla f(x_0)=c+Cx_0 and j=0 to initialize, where j represents the iteration of the algorithm.
  • Step 1: Computation of search direction s_j: Let GS_j=[c_{1j},\ldots, c_{nj}] and J_j=\{\alpha_{1j},\ldots,\alpha_{nj}]. If each component of J_j is equal to Y, then stop with optimal solution x_j. Otherwise, compute the smallest index k such that |g_j^Tc_{kj}|= \max\{|g_j^Tc_{ij}|\,|\, \forall i \alpha_{ij}=N\}. If g^T_jc_{kj}=0, stop with optimal solution x_j. Otherwise set s_j=\operatorname{sgn}(g^T_jc_{kj})c_{kj}. Go to step 2.
  • Step 2: Computation of step size t_j: Compute \langle s_j,s_j\rangle. If it is 0, then stop. The objective function is unbounded from below. Otherwise set: t_j=\frac{g_j^Ts_j}{\langle s_j,s_j\rangle} Go to step 3.
  • Step 3: Updating data: Set
    • x_{j+1}:=x_j-t_js_j.
    • g_{j+1} := \nabla f(x_{j+1}) = c+Cx_{j+1}
    • f(x_{j+1}) = c^Tx_{j+1} +\frac{1}{2}x_{j+1}^TCx_{j+1}
    • d_j = s_j/\sqrt{\langle s_j,s_j\rangle}
    • GS_{j+1} is formed from GS_j by 
      • replacing the kth column c_k by c_k/(d_j^Tc_k) 
      • replacing all columns c_i where \alpha_{ij}=N by c_i-\frac{d_j^Tc_i}{d_j^Tc_k}c_k
      • leaving columns c_i where \alpha_{ij}=Y unchanged.
    • J_{j+1} is formed from J_j be setting \alpha_{k} to Y and leaving everything else unchanged.
    • Increment j:=j+1 and go to step 1!

To summarize the algorithm works as follows. Pick some point x_0. Choose a search direction s_1 parallel to the axes that maximizes the gradient slope. Locate the minimum of f on the line through x_0 parallel to s_1 to get x_1. Comb the remaining basis vectors of \mathbb R^n to be orthogonal to s_1 with respect to the inner product defined by C. Next find a search direction s_2 among these combed vectors that maximizes the gradient slope, locate the minimum of f on the line through x_1 parallel to s_2 to get x_3, etc. After at most n steps the algorithm will terminate.

I went ahead and programmed this into C# using the MathNet.Numerics.LinearAlgebra library and it seems to work quite well.

First we have a helper function that updates the matrix GS, followed by the main minimization routine.

public Matrix<double> GramSchmidt(Matrix<double> matrix, Vector<double> d, int k, bool[] J)
        {
            Matrix<double> A = matrix.Clone();
            double lambda = d.DotProduct(matrix.Column(k));
            A.SetColumn(k, matrix.Column(k) / lambda);
            for(int i = 0; i < A.ColumnCount; i++)
            {
                if (!J[i]&&i!=k)
                {
                    double mu = d.DotProduct(matrix.Column(i));
                    A.SetColumn(i, matrix.Column(i)-matrix.Column(k) * (mu / lambda));
                }
            }
            return A;
        }

 public Vector<double> Minimize(Vector<double> c,Matrix<double> C,Vector<double> x0, out bool unbounded)
        {
            unbounded = false;
            Vector<double> x = x0;
            int n = c.Count;
            bool[] J = new bool[n];//builds array of falses
            Matrix<double> GS = Matrix<double>.Build.DenseIdentity(n);
            Vector<double> g = c + C * x;
            for(int j = 0; j < n; j++)
            {
                //Step 1:
                bool init = true;
                foreach(bool index in J) { init = init && index; }
                if (init) { return x; }

                double max = 0;
                int k = -1;
                for(int i =0; i < n; i++)
                {
                    if (!J[i] && Math.Abs(g.DotProduct(GS.Column(i))) > max)
                          { max = Math.Abs(g.DotProduct(GS.Column(i)));k = i; }
                }

                if (g.DotProduct(GS.Column(k)) == 0) { return x; }
                Vector<double> s = GS.Column(k);
                if (g.DotProduct(GS.Column(k)) < 0) { s *= -1; }

                //Step 2:
                double normSquaredOfs = (s.ToRowMatrix() * C * s.ToColumnMatrix())[0,0];
                if (Math.Abs(normSquaredOfs) < .0000001) { unbounded = true; return x; }
                double t = g.DotProduct(s) / normSquaredOfs;

                //Step 3:
                x -= t * s;
                g = c + C * x;
                Vector<double> d = 1/ Math.Pow(normSquaredOfs, 0.5)*(C * s);
                GS = GramSchmidt(GS, d, k, J);
                J[k] = true;

            }

            return x;
        }

No comments:

Post a Comment