Generalisations and improvements of New Q-Newton's method Backtracking
In this paper, we propose a general framework for the algorithm New Q-Newton's method Backtracking, developed in the author's previous work. For a symmetric, square real matrix A, we define minsp(A):=min _||e||=1 ||Ae||. Given a C^2 cost function f:ℝ^m→ℝ and a real number 0<τ, as well as m+1 fixed real numbers δ _0,… ,δ _m, we define for each x∈ℝ^m with ∇ f(x)≠ 0 the following quantities: κ :=min _i≠ j|δ _i-δ _j|; A(x):=∇ ^2f(x)+δ ||∇ f(x)||^τId, where δ is the first element in the sequence {δ _0,… ,δ _m} for which minsp(A(x))≥κ ||∇ f(x)||^τ; e_1(x),… ,e_m(x) are an orthonormal basis of ℝ^m, chosen appropriately; w(x)= the step direction, given by the formula: w(x)=∑ _i=1^m<∇ f(x),e_i(x)>/||A(x)e_i(x)||e_i(x); (we can also normalise by w(x)/max{1,||w(x)||} when needed) γ (x)>0 learning rate chosen by Backtracking line search so that Armijo's condition is satisfied: f(x-γ (x)w(x))-f(x)≤ -1/3γ (x)<∇ f(x),w(x)>. The update rule for our algorithm is x↦ H(x)=x-γ (x)w(x). In New Q-Newton's method Backtracking, the choices are τ =1+α >1 and e_1(x),… ,e_m(x)'s are eigenvectors of ∇ ^2f(x). In this paper, we allow more flexibility and generality, for example τ can be chosen to be <1 or e_1(x),… ,e_m(x)'s are not necessarily eigenvectors of ∇ ^2f(x). New Q-Newton's method Backtracking (as well as Backtracking gradient descent) is a special case, and some versions have flavours of quasi-Newton's methods. Several versions allow good theoretical guarantees. An application to solving systems of polynomial equations is given.
READ FULL TEXT