A modification of quasi-Newton's methods helping to avoid saddle points
We recall that if A is an invertible and symmetric real m× m matrix, then it is diagonalisable. Therefore, if we denote by ℰ^+(A)⊂ℝ^m (respectively ℰ^-(A)⊂ℝ^m) to be the vector subspace generated by eigenvectors with positive eigenvalues of A (correspondingly the vector subspace generated by eigenvectors with negative eigenvalues of A), then we have an orthogonal decomposition ℝ^m=ℰ^+(A)⊕ℰ^-(A). Hence, every x∈ℝ^m can be written uniquely as x=pr_A,+(x)+pr_A,-(x) with pr_A,+(x)∈ℰ^+(A) and pr_A,-(x)∈ℰ^-(A). We propose the following simple new modification of quasi-Newton's methods. New Q-Newton's method. Let Δ ={δ _0,δ _1,δ _2,...} be a countable set of real numbers which has at least m+1 elements. Let f:ℝ^m→ℝ be a C^2 function. Let α >0. For each x∈ℝ^m such that ∇ f(x)≠0, let δ (x)=δ _j, where j is the smallest number so that ∇ ^2f(x)+δ _j||∇ f(x)||^1+αId is invertible. (If ∇ f(x)=0, then we choose δ (x)=δ _0.) Let x_0∈ℝ^m be an initial point. We define a sequence of x_n∈ℝ^m and invertible and symmetric m× m matrices A_n as follows: A_n=∇ ^2f(x_n)+δ (x_n) ||∇ f(x_n)||^1+αId and x_n+1=x_n-w_n, where w_n=pr_A_n,+(v_n)-pr_A_n,-(v_n) and v_n=A_n^-1∇ f(x_n). The main result of this paper roughly says that if f is C^3 and a sequence {x_n}, constructed by the New Q-Newton's method from a random initial point x_0, converges, then the limit point is not a saddle point, and the convergence rate is the same as that of Newton's method.
READ FULL TEXT