A modification of quasi-Newton's methods helping to avoid saddle points

06/02/2020
by   Tuyen Trung Truong, et al.
0

We recall that if A is an invertible and symmetric real m× m matrix, then it is diagonalisable. Therefore, if we denote by ℰ^+(A)⊂ℝ^m (respectively ℰ^-(A)⊂ℝ^m) to be the vector subspace generated by eigenvectors with positive eigenvalues of A (correspondingly the vector subspace generated by eigenvectors with negative eigenvalues of A), then we have an orthogonal decomposition ℝ^m=ℰ^+(A)⊕ℰ^-(A). Hence, every x∈ℝ^m can be written uniquely as x=pr_A,+(x)+pr_A,-(x) with pr_A,+(x)∈ℰ^+(A) and pr_A,-(x)∈ℰ^-(A). We propose the following simple new modification of quasi-Newton's methods. New Q-Newton's method. Let Δ ={δ _0,δ _1,δ _2,...} be a countable set of real numbers which has at least m+1 elements. Let f:ℝ^m→ℝ be a C^2 function. Let α >0. For each x∈ℝ^m such that ∇ f(x)≠0, let δ (x)=δ _j, where j is the smallest number so that ∇ ^2f(x)+δ _j||∇ f(x)||^1+αId is invertible. (If ∇ f(x)=0, then we choose δ (x)=δ _0.) Let x_0∈ℝ^m be an initial point. We define a sequence of x_n∈ℝ^m and invertible and symmetric m× m matrices A_n as follows: A_n=∇ ^2f(x_n)+δ (x_n) ||∇ f(x_n)||^1+αId and x_n+1=x_n-w_n, where w_n=pr_A_n,+(v_n)-pr_A_n,-(v_n) and v_n=A_n^-1∇ f(x_n). The main result of this paper roughly says that if f is C^3 and a sequence {x_n}, constructed by the New Q-Newton's method from a random initial point x_0, converges, then the limit point is not a saddle point, and the convergence rate is the same as that of Newton's method.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset