Some convergent results for Backtracking Gradient Descent method on Banach spaces
Our main result concerns the following condition: Condition C. Let X be a Banach space. A C^1 function f:X→R satisfies Condition C if whenever {x_n} weakly converges to x and lim _n→∞||∇ f(x_n)||=0, then ∇ f(x)=0. We assume that there is given a canonical isomorphism between X and its dual X^*, for example when X is a Hilbert space. Theorem. Let X be a reflexive, complete Banach space and f:X→R be a C^2 function which satisfies Condition C. Moreover, we assume that for every bounded set S⊂ X, then sup _x∈ S||∇ ^2f(x)||<∞. We choose a random point x_0∈ X and construct by the Local Backtracking GD procedure (which depends on 3 hyper-parameters α ,β ,δ _0, see later for details) the sequence x_n+1=x_n-δ (x_n)∇ f(x_n). Then we have: 1) Every cluster point of {x_n}, in the weak topology, is a critical point of f. 2) Either lim _n→∞f(x_n)=-∞ or lim _n→∞||x_n+1-x_n||=0. 3) Here we work with the weak topology. Let C be the set of critical points of f. Assume that C has a bounded component A. Let B be the set of cluster points of {x_n}. If B∩ A≠∅, then B⊂ A and B is connected. 4) Assume that f has at most countably many saddle points. Then for generic choices of α ,β ,δ _0 and the initial point x_0, if the sequence {x_n} converges - in the weak topology, then the limit point cannot be a saddle point.
READ FULL TEXT