Implicit bias of any algorithm: bounding bias via margin

11/12/2020
by   Elvis Dohmatob, et al.
0

Consider n points x_1,…,x_n in finite-dimensional euclidean space, each having one of two colors. Suppose there exists a separating hyperplane (identified with its unit normal vector w) for the points, i.e a hyperplane such that points of same color lie on the same side of the hyperplane. We measure the quality of such a hyperplane by its margin γ(w), defined as minimum distance between any of the points x_i and the hyperplane. In this paper, we prove that the margin function γ satisfies a nonsmooth Kurdyka-Lojasiewicz inequality with exponent 1/2. This result has far-reaching consequences. For example, let γ^opt be the maximum possible margin for the problem and let w^opt be the parameter for the hyperplane which attains this value. Given any other separating hyperplane with parameter w, let d(w):=w-w^opt be the euclidean distance between w and w^opt, also called the bias of w. From the previous KL-inequality, we deduce that (γ^opt-γ(w)) / R ≤ d(w) ≤ 2√((γ^opt-γ(w))/γ^opt), where R:=max_i x_i is the maximum distance of the points x_i from the origin. Consequently, for any optimization algorithm (gradient-descent or not), the bias of the iterates converges at least as fast as the square-root of the rate of their convergence of the margin. Thus, our work provides a generic tool for analyzing the implicit bias of any algorithm in terms of its margin, in situations where a specialized analysis might not be available: it is sufficient to establish a good rate for converge of the margin, a task which is usually much easier.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro