Theory of Optimization: Frank-Wolfe Algorithm

2 minute read

Published:

In this post, we describe a new geometry dependent algorithm that relies on different set of assumptions. The algorithm is called conditional gradient descent, aka Frank-Wolfe.

Frank-Wolfe

Algorithm

Frank-Wolfe algorithm solves the following convex optimization problem

minxDf(x),

for f such that f(x) is `Lipschitz’ in a certain sense. The algorithm reduces the problem into a sequence of linear optimization problems.

Frank-Wolfe algorithm has the following procedure:

  1. Initial point x(0)Rn, step size h>0.
  2. For k=0,1, do
  3. Compute y(k)=argminyDy,f(x(k)).
  4. x(k+1)(1hk)x(k)+hky(k) with hk=2k+2.

Analysis

We have the following theorem for Frank-Wolfe algorithm.

Given a convex function f on a convex set D and a constant Cf such that

f((1h)x+hy)f(x)+hf(x),yx+12Cfh2

for any x,yD and h[0,1], we have

f(x(k))f(x)2Cfk+2.

Proof: By the definition of Cf, we have

f(x(k+1))f(x(k))+hkf(x(k)),y(k)x(k)+12Cfh2k.

Note that from the convexity of f and the definition of y(k), we have

f(x)f(x(k))+f(x(k)),xx(k)f(x(k))+f(x(k)),y(k)x(k).

Hence, we have

f(x(k+1))f(x(k))hk(f(x(k))f(x))+12Cfh2k.

Let ϵk=f(x(k))f(x), we have

ϵk+1(1hk)ϵk+12Cfh2k.

Note that ϵ0=f(0)f(x)12Cf, we can prove the theorem by induction.

Note that if f(x) is L-Lipschitz with respect to ||||, over the domain D, then CfLdiam||||(D)2.

Sparsity Analysis

For the domain D is a simplex, the step y(k) is always a vertex of the simplex, and hence, each step of Frank-Wolfe increases the sparsity by at most 1. We can think that the convergence result of Frank Wolfe proves that there is an approximate sparse solution of the optimization problem.

Given a polytope P=conv(vi) lies inside an unit ball. For any uP, there are k=O(1ϵ2) vertices v1,,vk of P such that

||ki=1λiviu||2ϵ,

for some iλi=1,λi0.

Proof: Run Frank-Wolfe algorithm with f(x)=||xu||22. Note that f is 1-Lipschitz with respect to ||x||2 and that the diameter of P is bounded by 1. Hence, we have Cf1.

Therefore, Frank-Wolfe algorithm shows that

f(x(k))f(x)2k+2.

Since f(u)=0, we have ||x(k)u||24k+2, and this gives the result.