Convex sets, functions, and optimization

Agenda today

Finish up gradient descent material from last time
Definition of convex sets/functions
Definition of a convex optimization problem

Reading:

Boyd and Vandenberghe Chapter 2.1-2.3, 3.1-3.2

Why do we care about convex problems?

Guarantees about existence and findability of global minima (for convex problems, any local minimum is a global minimum)
Solving convex problems is almost a technology
Local solutions to non-convex problems often obtained by methods that give global solutions to convex problems
Many of the statistical problems we care about are convex

Convex sets

Definion: \(C\) is a convex set if for any \(x_1, x_2 \in C\) and any \(\theta \in [0,1]\), \(\theta x_1 + (1 - \theta)x_2 \in C\).

Example: Affine sets

An affine set/solution set of linear equations: \(\{x : Ax = b\}\) are convex sets

Proof: check the definition: if \(x_1\) and \(x_2\) are both solutions to \(Ax = b\), \(\theta x_1 + (1 - \theta)x_2\) is as well.

Example: Hyperplanes and half spaces

A hyperplane is a set of the form \(\{x : a^T x = b\}\), \(a \ne 0\)
A halfspace is a set of the form \(\{x : a^T x \le b\}\)
\(a\) is the normal vector
Hyperplanes and half spaces are both convex
Interpretation as sets of linear equalities or inequalities

Example: Norm balls

A norm is a function \(\| \cdot \|\) that satisfies

\(\|x\|\ge 0\) and \(\|x\| = 0\) iff \(x = 0\)
\(\|t x\| = |t| \|x\|\) for \(t \in \mathbb R\)
\(\|x + y\|\le \|x \| + \|y\|\)

A norm ball with center \(x_c\) and radius \(r\) is the set \(\{x : \|x - x_c\| \le r\}\).
Norm balls are convex.
We will see them more with regularized regression

Example: Positive semidefinite cone

\(\mathbb S^p\) is the set of symmetric \(p \times p\) matrices
Positive semidefinite cone: \(\mathbb S^p_+ = \{ X \in \mathbb S^p : X \succeq 0\}\).
\(X \in \mathbb S^p_+\) iff \(z^T X z \ge 0\) for all \(z\).
The set of covariance matrices!

Example: Polyhedra

A polyhedron is the solution set to a finite number of linear inequalities and equalities: \[ \{x : Ax \preceq b, Cx = d\} \] where \(\preceq\) means component-wise inequality, \(A\) and \(C\) are matrices.

Can also think of as the intersection of a finite number of halfspaces and hyperplanes

Operations that preserve convexity of sets

Intersection (easy way to prove a polyhedron is convex)
Image/inverse image of set under affine functions (projection, scaling, translation)
Several others if you’re interested; see the reading

Convex functions

Definition: \(f: \mathbb R^n \to \mathbb R\) is convex if its domain is a convex set and \[ f(\theta x + (1-\theta) y) \le \theta f(x) + (1 - \theta) f(y) \] for all \(x, y \in \text{dom}(f)\) and \(\theta \in [0,1]\)

\(f\) is concave if \(-f\) is convex
\(f\) is strictly convex if its domain is a convex set and \[ f(\theta x + (1-\theta) y) < \theta f(x) + (1 - \theta) f(y) \] for all \(x, y \in \text{dom}(f)\) and \(\theta \in [0,1]\)

First-order condition

Suppose \(f\) is differentiable, and let \(d f(x) = (\frac{\partial f(x)}{\partial x_1}, \frac{\partial f(x)}{\partial x_1}, \cdots, \frac{\partial f(x)}{\partial x_n})\).

The 1st-order condition states that \(f\) is convex iff \(f\) has a convex domain and \[ f(y) \ge f(x) + d f(x)^T (y - x) \quad \forall x, y \in \text{dom}(f) \]

Interpretation: the first-order Taylor approximation of \(f\) is a global underestimator.

Second-order condition

Suppose \(f\) is twice differentiable: the Hessian \(d^2 f(x)\), \(d^2 f(x)_{ij} = \frac{\partial^2 f(x)}{\partial x_i \partial x_j}\) exists for any \(x \in \text{dom}(f)\).

The 2nd-order condition states that if \(f\) is twice differentiable and has a convex domain

\(f\) is convex if and only if \[ d^2 f(x) \succeq 0 \quad \forall x \in \text{dom}(f) \]

Restriction of convex function to a line

\(f : \mathbb R^n \to \mathbb R\) is convex iff the function \(g : \mathbb R \to \mathbb R\), \[ g(t) = f(x + tv), \quad \text{dom}(g) = \{t : x + tv \in \text{dom}(f)\} \] is a convex function of \(t\) for any \(x \in \text{dom}(f)\) and any \(v \in \mathbb R^n\)

This equivalence lets you check the convexity of \(f\) by checking convexity of a function of one variable.

Operations that preserve convexity

Positive weighted sum
Composition with an affine function
Pointwise maximum and supremum
Composition (under some extra conditions)

Positive weighted sum/composition with affine function

Non-negative: \(\alpha f\) is convex if \(f\) is convex, \(\alpha \ge 0\)
Sum: \(f_1 + f_2\) is convex if \(f_1, f_2\) are convex
Composition with affine function: \(f(Ax + b)\) is convex if \(f\) is convex

Examples:

Log barrier function for linear inequalities: \[ f(x) = - \sum_{i=1}^m \text{log}(b_i - a_i^T x), \quad \text{dom}(f) = \{x : a_i^T x \le b_i, i = 1,\ldots, m\} \]
Norm of an affine function: \(f(x) = \|A x + b \|\)

Pointwise maximum

If \(f_1, \ldots, f_m\) are convex, then \(f(x) = \text{max}\{f_1(x), \ldots, f_n(x)\}\) is convex

Example:

Piecewise-linear function: \(f(x) = \text{max}_{i = 1,\ldots, m}(a_i^T x + b_i)\)

Composition

Suppose \(g : \mathbb R^n \to \mathbb R\) and \(h : \mathbb R \to \mathbb R\), and define \(f\) as \[ f(x) = h(g(x)) \]

\(f\) is convex if either:

\(g\) is convex, \(h\) is convex, \(h\) non-decreasing
\(g\) concave, \(h\) convex, \(h\) non-increasing

Proof for differentiable \(g\), \(h\) by checking the second-order conditions

Examples

\(\text{exp}(g(x))\) is convex if \(g\) is convex.
\(1 / g(x)\) is convex if \(g\) is concave and positive.

Convex optimization problem

In a convex optimization problem, we minimize a convex function over a convex set.

Standard form for an optimization problem is:

\[ \begin{align*} \text{minimize} \quad &f_0(x) \\ \text{subject to}\quad &f_i(x) \le 0, \quad i = 1,\ldots, m\\ &a_j^T x = b_j, \quad j = 1,\ldots, n \end{align*} \]

\(x \in \mathbb R^n\) is the optimization variable
\(f_0: \mathbb R^n \to \mathbb R\) is the objective function
\(f_i : \mathbb R^n \to \mathbb R\) are the inequality constraint functions
\(a_j^T x = b_j\) are the equality constraints

Regression

Standard least squares problem is convex: \[ \text{minimize} \|y - X \beta\|_2^2 \]

Regularized regression

Any “regularization” with a convex function \(P\) will still be convex:

\[ \text{minimize} \|y - X \beta\|_2^2 + P(\beta) \]

Covariance estimation

Let \(S\) denote the sample covariance and \(\Theta\) be the inverse covariance matrix.

Up to constant factors, the log-likelihood of the data given a Gaussian distribution is

\[ \log \det \Theta - \text{tr}(S \Theta) \]

Covariance estimation would be by maximizing the log likelihood or minimizing the negative log likelihood.

\(\log \det\) is concave, \(\text{tr}(S \Theta)\) is linear
Restriction of \(\Theta\) to be positive definite is a restriction to a convex set
Convex problem; remains convex if we add convex penalties to \(\Theta\)

Summing up

If you can show an optimization problem is convex, you’re very likely able to solve it efficiently

Many statistical estimation problems are naturally convex

You have a couple of options for checking convexity:

Check the definition
Check first-order conditions (not usually as useful)
Check second-order conditions (good option if the function is differentiable)
Check restriction to a line
Check whether the function can be re-expressed as a combination of convex functions and convexity-preserving operations