Agenda today
Finish up gradient descent material from last time
Definition of convex sets/functions
Definition of a convex optimization problem
Reading:
Guarantees about existence and findability of global minima (for convex problems, any local minimum is a global minimum)
Solving convex problems is almost a technology
Local solutions to non-convex problems often obtained by methods that give global solutions to convex problems
Many of the statistical problems we care about are convex
Definion: \(C\) is a convex set if for any \(x_1, x_2 \in C\) and any \(\theta \in [0,1]\), \(\theta x_1 + (1 - \theta)x_2 \in C\).
An affine set/solution set of linear equations: \(\{x : Ax = b\}\) are convex sets
Proof: check the definition: if \(x_1\) and \(x_2\) are both solutions to \(Ax = b\), \(\theta x_1 + (1 - \theta)x_2\) is as well.
A hyperplane is a set of the form \(\{x : a^T x = b\}\), \(a \ne 0\)
A halfspace is a set of the form \(\{x : a^T x \le b\}\)
\(a\) is the normal vector
Hyperplanes and half spaces are both convex
Interpretation as sets of linear equalities or inequalities
A norm is a function \(\| \cdot \|\) that satisfies
\(\|x\|\ge 0\) and \(\|x\| = 0\) iff \(x = 0\)
\(\|t x\| = |t| \|x\|\) for \(t \in \mathbb R\)
\(\|x + y\|\le \|x \| + \|y\|\)
A norm ball with center \(x_c\) and radius \(r\) is the set \(\{x : \|x - x_c\| \le r\}\).
Norm balls are convex.
We will see them more with regularized regression
\(\mathbb S^p\) is the set of symmetric \(p \times p\) matrices
Positive semidefinite cone: \(\mathbb S^p_+ = \{ X \in \mathbb S^p : X \succeq 0\}\).
\(X \in \mathbb S^p_+\) iff \(z^T X z \ge 0\) for all \(z\).
The set of covariance matrices!
A polyhedron is the solution set to a finite number of linear inequalities and equalities: \[ \{x : Ax \preceq b, Cx = d\} \] where \(\preceq\) means component-wise inequality, \(A\) and \(C\) are matrices.
Can also think of as the intersection of a finite number of halfspaces and hyperplanes
Intersection (easy way to prove a polyhedron is convex)
Image/inverse image of set under affine functions (projection, scaling, translation)
Several others if you’re interested; see the reading
Definition: \(f: \mathbb R^n \to \mathbb R\) is convex if its domain is a convex set and \[ f(\theta x + (1-\theta) y) \le \theta f(x) + (1 - \theta) f(y) \] for all \(x, y \in \text{dom}(f)\) and \(\theta \in [0,1]\)
\(f\) is concave if \(-f\) is convex
\(f\) is strictly convex if its domain is a convex set and \[ f(\theta x + (1-\theta) y) < \theta f(x) + (1 - \theta) f(y) \] for all \(x, y \in \text{dom}(f)\) and \(\theta \in [0,1]\)
Suppose \(f\) is differentiable, and let \(d f(x) = (\frac{\partial f(x)}{\partial x_1}, \frac{\partial f(x)}{\partial x_1}, \cdots, \frac{\partial f(x)}{\partial x_n})\).
The 1st-order condition states that \(f\) is convex iff \(f\) has a convex domain and \[ f(y) \ge f(x) + d f(x)^T (y - x) \quad \forall x, y \in \text{dom}(f) \]
Interpretation: the first-order Taylor approximation of \(f\) is a global underestimator.
Suppose \(f\) is twice differentiable: the Hessian \(d^2 f(x)\), \(d^2 f(x)_{ij} = \frac{\partial^2 f(x)}{\partial x_i \partial x_j}\) exists for any \(x \in \text{dom}(f)\).
The 2nd-order condition states that if \(f\) is twice differentiable and has a convex domain
\(f : \mathbb R^n \to \mathbb R\) is convex iff the function \(g : \mathbb R \to \mathbb R\), \[ g(t) = f(x + tv), \quad \text{dom}(g) = \{t : x + tv \in \text{dom}(f)\} \] is a convex function of \(t\) for any \(x \in \text{dom}(f)\) and any \(v \in \mathbb R^n\)
This equivalence lets you check the convexity of \(f\) by checking convexity of a function of one variable.
Positive weighted sum
Composition with an affine function
Pointwise maximum and supremum
Composition (under some extra conditions)
Non-negative: \(\alpha f\) is convex if \(f\) is convex, \(\alpha \ge 0\)
Sum: \(f_1 + f_2\) is convex if \(f_1, f_2\) are convex
Composition with affine function: \(f(Ax + b)\) is convex if \(f\) is convex
Examples:
Log barrier function for linear inequalities: \[ f(x) = - \sum_{i=1}^m \text{log}(b_i - a_i^T x), \quad \text{dom}(f) = \{x : a_i^T x \le b_i, i = 1,\ldots, m\} \]
Norm of an affine function: \(f(x) = \|A x + b \|\)
If \(f_1, \ldots, f_m\) are convex, then \(f(x) = \text{max}\{f_1(x), \ldots, f_n(x)\}\) is convex
Example:
Suppose \(g : \mathbb R^n \to \mathbb R\) and \(h : \mathbb R \to \mathbb R\), and define \(f\) as \[ f(x) = h(g(x)) \]
\(f\) is convex if either:
\(g\) is convex, \(h\) is convex, \(h\) non-decreasing
\(g\) concave, \(h\) convex, \(h\) non-increasing
Proof for differentiable \(g\), \(h\) by checking the second-order conditions
Examples
\(\text{exp}(g(x))\) is convex if \(g\) is convex.
\(1 / g(x)\) is convex if \(g\) is concave and positive.
In a convex optimization problem, we minimize a convex function over a convex set.
Standard form for an optimization problem is:
\[ \begin{align*} \text{minimize} \quad &f_0(x) \\ \text{subject to}\quad &f_i(x) \le 0, \quad i = 1,\ldots, m\\ &a_j^T x = b_j, \quad j = 1,\ldots, n \end{align*} \]
\(x \in \mathbb R^n\) is the optimization variable
\(f_0: \mathbb R^n \to \mathbb R\) is the objective function
\(f_i : \mathbb R^n \to \mathbb R\) are the inequality constraint functions
\(a_j^T x = b_j\) are the equality constraints
Standard least squares problem is convex: \[ \text{minimize} \|y - X \beta\|_2^2 \]
Any “regularization” with a convex function \(P\) will still be convex:
\[ \text{minimize} \|y - X \beta\|_2^2 + P(\beta) \]
Let \(S\) denote the sample covariance and \(\Theta\) be the inverse covariance matrix.
Up to constant factors, the log-likelihood of the data given a Gaussian distribution is
\[ \log \det \Theta - \text{tr}(S \Theta) \]
Covariance estimation would be by maximizing the log likelihood or minimizing the negative log likelihood.
\(\log \det\) is concave, \(\text{tr}(S \Theta)\) is linear
Restriction of \(\Theta\) to be positive definite is a restriction to a convex set
Convex problem; remains convex if we add convex penalties to \(\Theta\)
If you can show an optimization problem is convex, you’re very likely able to solve it efficiently
Many statistical estimation problems are naturally convex
You have a couple of options for checking convexity:
Check the definition
Check first-order conditions (not usually as useful)
Check second-order conditions (good option if the function is differentiable)
Check restriction to a line
Check whether the function can be re-expressed as a combination of convex functions and convexity-preserving operations