21.4.5.1. Augmented Lagrange Multiplier Method
ALM method may be called as Method of Multiplier (MOM) or Primal-Dual Method. Let’s consider Lagrangian functional only for equality constraints.
\(L\left( \mathbf{x} \right)=f\left( \mathbf{x} \right)+{{\mathbf{\lambda }}^{T}}\mathbf{h}\left( \mathbf{x} \right)\)
Now, for a Lagrange multiplier vector \({{\mathbf{\lambda }}^{*}}\), suppose that there is an optimum \({{\mathbf{x}}^{*}}\) for the following unconstrained optimization problem.
\(\underset{\mathbf{x}}{\mathop{\min }}\,L\left( \mathbf{x},{{\mathbf{\lambda }}^{*}} \right)\)
If \({{\mathbf{x}}^{*}}\) satisfy all the equality constraints \(\left( \mathbf{h}\left( {{\mathbf{x}}^{*}} \right)=\mathbf{0} \right)\) in the original design problem, \({{\mathbf{x}}^{*}}\) is an optimum for the original optimization problem and \({{\mathbf{\lambda }}^{*}}\) is a Lagrange multiplier optimum. Consequently, the original optimization problem can be transformed into the following problem that have the same optimum \({{\mathbf{x}}^{*}}\) and \({{\mathbf{\lambda }}^{*}}\).
\(\underset{\mathbf{x}}{\mathop{\min }}\,L\left( \mathbf{x},\mathbf{\lambda } \right)\)
subject to \({{h}_{i}}\left( \mathbf{x} \right)=0,\text{ }i=1,2,...,l.\)
In order to avoid the unboundness of Lagrangian, a penalty function is introduced. We call it as augmented Lagrangian.
\(A\left( \mathbf{x},\mathbf{\lambda },\mathbf{r} \right)=L\left( \mathbf{x},\mathbf{\lambda } \right)+\frac{1}{2}\sum\limits_{i=1}^{l}{{{r}_{i}}{{h}_{i}}{{\left( \mathbf{x} \right)}^{2}}}\)
where, \({{r}_{i}}\) is the penalty parameter for the ith equality constraint. In the ALM method, the unconstrained optimization tool sequentially minimize the augmented Lagrangian for the given value of \({{r}_{i}}\) and \({{\lambda }_{i}}\). Then, these two parameters are modified to satisfy the optimality condition.
The update rule for Lagrange multipliers can be determined from the following relation.
\(\underset{\mathbf{x}\to {{\mathbf{x}}^{*}}}{\mathop{\lim }}\,\nabla A\left( \mathbf{x},\mathbf{\lambda },\mathbf{r} \right)=\nabla L\left( {{\mathbf{x}}^{*}},{{\mathbf{\lambda }}^{*}} \right)\)
This implies
\(\underset{\mathbf{x}\to {{\mathbf{x}}^{*}}}{\mathop{\lim }}\,\left( {{\lambda }_{i}}+{{r}_{i}}{{h}_{i}}\left( \mathbf{x} \right) \right)=\lambda _{i}^{*},\text{ }i=1,2,...,l.\)
Hence, the update rule for Lagrange multipliers is
\(\lambda _{i}^{k+1}=\lambda _{i}^{k}+r_{i}^{k}{{h}_{i}}\left( {{\mathbf{x}}^{k}} \right),\text{ }i=1,2,...,l\)
where, the superscript \(k\) is the iteration of ALM algorithm.
Inequality constraints are transformed into equality constraints by adding slack variables \(\left( {{\theta }_{j}}>0 \right)\). Thus, the augmented Lagrangian becomes
\(A\left( \mathbf{x},\mathbf{\theta },\mathbf{\mu },\mathbf{r} \right)=f\left( \mathbf{x} \right)+\sum\limits_{j=1}^{m}{\left[ {{\mu }_{j}}\left( {{g}_{j}}\left( \mathbf{x} \right)+{{\theta }_{j}} \right)+\frac{1}{2}{{r}_{j}}{{\left( {{g}_{j}}\left( \mathbf{x} \right)+{{\theta }_{j}} \right)}^{2}} \right]}\).
Then, a new primal variables are \(\mathbf{\bar{x}}=\left\{ \mathbf{x},\mathbf{\theta } \right\}\). The augmented Lagrangian should satisfy the optimality condition for slack variable \({{\theta }_{j}}\).
\(\nabla_{\theta}, \mathbf{A} \left( \mathbf{x},\mathbf{\theta },\mathbf{\mu },\mathbf{r} \right) = \sum\limits_{j=1}^{m} { \left( {\mu}_j+{r}_j \left( {g}_j \left( x \right) + {\theta}_j \right) \right) }= 0\)
Hence, an optimum of slack variable \({{\theta }_{j}}\) is
\(\theta _{j}^{*}=\max \left\{ 0,-\frac{{{\mu }_{j}}}{{{r}_{j}}}-{{g}_{j}}\left( \mathbf{x} \right) \right\}\).
Now, these optimum values are substituted into the originally transformed form.
\({{g}_{j}}(x)+\theta _{j}^{*}={{g}_{j}}(x)+\max \left\{ 0,-\frac{{{\mu }_{j}}}{{{r}_{j}}}-{{g}_{j}}\left( \mathbf{x} \right) \right\}=\max \left\{ {{g}_{j}}(x),-\frac{{{\mu }_{j}}}{{{r}_{j}}} \right\}\)
Hence, the augmented Lagrangian for inequality constraints are transformed into the following simple functional.
\(A\left( \mathbf{x},\mathbf{\mu },\mathbf{r} \right)=f\left( \mathbf{x} \right)+\sum\limits_{j=1}^{m}{\left[ {{\mu }_{j}}{{\Psi }_{j}}\left( \mathbf{x},{{\mu }_{j}},{{r}_{j}} \right)+\frac{1}{2}{{r}_{j}}{{\Psi }_{j}}{{\left( \mathbf{x},{{\mu }_{j}},{{r}_{j}} \right)}^{2}} \right]}\)
where, \({{\Psi }_{j}}\left( \mathbf{x},{{\mu }_{j}},{{r}_{j}} \right)=\max \left\{ {{g}_{j}}\left( \mathbf{x} \right),-\frac{{{\mu }_{j}}}{{{r}_{j}}} \right\}\). Also, the Lagrange multiplier update rule is defined as
\(\mu _{j}^{k+1}=\mu _{j}^{k}+{{r}_{j}}{{\Psi }_{j}}\left( {{\mathbf{x}}^{k}},\mu _{j}^{k},r_{j}^{k} \right),\text{ }j=1,2,...,l\).
In order to sequentially solve the augmented Lagrangian, AutoDesign uses quasi-Newton (BFGS) for \(n\le 50\) and conjugate gradient method (Hestnes-Stiefel) for \(n>50\).