It's perhaps worth reading about Lagrangian duality and a broader relation (at times equivalence) between:
- optimization subject to hard (i.e. inviolable) constraints
- optimization with penalties for violating constraints.
Quick intro to weak duality and strong duality
Assume we have some function f(x,y) of two variables. For any x^ and y^, we have:
minxf(x,y^)≤f(x^,y^)≤maxyf(x^,y)
Since that holds for any x^ and y^ it also holds that:
maxyminxf(x,y)≤minxmaxyf(x,y)
This is known as weak duality. In certain circumstances, you have also have strong duality (also known as the saddle point property):
maxyminxf(x,y)=minxmaxyf(x,y)
When strong duality holds, solving the dual problem also solves the primal problem. They're in a sense the same problem!
Lagrangian for constrained Ridge Regression
Let me define the function L as:
L(b,λ)=∑i=1n(y−xi⋅b)2+λ(∑j=1pb2j−t)
The min-max interpretation of the Lagrangian
The Ridge regression problem subject to hard constraints is:
minbmaxλ≥0L(b,λ)
You pick b to minimize the objective, cognizant that after b is picked, your opponent will set λ to infinity if you chose b such that ∑pj=1b2j>t.
If strong duality holds (which it does here because Slater's condition is satisfied for t>0), you then achieve the same result by reversing the order:
maxλ≥0minbL(b,λ)
Here, your opponent chooses λ first! You then choose b to minimize the objective, already knowing their choice of λ. The minbL(b,λ) part (taken λ as given) is equivalent to the 2nd form of your Ridge Regression problem.
As you can see, this isn't a result particular to Ridge regression. It is a broader concept.
References
(I started this post following an exposition I read from Rockafellar.)
Rockafellar, R.T., Convex Analysis
You might also examine lectures 7 and lecture 8 from Prof. Stephen Boyd's course on convex optimization.