In mathematical optimization, the method of Lagrange multipliers (named after Joseph Louis Lagrange) is a strategy for finding the local maxima and minima of a function subject to equality constraints.
For instance, consider the optimization probelm
subject to g(x,y)=c.
Lagrange multipliers relies on the intuition that at a maximum f(x, y) cannot be increasing in the direction of any neighboring point where g = c. If it were, we could walk along g = c to get higher, meaning that the starting point wasn’t actually the maximum. Suppose we find points where f does not change as we walk, there maybe two situation appear, one is we are walking along it’s contour lines, another is we have reach a ”level’ part of f. For checking the first possibility, we can see that the gradient of f and g are parallel. Thus we want points (x,y) where g(x,y)=c and ∇x,y f = -λ ∇x,y g. for some λ, also it’s suitable for another situation. when λ=0, it means that f is “level”.
To incorporate these conditions into one equations, we introduce an auxiliary function
∇x,y,λ ∧(x,y,λ)=0. note that ∇λ ∧(x,y,λ)=0 implies g(x,y)=c.
In the popular tf-idf scheme, it reduces documents of arbitrary length to fix-length lists of numbers.
tf: term frequency, the word’s frequency in a document.
idf: inverse document frequency, using the inverse of the document frequency, such as a word appears in N documents out of the total M documents, then it’s idf=M/N.
Empirical Bayes, also known as maximum marginal likelihood, represents one approach for setting hyperparameters.
Here is a good tutorial for ML, MAP, and Bayesian provided by Avi Kak Trinity