Homework:

In mathematical optimization, the method of Lagrange multipliers (named after Joseph Louis Lagrange) is a strategy for finding the local maxima and minima of a function subject to equality constraints.

For instance, consider the optimization probelm

maximize f(x,y)

subject to g(x,y)=c.

Lagrange multipliers relies on the intuition that at a maximum *f*(*x*, *y*) cannot be increasing in the direction of any neighboring point where *g* = *c*. If it were, we could walk along *g* = *c* to get higher, meaning that the starting point wasn’t actually the maximum. Suppose we find points where f does not change as we walk, there maybe two situation appear, one is we are walking along it’s contour lines, another is we have reach a ”level’ part of f. For checking the first possibility, we can see that the gradient of f and g are parallel. Thus we want points (x,y) where g(x,y)=c and ∇x,y f = -λ ∇x,y g. for some λ, also it’s suitable for another situation. when λ=0, it means that f is “level”.

To incorporate these conditions into one equations, we introduce an auxiliary function

∧(x,y,λ)=f(x,y)+λ(g(x,y)-c)

and solve

∇x,y,λ ∧(x,y,λ)=0. note that ∇λ ∧(x,y,λ)=0 implies g(x,y)=c.

In the popular tf-idf scheme, it reduces documents of arbitrary length to fix-length lists of numbers.

tf: term frequency, the word’s frequency in a document.

idf: inverse document frequency, using the inverse of the document frequency, such as a word appears in N documents out of the total M documents, then it’s idf=M/N.

Empirical Bayes, also known as maximum marginal likelihood, represents one approach for setting hyperparameters.

Here is a good tutorial for ML, MAP, and Bayesian provided by Avi Kak Trinity