Reference:
[1] E.T. Jaynes, "Information theory and statistical mechanics", Physical Review,
106, 620-630, 1957.
[2] S.C. Zhu, Y.N. Wu, and D.B.Mumford, "Minimax Entropy Principle and Its Applications to
Texture Modeling", Neural Computation, 9, pp1627-1660, nov. 1997.
[3] S.C. Zhu, Y.N. Wu, and D.B.Mumford, "FRAME: Filters, Random fields And Maximum Entropy
--- towards a unified theory for texture modeling", Int'l Journal of Computer Vision, 27(2),
pp1-20, 1998.
Minimax Entropy : a Mathematical Theory for Descriptive Learning
![]() |
![]() |
The red cloud represents a joint density P in a high dimensional space. A set of axes (blue) are chosen to observe the distribution like viewing a room through an aperture. |
The observed by an axis is a marginal distribution (projections of density P onto the axis) which is estimated conveniently by empirical histogram. |

The minimax entropy learning theory studied in (Zhu, Wu, and Mumford,
1997)[2,3] provide a mathematically rigorious scheme for learning
high dimensional density. The general idea is illustrated in the figure
below. For a density in K-dimensional space (see red clouds and K=3
in the left figure). One can measure as observation the marginal distributions
through various axes, which are projection of the density to those
axes (see the right figure). So one constructs a model that can re-produce
all observed statistics (marginal distributions). Among all densities
satisfying the constraints, we choose the one with maximum entropy[3].
This is posed as a constrained optimization problem, and yields the
Gibbs (Markov random field) model[1,2]. The the axis (observations)
must be chosen so that they are informative, in the sense that the
constructed model approximate the underlying density by minimizing
a Kullback-Leibler divergence (or crosss entropy). This leads to the
general minimax entropy principle expressed below:
1. We choose best features and statistics F to minimize the entropy
of the model.
2. We choose the best parameters beta (model) that has maximum entropy.
Minimax entropy provides a unifying learning scheme for learning homogeneous and inhomogeneous Gibbs models, as well as the verification of such models. The bottom Figure illustrates this scheme.
![]() |
|
The minimax entropy
learning scheme [2,3]: Given
a set of training images as instances of a pattern (texture, shape)
generated by some underlying stochastic processes, we pursue a Gibbs
model by choosing a set of information features(minimizing the entropy),
then a maximum entropy distribution is learned, and verified through
a general MCMC sampling process. The process ends when the MCMC samples
appears non-distinguishable from the observed ones.,
|