Apply doubly stochastic matrix M to a probability vector, then entropy increases?
Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.
Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.
$$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$
Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).
Question 3 What are generalizations,
in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?
Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.
co.combinatorics pr.probability it.information-theory entropy stochastic-matrices
add a comment |
Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.
Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.
$$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$
Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).
Question 3 What are generalizations,
in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?
Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.
co.combinatorics pr.probability it.information-theory entropy stochastic-matrices
add a comment |
Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.
Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.
$$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$
Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).
Question 3 What are generalizations,
in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?
Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.
co.combinatorics pr.probability it.information-theory entropy stochastic-matrices
Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.
Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.
$$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$
Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).
Question 3 What are generalizations,
in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?
Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.
co.combinatorics pr.probability it.information-theory entropy stochastic-matrices
co.combinatorics pr.probability it.information-theory entropy stochastic-matrices
asked Dec 1 at 17:44
Alexander Chervov
11.1k1260139
11.1k1260139
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.
Now, use the fact that $f=-H$ is convex.
For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).
Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
$$frac1{1-alpha}logsum_jp_j^alpha.$$
The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.
Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32
@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23
Thank you ! I should buy your book )))
– Alexander Chervov
Dec 11 at 19:04
@AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
– Denis Serre
Dec 11 at 19:27
Great collection of exercises ! I sent you a mail may be you add some from that...
– Alexander Chervov
Dec 11 at 20:04
add a comment |
This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)
Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.
add a comment |
This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
$$
D(alpha|beta) ge D(alpha P|beta P) ;.
$$
In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
$$
D(alpha | m_X) = log text{card} X - H(alpha) ;.
$$
add a comment |
The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.
The proof (Question 2) is quite simple:
Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
begin{align}
mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
end{align}
Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.
Now,
begin{align}
H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
end{align}
which implies
begin{align}
H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
&= H(mathbf{x}) + I(mathbf{y};pmb{pi})
end{align}
where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.
For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "504"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathoverflow.net%2fquestions%2f316661%2fapply-doubly-stochastic-matrix-m-to-a-probability-vector-then-entropy-increases%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.
Now, use the fact that $f=-H$ is convex.
For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).
Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
$$frac1{1-alpha}logsum_jp_j^alpha.$$
The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.
Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32
@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23
Thank you ! I should buy your book )))
– Alexander Chervov
Dec 11 at 19:04
@AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
– Denis Serre
Dec 11 at 19:27
Great collection of exercises ! I sent you a mail may be you add some from that...
– Alexander Chervov
Dec 11 at 20:04
add a comment |
This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.
Now, use the fact that $f=-H$ is convex.
For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).
Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
$$frac1{1-alpha}logsum_jp_j^alpha.$$
The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.
Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32
@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23
Thank you ! I should buy your book )))
– Alexander Chervov
Dec 11 at 19:04
@AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
– Denis Serre
Dec 11 at 19:27
Great collection of exercises ! I sent you a mail may be you add some from that...
– Alexander Chervov
Dec 11 at 20:04
add a comment |
This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.
Now, use the fact that $f=-H$ is convex.
For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).
Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
$$frac1{1-alpha}logsum_jp_j^alpha.$$
The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.
This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.
Now, use the fact that $f=-H$ is convex.
For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).
Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
$$frac1{1-alpha}logsum_jp_j^alpha.$$
The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.
edited Dec 2 at 7:29
answered Dec 1 at 19:10
Denis Serre
29.1k791195
29.1k791195
Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32
@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23
Thank you ! I should buy your book )))
– Alexander Chervov
Dec 11 at 19:04
@AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
– Denis Serre
Dec 11 at 19:27
Great collection of exercises ! I sent you a mail may be you add some from that...
– Alexander Chervov
Dec 11 at 20:04
add a comment |
Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32
@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23
Thank you ! I should buy your book )))
– Alexander Chervov
Dec 11 at 19:04
@AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
– Denis Serre
Dec 11 at 19:27
Great collection of exercises ! I sent you a mail may be you add some from that...
– Alexander Chervov
Dec 11 at 20:04
Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32
Thank you. That seems to cover Renyi entropy also, is not it?
– Alexander Chervov
Dec 1 at 19:32
@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23
@AlexanderChervov. Yes, see my edit.
– Denis Serre
Dec 2 at 7:23
Thank you ! I should buy your book )))
– Alexander Chervov
Dec 11 at 19:04
Thank you ! I should buy your book )))
– Alexander Chervov
Dec 11 at 19:04
@AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
– Denis Serre
Dec 11 at 19:27
@AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
– Denis Serre
Dec 11 at 19:27
Great collection of exercises ! I sent you a mail may be you add some from that...
– Alexander Chervov
Dec 11 at 20:04
Great collection of exercises ! I sent you a mail may be you add some from that...
– Alexander Chervov
Dec 11 at 20:04
add a comment |
This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)
Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.
add a comment |
This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)
Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.
add a comment |
This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)
Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.
This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)
Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.
answered Dec 1 at 18:39
Puck Rombach
418212
418212
add a comment |
add a comment |
This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
$$
D(alpha|beta) ge D(alpha P|beta P) ;.
$$
In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
$$
D(alpha | m_X) = log text{card} X - H(alpha) ;.
$$
add a comment |
This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
$$
D(alpha|beta) ge D(alpha P|beta P) ;.
$$
In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
$$
D(alpha | m_X) = log text{card} X - H(alpha) ;.
$$
add a comment |
This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
$$
D(alpha|beta) ge D(alpha P|beta P) ;.
$$
In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
$$
D(alpha | m_X) = log text{card} X - H(alpha) ;.
$$
This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
$$
D(alpha|beta) ge D(alpha P|beta P) ;.
$$
In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
$$
D(alpha | m_X) = log text{card} X - H(alpha) ;.
$$
answered Dec 1 at 18:55
R W
10.1k22047
10.1k22047
add a comment |
add a comment |
The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.
The proof (Question 2) is quite simple:
Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
begin{align}
mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
end{align}
Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.
Now,
begin{align}
H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
end{align}
which implies
begin{align}
H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
&= H(mathbf{x}) + I(mathbf{y};pmb{pi})
end{align}
where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.
For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.
add a comment |
The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.
The proof (Question 2) is quite simple:
Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
begin{align}
mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
end{align}
Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.
Now,
begin{align}
H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
end{align}
which implies
begin{align}
H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
&= H(mathbf{x}) + I(mathbf{y};pmb{pi})
end{align}
where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.
For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.
add a comment |
The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.
The proof (Question 2) is quite simple:
Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
begin{align}
mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
end{align}
Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.
Now,
begin{align}
H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
end{align}
which implies
begin{align}
H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
&= H(mathbf{x}) + I(mathbf{y};pmb{pi})
end{align}
where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.
For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.
The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.
The proof (Question 2) is quite simple:
Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
begin{align}
mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
end{align}
Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.
Now,
begin{align}
H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
end{align}
which implies
begin{align}
H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
&= H(mathbf{x}) + I(mathbf{y};pmb{pi})
end{align}
where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.
For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.
answered Dec 1 at 19:25
Algernon
9591712
9591712
add a comment |
add a comment |
Thanks for contributing an answer to MathOverflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathoverflow.net%2fquestions%2f316661%2fapply-doubly-stochastic-matrix-m-to-a-probability-vector-then-entropy-increases%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown