Apply doubly stochastic matrix M to a probability vector, then entropy increases?












2














Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.



Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.



$$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$



Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).



Question 3 What are generalizations,
in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?





Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.










share|cite|improve this question



























    2














    Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
    and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.



    Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.



    $$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$



    Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).



    Question 3 What are generalizations,
    in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?





    Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.










    share|cite|improve this question

























      2












      2








      2







      Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
      and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.



      Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.



      $$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$



      Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).



      Question 3 What are generalizations,
      in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?





      Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.










      share|cite|improve this question













      Consider a vector $p =(p_1,...p_n)$, $p_i>0$, $sum p_i = 1$
      and a matrix $M_{ij}$, which is doubly stochastic: $sum_i M_{ij} = 1, sum_j M_{ij} = 1, M_{ij} > 0$.



      Question 1 Just apply matrix M to a vector $p$ , i.e. $q = Mp$ (i.e. $q_i = sum_j M_{ij} p_j$) is it true that entropy of new vector $q$ is greater then of original vector $p$ ? I.e.



      $$ H(q) = -sum_i q_i ln(q_i) > -sum_i p_i ln(p_i) = H(p) $$



      Question 2 Is there a simple proof of it ? (It might follow from the Gibb's inequality, but it does not seem obvious for me).



      Question 3 What are generalizations,
      in view of meta-principle "entropy always grows" that might be an example of some more general phenomena ?





      Motivation: There is a meta-principle that entropy grows is some natural systems, matrix applied to a vector is probably the most simple system one can consider (Markov chain), so the question above arises. After some thinking one can restrict from all matrices to doubly stochastic, because $M^n v$ tends to a uniform distribution (which has maximal entropy of all) only for doubly stochastic $M$ , so it cannot be true for general $M$.







      co.combinatorics pr.probability it.information-theory entropy stochastic-matrices






      share|cite|improve this question













      share|cite|improve this question











      share|cite|improve this question




      share|cite|improve this question










      asked Dec 1 at 17:44









      Alexander Chervov

      11.1k1260139




      11.1k1260139






















          4 Answers
          4






          active

          oldest

          votes


















          4














          This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.



          Now, use the fact that $f=-H$ is convex.



          For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).



          Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
          $$frac1{1-alpha}logsum_jp_j^alpha.$$
          The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.






          share|cite|improve this answer























          • Thank you. That seems to cover Renyi entropy also, is not it?
            – Alexander Chervov
            Dec 1 at 19:32










          • @AlexanderChervov. Yes, see my edit.
            – Denis Serre
            Dec 2 at 7:23










          • Thank you ! I should buy your book )))
            – Alexander Chervov
            Dec 11 at 19:04










          • @AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
            – Denis Serre
            Dec 11 at 19:27










          • Great collection of exercises ! I sent you a mail may be you add some from that...
            – Alexander Chervov
            Dec 11 at 20:04



















          1














          This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)



          Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.






          share|cite|improve this answer





























            1














            This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
            $$
            D(alpha|beta) ge D(alpha P|beta P) ;.
            $$

            In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
            $$
            D(alpha | m_X) = log text{card} X - H(alpha) ;.
            $$






            share|cite|improve this answer





























              1














              The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.



              The proof (Question 2) is quite simple:



              Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
              begin{align}
              mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
              end{align}

              Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.



              Now,
              begin{align}
              H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
              H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
              end{align}

              which implies
              begin{align}
              H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
              &= H(mathbf{x}) + I(mathbf{y};pmb{pi})
              end{align}

              where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.



              For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.






              share|cite|improve this answer





















                Your Answer





                StackExchange.ifUsing("editor", function () {
                return StackExchange.using("mathjaxEditing", function () {
                StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
                StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
                });
                });
                }, "mathjax-editing");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "504"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                noCode: true, onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathoverflow.net%2fquestions%2f316661%2fapply-doubly-stochastic-matrix-m-to-a-probability-vector-then-entropy-increases%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                4 Answers
                4






                active

                oldest

                votes








                4 Answers
                4






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                4














                This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.



                Now, use the fact that $f=-H$ is convex.



                For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).



                Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
                $$frac1{1-alpha}logsum_jp_j^alpha.$$
                The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.






                share|cite|improve this answer























                • Thank you. That seems to cover Renyi entropy also, is not it?
                  – Alexander Chervov
                  Dec 1 at 19:32










                • @AlexanderChervov. Yes, see my edit.
                  – Denis Serre
                  Dec 2 at 7:23










                • Thank you ! I should buy your book )))
                  – Alexander Chervov
                  Dec 11 at 19:04










                • @AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
                  – Denis Serre
                  Dec 11 at 19:27










                • Great collection of exercises ! I sent you a mail may be you add some from that...
                  – Alexander Chervov
                  Dec 11 at 20:04
















                4














                This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.



                Now, use the fact that $f=-H$ is convex.



                For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).



                Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
                $$frac1{1-alpha}logsum_jp_j^alpha.$$
                The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.






                share|cite|improve this answer























                • Thank you. That seems to cover Renyi entropy also, is not it?
                  – Alexander Chervov
                  Dec 1 at 19:32










                • @AlexanderChervov. Yes, see my edit.
                  – Denis Serre
                  Dec 2 at 7:23










                • Thank you ! I should buy your book )))
                  – Alexander Chervov
                  Dec 11 at 19:04










                • @AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
                  – Denis Serre
                  Dec 11 at 19:27










                • Great collection of exercises ! I sent you a mail may be you add some from that...
                  – Alexander Chervov
                  Dec 11 at 20:04














                4












                4








                4






                This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.



                Now, use the fact that $f=-H$ is convex.



                For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).



                Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
                $$frac1{1-alpha}logsum_jp_j^alpha.$$
                The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.






                share|cite|improve this answer














                This is a particular case of the following general principle. On the one hand, a bistochastic matrix $M$ has the property that for every non-negative vector (say, a probability vector), $Mpsucc p$. On another hand, the order $succ$ can be defined by $xsucc y$ iff $sum_if(x_i)lesum_if(y_i)$ for every convex function $f$.



                Now, use the fact that $f=-H$ is convex.



                For proofs, see my book Matrices (Springer-Verlag GTM #216, second edition) at sections 6.5 (Proposition 6.4) and 8.5 (Theorem 8.5).



                Edit (at Alexander's request.) Let $alphane1$ be a positive parameter. The Renyi entropy is
                $$frac1{1-alpha}logsum_jp_j^alpha.$$
                The map $pmapsto Mp$ does increase Renyi entropy. Proof: if $alpha>1$, the map $tmapsto t^alpha$ is convex, and $smapstofrac1{1-alpha}log s$ is increasing. If instead $alpha<1$, the map $tmapsto t^alpha$ is concave, and $smapstofrac1{1-alpha}log s$ is decreasing.







                share|cite|improve this answer














                share|cite|improve this answer



                share|cite|improve this answer








                edited Dec 2 at 7:29

























                answered Dec 1 at 19:10









                Denis Serre

                29.1k791195




                29.1k791195












                • Thank you. That seems to cover Renyi entropy also, is not it?
                  – Alexander Chervov
                  Dec 1 at 19:32










                • @AlexanderChervov. Yes, see my edit.
                  – Denis Serre
                  Dec 2 at 7:23










                • Thank you ! I should buy your book )))
                  – Alexander Chervov
                  Dec 11 at 19:04










                • @AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
                  – Denis Serre
                  Dec 11 at 19:27










                • Great collection of exercises ! I sent you a mail may be you add some from that...
                  – Alexander Chervov
                  Dec 11 at 20:04


















                • Thank you. That seems to cover Renyi entropy also, is not it?
                  – Alexander Chervov
                  Dec 1 at 19:32










                • @AlexanderChervov. Yes, see my edit.
                  – Denis Serre
                  Dec 2 at 7:23










                • Thank you ! I should buy your book )))
                  – Alexander Chervov
                  Dec 11 at 19:04










                • @AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
                  – Denis Serre
                  Dec 11 at 19:27










                • Great collection of exercises ! I sent you a mail may be you add some from that...
                  – Alexander Chervov
                  Dec 11 at 20:04
















                Thank you. That seems to cover Renyi entropy also, is not it?
                – Alexander Chervov
                Dec 1 at 19:32




                Thank you. That seems to cover Renyi entropy also, is not it?
                – Alexander Chervov
                Dec 1 at 19:32












                @AlexanderChervov. Yes, see my edit.
                – Denis Serre
                Dec 2 at 7:23




                @AlexanderChervov. Yes, see my edit.
                – Denis Serre
                Dec 2 at 7:23












                Thank you ! I should buy your book )))
                – Alexander Chervov
                Dec 11 at 19:04




                Thank you ! I should buy your book )))
                – Alexander Chervov
                Dec 11 at 19:04












                @AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
                – Denis Serre
                Dec 11 at 19:27




                @AlexanderChervov. It is concise, and stands at the graduate level at every chapter but the two first. Visit also my web page perso.ens-lyon.fr/serre/DPF/exobis.pdf with 469 exercises (up today) which display additional material.
                – Denis Serre
                Dec 11 at 19:27












                Great collection of exercises ! I sent you a mail may be you add some from that...
                – Alexander Chervov
                Dec 11 at 20:04




                Great collection of exercises ! I sent you a mail may be you add some from that...
                – Alexander Chervov
                Dec 11 at 20:04











                1














                This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)



                Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.






                share|cite|improve this answer


























                  1














                  This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)



                  Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.






                  share|cite|improve this answer
























                    1












                    1








                    1






                    This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)



                    Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.






                    share|cite|improve this answer












                    This is true, except that we have equality when $p$ is the uniform distribution, which is the limit distribution of your matrix. (Proof in here, 15.5, for example.)



                    Whenever $p_igeq 1/n$, it is not so hard to show that $q_ileq p_i$, and when $p_ileq 1/n$, we have $q_igeq p_i$. From there, it follows that $-sum_i p_i ln (q_i) leq -sum_i q_i ln (q_i) $, and this gives the result combined with Gibbs'.







                    share|cite|improve this answer












                    share|cite|improve this answer



                    share|cite|improve this answer










                    answered Dec 1 at 18:39









                    Puck Rombach

                    418212




                    418212























                        1














                        This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
                        $$
                        D(alpha|beta) ge D(alpha P|beta P) ;.
                        $$

                        In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
                        $$
                        D(alpha | m_X) = log text{card} X - H(alpha) ;.
                        $$






                        share|cite|improve this answer


























                          1














                          This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
                          $$
                          D(alpha|beta) ge D(alpha P|beta P) ;.
                          $$

                          In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
                          $$
                          D(alpha | m_X) = log text{card} X - H(alpha) ;.
                          $$






                          share|cite|improve this answer
























                            1












                            1








                            1






                            This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
                            $$
                            D(alpha|beta) ge D(alpha P|beta P) ;.
                            $$

                            In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
                            $$
                            D(alpha | m_X) = log text{card} X - H(alpha) ;.
                            $$






                            share|cite|improve this answer












                            This is a particular case of a well-known monotonicity property of the Kullback-Leibler divergence $D(cdot|cdot)$. Namely, if $alpha$ and $beta$ are any two distributions on the same (say, finite) space $X$, and $P$ is a Markov operator on $X$, then
                            $$
                            D(alpha|beta) ge D(alpha P|beta P) ;.
                            $$

                            In your case double stochasticity of $P$ means that its stationary distribution is the uniform distribution $m_X$ on $X$, whereas
                            $$
                            D(alpha | m_X) = log text{card} X - H(alpha) ;.
                            $$







                            share|cite|improve this answer












                            share|cite|improve this answer



                            share|cite|improve this answer










                            answered Dec 1 at 18:55









                            R W

                            10.1k22047




                            10.1k22047























                                1














                                The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.



                                The proof (Question 2) is quite simple:



                                Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
                                begin{align}
                                mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
                                end{align}

                                Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.



                                Now,
                                begin{align}
                                H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
                                H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
                                end{align}

                                which implies
                                begin{align}
                                H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
                                &= H(mathbf{x}) + I(mathbf{y};pmb{pi})
                                end{align}

                                where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.



                                For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.






                                share|cite|improve this answer


























                                  1














                                  The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.



                                  The proof (Question 2) is quite simple:



                                  Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
                                  begin{align}
                                  mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
                                  end{align}

                                  Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.



                                  Now,
                                  begin{align}
                                  H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
                                  H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
                                  end{align}

                                  which implies
                                  begin{align}
                                  H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
                                  &= H(mathbf{x}) + I(mathbf{y};pmb{pi})
                                  end{align}

                                  where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.



                                  For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.






                                  share|cite|improve this answer
























                                    1












                                    1








                                    1






                                    The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.



                                    The proof (Question 2) is quite simple:



                                    Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
                                    begin{align}
                                    mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
                                    end{align}

                                    Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.



                                    Now,
                                    begin{align}
                                    H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
                                    H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
                                    end{align}

                                    which implies
                                    begin{align}
                                    H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
                                    &= H(mathbf{x}) + I(mathbf{y};pmb{pi})
                                    end{align}

                                    where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.



                                    For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.






                                    share|cite|improve this answer












                                    The answer to Question 1 is yes as long as $p$ is not the uniform distribution $(frac{1}{n},frac{1}{n},ldots,frac{1}{n})$.



                                    The proof (Question 2) is quite simple:



                                    Recall from the Birkhoff–von Neumann theorem that every doubly stochastic matrix $M$ is a convex combination of permutation matrices. We can interpret this as saying that there is a distribution $theta$ on the set of all permutations of $A:={1,2,ldots,n}$ such that whenever $mathbf{x}$ is a random variable from $A$ and $pmb{pi}$ is a random permutation of $A$ with distribution $theta$ independent of $mathbf{x}$, and we set $mathbf{y}:=pmb{pi}(mathbf{x})$, then we have
                                    begin{align}
                                    mathbb{P}(mathbf{y}=i,|,mathbf{x}=j) &= M_{i,j} ;.
                                    end{align}

                                    Note that if $mathbf{x}$ is distributed as $p$, then $mathbf{y}$ is distributed as $q:=Mp$, so $H(mathbf{x})=H(p)$ and $H(mathbf{y})=H(q)$.



                                    Now,
                                    begin{align}
                                    H(mathbf{y},pmb{pi}) &= H(mathbf{y}) + H(pmb{pi},|,mathbf{y}) ;, \
                                    H(mathbf{y},pmb{pi}) &= H(pmb{pi}) + underbrace{H(mathbf{y},|,pmb{pi})}_{H(mathbf{x})} ;,
                                    end{align}

                                    which implies
                                    begin{align}
                                    H(mathbf{y}) &= H(mathbf{x}) + H(pmb{pi}) - H(pmb{pi},|,mathbf{y}) \
                                    &= H(mathbf{x}) + I(mathbf{y};pmb{pi})
                                    end{align}

                                    where $I(mathbf{y};pmb{pi})$ is the mutual information between $mathbf{y}$ and $pmb{pi}$. We know that $I(mathbf{y};pmb{pi})geq 0$ with equality if and only if $mathbf{y}$ and $pmb{pi}$ are independent, which is the case if and only if $mathbf{x}$ has the uniform distribution. Q.E.D.



                                    For Question 3, suppose that $M$ is merely a positive stochastic matrix (not doubly stochastic). Let $r$ denote the unique stationary distribution of $M$. Then we have a similar ``entropy increase'' principle if we replace entropy with (minus) the Kullback-Leibler divergence relative to $r$. There is a nice discussion on this in the book of Cover and Thomas.







                                    share|cite|improve this answer












                                    share|cite|improve this answer



                                    share|cite|improve this answer










                                    answered Dec 1 at 19:25









                                    Algernon

                                    9591712




                                    9591712






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to MathOverflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        Use MathJax to format equations. MathJax reference.


                                        To learn more, see our tips on writing great answers.





                                        Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                        Please pay close attention to the following guidance:


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathoverflow.net%2fquestions%2f316661%2fapply-doubly-stochastic-matrix-m-to-a-probability-vector-then-entropy-increases%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        QoS: MAC-Priority for clients behind a repeater

                                        Ивакино (Тотемский район)

                                        Can't locate Autom4te/ChannelDefs.pm in @INC (when it definitely is there)