Access the full text.
Sign up today, get DeepDyve free for 14 days.
Bull. Math. Sci. https://doi.org/10.1007/s13373-018-0123-3 Some trace inequalities for exponential and logarithmic functions 1 2 Eric A. Carlen · Elliott H. Lieb Received: 8 October 2017 / Revised: 17 April 2018 / Accepted: 24 April 2018 © The Author(s) 2018 Abstract Consider a function F ( X, Y ) of pairs of positive matrices with values in the p q positive matrices such that whenever X and Y commute F ( X, Y ) = X Y . Our ﬁrst main result gives conditions on F such that Tr[ X log( F ( Z , Y ))]≤ Tr[ X ( p log X + q log Y )] for all X, Y, Z such that Tr Z = Tr X. (Note that Z is absent from the right side of the inequality.) We give several examples of functions F to which the theorem applies. Our theorem allows us to give simple proofs of the well known logarithmic inequalities of Hiai and Petz and several new generalizations of them which involve three variables X, Y, Z instead of just X, Y alone. The investigation of these logarith- mic inequalities is closely connected with three quantum relative entropy functionals: The standard Umegaki quantum relative entropy D( X ||Y ) = Tr[ X (log X − log Y ]), and two others, the Donald relative entropy D ( X ||Y ), and the Belavkin–Stasewski relative entropy D ( X ||Y ). They are known to satisfy D ( X ||Y ) ≤ D( X ||Y ) ≤ BS D D ( X ||Y ). We prove that the Donald relative entropy provides the sharp upper bound, BS independent of Z on Tr[ X log( F ( Z , Y ))] in a number of cases in which F ( Z , Y ) is homogeneous of degree 1 in Z and −1in Y . We also investigate the Legendre trans- Communicated by Ari Laptev. Work partially supported by U.S. National Science Foundation Grant DMS 1501007. Work partially supported by U.S. National Science Foundation Grant PHY 1265118. B Eric A. Carlen carlen@math.rutgers.edu Department of Mathematics, Hill Center, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854-8019, USA Departments of Mathematics and Physics, Princeton University, Washington Road, Princeton, NJ 08544, USA 123 E. A. Carlen,E.H.Lieb forms in X of D ( X ||Y ) and D ( X ||Y ), and show how our results about these D BS Legendre transforms lead to new reﬁnements of the Golden–Thompson inequality. Keywords Trace inequalities · Quantum relative entropy · Convexity 1 Introduction Let M denote the set of complex n × n matrices. Let P and H denote the subsets of n n n M consisting of strictly positive and self-adjoint matrices respectively. For X, Y ∈ H , X ≥ Y indicates that X − Y is positive semi-deﬁnite; i.e., in the closure of P , n n and X > Y indicates that X ∈ P . Let p and q be non-zero real numbers. There are many functions F : P ×P → P n n n p q such that F ( X, Y ) = X Y whenever X and Y compute. For example, p/2 q p/2 q/2 p q/2 F ( X, Y ) = X Y X or F ( X, Y ) = Y X Y . (1.1) Further examples can be constructed using geometric means: For positive n × n matri- ces X and Y , and t ∈[0, 1],the t-geometric mean of X and Y , denoted by X# Y,is deﬁned by Kubo and Ando [26]tobe 1/2 −1/2 −1/2 t 1/2 X# Y := X ( X YX ) X . (1.2) The geometric mean for t = 1/2 was initially deﬁned and studied by Pusz and Woronowicz [36]. The formula (1.2) makes sense for all t ∈ R and it has a natu- ral geometric meaning [40]; see the discussion around Deﬁnition 2.4 and in Appendix C. Then for all r > 0 and all t ∈ (0, 1), r r F ( X, Y ) = X # Y (1.3) is such a function with p = r (1 − t ) and q = rt. Other examples will be considered below. If F is such a function, then Tr[ X log F ( X, Y )]= Tr[ X ( p log X + q log Y )] when- ever X and Y commute. We are interested in conditions on F that guarantee either Tr[ X log F ( X, Y )]≥ Tr[ X ( p log X + q log Y )] (1.4) or Tr[ X log F ( X, Y )]≤ Tr[ X ( p log X + q log Y )] (1.5) for all X, Y ∈ P . Some examples of such inequalities are known: Hiai and Petz [23] proved that 123 Some trace inequalities for exponential and logarithmic… 1 1 p/2 p p/2 p/2 p p/2 Tr[ X log(Y X Y )]≤ Tr[ X (log X + log Y )]≤ Tr[ X log( X Y X )] p p (1.6) q/ p for all X, Y > 0 and all p > 0. Replacing Y by Y shows that for F ( X, Y ) = p/2 q p/2 q/2 p q/2 X Y X ,(1.4) is valid, while for F ( X, Y ) = Y X Y ,(1.5) is valid: Remark- ably, the effects of non-commutativity go in different directions in these two examples. Other examples involving functions F of the form (1.3) have been proved by Ando and Hiai [2]. Here we prove several new inequalities of this type, and we also strengthen the results cited above by bringing in a third operator Z: For example, Theorem 1.4 says that for all postive X, Y and Z such that Tr[ Z]= Tr[ X ], p/2 p p/2 Tr[ X log(Y Z Y )]≤ Tr[ X (log X + log Y )] (1.7) with strict inequlaity if Y and Z do not commute. If Y and Z do commute, the left side of (1.7)issimplyTr[ X (log Z + log Y )], and the inequality (1.7) would then follow from the inequality Tr[ X log Z]≤ Tr[ X log X ] for all positive X and Z with Tr[ Z]= Tr[ X ]. Our result shows that this persists in the non-commutative case, and we obtain similar results for other choices of F, in particular for those deﬁned in terms of gemetric means. One of the reasons that inequalities of this sort are of interest is their connection −1 with quantum relative entropy. By taking Y = W , with X and W both having unit trace, so that both X and W are density matrices, the middle quantity in (1.6), Tr[ X (log X − log W )], is the Umegaki relative entropy of X with respect to W [43]. Thus (1.6) provides upper and lower bounds on the relative entropy. There is another source of interest in the inequalities (1.6), which Hiai and Petz refer to as logarithmic inequalities. As they point out, logarithmic inequalities are dual, via the Legendre transform, to certain exponential inequalities related to the Golden–Thompson inequality. Indeed, the quantum Gibbs variational principle states that H +log W sup{Tr[XH ]−Tr[ X (log X −log W )]: X ≥0Tr[ X]= 1}= log(Tr[e ]), (1.8) for all self-adjoint H and all non-negative W . (The quantum Gibbs variational principle is a direct consequence of the Peierls–Bogoliubov inequality, see Appendix A.) It follows immediately from (1.6) and (1.8) that 1/2 −1 1/2 sup{Tr[XH]− Tr[ X (log( X W X )]: H +log W X ≥ 0Tr[ X]= 1}≤ log(Tr[e ]). (1.9) H +log W The left side of (1.9) provides a lower bound for log(Tr[e ]) in terms of a Legendre transform, which, unfortunately, cannot be evaluated explicitly. 123 E. A. Carlen,E.H.Lieb An alternate use of the inequality on the right in (1.6) does yield an explicit lower H +log W H bound on log(Tr[e ]) in terms of a geometric mean of e and W . This was done in [23]; the bound is rH rK 1/r (1−t ) H +tK Tr[(e # e ) ]≤ Tr[e ], (1.10) which is valid for all self adjoint H, K , and all r > 0 and t ∈[0, 1]. Since the Golden– (1−t ) H +tK (1−t ) H tK Thompson inequality is Tr[e ]≤ Tr[e e ],(1.10)isviewedin[23]as a complement to the Golden–Thompson inequality. Hiai and Petz show [23, Theorem 2.1] that the inequality (1.10) is equivalent to the inequality on the right in (1.6). One direction in proving the equivalence, starting from (1.10), is a simple differentiation argument; differentiating (1.10)at t = 0 yields the result. While the inequality on the left in (1.6) is relatively simple to prove, the one on the right appears to be deeper and more difﬁcult to prove from the perspective of [23]. In our paper we prove a number of new inequalities, some of which strengthen and extend (1.6) and (1.10). Our results show, in particular, that the geometric mean provides a natural bridge between the pair of inequalities (1.6). This perspective yields a fairly simple proof of the deeper inequality on the right of (1.6), and thereby places the appearance of the geometric mean in (1.10) in a natural context. Before stating our results precisely, we recall the notions of operator concavity and operator convexity. A function F : P → H is concave in case for all X, Y ∈ P n n n and all t ∈[0, 1], F ((1 − t ) X + tY ) − (1 − t ) F ( X ) − tF (Y ) ∈ P , and F is convex in case − F is concave. For example, F ( X ) := X is concave for p ∈[0, 1] as is F (x ) := log X. A function F : P × P → H is jointly concave in case for all X, Y, W, Z ∈ P n n n n and all t ∈[0, 1] F ((1 − t ) X + tY,(1 − t ) Z + tW ) − (1 − t ) F ( X, Z ) − tF (Y, W ) ∈ P , and F is jointly convex in case − F is jointly concave. Strict concavity or convexity means that the left side is never zero for any t ∈ (0, 1) unless X = Y and Z = W . A par- ticularly well-known and important example is provided by the generalized geometric means. By a theorem of Kubo and Ando [26], for each t ∈[0, 1], F ( X, Y ) := X# Y is jointly concave in X and Y . Other examples of jointly concave functions are discussed below. Our ﬁrst main result is the following: 1.1 Theorem Let F : P × P → P be such that: n n n (1) For each ﬁxed Y ∈ P ,X → F ( X, Y ) is concave, and for all λ> 0,F (λ X, Y ) = λ F ( X, Y ). 123 Some trace inequalities for exponential and logarithmic… (2) For each n × n unitary matrix U , and each X, Y ∈ P , ∗ ∗ ∗ F (UXU , UY U ) = UF ( X, Y )U . (1.11) (3) For some q ∈ R, if X and Y commute then F ( X, Y ) = XY . Then, for all X, Y, Z ∈ P such that Tr[ Z]= Tr[ X ], Tr[ X log( F ( Z , Y ))]≤ Tr[ X (log X + q log Y )]. (1.12) If, moreover, X → F ( X, Y ) is strictly concave, then the inequality in (1.12) is strict when Z and Y do not commute. 1.2 Remark Notice that (1.12) has three variables on the left, but only two on the right. The third variable Z is related to X and Y only through the constraint Tr[ Z]= Tr[ X ]. Different choices for the function F ( X, Y ) yield different corollaries. For our ﬁrst 1 1 corollary, we take the function F ( X, Y ) = X dλ, which evidently satis- 0 λ+Y λ+Y ﬁes the conditions of Theorem 1.1 with q =−1. We obtain, thereby, the following inequality: 1.3 Theorem Let X, Y, Z ∈ P be such that Tr[ Z]= Tr[ X ], Then 1 1 Tr X log Z dλ ≤ Tr[ X (log X − log Y )]. (1.13) λ + Y λ + Y 1/2 1/2 Another simple application can be made to the function F ( X, Y ) = Y XY , however in this case, an adaptation of method of proof of Theorem 1.1 yields a more p/2 p p/2 general result for the two-parameter family of functions F ( X, Y ) = Y X Y for all p > 0. 1.4 Theorem For all X, Y, Z ∈ P such that Tr[ Z]= Tr[ X ], and all p > 0, p/2 p p/2 p p Tr[ X log(Y Z Y ))]≤ Tr[ X (log X + log Y )]. (1.14) The inequality in (1.14) is strict unless Z and Y commute, Specializing to the case Z = X,(1.14) reduces to the inequality on the left in (1.6). Theorem 1.4 thus extends the inequality of [23] by inclusion of the third variable Z, and speciﬁes the cases of equality there. 1.5 Remark If Z does commute with Y,(1.14) reduces to Tr[ X log Z]≤ Tr[ X log X ] which is well-known to be true under the condition Tr[ Z]= Tr[ X ], with equality if and only if Z = X. We also obtain results for the two parameter family of functions r r F ( X, Y ) = Y # X 123 E. A. Carlen,E.H.Lieb p q with s ∈[0, 1]. and r > 0. In this case, when X and Y commute, F ( X, Y ) = X Y with p = rs and q = r (1 − s). (1.15) It would be possible to deduce at least some of these results directly from Theorem 1.1 2 2 −1 2 −1 1/2 if we knew that, for example, X → Y # X = Y (Y X Y ) Y is concave in 1/2 X. While we have no such result, it turns out that we can use Theorem 1.4 to obtain the following: 1.6 Theorem Let X, Y, Z ∈ P be such that Tr[ Z]= Tr[ X ]. Then for all s ∈[0, 1] and all r > 0, r r r r Tr[ X log(Y # Z ))]≤ Tr[ X (s log X + (1 − s) log Y )]. (1.16) For s ∈ (0, 1), when Z does not commute with Y , the inequality is strict. The case in which Z = X is proved in [2] using log-majorization methods. The inequality (1.16) is an identity at s = 1. As we shall show, differentiating it at s = 1in the case Z = X yields the inequality on the right in (1.6). Since the geometric mean inequality (1.16) is a consequence of our generalization of the inequality on the left in (1.6), this derivation shows how the geometric means construction ‘bridges’ the pair of inequalities (1.6). Theorems 1.3, 1.4 and 1.6 provide inﬁnitely many new lower bounds on the Umegaki relative entropy. One for each choice of Z. The trace functional on the right side of (1.6) bounds the Umegaki relative entropy from above, and in many ways better-behaved than the trace functional on the left, or any of the individual new lower bounds. By a theorem of Fujii and Kamei [17] 1/2 1/2 −1 1/2 1/2 X, W → X log( X W X ) X is jointly convex as a function from P × P to P , and then as a trivial consequence, n n n 1/2 −1 1/2 X, W → Tr[ X log( X W X )] 1/2 −1 1/2 is jointly convex. When X and W are density matrices, Tr[ X log( X W X )]=: D ( X ||W ) is the Belavkin–Stasewski relative entropy [6]. The joint convexity of the BS Umegaki relative entropy is a Theorem of Lindblad [32], who deduced it as a direct consequence of the main concavity theorem in [30]; see also [42]. 1/2 −1 1/2 A seemingly small change in the arrangement of the operators— X W X −1/2 −1/2 replaced with W XW —obliterates convexity; −1/2 −1/2 X, W → Tr[ X log(W XW )] (1.17) −1/2 −1/2 is not jointly convex, and even worse, the function W → Tr[ X log(W XW )] is not convex for all ﬁxed X ∈ P . Therefore, although the function in (1.17) agrees 123 Some trace inequalities for exponential and logarithmic… with the Umegaki relative entropy when X and W commute, its lack of convexity makes it unsuitable for consideration as a relative entropy functional. We discuss the failure of convexity at the end of Sect. 3. However, Theorem 1.4 provides a remedy by introducing a third variable Z with respect to which we can maximize. The resulting functional is still bounded above by the Umegaki relative entropy: that is, for all density matrices X and W , −1/2 −1/2 sup{Tr[ X log(W ZW )]: Z ≥ 0Tr[ Z]≤ 1}≤ D( X ||W ). (1.18) One might hope that the left side is a jointly convex function of X and W , which does turn out to be the case. In fact, the left hand side is a quantum relative entropy originally introduced by Donald [14], through a quite different formula. Given any orthonormal basis {u ,..., u } of C , deﬁne a “pinching” map : M → M by 1 n n n deﬁning ( X ) to be the diagonal matrix whose jth diagonal entry is u , Xu .Let j j P denote the sets of all such pinching operations. For density matrices X and Y,the Donald relative entropy, D ( X ||Y ) is deﬁned by D ( X ||Y ) = sup{ D(( X )||(Y )) : ∈ P}. (1.19) Hiai and Petz [22] showed that for all density matrices X and all Y ∈ P , D ( X ||Y ) = sup{Tr[XH]− log Tr[e Y ] : H ∈ H }, (1.20) D n arguing as follows. Fix any orthonormal basis {u ,..., u } of C .Let X be any density 1 n matrix and let Y be any positive matrix. Deﬁne x = u , Xu and y = u , Yu j j j j j j for j = 1,..., n.For (h ,..., h ) ∈ R , deﬁne H to be the self-adjoint operator 1 n given by Hu = h u , j = 1,..., n. j j j Then by the classical Gibb’s variational principle. ⎧ ⎛ ⎞ ⎫ n n n ⎨ ⎬ h n ⎝ ⎠ x (log x − log y ) = sup x h − log e y : (h ,..., h ) ∈ R j j j j j j 1 n ⎩ ⎭ j =1 j =1 j =1 H n = sup Tr[XH]− log Tr[e Y ] : (h ,..., h ) ∈ R . 1 n Taking the supremum over all choices of the orthonormal basis yields (1.20). For our purposes, a variant of (1.20)isuseful: 1.7 Lemma For all density matrices X , and all Y ∈ P , D ( X ||Y ) = sup{Tr[XH]: H ∈ H Tr[e Y]≤ 1}. (1.21) D n Proof Observe that we may add a constant to H without changing Tr[XH]− log Tr[e Y ] , and thus in taking the supremum in (1.20) we may restrict our attention H H to H ∈ H such that Tr[e Y]= 1. Then Tr[XH]− log Tr[e Y ] = Tr[XH ] and 123 E. A. Carlen,E.H.Lieb the constraint in (1.21) is satisﬁed. Hence the supremum in (1.20) is no larger than the supremum in (1.21). Conversely, if Tr[e Y]≤ 1, then Tr[XH]≤ Tr[XH]− log Tr[e Y ] , and thus the supremum in (1.21) is no larger than the supremum in (1.20). By the joint convexity of the Umegaki relative entropy, for each ∈ P, D(( X )||(Y )) is jointly convex in X and Y , and then since the supremum of a family of convex functions is convex, the Donald relative entropy D ( X ||Y ) is jointly 1/2 H 1/2 convex. Making the change of variables Z = W e W in (1.18), one sees that the supremum in (1.20) is exactly the same as the supremum in (1.21), and thus for all density matrices X and W , D ( X ||W ) ≤ D( X ||W ) which can also be seen as a consequence of the joint convexity of the Umegaki relative entropy. Theorems 1.3 and 1.6 give two more lower bounds to the Umegaki relative entropy for density matrices X and Y , namely 1 1 sup Tr X log Z dλ (1.22) λ + Y λ + Y Z ∈P ,Tr[ Z ]=Tr[ X ] 0 and −1 2 sup Tr[ X log(Y # Z ) ] (1.23) 1/2 Z ∈P ,Tr[ Z ]=Tr[ X ] Proposition 3.1 shows that both of the supremums are equal to D ( X ||Y ). Our next results concern the partial Legendre transforms of the three relative entropies D ( X ||Y ), D( X ||Y ) and D ( X ||Y ). For this, it is natural to consider D BS them as functions on P × P , and not only on density matrices. The natural extension n n of the Umegaki relative entropy functional to P × P is n n D( X ||W ) := Tr[ X (log X − log W )]+ Tr[W]− Tr[ X ]. (1.24) It is homogeneous of degree one in X and W and, with this deﬁnition, D( X ||Y ) ≥ 0 with equality only in case X = W , which is a consequence of Klein’s inequality, as discussed in Appendix A. The natural extension of the Belavkin–Stasewski relative entropy functional to P × P is n n 1/2 −1 1/2 D ( X ||W ) = Tr[ X log( X W X )]+ Tr[W]− Tr[ X ]. (1.25) BS Introducing Q := e , the supremum in (1.21)is sup{Tr[ X log Q]: Q ≥ 0Tr[WQ]≤ 1}, 123 Some trace inequalities for exponential and logarithmic… and the extension of the Donald relative entropy to P × P is n n D ( X ||W ) = sup {Tr[ X log Q]: Tr[WQ]≤ Tr[ X ]} + Tr[W]− Tr[ X ]. (1.26) Q>0 To avoid repetition, it is useful to note that all three of these functionals are examples of quantum relative entropy functionals in the sense of satisfying the following axioms. This axiomatization differs from many others, such as the ones in [14,18], which are designed to single out the Umegaki relative entropy. 1.8 Deﬁnition A quantum relative entropy is a function R( X ||W ) on P × P with n n values in [0, ∞] such that (1) X, Y → R( X ||W ) is jointly convex. (2) For all X, W ∈ P and all λ> 0, R(λ X,λW ) = λ R( X, W ) and R(λ X, W ) = λ R( X, W ) + λ log λTr[ X]+ (1 − λ)Tr[W ]. (1.27) (3) If X and W commute, R( X ||W ) = D( X ||W ). The deﬁnition does not include the requirement that R( X ||W ) ≥ 0 with equality if and only if X = W because this follows directly from (1), (2) and (3): 1.9 Proposition Let R( X ||W ) be any quantum relative entropy. Then X W R( X ||W ) ≥ Tr[ X ] − (1.28) Tr[ X ] Tr[W ] where · denotes the trace norm. The proof is given towards the end of Sect. 3. It is known for the Umegaki relative entropy [21], but the proof uses only the properties (1), (2) and (3). The following pair of inequalities summarizes the relation among the three relative entropies. For all X, W ∈ P , D ( X ||W ) ≤ D( X ||W ) ≤ D ( X ||W ). (1.29) D BS These inequalities will imply a corresponding pair of inequalities for the partial Leg- endre transforms in X. 1.10 Remark The partial Legendre transform of the relative entropy, which ﬁgures in the Gibbs variational principle, is in many ways better behaved than the full Legendre ∗ n transform. Indeed the Legendre transform F of a function F on R that is convex and homogenous of degree one always has the form 0 y ∈ C F ( y) = ∞ y ∈ / C 123 E. A. Carlen,E.H.Lieb for some convex set C [38]. The set C ﬁguring in the full Legendre transform of the Umegaki relative entropy was ﬁrst computed by Pusz and Woronowicz [37], and somewhat more explicitly by Donald in [14]. Consider any function R( X ||Y ) on P ×P that is convex and lower semicontinuous n n in X. There are two natural partial Legendre transforms that are related to each other, namely ( H, Y ) and ( H, Y ) deﬁned by R R ( H, Y ) = sup {Tr[XH]− R( X ||Y ) : Tr[ X]= 1} (1.30) X ∈P and ( H, Y ) = sup {Tr[XH]− R( X ||Y )} (1.31) X ∈P where H ∈ H is the conjugate variable to X. For example, let R( X ||Y ) = D( X ||Y ), the Umegaki relative entropy. Then, by the Gibbs variational principle, H +log Y ( H, Y ) = 1 − Tr[Y ]+ log(Tre ) (1.32) and H +log Y ( H, Y ) = Tre − TrY. (1.33) 1.11 Lemma Let R( X ||Y ) be any function on P × P that is convex and lower n n semicontinuous in X , and which satisﬁes the scaling relation (1.27). Then for all H ∈ H and all Y ∈ P . n n ( X,Y )+Tr[Y ]−1 ( H, Y ) = e − Tr[Y ]. (1.34) This simple relation between the two Legendre transforms is a consequence of scaling, and hence the corresponding relation holds for any quantum relative entropy. Consider the Donald relative entropy and deﬁne ( H, Y ) := sup {Tr[XH]− D ( X ||Y )}, (1.35) D D X >0 and ( H, Y ) := sup {Tr[XH]− D ( X ||Y )} (1.36) D D X >0,Tr[ X ]=1 In Lemma 3.7, we prove the following analog of (1.32): For H ∈ H and Y ∈ P , n n ( H, Y ) = 1 − Tr[Y ]+ inf λ H − log Q : Q ∈ P Tr[QY]≤ 1 ( ) D max n (1.37) 123 Some trace inequalities for exponential and logarithmic… where for any self-adjoint operator K , λ (K ) is the largest eigenvalue of K , and max we prove that ( H, Y ) is concave in Y . As a consequence of this we prove in Theorem 3.10 that for all H ∈ H , the function Y → exp inf λ H − log Q (1.38) ( ) max Q>0,Tr[QY ]≤1 is concave on P . Moreover, for all H, K ∈ H , n n H +K H K log(Tr[e ]) ≤ inf λ ( H − log Q) ≤ log(Tr[e e ]). (1.39) max Q>0,Tr[Qe ]≤1 These inequalities improve upon the Golden–Thompson inequality. Note that by Lemma 1.11,(1.33) and (1.37), the inequality on the left in (1.39) is equivalent to ( H, Y ) ≤ ( H, Y ), which in turn is equivalent under the Legendre transform to D ( X ||Y ) ≤ D( X ||Y ). The inequality on the right in (1.39) arises through the simple of choice Q = H H e /Tr[Ye ] in the variational formula for ( H, Y ).The Q chosen here is optimal only when H and Y commute. Otherwise, there is a better choice for Q, which we shall identify in Sect. 4, and which will lead to a tighter upper bound. In Sect. 4 we shall also discuss the Legendre transform of the Belavkin–Staszewski relative entropy and form this we derive further reﬁnements of the Golden–Thompson inequality. Finally, in Theorem 4.3 we prove a sharpened form of (1.10), the complementary Golden– Thompsen inequality of Hiai and Petz, incorporating a relative entropy remainder term. Three appendices collect background material for the convenience of the reader. 2 Proof of Theorem 1.1 and related inequalities Proof of Theorem 1.1 Our goal is to prove that for all X, Y, Z ∈ P such that Tr[ Z]= Tr[ X ]. Tr[ X log( F ( Z , Y ))]≤ Tr[ X (log X + q log Y )] (2.1) whenever F has the properties (1), (2) and (3) listed in the statement of Theorem 1.1. By the homogeneity speciﬁed in (3), we may assume without loss of generality that Tr[ X]= Tr[ Z]= 1. Note that (2.1) is equivalent to Tr X (log( F ( Z , Y )) − log X − q log Y )) ≤0(2.2) By the Peierls–Bogoliubov inequality (A.3), it sufﬁces to prove that Tr exp (log( F ( Z , Y )) − q log Y )) ≤ 1. (2.3) Let J denote an arbitrary ﬁnite index set with cardinality |J |.Let U ={U ,..., U } 1 |J | be any set of unitary matrices each of which commutes with Y . Then for each j ∈ J , by (2) 123 E. A. Carlen,E.H.Lieb ! " Tr exp (log( F ( Z , Y )) − q log Y ) = Tr U exp (log( F ( Z , Y )) − q log Y ) U ! " = Tr exp log( F (U ZU , Y )) − q log Y (2.4) Deﬁne Z = U ZU , |J | j ∈J H +log W Recall that W → Tr[e ] is concave [30]. Using this, the concavity of Z → F ( Z , Y ) speciﬁed in (1), and the monotonicity of the logarithm, averaging both sides of (2.4) over j yields Tr exp (log( F ( Z , Y )) − q log Y ) ≤ Tr exp log( F ( Z , Y )) − q log Y . Now making an appropriate choice of U [13], Z becomes the “pinching” of Z with respect to Y ; i.e., the orthogonal projection in M onto the ∗-subalgebra generated by Y and 1. In this case, Z and Y commute so that by (3), # # log( F ( Z , Y )) − q log Y = log Z + q log Y. Altogether, Tr exp (log( F ( Z , Y )) − q log Y ) ≤ Tr[ Z]= Tr[ Z]= 1 and this proves (2.3). p/2 p p/2 For the case F ( X, Y ) = Y X Y , we can make a similar use of the Peierls– Bogoliubov inequality but can avoid the appeal to convexity. Proof of Theorem 1.4 The inequality we seek to prove is equivalent to p/2 p p/2 Tr X log(Y Z Y ) − log X − log Y ) ≤ 0, (2.5) and again by the Peierls–Bogoliubov inequality it sufﬁces to prove that p/2 p p/2 Tr exp log(Y Z Y ) − log Y ) ≤ 1. (2.6) A reﬁned version of the Golden–Thompson inequality due to Friedland and So [16] says that for all positive A, B, and all r > 0, log A+log B r/2 r r/2 1/r Tr[e ]≤ Tr[( A B A ) ]. (2.7) 123 Some trace inequalities for exponential and logarithmic… and moreover the right hand side is a strictly increasing function of r, unless A and B commute, in which case it is constant in r. The fact that the right side of (2.7)is increasing in r is a conseqence of the Araki–Lieb–Thirring inequality [4], but here we shall need to know that the increase is strict when A and B do not commute; this is the contribution of [16]. Applying (2.7) with r = p, p/2 p p/2 Tr exp log(Y Z Y ) − log Y ) − p/2 p/2 p p/2 − p/2 1/ p ≤ Tr[(Y (Y Z Y )Y ) ]= Tr[ Z]= 1. (2.8) By the condition for equality in (2.7), there is equality in (2.8) if and only if p/2 p p/2 1/ p (Y Z Y ) and Y commute, and evidently this is the case if and only if Z and Y commute. In the one parameter family of inequalities provided by Theorem 1.4,someare stronger than others. It is worth noting that the lower the value of p > 0in (1.14)the stronger this inequality is, in the following sense: 2.1 Proposition The validity of (1.14) for p = p and for p = p implies its validity 1 2 for p = p + p . 1 2 Proof Since there is no constraint on Y other than that Y is positive, we may replace Y by any power of Y . Therefore, it is equivalent to prove that for all X, Y, Z ∈ P such that Tr[ Z]= Tr[ X ] and all p > 0, Tr[ X log(YZ Y ))]≤ Tr[ X ( p log X + 2log Y )]. (2.9) If (2.9) is valid for p = p and for p = p , then it is also valid for p = p + p : 1 2 1 2 p + p p /2 p p /2 1 2 2 1 2 YZ Y = (YZ ) Z ( Z Y ) p 1/2 ∗ p p 1/2 2 1 2 = (YZ Y ) U Z U (YZ Y ) p 1/2 ∗ p p 1/2 2 1 2 = (YZ Y ) (U ZU ) (YZ Y ) p 1/2 p /2 ∗ 2 2 where U (YZ Y ) is the polar factorization of Z Y . Since Tr[U ZU]= p + p 1 2 Tr[ Z]= Tr[ X ], we may apply (2.9)for p to conclude Tr[ X log(YZ Y )]≤ p Tr[ X log X]+ Tr[ X log(YZ Y )]. One more application of (2.9), this time with p = p , yields p + p 1 2 Tr[ X log(YZ Y )]≤ ( p + p )Tr[ X log X]+ 2Tr[ X log Y ]. (2.10) 1 2 By the last line of Corollary 1.4, the inequality (2.10) is strict if Z and Y do not commute and at least one of p or p belongs to (0, 1). 1 2 Our next goal is to prove Theorem 1.6. As indicated in the Introduction, we will show that Theorem 1.6 is a consequence of Theorem 1.4. The determination of cases 123 E. A. Carlen,E.H.Lieb of equality in Theorem 1.4 is essential for the proof of the key lemma, which we give now. 2.2 Lemma Fix X, Y, Z ∈ P such that Tr[ Z]= Tr[ X ], and ﬁx p > 0. Then there is some > 0 so that (1.16) is valid for all s ∈[0,], and such that when Y and Z do not commute, (1.16) is valid as a strict for all s ∈ (0,). Proof We may suppose, without loss of generality, that Y and Z do not commute since, if they do commute, the inequality is trivially true, just as in Remark 1.5.We compute ∞ p/2 p/2 d Y Y p p − p/2 − p/2 Tr[ X log(Y # Z ))] = Tr X log(Y ZY ) dt p p ds t + Y t + Y s=0 ! " − p/2 − p/2 = Tr W log(Y ZY ) where ∞ p/2 p/2 Y Y W := X dt. p p t + Y t + Y Evidently, Tr[W]= Tr[ X]= Tr[ Z ]. Therefore, by Theorem 1.4 (with X replaced by −1 W and Y replaced by Y ), ! " − p/2 − p/2 p p Tr W log(Y ZY ) ≤ Tr W (log W − log Y ) . Now note that ∞ p/2 ∞ p/2 Y Y p p p Tr W log Y = Tr X log Y dt = Tr[ X log Y ]. p p t + Y t + Y 0 0 Moreover, by Deﬁnition W = ( X ) where is a completely positive, trace and identity preserving linear map. By Lemma B.2 this implies that p p Tr[W log W ]≤ Tr[ X log X ]. Consequently, p p p p Tr[ X log(Y # Z ) − s log X − (1 − s) log Y ))] ds s=0 p p ≤ Tr[W log W ]− Tr[ X log X ]. Therefore, unless Y and Z commute, the derivative on the left is strictly negative, and hence, for some > 0, (1.16) is valid as a strict inequality for all s ∈ (0,).If Y and Z commute, (1.16) is trivially true for all p > 0 and all s ∈[0, 1]. 123 Some trace inequalities for exponential and logarithmic… Proof of Theorem 1.6 Suppose that (1.16) is valid for s = s and s = s , Since (by 1 2 eqs. (C.7) and (C.8) below) p p p p p (Y # Z )# Z = Y # Z s s s +s −s s 1 2 1 2 1 2 p p p p p Tr[ X log(Y # Z )]= Tr[ X log((Y # Z )# Z )] s +s −s s s s 1 2 1 2 1 2 p p p ≥ Tr[ X (s log X + (1 − s ) log(Y # Z ))] 2 2 s p p ≥ Tr[ X ((s + s − s s ) log X + (1 − s )(1 − s ) log Y )]. 1 2 1 2 2 1 Therefore, whenever (1.16) is valid for s = s and s = s ,itisvalid for s = s + s − 1 2 1 2 s s . 1 2 By Lemma 2.2, there is some > 0 so that (1.16) is valid as a strict inequality for all s ∈ (0,). Deﬁne an increasing sequence {t } recursively by t = and n 1 n∈N t = 2t − t for n > 1. Then by what we have just proved, (1.16) is valid as a strict n n inequality for all s ∈ (0, t ). Since lim t = 1, the proof is complete. n n→∞ n The next goal is to show that the inequality on the right in (1.6) is a consequence of Theorem 1.6 by a simple differentiation argument. This simple proof is the new feature, The statement concerning cases of equality was proved in [20]. 2.3 Theorem For all X, Y ∈ P and all p > 0, p p p/2 p p/2 Tr[ X (log X + log Y )]≤ Tr[ X log( X Y X )] (2.11) and this inequality is strict unless X and Y commute. Proof Specializing to the case Z = X in Theorem 1.6, r r r r Tr[ X log(Y # X ))]≤ Tr[ X (s log X + (1 − s) log Y )] (2.12) At s = 1 both sides of (2.12) equal Tr[ X log X ], Therefore, we may differentiate at s = 1 to obtain a new inequality. Rearranging terms in (2.12) yields r r r Tr[ X log X ]− Tr[ X log(Y # X ))] r r ≥ Tr[ X (log X − log Y )]. (2.13) 1 − s r r Taking the limit s ↑ 1 on the left side of (2.15) yields Tr[ X log(Y # X ))] . ds s=1 1 1 From the integral representation for the logarithm, namely log A = − dλ, 0 λ λ+ A it follows that for all A ∈ P and H ∈ H , n n d 1 1 log( A + uH ) = H dλ. du λ + A λ + A u=0 r r s s r/2 −r/2 r −r/2 1− p r/2 Since (see (C.8)) Y # X = X # Y = X ( X Y X ) X , s 1−s r r r/2 −r/2 r −r/2 r/2 r/2 r/2 −r r/2 r/2 Y # X =− X log( X Y X ) X = X log( X Y X ) X s=1 ds 123 E. A. Carlen,E.H.Lieb Altogether, by the cyclicitiy of the trace, ∞ 1+r d X r r r/2 −r r/2 Tr[ X log(Y # X ))] = Tr dλ log( X Y X ) r 2 d p (λ + X ) s=1 r/2 −r r/2 = Tr[ X log( X Y X )]. −1 Replacing Y by Y yields (2.11). This completes the proof of the inequality itself, and it remains to deal with the cases of equality. Fix r > 0 and X and Y that do hot commute. By Theorem 1.3 applied with Z = X and s = 1/2, there is some δ> 0 such that 1 1 1 Tr[ X log(Y # X )]≤ Tr[ X ( log X + log Y )]− δ. (2.14) 1/2 2 2 2 Now use the fact that Y # X = (Y # X )# X, and apply (2.11) and then (2.14): 3/4 1/2 1/2 1 1 Tr[ X log(Y # X )]= Tr[ X log((Y # X )# X )]≤ Tr[ X ( log X + log(Y # X ))] 3/4 1/2 1/2 1/2 2 2 1 1 = Tr[ X log X]+ Tr[ X (Y # X ))] 1/2 2 2 1 1 1 1 1 ≤ Tr[ X log X]+ (Tr[ X ( log X + log Y )]− δ) 2 2 2 2 2 3 1 1 = Tr[ X ( log X + log Y )]− δ. 4 4 4 We may only apply strict in the last step since δ depends on X and Y , and strict need not hold if Y is replaced by Y # X. However, in this case, we may apply (2.11). 1/2 Further iteration of this argument evidently yields the inequalities −k Tr[ X log(Y # X )]≤ Tr[ X ((1 − t ) log X + s log Y )]− t δ, t = 2 , 1−t k k k k for each k ∈ N. We may now improve (2.15)to r r r Tr[ X log X ]− Tr[ X log(Y # X ))] r r ≥ Tr[ X (log X − log Y )]+ δ (2.15) 1 − s −k for s = 1 − 2 , k ∈ N . By the calculations above, taking s → 1 along this sequence yields the desired strict inequality. Further inequalities, which we discuss now, involve an extension of the notion of geometric means. This extension is introduced here and explained in more detail in Appendix C. 1/2 −1/2 −1/2 t 1/2 Recall that for t ∈[0, 1] and X, Y ∈ P , X# Y := X ( X YX ) X .As n t noted earlier, this formula makes sense for all t ∈ R, and it has a natural geometric meaning. The map t → X# Y , deﬁned for t ∈ R, is a constant speed geodesic running between X and Y for a particular Riemannian metric on the space of positive matrices. 2.4 Deﬁnition For X, Y ∈ P and for t ∈ R, 1/2 −1/2 −1/2 t 1/2 X# Y := X ( X YX ) X . (2.16) 123 Some trace inequalities for exponential and logarithmic… The geometric picture leads to an easy proof of the following identity: Let X, Y ∈ P , and t , t ∈ R. Then for all t ∈ R 0 1 X# Y = ( X# Y )# ( X# Y ) (2.17) (1−t )t +tt t t t 0 1 0 1 See Theorem C.4 for the proof. As a special case, take t = 0 and t = 1. Then, for 1 0 all t, X# Y = Y # X. (2.18) 1−t t With this deﬁnition of X# Y for t ∈ R we have: 2.5 Theorem For all X, Y, Z ∈ P such that Tr[ Z]= Tr[ X ], r r r r Tr[ X log( Z # Y )]≥ Tr[ X ((1 − t ) log X + t log Y )]. (2.19) is valid for all t ∈[1, ∞) and r > 0. If Y and Z do not commute, the inequality is strict for all t > 1. The inequalities in Theorem 2.5 and in Theorem 1.6 are equivalent. The following simple identity is the key to this observation: 2.6 Lemma For B, C ∈ P and s = 1,let A = B# C . Then n s B = C# A. (2.20) 1/(1−s) Proof Note that by (2.16) and (2.18), A = B# C is equivalent to A = 1/2 −1/2 −1/2 1−s 1/2 −1/2 −1/2 −1/2 −1/2 1−s C (C BC ) C , so that C AC = (C BC ) . 2.7 Lemma Let X, Y, Z ∈ P be such that Tr[ Z]= Tr[ X ]. Let r > 0. Then (1.16) is valid for s ∈ (0, 1) if and only if (2.19) is valid for t = 1/(1 − s). r r r r Proof Deﬁne W ∈ P by W := Y # Z . The identity (2.20) then says that Y = n s r r Z # W . Therefore, 1/(1−s) r r r r Tr[ X log(Y # Z ) − s log X − (1 − s) log Y )] r r r r = Tr[ X (log W − s log X − (1 − s) log( Z # W )]. (2.21) 1/(1−s) Since s ∈ (0, 1), the right side of (2.21) is non-positive if and only if r r −s 1 r Tr[ X log( Z # W )]≥ Tr[ X ( log X + W )] 1/(1−s) 1−s 1−s With this lemma we can now prove Theorem 2.5. Proof of Theorem 2.5 Lemma 2.7 says that Theorem 2.5 is equivalent to Theorem 1.6. 123 E. A. Carlen,E.H.Lieb There is a complement to Theorem 2.5 in the case Z = X that is equivalent to a result of Hiai and Petz, who formulate it differently and do not discuss extended geometric means. The statement concerning cases of equality is new. 2.8 Theorem For all X, Y ∈ P , r r r r Tr[ X log( X # Y )]≥ Tr[ X ((1 − t ) log X + t log Y )]. (2.22) is valid for all t ∈ (−∞, 0] and r > 0. If Y and X do not commute, the inequality is strict for all t < 0. Proof By Deﬁnition 2.4 r r r/2 r/2 −r r/2 |t | r/2 r/2 r r/2 X # Y = X ( X Y X ) X = X W X where r/2 −r r/2 |t |/r W := ( X Y X ) . Therefore, by (2.11), r r r/2 r r/2 Tr[ X log( X # Y )]= Tr[ X log( X W X )]≥ r Tr[ X log X]+ r Tr[ X log W ]. By the deﬁnition of W and (2.18) once more, |t | |t | r/2 −r r/2 Tr[ X log W]= Tr[ X log(( X Y X )]≥ r Tr[ X (log X − log Y )]. r r By combining the inequalities we obtain (2.22). The proof given by Hiai and Petz is quite different. It uses a tensorization argument. 3 Quantum relative entropy inequalities Theorems 1.3, 1.4 and 1.6 show that the three functions −1/2 −1/2 X, Y → sup Tr[ X log(Y ZY ))] + Tr[Y ]− Tr[ X ] (3.1) Z ∈P ,Tr[ Z ]=Tr[ X ] −1 2 X, Y → sup Tr[ X log(Y # Z ) ] + Tr[Y ]− Tr[ X ] (3.2) 1/2 Z ∈P ,Tr[ Z ]=Tr[ X ] and 1 1 X, Y → sup Tr X log Z dλ + Tr[Y ]−Tr[ X ] λ + Y λ + Y Z ∈P ,Tr[ Z ]=Tr[ X ] 0 (3.3) are all bounded above by the Umegaki relative entropy X, Y → Tr[ X (log X − log Y )]+ Tr[Y]− Tr[ X ]. The next lemma shows that these functions are actually one and the same. 123 Some trace inequalities for exponential and logarithmic… 3.1 Proposition The three functions deﬁned in (3.1), (3.2) and (3.3) are all equal to to the Donald relative entropy D ( X ||Y ). Consequently, for all X, Y ∈ P , D n D ( X ||Y ) ≤ D( X ||Y ). (3.4) Proof The ﬁrst thing to notice is that the relaxed constraint Tr[YQ]≤ Tr[ X ] imposes the same restriction in (1.26) as does the hard constraint Tr[YQ]= Tr[ X ] since, if Tr[YQ] < Tr[ X ], we may replace Q by (Tr[ X ]/Tr[YQ]) Q so that the hard constraint is satisﬁed. Thus we may replace the relaxed constraint in (1.26) by the hard constraint without affecting the function D ( X ||Y ). This will be convenient in the lemma, though elsewhere the relaxed constraint will be essential. Next, for each of (3.1), (3.2) and (3.3) we make a change of variables. In the ﬁrst −1/2 −1/2 case, deﬁne : P → P by ( Z ) = Y ZY := Q. Then is invertible with n n −1 1/2 1/2 ( Q) = Y QY . Under this change of variables, the constraint Tr[ X]= Tr[ Z ] 1/2 1/2 becomes. Tr[ X]= Tr[Y QY ]= Tr[YQ]. Thus (3.1) gives us another expression for the Donald relative entropy. For the function in (3.2), we make a similar change of variables. Deﬁne : P → 1/2 P by ( Z ) = Z# Y := Q from P to P . This map is invertible: It fol- n 1/2 n n 1/2 −1 lows by direct computation from the deﬁnition (1.2) that for Q := Z# Y , 1/2 1/2 1/2 −1 1/2 1/2 Z = Q YQ , so that ( Q) = Q YQ . (This has an interesting and useful geometric interpretation that is discussed in Appendix C.) Under this change of vari- 1/2 1/2 ables, the constraint Tr[ X]= Tr[ Z ] becomes. Tr[ X]= Tr[ Q YQ ]= Tr[YQ]. Thus (3.2) gives another expression for the Donald relative entropy. Finally, for the function in (3.3), we make a similar change of variables. Deﬁne : P → P by n n 1 1 1/2 ( Z ) = Z dλ := Q λ + Y λ + Y −1 1−s s from P to P . This map is invertible: ( Q) = Y QY ds. Under this change n n 1−s s of variables, the constraint Tr[ X]= Tr[ Z ] becomes Tr[ X]= Tr[ Y QY ds]= Tr[YQ]. With the Donald relative entropy having taken center stage, we now bend our efforts to establishing some of its properties. 3.2 Lemma Fix X, Y ∈ P , and deﬁne K := { Q ∈ P : Tr[QY]≤ Tr[ X ]}. n X,Y n There exists a unique Q ∈ K such that Tr[ Q Y]≤ Tr[ X ] and such that X,Y X,Y X,Y Tr[ X log Q ] > Tr[ X log Q] X,Y for all other Q ∈ K . The equation X,Y 1 1 X dt = Y. (3.5) t + Q t + Q 123 E. A. Carlen,E.H.Lieb has a unique solution in P , and this unique solution is the unique maximizer Q . n X,Y Proof Note that K is a compact, convex set. Since Q → log Q is strictly concave, X,Y Q → Tr[ X log Q] is strictly concave on K , and it has the value −∞ on ∂P ∩K , X,Y n X,Y there is a unique maximizer Q that lies in P ∩ K . X,Y n X,Y Let H ∈ H be such that Tr[HY]= 0. For all t in a neighborhood of 0, Q +tH ∈ n X,Y P ∩ K . Differentiating in t at t = 0 yields n X,Y 1 1 0 = Tr X H dt t + Q t + Q X,Y X,Y 1 1 = Tr H X dt , t + Q t + Q 0 X,Y X,Y and hence 1 1 X dt = λY t + Q t + Q 0 X,Y X,Y 1/2 for some λ ∈ R. Multiplying through on both sides by Q and taking the trace X,Y yields λ = 1, which shows that Q solves (3.5). Conversely, any solution of (3.5) X,Y yields a critical point of our strictly concave functional, and hence must be the unique maximizer. 3.3 Remark There is one special case for which we can give a formula for the solution −1 Q to (3.5): When X and Y commute, Q = XY . X,Y X,Y 3.4 Lemma For all X, Y ∈ P and all λ> 0, D (λ X,λY ) = λ D ( X, Y ), (3.6) D D and D (λ X, Y ) = λ D ( X, Y ) + λ log λTr[ X]+ (1 − λ)Tr[Y ], (3.7) D D Proof By (3.5) the maximizer Q in Lemma 3.2 satisﬁes the scaling relations X,Y −1 Q = λ Q and Q = λ Q , (3.8) λ X,Y X,Y X,λY X,Y and (3.6) follows immediately. Next, by (3.8)again, D (λ X ||Y ) = λ Tr[ X log Q ]+ Tr[Y ]− Tr[ X ] + λ log λTr[ X]+ (1 − λ)Tr[Y ], D X,Y which proves (3.7). 123 Some trace inequalities for exponential and logarithmic… 3.5 Lemma If X and Y commute, D ( X ||Y ) = D( X ||Y ). Proof Let {U ,..., U } be any set of unitary matrices that commute with X and Y . 1 N Then for each j = 1,..., n,Tr[Y (U QU )]= Tr[YQ]. Deﬁne Q = U QU . j =1 For an appropriate choice of the set {U ,..., U }, Q is the orthogonal projection of 1 N Q, with respect to the Hilbert-Schmidt inner product, onto the abelian subalgebra of M generated by X, Y and 1 [13]. By the concavity of the logarithm, N N 1 1 ∗ ∗ Tr[ X log Q]≥ Tr[ X log(U QU )]= Tr[UXU log Q]= Tr[ X log Q]. N N j =1 j =1 Therefore, in taking the supremum, we need only consider operators Q that commute with both X and Y . The claim now follows by Remark 3.3. 3.6 Remark Another simple proof of this can be given using Donald’s original formula (1.19). We have now proved that D has properties (2) and (3) in the Deﬁnition 1.8 of relative entropy, and have already observed that it inherits joint convexity from the Umegaki relative entropy though its original deﬁnition by Donald. We now compute the partial Legendre transform of D ( X ||Y ). In doing so we arrive at a direct proof of the joint convexity of D ( X ||Y ), independent of the joint convexity of the Umegaki relative entropy. We ﬁrst prove Lemma 1.11. −1 Proof of Lemma 1.11 For X ∈ P , deﬁne a = Tr[ X ] and W := a X, so that W is a density matrix. Then Tr[XH]− R( X ||Y ) = aTr[WH]− aR(W ||Y ) − a log a − (1 − a)Tr[Y ] Therefore, ( H, Y ) = sup a sup {Tr[WH]− D (W ||Y ) : Tr[W]= 1} + aTr[Y ]− a log a − Tr[Y ] R R a>0 W ∈P = sup {a( ( H, Y ) + Tr[Y ]) − a log a} − Tr[Y ]. a>0 b−1 Now use the fact that for all a > 0 and all b ∈ R, a log a + e ≥ ab with equality if and only if b = 1 + log a to conclude that (1.34) is valid. The function D evidently satisﬁes the conditions of this lemma. Our immediate goal is to compute ( H, Y ) for this choice of R, and to show its concavity as a function of Y . Recall the deﬁnition ( H, Y ) := sup {Tr[XH]− D ( X ||Y )}. (3.9) D D X >0,Tr[ X ]=1 123 E. A. Carlen,E.H.Lieb We wish to evaluate the supremum as explicitly as possible. 3.7 Lemma For H ∈ H and Y ∈ P , n n ( H, Y ) = 1 − Tr[Y ]+ inf λ ( H − log Q)) : Q ∈ P Tr[QY]≤ 1 D max n (3.10) where for any self-adjoint operator K , λ (K ) is the largest eigenvalue of K . max Our proof of (3.10) makes use of a Minimax Theorem; such theorems give condi- tions under which a function f (x , y) on A × B satisﬁes sup inf f (x , y) = inf sup f (x , y). (3.11) x ∈ A y∈ B y∈ B x ∈ A The original Minimax Theorem was proved by von Neumann [44]. While most of his m n paper deals with the case in which f is a bilinear function on R × R for some m and n, and A and B are simplexes, he also proves [44, p. 309] a more general results for functions on R × R that are quasi-concave in X and quasi convex in y. According to Kuhn and Tucker [27, p. 113], a multidimensional version of this is implicit in the paper. von Neumann’s work inspired host of researchers to undertake extensions and generalizations; [15] contains a useful survey. A theorem of Peck and Dulmage [34] serves our purpose. See [39] for a more general extension. 3.8 Theorem (Peck and Dulmage) Let X be a topological vector space, and let Y be a vector space. Let A ⊂ X be non-empty compact and convex, and let B ⊂ Y be non-empty and convex. Let f be a real valued function on A × B such that for each ﬁxed y ∈ B, x → f (x , y) is concave and upper semicontinuous, and for each ﬁxed x ∈ A, y → f (x , y) is convex. Then (3.11) is valid. Proof of Lemma 3.7 The formula (3.10) has been proved above. Deﬁne X = Y = M , A ={W ∈ P : Tr[W]= 1} and B := {W ∈ P : n n n Tr[WY]≤ 1}.For H ∈ H , deﬁne f ( X, Q) := Tr[ X ( H − log Q)]. Then the hypotheses of Theorem 3.8 are satisﬁed, and hence sup inf f ( X, Q) = inf sup f ( X, Q). (3.12) Q∈ B Q∈ B X ∈ A X ∈ A Using the deﬁnition (3.9) and the identity (3.12) ( H, Y ) + Tr[Y ]− 1 := sup Tr[XH]− sup {Tr[ X log Q]} X >0,Tr[ X ]=1 Q>0,Tr[QY ]≤1 := sup inf Tr X ( H − log Q) Q>0,Tr[QY ]≤1 X >0,Tr[ X ]=1 123 Some trace inequalities for exponential and logarithmic… = inf sup Tr X ( H − log Q)) Q>0,Tr[QY ]≤1 X >0,Tr[ X ]=1 = inf λ ( H − log Q)) (3.13) max Q>0,Tr[QY ]≤1 3.9 Lemma For each H ∈ H ,Y → ( H, Y ) is concave. n D Proof Fix Y > 0 and let A ∈ H be such that Y := Y ± A are both positive. Let Q n ± be optimal in the variational formula (3.10)for ( H, Y ). We claim that there exists c ∈ R so that c −c Tr[Y Qe ]≤ 1 and Tr[Y Qe ]≤ 1. (3.14) + − Suppose for the moment that this is true. Then 1 1 c −c λ ( H − log Q) = λ ( H − log(Qe ) + λ ( H − log(Qe ). max max max 2 2 By (3.14), 1 1 ( H, Y ) ≥ ( H, Y ) + ( H, Y ) D D + D 2 2 which proves midpoint concavity. The general concavity statement follows by conti- nuity. To complete this part of the proof, it remains to show that we can choose c ∈ R so that (3.14) is satisﬁed. Deﬁne a := Tr[QA]. Since Y ± A > 0, and Tr[ Q(Y ± A)] > 0, which is the same as 1 ± a > 0. That is, |a| < 1. We then compute c c c Tr[Y Qe ]= e Tr[YQ + AQ]= e (1 + a) −c −c and likewise, Tr[Y Qe ]− e (1 − a). We wish to choose c so that c −c e (1 + a) ≤ 1 and e (1 − a) ≤ 1. This is the same as log(1 − a) ≤ c ≤− log(1 + a). Since − log(1+a)−log(1−a) =− log(1−a )> 0. the interval [log(1−a), − log(1+ a)] is non-empty, and we may choose any c in this interval. We may now improve on Lemma 3.9: Not only is ( H, Y ) concave in Y ; its exponential is also concave in Y . 123 E. A. Carlen,E.H.Lieb 3.10 Theorem For all H ∈ H , the function Y → exp inf λ ( H − log Q)) (3.15) max Q>0,Tr[QY ]≤1 is concave on P . Moreover, for all H, K ∈ H , n n H +K H K log(Tr[e ]) ≤ inf λ ( H − log Q)) ≤ log(Tr[e e ]). (3.16) max Q>0,Tr[Qe ]≤1 These inequalities improve upon the Golden–Thompson inequality. Proof Let ( H, Y ) be the partial Legendre transform of D( X ||Y ) in X without any restriction on X: ( H, Y ) := sup {Tr[XH]− D ( X ||Y )}, (3.17) D D X >0 By [9, Theorem 1.1], and the joint convexity of D ( X ||Y ), ( H, Y ) is concave in D D Y for each ﬁxed H ∈ H . By Lemma 1.11, ( X,Y )+Tr[Y ]−1 ( H, Y ) = e − Tr[Y ], and thus we conclude ( H, Y ) = exp inf λ ( H − log Q)) − Tr[Y ]. (3.18) D max Q>0,Tr[QY ]≤1 The inequality ( H, Y ) ≤ ( H, Y ) follows from D ( X ||Y ) ≤ D( X ||Y ) and the D D order reversing property of Legendre transforms. Taking exponentials and writing Y = e yields the ﬁrst inequality in (3.16). Finally, choosing Q := so that Tr[e Y ] the constraint Tr[QY]≤ 1 is satisﬁed, we obtain ( H, Y ) ≤ log(Tr[e Y ]). Taking exponentials and writing Y = e now yields the second inequality in (3.16). The proof that the function in (3.15) is concave has two components. One is the identiﬁcation (3.18) of this function with ( H, Y ). The second makes use of the direct analog of an argument of Tropp [41] proving the concavity in Y of H +log Y Tr[e ]= ( H, Y ) + Tr[Y ] as a consequence of the joint convexity of the Umegaki relative entropy. Once one has the formula (3.18), the convexity of the func- tion in (3.15) follows from the same argument, applied instead to the Donald relative entropy, which is also jointly convex. However, it is of interest to note here that this argument can be run in reverse to deduce the joint convexity of the Donald relative entropy without invoking the joint convexity of the Umegaki relative entropy. To see this, note that Lemma 3.9 provides a simple direct proof of the concavity in Y of ( H, Y ). By the Fenchel-Moreau Theorem, for all density matrices X 123 Some trace inequalities for exponential and logarithmic… D ( X ||Y ) = sup {Tr[XH]− ( H, Y )}. (3.19) D D H ∈H For each ﬁxed H ∈ H , X, Y → Tr[XH]− ( H, Y ) is evidently jointly convex. n R Since the supremum of any family of convex functions is convex, we conclude that with the X variable restricted to be a density matrix, X, Y → D ( X ||Y ) is jointly convex. The restriction on X is then easily removed; see Lemma 3.11 below. This gives an elementary proof of the joint convexity of D ( X ||Y ). It is somewhat surprising the the joint convexity of the Umegaki relative entropy is deeper than the joint convexity of either D ( X ||Y ) or D ( X ||Y ). In fact, the simple D BS proof by Fujii and Kamei that the latter is jointly convex stems from a joint operator convexity result; see the discussion in Appendix C. The joint convexity of the Umegaki relative entropy, in contrast, stems from the basic concavity theorem in [30]. m n 3.11 Lemma Let f (x , y) be a (−∞, ∞] valued function on R × R that is homo- m m geneous of degree one. Let a ∈ R , and let K ={x ∈ R : a, x = 1}, and suppose that whenever f (x , y)< ∞, a, x > 0. If f is convex on K × R , then it m n is convex on R × R . n n Proof Let x , x ∈ R and y , y ∈ R . We may suppose that f (x , y ), f (x , y )< 1 2 1 2 1 1 2 2 ∞. Deﬁne α = a, x and α = a, x . Than α ,α > 0, and u /α , x /α ∈ K . 1 1 2 2 1 2 1 1 2 2 a With λ := α /(α + α ), 1 1 2 x x y y 1 2 1 2 f (x + x , y + y ) = (α + α ) f λ + (1 − λ) ,λ + (1 − λ) 1 2 1 2 1 2 α α α α 1 2 1 2 x y x y 1 1 2 2 ≤ (α + α )λ f , + (α + α )(1 − λ) f , 1 2 1 2 α α α α 1 1 2 2 = f (x , y ) + f (x , y ). 1 1 2 2 m m Thus, f is subaddtive on R ×R , and by the homogeneity once more, jointly convex. We next provide the proof of Proposition 1.9, which we recall says that any quantum relative entropy functional satisﬁes the inequality X W R( X ||W ) ≥ Tr[ X ] − (3.20) Tr[ X ] Tr[W ] for all X, W ∈ P , where · denotes the trace norm. n 1 Proof of Proposition 1.9 By scaling, it sufﬁces to show that when X and W are density matrices, 1 2 R( X ||W ) ≥ X − W (3.21) Let X and W be density matrices and deﬁne H = X − W.Let P be the spectral projection onto the subspace of C spanned be the eigenvectors of H with non-negative 123 E. A. Carlen,E.H.Lieb eigenvalues. Let A be the ∗-subalgebra of M generated by H and 1, and let E be the n A orthogonal projection in M equipped with the Hilbert-Schmidt inner product onto A. Then A → E A is a convex operation [13], and then by the joint convexity of R, R( X ||Y ) ≥ R(E X ||E Y ). (3.22) A A Since both E X and E Y belong to the commutative algebra A,(3.22) together with A A property (3) in the deﬁnition of quantum relative entropies then gives us R( X ||Y ) ≥ D(E X ||E Y ). A A Since E X − E Y = X − Y , the inequality now follows from the classical A A 1 1 Csiszar–Kullback–Leibler–Pinsker inequality [12,28,29,33,35] on a two-point prob- ability space. 3.12 Remark The proof of the lower bound (3.21) given here is essentially the same as the proof for the case of the Umegaki relative entropy given in [21]. The proof gives one reason for attaching importance to the joint convexity property, and since it is short, we spelled it out to emphasize this. We conclude this section with a brief discussion of the failure of convexity of the 1/2 −1/2 −1/2 1/2 function φ( X, Y ) = Tr X log(Y XY ) X . We recall that if we write this in 1/2 1/2 −1 1/2 1/2 the other order, i.e., deﬁne the function ψ( X, Y ) = Tr X log( X Y X ) X , the function ψ is jointly convex. In fact ψ is operator convex if the trace is omitted. We might have hoped, therefore, that φ would at least be convex in Y alone, and even −1/2 −1/2 have hoped that log(Y XY ) is operator convex in Y . Neither of these things is true. The following lemma precludes the operator convexity. 3.13 Lemma Let F be a function mapping the set of positive semideﬁnite matrices into itself. Let f :[0, ∞) → R be a concave, monotone increasing function. If Y → f ( F (Y )) is operator convex, then Y → F (Y ) is operator convex. Proof If Y → F (Y ) is not operator convex, then there is a unit vector v and there are density matrices Y and Y such that with Y = (Y + Y ), 1 2 1 2 v, F (Y )v < ( v, F (Y )v + v, F (Y )v ) . 1 2 By Jensen’s inequality, for all density matrices X, v, f ( F ( X ))v ≤ f ( v, F ( X )v ). Therefore, 1 1 ( v, f ( F (Y ))v + v, f ( F (Y ))v ) ≤ ( f ( v, F (Y )v ) + f ( v, F (Y )v )) 1 2 1 2 2 2 ≤ f ( v, F (Y )v + v, F (Y )v 1 2 < v, f ( F (Y ))v 123 Some trace inequalities for exponential and logarithmic… −1/2 −1/2 −1/2 −1/2 By the lemma, if Y → log(Y ZY ) were convex, Y → Y ZY would be convex. But this may be shown to be false in the 2 × 2 case by simple computations in an neighborhood of the identity with Z a rank-one projector. A more intricate computation of the same type shows that—even with the trace—convexity fails. 4 Exponential inequalities related to the Golden–Thompson inequality Let ( H, Y ) be given in (1.33) and ( H, Y ) be given in (3.17). We have seen in the previous section that the inequality D ( X ||Y ) ≤ D( X ||Y ) leads to the inequality ( H, Y ) ≤ ( H, Y ). This inequality, which may be written explicitly as H +log Y Tr[e ]≤ exp (inf{λ ( H − log Q)) : Q ∈ P Tr[QY]≤ 1}) , (4.1) max n immediately implies the Golden–Thompson inequality through the simple choice Q = H H e /Tr[Ye ].The Q chosen here is optimal only when H and Y commute. Otherwise, there is a better choice for Q, which will lead to a tighter upper bound. A similar analysis can be made with respect to the BS relative entropy. Deﬁne ( H, Y ) by BS ( H, Y ) := sup{Tr[HX]− D ( X ||Y ) : X ∈ P }. (4.2) BS BS n The inequality D( X ||Y ) ≤ D ( X ||Y ) together with Lemma 1.11 gives BS H +log Y ( H, Y ) ≤ ( H, Y ) = Tr[e ]− Tr[Y ]. (4.3) BS It does not seem possible to compute ( H, Y ) explicitly, but it is possible to give BS an alternate expression for it in terms of the solutions of a non-linear matrix equation similar to the one (3.5) that arises in the context of the Donald relative entropy. Writing out the identity X# Y = Y # X gives t 1−t 1/2 −1/2 −1/2 t 1/2 1/2 −1/2 −1/2 1−t 1/2 X ( X YX ) X = Y (Y XY ) Y . Differentiating at t = 0 yields 1/2 1/2 −1 1/2 1/2 1/2 −1/2 −1/2 −1/2 −1/2 1/2 X log( X Y X ) X = Y (Y XY ) log(Y XY )Y . This provides an alternate expression for D ( X ||Y ) that involves X in a somewhat BS simpler way that is advantageous for the partial Legendre transform in X: −1/2 −1/2 D ( X ||Y ) = Tr[Yf (Y XY )]− Tr[ X]+ Tr[Y ] (4.4) BS where f (x ) = x log x. A different derivation of this formula may be found in [23]. 123 E. A. Carlen,E.H.Lieb −1/2 −1/2 Introducing the variable R = Y XY we have, for all H ∈ H , −1/2 −1/2 Tr[XH]− D ( X ||Y ) = Tr[ X ( H + 1)]− TrTr[Yf (Y XY )]− Tr[Y ] BS 1/2 1/2 = Tr[ R(Y ( H + 1)Y )]− Tr[Yf ( R)]− Tr[Y ]. Therefore, 1/2 1/2 ( H, Y ) + Tr[Y]= sup Tr[ R(Y ( H + 1)Y )]− Tr[Yf ( R)] . (4.5) BS R∈P When Y and H commute, the supremum on the right is achieved at R = e since for this choice of R, 1/2 1/2 H H +log Y Tr[ R(Y ( H + 1)Y )]− Tr[Yf ( R)]= Tr[Ye ]= Tr[e ] and by (4.3), this is the maximum possible value. In general, without assuming that H and Y commute, this choice of R and (4.3) yields an interesting inequality. 4.1 Theorem For all self-adjoint H and L , H L H +L H L H L/2 L/2 Tr[e e ]− Tr[e ]≤ Tr[e He ]− Tr[e e He ]. (4.6) Proof With the choice R = e , the inequality (4.3) together with (4.5) yields H 1/2 1/2 H H +log Y Tr[e (Y HY + Y )]− Tr[Ye H]≤ Tr[e ] or, rearranging terms, H H +log Y H H 1/2 1/2 Tr[e Y ]− Tr[e ]≤ Tr[e HY ]− Tr[e (Y HY )]. The inequality is proved by writing Y = e . We now turn to the speciﬁcation of the actual maximizer. 4.2 Lemma For K ∈ H and Y ∈ P , the function n n R → Tr[RK]− Tr[Yf ( R)] on P has a unique maximizer R in P which is contained in P , and R is the n K ,Y n n K ,Y unique critical point of this function in P . Proof Since f is strictly operator convex, R → Tr[RK ]− Tr[Yf ( R)] is strictly con- cave. There are no local maximizers on the boundary on P since lim (− f (x )) = n x ↓0 ∞, so that if R has a zero eigenvalue, a small perturbation of R will yield a higher value. 123 Some trace inequalities for exponential and logarithmic… Finally, Tr[RK]− Tr[Yf ( R)]≤ K Tr[ R − R log R] −1 where a = K Y . This shows that 1/a sup {Tr[RK]− Tr[Yf ( R)]} = sup{Tr[RK]− Tr[Yf ( R)]: R ≥ 0 R ≤ e }. R∈P since the set on the right is compact and convex, and since the function R → Tr[RK ]− Tr[Yf ( R)] is strictly concave and upper-semicontinuous on this set, there exists a unique maximizer, which we have seen must be in the interior, and by the strict concavity, there can be no other interior critical point. It is now a simple matter to derive the Euler–Lagrange equation that determines the maximizer in Lemma 4.2. The integral representation for f ( A) = A log A is A log A = − 1 + dλ λ + 1 λ + A and then one readily concludes that the unique maximizer R to the variational H,Y problem in (4.5) is the unique solution in P of Y 1 1 1/2 1/2 − λ Y dλ = Y ( H + 1)Y . λ + 1 λ + R λ + R When H and Y commute, one readily checks that R = e is the unique solution in P . We now show how some of the logarithmic inequalities that follow from Theo- H +log Y rem 1.1 may be used to get upper and lower bounds on Tr[e ]. Given two positive matrices W and V , one way to show that Tr[W]≤ Tr[V ] is to show that Tr[W log W]≤ Tr[W log V ]. (4.7) Then 0 ≤ D(W ||V ) = Tr[W log W]− Tr[W log V ]− Tr[W]+ Tr[V ] ≤−Tr[W]+ Tr[V ]. (4.8) Thus, when (4.7) is satisﬁed, one not only has Tr[W]≤ tr [V ], but the stronger bound D(W ||V ) + Tr[W]≤ Tr[V ]. 4.3 Theorem Let H, K ∈ H For r > 0, deﬁne rH rK 1/r (1−s) H +sK W := (e # e ) and V := e . (4.9) 123 E. A. Carlen,E.H.Lieb Then for s ∈[0, 1], D(V ||W ) + Tr[W]≤ Tr[V ]. (4.10) Proof By the remarks preceding the theorem, it sufﬁces to show that for this choice H K of V and W,Tr[W log W]≤ Tr[W log V ]. Deﬁne X = e and Y = e . The identity A = ( A# B)# B (4.11) s −s/(1−s) valid for A, B ∈ P . is the special case of Theorem C.4 in which t = 1, t = n 1 r rH r rK −t /(t − t ) and t = s. Taking A = X = e and B = Y = e ,wehave 0 0 0 r r r X = W # Y , with β =−s/(1 − s). Therefore, by (2.22), r r Tr[W log X]= Tr[W log(W # Y )] ≥ Tr[W ((1 − β) log W + β log Y )]. Since 1 β log X − log Y = (1 − s) log X + s log Y = log V 1 − β 1 − β this last inequality is equivalent to Tr[W log W]≤ Tr[W log V ]. 4.4 Remark Since D(W ||V)> 0 unless W = V,(4.10) is stronger than the inequal- ity Tr[W]≤ Tr[V ] which is the complemented Golden–Thompson inequality of Hiai and Petz [23]. Their proof is also based on (2.22), together with an identity equiva- lent to (4.11), but they employ these differently, thereby omitting the remainder term D(W ||V ). We remark that one may obtain at least one of the cases of (1.10) directly from (4.2) and (4.3) by making an appropriate choice of X in terms of H and Y : Deﬁne 1/2 H X := Y #e . Then 1/2 −1 1/2 1/2 H X Y X = X # Y = e , −1 and, therefore, making this choice of X, H 2 H 2 H 2 ( H, Y ) ≥ Tr[(Y #e ) H]− Tr[(Y #e ) H]+ Tr[(Y #e ) ]− Tr[Y ] BS H 2 = Tr[(Y #e ) ]− Tr[Y ]. H 2 H +log Y This proves Tr[(Y #e ) ]≤ Tr[e ] which is equivalent to the r = 1/2, t = 1/2 case of (1.10). Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 Interna- tional License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. 123 Some trace inequalities for exponential and logarithmic… Appendices A The Peierls–Bogoliubov Inequality and the Gibbs Variational Principle For A ∈ H ,let σ( A) denote the spectrum of A, and let A = λ P be the spec- n λ λ∈σ( A) tral decomposition of A. For a function f deﬁned σ( A), f ( A) = f (λ) P . λ∈σ( A) Likewise, for B ∈ H ,let B = μ Q be the spectral decomposition of B. n μ μ∈σ( B) Let f be convex and differentiable on an interval containing σ( A) ∪σ( B). Then, since & & P = Q = 1, λ μ λ∈σ( A) μ∈σ( B) Tr[ f ( B) − f ( A) − f ( A)( B − A)] = [ f (μ) − f (λ) − f (λ)(λ − μ)]Tr[ P Q ]. (A.1) λ μ λ∈σ( A) μ∈σ( B) For each μ and λ both [ f (μ) − f (λ) − f (λ)(λ − μ)] and Tr[ P Q ] are non-negative, λ μ and hence the right side of (A.2) is non-negative. This yields Klein’s inequality [25]: Tr[ f ( B)]≥ Tr[ f ( A)]+ Tr[ f ( A)( B − A)]. (A.2) Now suppose that the function f is strictly convex on an interval containing σ( A) ∪ σ( B), Then for μ = λ, [ f (μ) − f (λ) − f (λ)(λ − μ)] > 0. If there is equality in (A.2), then for each λ ∈ σ( A) and μ ∈ σ( B) such that λ = μ,Tr[ P Q ]= 0. Since λ μ Tr[ P Q ]= Tr[ P ] > 0, λ ∈ σ( B) and P ≤ Q . The same reasoning λ μ λ λ λ μ∈σ( B) shows that for each μ ∈ σ( B), μ ∈ σ( A) and Q ≤ P . Thus, there is equality in μ λ Klein’s inequality if and only if A = B. t B A A Taking f (t ) = e ,(A.2) becomes Tr[e ]≥ Tr[e ]+ Tr[e ( B − A)].For c ∈ R and H, K ∈ H , choose A = c + H and B = H + K to obtain H +K c H c H Tr[e ]≥ e Tr[e ]+ e Tr[e (K − c)]. H H H H H +K Tr[e K ]/Tr[e ] H Choosing c = Tr[e K ]/Tr[e ], we obtain Tr[e ]≥ e Tr[e ] which can be written as Tr[e K ] H +K H ≤ log(Tr[e ]) − log(Tr[e ]), (A.3) Tr[e ] the Peierls–Bogoliubov inequality [8], valid for all H, K ∈ H . The original application of Klein’s inequality was to the entropy. It may be used to prove the non-negativity of the relative entropy. Let A, B ∈ P , and apply Klein’s inequality with f (x ) = x log x to obtain Tr[ B log B]≥ Tr[ A log A]+ Tr[(1+ log A)( B − A)]= Tr[ B]−Tr[ A]+ Tr[ B log A]. Rearranging terms yields Tr[ B(log B − log A)]+ Tr[ A]− Tr[ B]≥ 0; that is, D( B|| A) ≥ 0. 123 E. A. Carlen,E.H.Lieb The Peierls–Bogoliubov Inequality has as a direct consequence the quantum Gibbs H H Variational Principle. Suppose that H ∈ H and Tr[e ]= 1. Deﬁne X := e so that X is a density matrix. Then (A.3) specializes to log X +K Tr[XK]≤ log(Tr[e ]), (A.4) which is valid for all density matrices X and all K ∈ H . Replacing K in (A.4) with K − log X yields Tr[XK]≤ log(Tr[e ]) + Tr[ X log X ]. (A.5) For ﬁxed X, there is equality in (A.5)for K = log X, and for ﬁxed K , there is equality K K in (A.5)for X := e /Tr[e ]. It follows that for all density matrices X, Tr[ X log X]= sup{Tr[XK]− log(Tr[e ]) : K ∈ H } (A.6) and that for all K ∈ H , log(Tr[e ]) = sup{Tr[XK]− Tr[ X log X]: X ∈ P Tr[ X]= 1}. (A.7) This is the Gibbs variational principle for the entropy S( X ) =−Tr[ X log X ]. Now let Y ∈ P and replace K with K + log Y in (A.5) to conclude that for all density matrices X,all Y ∈ P and all K ∈ H , n n K +log Y Tr[XK]≤ log(Tr[e ]) + Tr[ X (log X − log Y )] K +log Y = (log(Tr[e ]) + 1 − Tr[Y ]) + D( X ||Y ). (A.8) For ﬁxed X, there is equality in (A.8)for K = log X − log Y , and for ﬁxed K , there K +log Y K +log Y is equality in (A.5)for X := e /Tr[e ]. Recalling that for Tr[ X]= 1, Tr[ X (log X − log Y )]= D( X ||Y ) + 1 − Tr[Y ], we have that for all density matrices X, and all Y ∈ P , K +log Y D( X ||Y ) = sup{Tr[XK]− (log(Tr[e ]) + Tr[Y ]− 1) : K ∈ H } (A.9) and that for all K ∈ H and all Y ∈ P , n n K +log Y log(Tr[e ]) + 1−Tr[Y ]= sup{Tr[XK]− D( X ||Y ) : X ∈ P Tr[ X]= 1}. (A.10) The paper [3] of Araki contains a discussion of the Peierls–Bogoliubov and Golden– Thompson inequalities in a very general von Neumann algebra setting. 123 Some trace inequalities for exponential and logarithmic… B Majorization inequalities Let x = (x ,..., x ) and y = ( y ,..., y ) be two vectors in R such that x ≤ x 1 n 1 n j +1 j and y ≤ y for each j = 1,..., n − 1. Then y is said to majorize x in case j +1 j k k n n x ≤ y for k = 1,..., n − 1 and x = y . (B.1) j j j j j =1 j =1 j =1 j =1 and in this case we write x ≺ y. Amatrix P ∈ M is doubly stochastic in case P has non-negative entries and the entries in each row and column sum to one. By a theorem of Hardy, Littlewood and Pólya [19], x ≺ y if and only if there is a doubly stochastic matrix P such that x = Py. Therefore, if φ is convex on R and x ≺ y,let P be a doubly stochastic matrix such that x = Py. By Jensen’s inequality ' ( n n n n n φ(x ) = φ P y ≤ P φ ( y ) = φ( y ). j j,k k j,k k k j =1 j =1 k=1 j,k=1 k=1 That is, for every convex function φ, n n x ≺ y ⇒ φ(x ) ≤ φ( y ). (B.2) j j j =1 j =1 X Y Let X, Y ∈ H , and let λ and λ be the eigenvalue sequences of X and Y respectively with the eigenvalues repeated according to their geometric multiplicity and arranged in decreasing order considered as vectors in R . Then Y is said to majorize X Y X in case λ ≺ λ , and in this case we write X ≺ Y . It follows immediately from (B.2) that if φ is an increasing convex function, X ≺ Y ⇒ Tr[φ( X )]≤ Tr[φ(Y )] and Tr[ X]= Tr[Y ]. (B.3) The following extends a theorem of Bapat and Sunder [5]: B.1 Theorem Let : M → M be a linear transformation such that ( A) ≥ 0 for n n all A ≥ 0, (1) = 1 and Tr[( A)]= Tr[ A] for all A ∈ M . Then for all A ∈ H , n n ( X ) ≺ X. (B.4) Proof Note that ( X ) ∈ H .Let ( X ) = λ |v v | be the spectral resolution n j j j j =1 of ( X ) with λ ≥ λ for j = 1,..., n − 1, Fix k ∈{1,..., n − 1}.and let j j +1 P = |v v |. Then with denoting the adjoint of with respect to the k j j j =1 Hilbert-Schmidt inner product, 123 E. A. Carlen,E.H.Lieb λ = Tr[ P ( X )] j k j =1 = Tr[ ( P ) X]≤ sup{Tr[QX ], 0 ≤ Q ≤ 1, Tr[ Q]= k}= μ k j j =1 where {μ ,...,μ } is the eigenvalue sequence of X arranged in decreasing order. 1 k Bapat and Sunder prove this for of the form ( A) = V AV where Let j =1 j V ,..., V ∈ M satisfy 1 m n m m ∗ ∗ V V = 1 = V V . (B.5) j j j j j =1 j =1 Choi [10,11] has shown that, for all n ≥ 2, the transformation ( A) = ((n − 1)Tr[ A]1 − A) n − n − 1 cannot be written in the form (B.5), yet it satisﬁes the conditions of Theorem B.1. B.2 Lemma Let A ∈ P and let be deﬁned by ∞ 1/2 1/2 A A ( X ) = X dλ (B.6) λ + A λ + A . Then for all X ∈ H ,(B.4) is satisﬁed, and for all p ≥ 1, p p Tr[|( X )| ]≤ Tr[| X | ]. (B.7) Proof evidently satisﬁes the conditions of Theorem B.1, and then (B.4) implies (B.7) as discussed above. C Geodesics and geometric means There is a natural Riemannian metric on P such that the corresponding distance δ( X, Y ) is invariant under conjugation: ∗ ∗ δ( A XA, A YA) = δ( X, Y ) for all X, Y ∈ P and all invertible n × n matrices A. It turns out that for A, B ∈ P , n n t → A# B, t ∈[0, 1], is a constant speed geodesic for this metric that connects A and B. This geometric point of view, originating in the work of statisticians, and was developed in the form presented here by Bhatia and Holbrook [7]. 123 Some trace inequalities for exponential and logarithmic… C.1 Deﬁnition Let t → X (t ), t ∈[a, b], be a smooth path in P . The arc-length along this path in the conjugation invariant metric is −1/2 −1/2 X (t ) X (t ) X (t ) dt, where · denotes the Hilbert–Schmidt norm and the prime denotes the derivative. The corresponding distance between X, Y ∈ P is deﬁned by −1/2 −1/2 δ( X, Y ) = inf X (t ) X (t ) X (t ) dt : X (t ) ∈ P for t ∈ (0, 1), X (0) = X, X (1) = Y . 2 n To see the conjugation invariance, let the smooth path X (t ) be given, let an invertible matrix A be given, and deﬁne Z (t ) := A X (t ) A. Then by cyclicity of the trace, −1/2 −1/2 2 −1 −1 Z (t ) Z (t ) Z (t ) = Tr[ Z (t ) Z (t ) Z (t ) Z (t )] −1 −1 −1 = Tr[ A X (t ) X (t ) X (t ) X (t ) A] −1/2 −1/2 2 = X (t ) X (t ) X (t ) . H (t ) Given any smooth path t → X (t ), deﬁne H (t ) := log( X (t )) so that X (t ) = e , and then 1−s s X (t ) = X (t ) H (t ) X (t ) ds (C.1) or equivalently, 1 1 H (t ) = X (t ) dλ λ + X (t ) λ + X (t ) ∞ 1/2 1/2 X (t ) X (t ) −1/2 −1/2 = ( X (t ) X (t ) X (t ) ) dλ. (C.2) λ + X (t ) λ + X (t ) −1/2 −1/2 Lemma B.2 yields H (t ) ≺ X (t ) X (t ) X (t ) and its consequence −1/2 −1/2 H (t ) ≤ X (t ) X (t ) X (t ) . (C.3) 2 2 Now let X (t ) be a smooth path in P with X (0) = X and X (1) = Y . Then, with H (t ) = log X (t ) log Y − log X = H (t )dt 1 1 −1/2 −1/2 ≤ H (t ) dt ≤ X (t ) X (t ) X (t ) dt = δ( X, Y ). 2 2 0 0 (C.4) 123 E. A. Carlen,E.H.Lieb If X and Y commute, this lower bound is exact: Given X, Y ∈ P that commute, H (t ) deﬁne H (t ) = (1 − t ) log X + t log Y , and X (t ) = e . Then H (t ) = log Y −log X, independent of t. Hence all of the inequalities in (C.4) are equalities. Moreover, if there is equality in (C.4), the necessarily H (s) = H (t )dt = log Y − log X for all s ∈[0, 1]. This proves: C.2 Lemma When X, Y ∈ P commute, there is exactly one constant speed geodesic (1−t ) log X +t log Y running from X to Y in unit time, namely, X (t ) = e , and δ( X, Y ) = log Y − log X . Since conjugation is an isometry in this metric, it is now a simple matter to ﬁnd the explicit formula for the geodesic connecting X and Y in P . Apart from the statement on uniqueness, the following theorem is due to Bhatia and Holbrook [7]. C.3 Theorem For all X, Y ∈ P , there is exactly one constant speed geodesic running from X to Y in unit time, namely, 1/2 −1/2 −1/2 t 1/2 X (t ) = X# Y := X ( X YX ) X (C.5) and −1/2 −1/2 δ( X, Y ) = log( X YX ) . Proof By Lemma C.2, the unique constant speed geodesic running from 1 to −1/2 −1/2 −1/2 −1/2 t X YX in unit time is W (t ) = ( X YX ) ; it has the constant speed −1/2 −1/2 log( X YX ) , and −1/2 −1/2 −1/2 −1/2 −1/2 −1/2 δ(1, X YX ) = log( X YX ) = δ(1, X YX ). −1/2 −1/2 By the conjugation invariance of the metric, δ( X, Y ) = δ(1, X YX ) and X (t ) as deﬁned in (C.5) has the constant speed δ( X, Y ) and runs from X to Y in unit time. Thus it is a constant speed geodesic running from X to Y in unit time. −1/2 −1/2 ) ) If there were another such geodesic, say X (t ), then X X (t ) X wouldbea −1/2 −1/2 constant speed geodesic running from 1 to X YX in unit time, and different form W (t ), but this would contradict the uniqueness in Lemma C.2. In particular, the midpoint of the unique constant speed geodesic running from X to Y in unit time is the geometric mean of X and Y as originally deﬁned by Pusz and Woronowicz [36]: 1/2 −1/2 −1/2 1/2 1/2 X#Y = X ( X YX ) X . 123 Some trace inequalities for exponential and logarithmic… In fact, the Riemannian manifold (P ,δ) is geodesically complete: The smooth path 1/2 −1/2 −1/2 t 1/2 t → X ( X YX ) X := X# Y is well deﬁned for all t ∈ R. By the conjugation invariance and Lemma C.2, for all s, t ∈ R, −1/2 −1/2 s −1/2 −1/2 t −1/2 −1/2 δ( X# Y, X# Y ) = δ(( X YX ) ,( X YX ) ) =|t − s| log( X YX ) . s t 2 Since the speed along the curve T → X# Y has the constant value −1/2 −1/2 log( X YX ) , this, together with the uniqueness in Theorem C.3,shows that for all t < t in R, the restriction of t → X# Y to [t , t ] is the unique constant 0 1 t 0 1 speed geodesic running from X# Y to X# Y in time t − t . t t 1 0 0 1 This has a number of consequences. C.4 Theorem Let X, Y ∈ P , and t , t ∈ R. Then for all t ∈ R n 0 1 X# Y = ( X# Y )# ( X# Y ). (C.6) (1−t )t +tt t t t 0 1 0 1 Proof By what we have noted above, t → X# Y is a constant speed geodesic (1−t )t +tt 0 1 running from X# Y to X# Y in unit time, as is t → ( X# Y )# ( X# Y ). The identity t t t t t 0 1 0 1 (C.6) now follows from the uniqueness in Theorem C.3. Taking t = 0 and t = s, we have the special case 0 1 X# Y = X# ( X# Y ). (C.7) ts t s Taking t = 1 and t = 0, we have the special case 0 1 X# Y = Y # X. (C.8) 1−t t The identity (C.8) is well-known, and may be derived directly from the formula in (C.5). We are particularly concerned with t → X# Y for t ∈[−1, 2]. Indeed, from the formulain(C.5), 1 1 X# Y = X X and X# Y = Y Y. (C.9) −1 2 Y X Let t ∈ (0, 1). By combining the formula 1/2 −1/2 −1/2 t 1/2 1/2 1/2 −1 1/2 −t 1/2 X# Y = X ( X YX ) X = X ( X Y X ) X with the integral representation ∞ ∞ sin(π t ) 1 sin(π t ) 1 −t −t t A = λ dλ = λ dλ π λ + A π 1 + λ A 0 0 123 E. A. Carlen,E.H.Lieb we obtain, for t ∈ (0, 1), sin(π t ) 1 t 1/2 1/2 X# Y = λ X X dλ 1/2 −1 1/2 π 1 + λ X Y X sin(π t ) 1 = λ dλ. (C.10) −1 −1 π X + λY The merit of this formula lies in the following lemma [1]: −1 −1 −1 C.5 Lemma (Ando) The function ( A, B) → ( A + B ) is jointly concave on P . −1 −1 −1 −1 Proof Note that A + B = A ( A + B) B , so that −1 −1 −1 −1 −1 ( A + B ) = B( A + B) A = (( A + B) − A)( A + B) A −1 = A − A( A + B) A −1 and the claim now follows form the convexity of ( A, B) → A( A + B) A [24]. The harmonic mean of positive operators A and B, A : B, is deﬁned by −1 −1 −1 A : B := 2( A + B ) (C.11) and hence Lemma C.5 says that ( A, B) → A : B is jointly concave. Moreover, (C.10) can be written in terms of the harmonic mean as sin(π t ) X# Y = X : (λY )λ dλ (C.12) 2π which expresses weighted geometric means as average over harmonic means. By the −1 operator monotonicity of the map A → A ,the map X, Y → X : Y is monotone in each variable, and then by (C.12)thisisalsotrueof X, Y → X# Y . This proves the following result of Ando and Kubo [26]: C.6 Theorem (Ando and Kubo) For all t ∈[0, 1], ( X, Y ) → X# Y is jointly concave, and monotone increasing in X and Y . The method of Ando and Kubo can be used to prove joint operator concavity theorems for functions on P × P that are not connections. The next theorem, due to n n Fujii and Kamei [17], provides an important example. 1/2 1/2 −1 1/2 1/2 C.7 Theorem The map ( X, Y ) →− X log( X Y X ) X is jointly concave. Proof The representation 1 1 log A = − dλ λ + 1 λ + A 123 Some trace inequalities for exponential and logarithmic… yields 1 1 1/2 1/2 −1 1/2 1/2 − X log( X Y X ) X = − X dλ −1 −1 X + (λY ) λ + 1 from which the claim follows. C.8 Theorem For all t ∈[−1, 0]∪[1, 2],the map ( X, Y ) → X# Y is jointly convex. Proof First suppose that t ∈[0, 1]. The case t = 0 is trivial, and since X# Y = −1 −1 XY X which is convex, we may suppose that t ∈ (−1, 0).Let s =−t so that s ∈ (0, 1). We use the integral representation sin π s 1 1 s s A = λ − dλ π λ λ + A valid for A ∈ P and s ∈ (0, 1) to obtain sin π s 1 dλ X# Y = λ X − −1 −1 π X + (λY ) λ which by Lemma C.5 is jointly convex. Finally, the identity Y # X = X# Y shows 1−t t that the joint convexity for t ∈[1, 2] follows from the joint convexity for t ∈[−1, 0]. The special cases t =−1 and t = 2, whichby(C.9) can be expressed without discussing means, are proved in [1,31]. References 1. Ando, T.: Concavity of certain maps on positive deﬁnite matrices and applications to Hadamard prod- ucts. Linear Algebra Appl. 26, 203–241 (1979) 2. Ando, T., Hiai, F.: Log majorization and complementary Golden–Thompson type inequalities. Linear Algebra Appl. 197, 113–131 (1994) 3. Araki, H.: Golden–Thompson and Peierls–Bogoliubov inequalities for a general von Neumann algebra. Commun. Math. Phys. 34, 167–178 (1973) 4. Araki, H.: On an inequality of Lieb and Thirring. Lett. Math. Phys. 19, 167–170 (1990) 5. Bapat, R.B., Sunder, V.S.: On majorization and Schur products. Linear Algebra Appl. 72, 107–117 (1995) 6. Belavkin, V.P., Staszewski, P.: C -algebraic generalization of relative entropy and entropy. Ann. Inst. Henri Poincaré Sect. A 37, 51–58 (1982) 7. Bhatia, R., Holbrook, J.: Riemannian geometry and matrix geometric means. Linear Algebra Appl. 181, 594–168 (1993) 8. Bogoliubov, N.N.: On a variational principle in the many body problem. Soviet Phys. Doklady 3, 292 (1958) 9. Carlen, E.A., Lieb, E.H.: A Minkowski-type trace inequality and strong subadditivity of quantum entropy II: convexity and concavity. Lett. Math. Phys. 83, 107–126 (2008) 10. Choi, M.D.: Positive linear maps on C algebras. Can. J. Math. 24, 520–529 (1972) 11. Choi, M.D.: Completely positive linear maps on complex matrices. Linear Algebra Appl. 10, 285–290 (1975) 12. Csiszár, I.: Information-type measures of difference of probability distributions and indirect observa- tions. Studia Sci. Math. Hungar. 2, 299–318 (1967) 123 E. A. Carlen,E.H.Lieb 13. Davis, C.: Various averaging operations onto subalgebras. Ill. J. Math. 3, 528–553 (1959) 14. Donald, M.J.: On the relative entropy. Commun. Math. Phys. 105, 13–34 (1986) 15. Frenk, J.B.G., Kassay, G., Kolumbán, J.: On equivalent results in minimax theory. Eur. J. Oper. Res. 157, 46–58 (2004) 16. Frieedland, S., So, W.: On the product of matrix exponentials. Lin. alg. Appl. 196, 193–205 (1994) 17. Fujii, J.I., Kamei, E.: Relative operator entropy in noncommutative information theory. Math. Japon. 34, 341–348 (1989) 18. Hansen, F.: Quantum entropy derived from ﬁrst principles. J. Stat. Phys. 165, 799–808 (2016) 19. Hardy, G.H., Littlewood, J.E., Pólya, G.: Some simple inequalities satisﬁed by convex functions. Messenger Math 58(145–152), 310 (1929) 20. Hiai, F.: Equality cases in matrix norm inequalities of Golden–Thompson type. Linear Multilinear Algebra 36, 239–249 (1994) 21. Hiai, F., Ohya, M., Tsukada, M.: Sufﬁciency, KMS condition, and relative entropy in von Neumann algebras. Pac. J. Math. 96, 99–109 (1981) 22. Hiai, F., Petz, D.: The proper formula for relative entropy and its asymptotics in quantum probability. Commun. Math. Phys. 413, 99–114 (2006) 23. Hiai, F., Petz, D.: The Golden–Thompson trace inequality is complemented. Linear Algebra Appl. 181, 153–185 (1993) 24. Kiefer, J.: Optimum experimental designs. J. R. Stat. Soc. Ser. B 21, 272–310 (1959) 25. Klein, O.: Zur Quantenmechanischen Begründung des zweiten Hauptsatzes der Wärmelehre Z. Physik 72, 767–775 (1931) 26. Kubo, F., Ando, T.: Means of positive linear operators. Math. Ann. 246, 205–224 (1980) 27. Kuhn, H.W., Tucker, A.W.: John von Neumann’s work in the theory of games and mathematical economics. Bull. Am. Math. Soc. 64, 100–122 (1958) 28. Kullback, S., Leibler, R.A.: On information and sufﬁciency. Ann. Math. Stat. 22(1951), 79–86 (1951) 29. Kullback, S.: Lower bound for discrimination information in terms of variation. IEEE Trans. Inf. Theory 13, 126–127. Correction 16(1970), 652 (1967) 30. Lieb, E.H.: Convex trace functions and the Wigner–Yanase–Dyson conjecture. Adv. Math. 11, 267–288 (1973) 31. Lieb, E.H., Ruskai, M.B.: Some operator inequalities of the Schwarz type. Adv. Math. 12, 269–273 (1974) 32. Lindblad, G.: Expectations and entropy inequalities for ﬁnite quantum systems. Comm. Math. Phys. 39, 111–119 (1974) 33. Moakher, M.: A differential geometric approach to the geometric mean of symmetric positive deﬁnite matrices. SIAM J. Matrix Anal. Appl. 26, 735–747 (2005) 34. Peck, J.E.L., Dumage, A.L.: Games on a compact set. Canadian Journal of Mathematics 9, 450–458 (1957) 35. Pinsker, M.S.: Information and Information Stability of Random Variables and Processes. Holden Day (1964) 36. Pusz, W., Woronowicz, S.L.: Functional calculus for sesquilinear forms and the puriﬁcation map. Rep. Math. Phys. 8, 159–170 (1975) 37. Pusz, W., Woronowicz, S.L.: Form convex functions and the WYDL and other inequalities. Lett. Math. Phys. 2, 505–512 (1978) 38. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970) 39. Sion, M.: On general minimax theorems. Pac. J. Math. 8, 171–175 (1958) 40. Skovgaard, L.T.: A Riemannian geometry of the multivariate normal model. Scand. J. Statistics 11, 211–223 (1984) 41. Tropp, J.: From joint convexity of quantum relative entropy to a concavity theorem of Lieb. Proc. Am. Math. Soc. 140, 1757–1760 (2012) 42. Uhlmann, A.: Relative entropy and the Wigner–Yanase–Dyson–Lieb concavity in an interpolation theory. Commun. Math. Phys. 54, 21–32 (1977) 43. Umegaki, H.: Conditional expectation in an operator algebra, IV (entropy and information). Kodai Math. Sem. Rep. 14, 59 85 (1962) 44. Von Neumann, J.: Zur Theorie der Gesellschaftsspiele. Math. Annalen. 100, 295–320 (1928) Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afﬁliations.
Bulletin of Mathematical Sciences – Springer Journals
Published: May 29, 2018
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.