Transcription

Acta Applicandae Mathematicae, Volume 80, Issue 2, pp. 199–220, January 2004Riemannian geometry of Grassmann manifoldswith a view on algorithmic computationP.-A. Absil R. Mahony†R. Sepulchre‡Last revised: 14 Dec 2003PREPRINTAbstract. We give simple formulas for the canonical metric, gradient, Liederivative, Riemannian connection, parallel translation, geodesics and distanceon the Grassmann manifold of p-planes in Rn . In these formulas, p-planes arerepresented as the column space of n p matrices. The Newton method onabstract Riemannian manifolds proposed by S. T. Smith is made explicit onthe Grassmann manifold. Two applications –computing an invariant subspaceof a matrix and the mean of subspaces– are worked out.Key words. Grassmann manifold, noncompact Stiefel manifold, principal fiber bundle, Levi-Civitaconnection, parallel transportation, geodesic, Newton method, invariant subspace, mean of subspaces.AMS subject classification: 65J05 General theory (Numerical analysis in abstract spaces); 53C05Connections, general theory; 14M15 Grassmannians, Schubert varieties, flag manifolds.1IntroductionThe majority of available numerical techniques for optimization and nonlinear equations assume an underlying Euclidean space. Yet many computational problems are posed on nonEuclidean spaces. Several authors [Gab82, Smi94, Udr94, Mah96, MM02] have proposedabstract algorithms that exploit the underlying geometry (e.g. symmetric, homogeneous, Riemannian) of manifolds on which problems are cast, but the conversion of these abstractgeometric algorithms into numerical procedures in practical situations is often a nontrivialtask that critically relies on an adequate representation of the manifold.The present paper contributes to addressing this issue in the case where the relevantnon-Euclidean space is the set of fixed dimensional subspaces of a given Euclidean space.This non-Euclidean space is commonly called the Grassmann manifold. Our motivation for School of Computational Science and Information Technology, Florida State University, TallahasseeFL 32306-4120. Part of this work was done while the author was a Research Fellow with the Belgian National Fund for Scientific Research (Aspirant du F.N.R.S.) at the University of Liège. URL:www.montefiore.ulg.ac.be/ absil†Department of Engineering, Australian National University, ACT, 0200, Australia.‡Department of Electrical Engineering and Computer Science, Université de Liège, Bât. B28 Systèmes,Grande Traverse 10, B-4000 Liège, Belgium. URL: www.montefiore.ulg.ac.be/systems1

considering the Grassmann manifold comes from the number of applications that can beformulated as finding zeros of fields defined on the Grassmann manifold. Examples includeinvariant subspace computation and subspace tracking; see e.g. [Dem87, CG90] and referencestherein.A simple and robust manner of representing a subspace in computer memory is in the formof a matrix array of double precision data whose columns span the subspace. Using this representation technique, we produce formulas for fundamental Riemannian-geometric objects onthe Grassmann manifold endowed with its canonical metric: gradient, Riemannian connection, parallel translation, geodesics and distance. The formulas for the Riemannian connectionand geodesics directly yield a matrix expression for a Newton method on Grassmann, andwe illustrate the applicability of this Newton method on two computational problems cast onthe Grassmann manifold.The classical Newton method for computing a zero of a function F : Rn Rn can beformulated as follows [DS83, Lue69]: Solve the Newton equationDF (x)[η] F (x)(1)for the unknown η Rn and compute the updatex : x η.(2)When F is defined on a non-Euclidean manifold, a possible approach is to choose local coordinates and use the Newton method as in (1)-(2). However, the successive iterates onthe manifold will depend on the chosen coordinate system. Smith [Smi93, Smi94] proposesa coordinate-independent Newton method for computing a zero of a C one-form µ on anabstract complete Riemannian manifold M . He suggests to solve the Newton equation η µ µx(3)for the unknown η Tx M , where denotes the Riemannian connection (also called LeviCivita connection) on M , and update along the geodesic as x : Expx η. It can be proventhat if x is chosen suitably close to a point x̂ in M such that µx̂ 0 and Tx̂ M 3 η 7 η µ isnondegenerate, then the algorithm converges quadratically to x̂. We will refer to this iterationas the Riemann-Newton method.In practical cases it may not be obvious to particularize the Riemann-Newton method intoa concrete algorithm. Given a Riemannian manifold M and an initial point x on M , one maypick a coordinate system containing x, compute the metric tensor in these coordinates, deducethe Christoffel symbols and obtain a tensorial equation for (3), but this procedure is oftenexceedingly complicated and computationally inefficient. One can also recognize that theRiemann-Newton method is equivalent to the classical Newton method in normal coordinatesat x [MM02], but obtaining a tractable expression for these coordinates is often elusive.On the Grassmann manifold, a formula for the Riemannian connection was given byMachado and Salavessa in [MS85]. They identify the Grassmann manifold with the set ofprojectors into subspaces of Rn , embed the set of projectors in the set of linear maps from Rn toRn (which is an Euclidean space), and endow this set with the Hilbert-Schmidt inner product.The induced metric on the Grassmann manifold is then the essentially unique On -invariantmetric mentioned above. The embedding of the Grassmann manifold in an Euclidean space2

allows the authors to compute the Riemannian connection by taking the derivative in theEuclidean space and projecting the result into the tangent space of the embedded manifold.They obtain a formula for the Riemannian connection in terms of projectors.Edelman, Arias and Smith [EAS98] have proposed an expression of the Riemann-Newtonmethod on the Grassmann manifold in the particular case where µ is the differential df of areal function f on M . Their approach avoids the derivation of a formula for the Riemannianconnection on Grassmann. Instead, they obtain a formula for the Hessian ( 1 df ) 2 bypolarizing the second derivative of f along the geodesics.In the present paper, we derive an easy-to-use formula for the Riemannian connection η ξwhere η and ξ are arbitrary smooth vector fields on the Grassmann manifold of p-dimensionalsubspaces of Rn . This formula, expressed in terms of n p matrices, intuitively relates to thegeometry of the Grassmann manifold expressed as a set of equivalence classes of n p matrices.Once the formula for Riemannian connection is available, expressions for parallel transportand geodesics directly follow. Expressing the Riemann-Newton method on the Grassmannmanifold for concrete vector fields ξ reduces to a directional derivative in Rn followed by aprojection.We work out an example where the zeros of ξ are the p-dimensional right invariant subspaces of an arbitrary n n matrix A. This generalizes an application considered in [EAS98]where ξ was the gradient of a generalized scalar Rayleigh quotient of a matrix A AT . TheNewton method for our ξ converges locally quadratically to the nondegenerate zeros of ξ. Weshow that the rate of convergence is cubic if and only if the targeted zero of ξ is also a leftinvariant subspace of A. In a second example, the zero of ξ is the mean of a collection ofp-dimensional subspaces of Rn . We illustrate on a numerical experiment the fast convergenceof the Newton algorithm to the mean subspace.The present paper only requires from the reader an elementary background in Riemanniangeometry (tangent vectors, gradient, parallel transport, geodesics, distance), which can beread e.g. from Boothby [Boo75], do Carmo [dC92] or the introductory chapter of [Cha93]. Therelevant definitions are summarily recalled in the text. Concepts of reductive homogeneousspace and symmetric spaces (see [Boo75, Nom54, KN63, Hel78] and particularly sections II.4,IV.3, IV.A and X.2 in the latter) are not needed, but they can help to get insight into theproblem. Although some elementary concepts of principal fiber bundle theory [KN63] areused, no specific background is needed.The paper is organized as follows. In Section 2, the linear subspaces of Rn are identifiedwith equivalent classes of matrices and the manifold structure of Grassmann is defined. Section 3 defines a Riemannian structure on Grassmann. Formulas are given for Lie brackets,Riemannian connection, parallel transport, geodesics and distance between subspaces. TheGrassmann-Newton algorithm is made explicit in Section 4 and practical applications areworked out in details in Section 5.2The Grassmann manifoldThe goal of this section is to recall relevant facts about the Grassmann manifolds. Moredetails can be read from [Won67, Boo75, DM90, HM94, FGP94].Let n be a positive integer and let p be a positive integer not greater than n. The set ofp-dimensional linear subspaces of Rn (“linear” will be omitted in the sequel) is termed theGrassmann manifold, denoted here by Grass(p, n).3

An element Y of Grass(p, n), i.e. a p-dimensional subspace of Rn , can be specified by abasis, i.e. a set of p vectors y1 , . . . , yp such that Y is the set of all their linear combinations.When the y’s are ordered as the columns of an n-by-p matrix Y , then Y is said to spanY and Y is said to be the column space (or range, or image, or span) of Y , and we writeY span(Y ). The span of an n-by-p matrix Y is an element of Grass(p, n) if and only if Yhas full rank. The set of such matrices is termed the noncompact Stiefel manifold 1ST(p, n) : {Y Rn p : rank(Y ) p}.Given Y Grass(p, n), the choice of a Y in ST(p, n) such that Y spans Y is not unique.There are infinitely many possibilities. Given a matrix Y in ST(p, n), the set of the matricesin ST(p, n) that have the same span as Y isY GLp : {Y M : M GLp }(4)where GLp denotes the set of the p-by-p invertible matrices. This identifies Grass(p, n) withthe quotient space ST(p, n)/GLp : {Y GLp : Y ST(p, n)}. In fiber bundle theory, thequadruple (GLp , ST(p, n), π, Grass(p, n)) is called a principal GLp fiber bundle, with totalspace ST(p, n), base space Grass(p, n) ST(p, n)/GLp , group actionST(p, n) GLp 3 (Y, M ) 7 Y M ST(p, n)and projection mapπ : ST(p, n) 3 Y 7 span(Y ) Grass(p, n).See e.g. [KN63] for the general theory of principal fiber bundles and [FGP94] for a detailedtreatment of the Grassmann case. In this paper, we use the notation span(Y ) and π(Y ) todenote the column space of Y .To each subspace Y corresponds an equivalence class (4) of n-by-p matrices that span Y,and each equivalence class contains infinitely many elements. It is however possible to locallysingle out a unique matrix in (almost) each equivalence class, by means of cross sections.Here we will consider affine cross sections, which are defined as follows (see illustration onFigure 1). Let W ST(p, n). The matrix W defines an affine cross sectionSW : {Y ST(p, n) : W T (Y W ) 0}(5)orthogonal to the fiber W GLp . Let Y ST(p, n). If W T Y is invertible, then the equivalenceclass Y GLp (i.e. the set of matrices with the same span as Y ) intersects the cross section SWat the single point Y (W T Y ) 1 W T W . If W T Y is not invertible, which means that the spanof W contains an orthogonal direction to the span of Y , then the intersection between thefiber W GLp and the section SW is empty. LetUW : {span(Y ) : W T Y is invertible}(6)be the set of subspaces whose representing fiber Y GLp intersects the section SW . The mappingσW : UW 3 span(Y ) 7 Y (W T Y ) 1 W T W SW ,(7)1The (compact) Stiefel manifold is the set of orthonormal n p matrices.4

W GLpξ W MY GLpWMWξ WSWσW (span(Y ))Y0Figure 1: This is an illustration of Grass(p, n) as the quotient ST(p, n)/GLp for the case p 1,n 2. Each point, the origin excepted, is an element of ST(p, n) R2 {0}. Each line is anequivalence class of elements of ST(p, n) that have the same span. So each line corresponds toan element of Grass(p, n). The affine subspace SW is an affine cross section as defined in (5).The relation (10) satisfied by the horizontal lift ξ of a tangent vector ξ TW Grass(p, n) isalso illustrated. This picture can help to get insight into the general case. One has nonethelessto be careful when drawing conclusions from this picture. For example, in general there doesnot exist a submanifold of Rn p that is orthogonal to the fibers Y GLp at each point, althoughit is obviously the case when p 1 (any centered sphere in Rn will do).which we will call cross section mapping, realizes a bijection between the subset UW ofGrass(p, n) and the affine subspace SW of ST(p, n). The classical manifold structure ofGrass(p, n) is the one that, for all W ST(p, n), makes σW a diffeomorphism between UWand SW (embedded in the Euclidean space Rn p ) [FGP94]. Parameterizations of Grass(p, n)are then given byR(n p) p 3 K 7 π(W W K) span(W W K) UW ,where W is any element of ST(n p, n) such that W T W 0.3Riemannian structure on Grass(p, n) ST(p, n)/GLpThe goal of this section is to define a Riemannian metric on Grass(p, n) and then derive formulas for the associated gradient, connection and geodesics. For an introduction to Riemanniangeometry, see e.g. [Boo75], [dC92] or the introductory chapter of [Cha93].Tangent vectorsA tangent vector ξ of Grass(p, n) at W can be thought of as an elementary variation of the pdimensional subspace W (see [Boo75, dC92] for a more formal definition of a tangent vector).Here we give a way to represent ξ by a matrix. The principle is to decompose variations of abasis W of W into a component that does not modify the span and a component that doesmodify the span. The latter represents a tangent vector of Grass(p, n) at W.Let