Home > INTERIOR-POINT ALGORITHMS FOR CONVEX OPTIMIZATION BASED ON PRIMAL-DUAL METRICS

INTERIOR-POINT ALGORITHMS FOR CONVEX OPTIMIZATION BASED ON PRIMAL-DUAL METRICS

Page 1
INTERIOR-POINT ALGORITHMS FOR CONVEX OPTIMIZATION BASED ON PRIMAL-DUAL METRICS
TOR G. J. MYKLEBUST, LEVENT TUNC¸EL Abstract. We propose and analyse primal-dual interior-point algorithms for convex optimiza- tion problems in conic form. The families of algorithms whose iteration complexity we analyse are so-called short-step algorithms. Our iteration complexity bounds match the current best iteration complexity bounds for primal-dual symmetric interior-point algorithm of Nesterov and Todd, for symmetric cone programming problems with given self-scaled barriers. Our results apply to any self-concordant barrier for any convex cone. We also prove that certain specializa- tions of our algorithms to hyperbolicity cone programming problems (which lie strictly between symmetric cone programming and general convex optimization problems in terms of generality) can take advantage of the favourable special structure of hyperbolic barriers. We make new connections to Riemannian geometry, integrals over operator spaces, Gaussian quadrature, and strengthen the connection of our algorithms to quasi-Newton updates and hence first-order methods in general. Date: November 2014, revised: April 6, 2016. Key words and phrases. primal-dual interior-point methods, convex optimization, variable metric methods, local metric, self-concordant barriers, Hessian metric, polynomial-time complexity; AMS subject classification (MSC): 90C51, 90C22, 90C25, 90C60, 90C05, 65Y20, 52A41, 49M37, 90C30. Tor Myklebust: Department of Combinatorics and Optimization, Faculty of Mathematics, University of Wa- terloo, Waterloo, Ontario N2L 3G1, Canada (e-mail: tmyklebu@csclub.uwaterloo.ca). Research of this author was supported in part by an NSERC Doctoral Scholarship, ONR Research Grant N00014-12-10049, and a Dis- covery Grant from NSERC. Some of the results in this paper appeared in the PhD Thesis of the first author [19]. Levent Tunçel: Department of Combinatorics and Optimization, Faculty of Mathematics, University of Water- loo, Waterloo, Ontario N2L 3G1, Canada (e-mail: ltuncel@uwaterloo.ca). Research of this author was supported in part by ONR Research Grants N00014-12-10049, N00014-15-12171, and Discovery Grants from NSERC.
1

Page 2
2 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
1. Introduction Convex optimization problems (of minimizing a given convex function in a given convex set) form a beautiful research area with very powerful theory and algorithms and many far-reaching applications. Among the main algorithms to solve convex optimization problems are modern interior-point methods. The modern theory of interior-point methods have flourished since Karmarkar��s ground-breaking paper [11]. Every convex optimization problem can be paired with another convex optimization problem based on the same data, called its dual. Rigorous solution methods for convex optimization problems typically generate, together with solutions to the problem at hand (primal problem), solutions for its dual. A particularly successful line of research pursued methods that work ��equally hard�� at solving both primal and dual problems simultaneously, called primal-dual symmetric methods (for a rigorous definition, see [45]). In the special cases of linear programming (LP) and semidefinite programming (SDP), these primal-dual symmetric methods, in addition to carrying certain elegance, led to improved results in theory, in computation, and in applications. Part of the success of primal-dual symmetric methods for LP and SDP might stem from the fact that both classes admit convex conic formulations where the underlying cone is self-dual (the primal convex cone and the dual convex cone are linearly isomorphic) and homogeneous (the automorphism group of the cone acts transitively in its interior). Convex cones that are both homogeneous and self-dual are called symmetric cones. The success of primal-dual symmetric interior-point methods was further extended to the setting of symmetric cone programming, which led to deeper connections with other areas of mathematics. In this paper, we will extend many of the underlying mathematical entities, analyses and iteration complexity bounds of these algorithms to a general convex optimization setting (with arbitrary convex cones). The primal variable x lives in a finite dimensional vector space E and the dual variables y and s live in the finite dimensional vector spaces Y and E∗ respectively, where E∗ is the dual space of E. Every convex optimization problem can be put into the following conic form (under some mild assumptions): (P) inf 〈c, x〉 Ax = b, x �� K, where A : E �� Y∗ is a linear map, b �� Y∗, c �� E∗ and K ⊂ E is a pointed, closed, convex cone with nonempty interior. We assume that A, b, c are given explicitly and K is described via F : int(K) �� R, a logarithmically homogeneous self-concordant barrier function for K (defined in the next section). We assume without loss of generality that A is surjective (i.e., AE = Y∗).

Page 3
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 3
We define the dual of (P) as (D) sup 〈b, y〉D A∗y + s = c, s �� K∗, where K∗ is the dual of cone K, namely K∗ := {s �� E∗ : 〈s, x〉 �� 0, ∀x �� K}. We are using 〈·,·〉 to denote the dual pairing on (E,E∗) and 〈·,·〉D to denote the dual pairing on (Y,Y∗). A∗ : Y �� E∗ denotes the adjoint of A defined by the equations: 〈A∗y, x〉 = 〈Ax, y〉
D, ∀ x �� E, y �� Y.
For both of these spaces E and Y, we identify the dual of the dual space with the original space, i.e., E∗∗ = E and Y∗∗ = Y. In this setting, under our assumptions, K∗ is also a pointed, closed convex cone with nonempty interior and K∗∗ = K. Now, it is not difficult to verify that under our definitions, the optimization problem dual to the problem (D) is indeed equivalent to (P). Primal-dual symmetric conic formulations of convex optimization problems with self- concordant barrier functions were suggested by Nesterov and Nemirovski [28], and since their inception in the early 1990s, such formulations have had a significant influence on modern convex optimization. In this paper, we utilize many of the fundamental techniques developed throughout the history of interior-point methods. One of the main algorithms we design and analyse is a predictor- corrector algorithm generalizing the algorithm of Mizuno, Todd and Ye [15] from the LP setting. However, even in the LP setting, some of our algorithms are new. Our algorithms use Newton directions as in Renegar��s algorithm for LP [35], and one of our main algorithms uses a similar predictor-corrector scheme, but both predictor and corrector parts in a primal-dual setting. The modern theory of interior-point methods employed the concept of self-concordant barrier functions (see Nesterov and Nemirovski [29]). The Hessians of these nice convex functions induce local metrics with excellent properties. For instance, the unit ball induced by the Hessian at a point, called the Dikin ellipsoid, is contained in the cone. The self-concordance property implies that, as we make local moves in the domain of the barrier function, the local metrics do not change fast unless their norms are large. More precisely, the speed of the change in the local metric can be bounded in terms of the local metric itself. One of the indicators of how good a barrier function is (in terms of the local metrics it generates) can be measured by how well the Dikin ellipsoid approximates the domain of the function. This leads to the notion of long-step Hessian estimation property (defined in Section 2) of barriers. This property amounts to extending the controlled change of the Hessian of the barrier to the ��best, norm-like local approximation�� of the domain of the barrier. Various long-step strategies for interior-point methods have been investigated extensively in [21]. This long-step Hessian estimation property has been proven for self-scaled barriers [30, 31] (whose domains are

Page 4
4 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
symmetric cones and these barriers have been completely classified [8, 9, 42]) and hyperbolic barriers [6] (whose domains are hyperbolicity cones; for results on the structure of these cones see [6, 1, 37, 38]), but there exist barriers for which this property fails. For a related, but weaker notion of long-step, also see [26]. Indeed, we would like our algorithms to exploit such properties when they are present. (Beyond the set-up of symmetric cones and self-scaled barriers, exploitation of this long step property forces us to break the primal- dual symmetry of our algorithms in the favor of the problem with barrier function admitting this property.) Our general approach and many of our main technical results are primal-dual symmetric (in the sense of [45]); however, beyond symmetric cones and self-scaled barriers, there may be advantages to breaking the primal-dual symmetry in favor of the better behaved (or better posed) problem. A large part of these favorable structures of hyperbolic barriers that we understand, lies with the long-step Hessian-estimation property, another large part are the special algebraic structures given by these polynomials defining the self-concordant barriers (such structures created advantageous situations for breaking primal-dual symmetry, even in the case of convex optimization with univariate polynomials, see [22, 5, 34]). Our approach also provides useful tools for exploiting such structures when they are present. Independent of this work, recently, simultaneously with an announcement of this work [18], Renegar announced a primal-dual affine-scaling method for hyperbolicity cone programming [39] (also see Renegar and Sondjaja [40]). Nesterov and Todd [30, 31] present very elegant primal-dual interior-point algorithms with outstanding mathematical properties in the setting of self-scaled barriers; however, beyond self-scaled barriers, there are many other primal-dual approaches that are not as symmetric, but retain some of the desired properties (see [47, 20, 2, 3, ?, 25]). Some recent study of the interior-point methods and the central paths defined by self-concordant barriers led to connections with Riemannian geometry, see [32, 27]. We also make some ad- ditional connections to Riemannian geometry through the local primal-dual metrics that we utilize in this paper. Hessians of self-concordant barriers, in addition to inducing local metrics, provide linear iso- morphisms between primal and dual spaces E and E∗. In the special case of self-scaled barriers, we have F (w) int(K) = int(K∗) for every w �� int(K). We focus on generalization of such behaviour and simulate it with a symmetric positive-definite bilinear form T2 mapping E∗ to E. Upon fixing an inner product (e.g., based on the Hessians of F and F∗ evaluated at the starting pair (x(0),s(0)) and considering realization of T2 : Rn �� Rn, this leads to (via its unique self-adjoint, positive-definite square-root T) construction of a v-space as a space that is ��half-way between�� E and E∗, i.e., E(∗/2). The overall structure of the remainder of this paper is as follows: • Section 2 presents some fundamental notation, definitions and properties of underlying primal-dual spaces and the class of convex barrier functions.

Page 5
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 5
• Section 3 presents a new primal-dual scaling map based on integration of the barrier��s Hessian. • Section 4 presents some low-rank update formulae for the construction of local primal- dual metrics and connects these formulae to classical work on quasi-Newton updates. • Section 5 delves deeper into investigating the relationship between the set of Hessians of self-concordant barriers and the set of local primal-dual metrics we use. • Sections 6 and 7 combine a crude approximation of the scaling from Section 4 with the technique of Section 3 to derive and analyse some theoretically efficient interior-point algorithms for convex programming. • Section 8 examines some of the special properties of our approach for hyperbolic cones that may make our techniques particularly effective. • Some side issues, such as technical details of primal-dual symmetry of our approach (which is established via geodesic convexity of the underlying sets of local metrics), details of the complexity analysis in terms of individual tensor terms and with specific absolute constants, and the numerical evaluation of integrals of Hessians are relegated to Appendices A, B and C respectively. Before moving into the technical development, we next summarize the major goals of the paper, highlight some of the major ideas, and describe how our results fit in the general framework of interior-point methods and in which way our results improve the state-of-the-art in interior- point-methods for general convex cones: • Since the proposal of primal-dual symmetric polynomial-time interior-point algorithms in the late 1980��s, the problems of determining which classes of convex optimization problems admit such algorithms and of determining what worst-case iteration com- plexity bounds are achievable for such algorithms have been of great interest to many researchers. One of our major goals is to prove that for all convex optimization problems in the conic form, there exists such a primal-dual symmetric interior-point algorithm attaining the current best, worst-case iteration complexity bound for interior-point al- gorithms. Another major goal is to construct these proofs and develop the underlying theory in such a manner that in addition to the usual theoretical goals, paves the way for numerical implementations and further inspires the design and analysis of better algorithms. • Some of the highlights in the paper include the construction of primal-dual metrics with special properties via integration of Hessians of convex barrier functions, new decompositions for the existing rank-4 update formulae for interior-point methods [47], that are written as two consecutive rank-2 update formulas, with the second rank-2 formula exposing new, very meaningful terms in regard to proximity to the central path. Another feature of the paper is on the one hand, a high-level complexity analysis that reveals geometric and algebraic insights and emphasizes the generality of the framework, on the other hand, a very detailed term by term analysis (relegated to an appendix)

Page 6
6 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
that is very useful in designing robust algorithms which perform numerical computations accurately. • Even though the theoretical analysis with the polynomial iteration complexity bound only applies to short-step version of the algorithm, our development of the underlying theory does properly generalize Nesterov and Todd��s primal-dual symmetric, long-step algorithms and in more general settings is applicable to the analysis of long-step algo- rithms, provided one is willing to sacrifice at least some of the primal-dual symmetry when necessary (that is, beyond the symmetric cone case). • We also provide short proofs of existence of integral scalings, and a relatively transpar- ent proof of nonexistence of pointwise evaluated primal-dual Hessian metrics beyond symmetric cones. This proof utilizes an axiomatic characterization of Euclidean Jordan Algebras in an elementary way. • We further provide deeper connections from primal-dual metrics for interior-point meth- ods to quasi-Newton methods, Riemanian geometry (both via integration of Hessians, and via a primal-dual symmetry proof (Appendix A)). • Algorithms that we analyse are new even for Linear Programming problems. In our computational experiments, we observed that the practical versions of the algorithms with low-rank updates save a few Cholesky decompositions on many of the test instances (see [19]). This is already a promising development indicating some computational promise of the approach. The algorithms allow for adjusting the amount of the second- order information utilized from one iteration to the next. 2. Preliminaries In this section, we introduce some of the fundamental concepts, definitions and properties that will be useful in the rest of the paper. For a more detailed exposure to these concepts, see the standard references [29, 30, 31, 36, 23] as well as [45, 47]. Definition 2.1. (Self-concordance) Let K ⊂ E be a pointed, closed convex cone with nonemmpty interior and let F : int(K) �� R be a C3-smooth, closed convex function such that the domain of F is int(K) and there exists ϑ �� R such that, for every t > 0, F(tx) = F(x) − ϑln(t), and |D3F(x)[h, h, h]| �� 2(D2F(x)[h, h])
3/2
(1) for all x �� int(K) and for all h �� E. Then F is called a ϑ-logarithmically homogeneous self-concordant barrier (ϑ-LHSCB) for K. It follows from the above definition that ϑ �� 1 and that for every sequence in the interior of K, converging to a boundary point of K, the corresponding function values F(x) �� +��.

Page 7
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 7
If F is a ϑ-LHSCB for K, then its Legendre-Fenchel conjugate F∗(s) := sup{−〈s, x〉 − F(x) : x �� int(K)}, is a ϑ-LHSCB for the dual cone K∗ (Nesterov and Nemirovskii [29]). We refer to F∗ simply as the conjugate barrier. Once we have a ϑ-LHSCB F for K, at every point x �� int(K), F (x)[·,·] : E ⊕ E �� R is a symmetric, bilinear, positive-definite form. Hence, the Hessian of F defines a local metric. For every h �� E the local norm induced by F at x is ||h||
x
:= 〈F (x)h, h〉1/2. Moreover, for every x �� int(K), we may think of F (x) : E �� E∗ as a linear map from the primal space to the dual space. Therefore, [F (x)]
−1
: E∗ �� E is well-defined and [F (x)]
−1
[·,·] : E
∗ ⊕ E∗ �� R is a symmetric, bilinear, positive-definite form. For every u �� E∗, we define
||u||
∗ x
:= 〈 u,[F (x)]
−1
u 〉1/2 . Proposition 2.2. Let F be a ϑ-LHSCB for K. Then for every x �� int(K), ||·||
∗ x
is the norm dual to ||·||
x
. I.e., for every x �� int(K), ||u||
∗ x
= sup{〈u, h〉 : ||h||
x�� 1,h �� E}, ∀u �� E∗.
The above proposition implies: |〈u, h〉| �� ||u||
∗ x
||h||
x, ∀u �� E∗,h �� E.
We use the above ��Cauchy-Schwarz�� inequality quite often. Note that ||F (x)||
∗ x
= ||h||
x
, where h := [F (x)]
−1
F (x), the Newton step at x for minimizing F. Theorem 2.3. Let F be a ϑ-LHSCB for K. Then, for every x �� int(K), the open unit ellipsoid centered at x and defined by the positive-definite Hessian F (x) is contained in the interior of the cone. That is E(x;F (x)) := {z �� E : 〈F (x)(z − x),z − x〉 < 1} ⊂ int(K). Moreover, for every z �� int(K) such that �� := ||x − z||
x
< 1, we have (1 − ��)2F (x)[h, h] �� F (z)[h, h] �� 1 (1 − ��)2 F (x)[h, h], for all h �� E. We use (as above) F (x)[h(1),h(2)] to denote the second derivative of F evaluated along the directions h(1),h(2) �� E. We also use the notation 〈F (x)h(1),h(2)〉 to denote the same quantity. In both expressions, F (x) is the Hessian of F. As we deal with the Hessians and other

Page 8
8 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
symmetric bilinear forms in the same space, we sometimes utilize the Löwner order, we write A ≼ B to mean (B − A) is a symmetric, bilinear positive semidefinite form. With these clarifications, the above inequalities in the statement of the last theorem, can be equivalently written as: (1 − ��)2F (x) ≼ F (z) ≼ 1 (1 − ��)2 F (x); we refer to the above relations as the Dikin ellipsoid bound. For every x �� int(K) and every h �� Rn, define ��x(h) := 1 sup{t : (x − th) �� K} . We say that F has the long-step Hessian estimation property if 1 [1 + t��x(−h)]
2
F (x) ≼ F (x − th) ≼ 1 [1 − t��x(h)]
2
F (x), (2) for every x �� int(K), h �� Rn and t �� [0,1/��x(h)). Nesterov and Todd [30] proved that every self-scaled barrier has this property. G��ler [6] extended this property to hyperbolic barriers. However, Nesterov proved (see Theorem 7.2 in [6]) that the relation (2) can hold for a self- concordant barrier and its conjugate only if K is a symmetric cone. Essentially equivalent properties are expressed as the convexity of 〈−F (x),u〉, (in x) for every u �� K, or, as suffi- ciently smooth F having negative curvature: F (x)[u] ≼ 0 for every x �� int(K) and for every u �� K. All of the properties listed in the next theorem can be derived directly from the logarithmic homogeneity property of F. We denote the kth derivative of F by DkF. Theorem 2.4. Let F be a ϑ-LHSCB barrier for K.Then for all x �� int(K) and s �� int(K∗), F has the following properties: (1) For all k �� 1 integer and t > 0, if F is k times differentiable, then DkF(tx) = 1
tk DkF(x);
(2) 〈−F (x),x〉 = ϑ; (3) F (x)x = −F (x); (4) 〈F (x)x, x〉 = ϑ = ||x||
2 x
, 〈[F (x)]
−1
F (x),F (x)〉 = ϑ = (||F (x)||
∗ x
)
2
; (5) F (x)[x] = −2F (x). The LHSCB F and its conjugate barrier F∗ interact very nicely due to the elegant and powerful analytic structure imposed by the Legendre-Fenchel conjugacy: Theorem 2.5. Let F be a ϑ-LHSCB barrier for K.Then for all x �� int(K) and s �� int(K∗), F and F∗ satisfy the following properties: (1) F∗(−F (x)) = −ϑ − F(x) and F(−F∗(s)) = −ϑ − F∗(s); (2) −F∗(−F (x)) = x and −F (−F∗(s)) = s;

Page 9
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 9
(3) F∗ (−F (x)) = [F (x)]
−1
and F (−F∗(s)) = [F∗ (s)]
−1
. Maps between primal space E and the dual space E∗ play very fundamental roles. Among such maps, those which take int(K) to int(K∗) bijectively are particularly useful. One such map is −F (·) (i.e., the duality mapping). In the special case of symmetric cones and self-scaled barriers F, F (u) gives a linear isomorphism between int(K) and int(K∗) for every choice of u �� int(K). Next, we define three sets of such linear transformations: T 2
0 , T 2 1 , T 2 2 [47]. We further require in
the definitions that these linear transformations, just like Hessians of self-concordant barriers above, be also symmetric bilinear positive-definite forms. These sets are indexed based on their usage of information from the underlying self-concordant barrier functions (zeroth-order information only, up to first-order information only, and up to second-order information only, respectively). We denote by S++(E,E) the set of symmetric bilinear positive-definite forms H[·,·] : E ⊕ E �� R. Note that such forms are also interpreted in this paper as linear maps H : E �� E∗. Sometimes it is suitable to fix some inner product on E, and hence some linear isomorphism between E and E∗, and identify both E and E∗ by Rn (n := dim(E)). We can then represent every T2 �� S++(E∗,E∗) by a symmetric positive-definite matrix over the reals. We use the same notation T2 �� Sn
++ for such representations. Then, we are able utilize the unique symmetric
positive-definite square-root, T �� Sn
++ of T2. Since T2 : E∗ �� Rn �� E �� Rn, one may
conceptually think of T : E∗ �� E∗/2, as we eluded to in the Introduction. Such representations are only necessary for our explicit discussions of the v-space constructions and for drawing closer parallels to the Linear Programming special case. In fact, the algorithms can be stated, implemented and analysed without any need to directly computing T or even referring to it (we can fully operate with T2 and T−2). Definition 2.6. For every pair (x, s) �� int(K) ⊕ int(K∗), we define T 2
0 (x, s) := {T2 �� S++(E∗,E∗) : T2s = x}.
In words, T 2
0 (x, s) is the set of all symmetric bilinear positive-definite forms which map s to x.
It may not be obvious, but this set is nonempty for every convex cone K and for every pair of points x and s in the interiors of respective cones. A proof of this fact was given, in the context of self-scaled barriers, by Nesterov and Todd [30] and two other proofs were given in [47] (Theorems 3.2, and 3.3). In fact, this set is convex and its dimension is Ω(n2). For a given pair of x, s, let us define µ(x, s) := 〈s, x〉 ϑ . Note that if x and s are feasible in their respective problems, then ϑµ(x, s) is their duality gap: ϑµ(x, s) = 〈s, x〉 = 〈c, x〉−〈b, y〉
D
�� 0.

Page 10
10 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
We will often write simply µ for µ(x, s). We immediately further abuse this notation and consider for every µ > 0, the unique minimizers of the problem (Pµ) min 1
µ
〈c, x〉 + F(x) Ax = b, (x �� int(K)). Under the assumptions that both problems (P) and (D) have Slater points, and that A is surjective, (Pµ) has a unique solution for every µ > 0. The necessary and sufficient conditions for optimality applied to (Pµ) yield that for every µ > 0, the unique solution x(µ) of the minimization problem above make up the x-part of the unique solution (x(µ),y(µ),s(µ)) to the system: Ax = b, x �� int(K) A∗y + s = c, s = −µF (x). Moreover, these solutions (x(µ),y(µ),s(µ)) define a smooth path, called central path. Now we can explain the latest abuse of notation; we have, for such points on the central path: 〈s(µ),x(µ)〉 = ϑµ. Therefore, our definition of µ(x, s) is consistent with this usage of µ as well. To pass from the general set up of T2 to the Euclidean structure requiring set up of T, we may use a ��static�� Hessian to (for example, assume (x(0),s(0)) lies on the central path with parameter µ = 1, and based on F∗ (s(0)) = [F (x(0))]
−1
) fix an inner product for primal and dual spaces as in [34]. Let T2 �� T 2
0 (x, s), and define T as above. Upon further defining v := Ts = T−1x, we observe
ϑµ = 〈s, x〉 = (s T)(T−1x) = v v. These linear transformations allow generalizations of so-called v-space approaches to primal- dual interior-point methods from LP and SDP (see for instance, [17, 12, 10, 44, 24]) as well as symmetric cone programming settings [30, 31, 45, 7] to the general convex optimization setting [45, 47]. What perhaps started as a convenience of notation as well as simplicity and elegance of analysis in the 1980��s, by now turned into a solid theoretical framework in which deep problems can be posed and solved whether they relate to the mathematical structures or are directly motivated by a family of algorithms. Note that the set T 2
0 does not use any information about the most difficult constraints of the
convex optimization problem. Indeed, in the worst-case, certain algorithms using only T 2
0 need
not even converge to an optimal solution (see [50]). Using first order information via F (x) and F∗(s), we can focus on a smaller, but more interesting and useful subset of T 2
0 :

Page 11
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 11
Definition 2.7. For every pair (x, s) �� int(K) ⊕ int(K∗), we define T 2
1 (x, s) :=
{ H �� S++(E∗,E∗) : Hs = x, H [ −F (x) ] = −F∗(s) } . Just like T 2
0 , T 2 1 is also convex, nonempty and has dimension Ω(n2). However, the nonemptiness
may be less obvious (for a proof, see [47], or Sections 3 and 4 of this paper). For convenience, we sometimes write ˜x := ˜x(s) := −F∗(s) and ˜s := ˜s(x) := −F (x). One can think of ˜x and ˜s as the shadow iterates, as ˜x �� int(K) and ˜s �� int(K∗) and if (x, s) is a feasible pair, then µ˜x = x iff µ˜s = s iff (x, s) lies on the central path. In this paper, s and decorated versions of it; ˜s, š, ¯s, etc. all belong to the dual s-space E∗; similarly, x and all of its decorated versions belong to the x-space E. We also denote ˜µ := ˜µ(x, s) := 〈˜s, ˜x〉/ϑ. In this context, ˜x is the primal shadow of the dual solution s. Analogous to the definition of v, for every T obtained from a T2 �� T 2
1 (x, s), we can
now define w := T ˜s = T−1 ˜x. Note that w w = ϑ˜µ. Once we have such a transformation T, we can map the primal space using T−1, and the dual space using T to arrive at the data of the underlying (equivalent) scaled primal-dual pair. We define ¯A := A(T(·)) and, we have ( ¯P) inf (Tc) ¯x ¯A¯x = b, ¯x �� T−1K, ( ¯D) sup 〈b, y〉D ¯A∗y + ¯s = T c, ¯s �� TK∗. Then the search directions for the scaled primal-dual pair ( ¯P) and ( ¯D) are (respectively) ¯ dx := T−1dx, ¯ ds := Tds, where dx,ds denote the search directions in the original primal and dual spaces respectively. Lemma 2.8. (Nesterov and Todd [31], also see [47, 48]) For every (x, s) �� int(K) ⊕ int(K∗), 〈s, x〉 ϑ 〈F (x),F∗(s)〉 ϑ = µ˜µ �� 1. Equality holds above iff x = −µF∗(s) (and hence s = −µF (x)). Note that the equality above together with feasibility of x and s in their respective problems, define the primal-dual central path: {(x(µ),s(µ)) : µ > 0}.

Page 12
12 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
Further note that for each T �� T1, we can equivalently express the centrality condition as v = µw (of course, together with the requirement that the underlying x and s be feasible in their respective problems). Primal and dual deviations from the central path are: ��P := x − µ˜x and ��D := s − µ˜s. In v-space, these deviations are both represented by ��v := v − µw. Corollary 2.9. For every (x, s) �� int(K) ⊕ int(K∗), 〈s + µF (x),x + µF∗(s)〉 = 〈��D,��P 〉 = ||��v||2
2
�� 0. The equality holds above iff x = −µF∗(s) (and hence s = −µF (x)). The above quantities have their counterpart, called the gradient proximity measure and denoted by ��G(x, s), in the work of Nesterov and Todd [31]. In their paper, the corresponding measure is ��G(x, s) = ϑ(µ˜µ − 1) = ||��v||2
2
µ . Now, we consider the system of linear equations (we utilized Newton��s method, to follow the central path, approximately): ¯A¯dx = 0 (3) ¯A∗dy + ¯ds = 0. (4) ¯ dx + ¯ds = −v + ��µw, (5) where �� �� [0,1] is a centering parameter. Clearly, (¯dx, ¯ ds) solves the above linear system of equations iff ¯dx and ¯ds are the orthogonal projections of (−v + ��µw) onto the kernel of ¯A and the image of ¯A∗ respectively. Given a pair of search directions, we define x(��) := x + ��dx, s(��) := s + ��ds. Using the above observations, the following result is immediate. Lemma 2.10. Let �� �� [0,1]. Then 〈s(��),x(��)〉 = [1 − ��(1 − ��)]〈s, x〉. For every s �� int(K∗), we define the operator norm ||M||s = sup
||u||s��1
||Mu||∗
s.
Then, by definition, ||Mu||∗
s �� ||M||s||u||s. We will also utilize bilinear forms defined via ten-
sors. Just like the Hessians and bilinear forms T2 above, these new bilinear forms will also

Page 13
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 13
be interpreted as linear transformations between two spaces: For u �� E1, z �� E2, we define u ⊗ z : E∗
2 �� E1 by
(u ⊗ z)h := 〈z,h〉u, ∀h �� E∗
2,
as well as a bilinear form, u ⊗ z : E2 ⊕ E∗
1 �� R. This bilinear form acts as
(u ⊗ z)[h1,h2] := 〈u, h1〉〈z,h2〉, ∀h1 �� E∗
1,h2 �� E∗ 2.
Next, let us observe that for every pair h, u �� E, we have 2(h ⊗ h − u ⊗ u) = [(h − u) ⊗ (h + u)] + [(h + u) ⊗ (h − u)]. We make considerable use of the following difference-of-squares bound for the operator norm of the difference of two tensors: Lemma 2.11. Let h and u lie in E and s �� int(K∗). Then ||(h ⊗ h) − (u ⊗ u)||s �� ||h − u||∗
s||h + u||∗ s.
Proof. By the triangle inequality, 2||(h ⊗ h) − (u ⊗ u)||s = ||[(h − u) ⊗ (h + u)] + [(h + u) ⊗ (h − u)]||
s
�� ||(h − u) ⊗ (h + u)||
s
+ ||(h + u) ⊗ (h − u)||
s
. Now, we compute ||(h − u) ⊗ (h + u)||
s
= sup
||z||s��1
||(h − u)〈z,h + u〉||
∗ s
= sup
||z||s��1
〈z,h + u〉 ||h − u||
∗ s
= ||h + u||
∗ s
||h − u||
∗ s
; and similarly ||(h + u) ⊗ (h − u)||
s
= ||h + u||
∗ s
||h − u||
∗ s
. Adding these together gives the advertised result. D The next lemma is used many times in the following analysis. Lemma 2.12. Let s �� int(K∗), h �� E∗ such that ||h||
s
< 1. Then, (1 − ||h||s)
2 ��
sup
u��E∗:||u||s��1
||F∗ (s + h)u||
∗ s
�� 1 (1 − ||h||
s
)
2
. Proof. Let s and h be as in the statement of the lemma. Then F∗ (s + h) ≼
1
(1−||h||s)2 F∗ (s) by the Dikin ellipsoid bound. Thus, the maximum eigenvalue of [F∗ (s)]
−1/2
F∗ (s + h)[F∗ (s)]
−1/2
is bounded above by
1
(1−||h||s)2 . The square of this quantity is an upper bound on the largest eigenvalue of [F∗ (s)]
−1/2
F∗ (s + h)[F∗ (s)]
−1
F∗ (s + h)[F∗ (s)]
−1/2
. Therefore, the supremum

Page 14
14 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
in the statement of the lemma is bounded above by
1
(1−||h||s)2 as desired. The left-hand side inequality can be proved similarly. D Now, we are ready to define the set of local primal-dual metrics which will utilize local second- order information from the underlying pair of self-concordant barriers. For each F and each pair (x, s) �� int(K) ⊕ int(K∗), consider the optimization problem with variables T2 �� S++(E∗,E∗) and �� �� R: (PD)2 inf �� T2s = x, T2 [−F (x)] = −F∗(s),
µ ��[ϑ(µ˜µ−1)+1]
F∗ (s) ≼ T2 ≼ ��[ϑ(µ˜µ−1)+1]
µ
[F (x)]−1 , �� �� 1, T2 �� S++(E∗,E∗). This problem is indeed an SDP (a SemiDefinite Programming problem in variables T2 and ��). Let ��∗ be the optimum value of (PD)2. Definition 2.13. For every pair (x, s) �� int(K) ⊕ int(K∗), we define T 2
2 (��;x, s) :=
   H �� S++(E∗,E∗) : Hs = x, H [−F (x)] = −F∗(s),
µ �Ǧ�∗[ϑ(µ˜µ−1)+1]
F∗ (s) ≼ H ≼
�Ǧ�∗[ϑ(µ˜µ−1)+1] µ
[F (x)]−1    , for �� �� 1. Sometimes we find it convenient to refer to T directly. So, we define T0, T1, T2 as the set of T �� Sn
++ whose square, T2 lie in T 2 0 , T 2 1 , T 2 2 respectively.
When the underlying cone K is symmetric, we have (P) and (D) as symmetric cone pro- gramming problems. Symmetric cones (are homogeneous and self-dual, and) admit self-scaled barriers. For such problems, there is a very small upper bound on ��∗ for every pair of interior- points. The following theorem follows mainly from Theorem 5.2 and Corollary 4.1 of Nesterov and Todd [30]. Theorem 2.14. (see [47]) Let K be a symmetric cone and F be a self-scaled barrier for K. Then, for every (x, s) �� int(K) ⊕ int(K∗), ��∗ �� 4/3. Being able to establish a nice upper bound on ��∗ for all iterates, and then, for some small �� �� 1, finding an efficient way of computing an element of T 2
2 (��;x, s), directly yield primal-
dual interior-point algorithms with good properties. In particular, if the upper bounds on ��∗ and �� are both O(1), then such results directly yield primal-dual interior-point algorithms with iteration complexity bound O (�� ϑln (1/ϵ) ) ; see, [47]. Next, we consider various ways of constructing good symmetric positive-definite bilinear forms in T 2
1 and T 2 2 .

Page 15
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 15
3. Primal-dual metrics via Hessian integration In the special case where cone K is symmetric and F is a self-scaled barrier for K, Nesterov and Todd [30, 31] identified a specific element of the sets T0, T1 and T2 in terms of the Hessians of certain scaling points (i.e., for every pair (x, s), there exists w �� int(K∗) such that T2 = F∗ (w)). There is also an explicit linear algebraic formula which expresses the Hessian at w in terms of the Hessians of F and the conjugate barrier F∗ at x and s respectively. The next theorem achieves an analogous goal in the fully general case of an arbitrary convex cone K and an arbitrary self- concordant barrier F for K by expressing a primal-dual scaling in T 2
1 as an integral of Hessians
along the line segment joining the dual iterate to the dual shadow of the primal iterate. Later in Section 5, we prove that beyond symmetric cones and self-scaled barriers, one should not expect in general to find, for every pair of primal dual interior points (x, s), a w �� int(K∗) such that F∗ (w) �� T 2
1 (x, s). Perhaps surprisingly, we prove next that ��an average of the Hessians��
works! Theorem 3.1. Let F be a LHSCB for K and (x, s) �� int(K) ⊕ int(K∗). Then, the linear transformation T2
D := µ
�� 1
0
F∗ (s − t��D)dt maps s to x, and maps ˜s to ˜x, and T2
D[·,·] is a symmetric positive-definite bilinear form.
Therefore, T2
D is in T 2 1 (x, s).
Proof. Using the fundamental theorem of calculus (for the second equation below) followed by the property −F∗ (−F (x)) = x (for the third equation below), we obtain T2
D��D = µ
�� 1
0
F∗ (s − t��D)��Ddt = µ[−F∗(s − ��D) + F∗(s)] = µ(x/µ − ˜x) = ��P . We next compute, using the substitution ˜t= 1/t, T2
Ds = µ
�� 1
0
F∗ (s − t��D)sdt = µ �� 1
0
1 t2 F∗ (s/t − ��D)sdt = µ �� ��
1
F∗ (˜ts − ��D)sd˜t = −µF∗(s − ��D) = x. Further, T2
D is the mean of some symmetric positive-definite bilinear forms, so T2 D itself is a
symmetric positive-definite bilinear form. D We call this scaling operator the dual integral scaling. Note that the above theorem holds under weaker assumptions (we only used the facts that F is logarithmically homogeneous and

Page 16
16 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
F, F∗ are Legendre-type functions in the sense of Rockafellar [41]). The dual integral scaling is expected to inherit many of the nice properties of the Hessians. Thus, if F∗ is well-behaved, then one can prove nice bounds on the deviation of dual integral scaling from the dual Hessian at s and µ˜s: Theorem 3.2. If �� < 1 is such that, for any t �� [0,1], (1 − t��)2F∗ (s) ≼ F∗ (s − t��D) ≼ 1 (1 − t��)2 F∗ (s), then (1 − ��)µF∗ (s) ≼ T2
D ≼
1 1 − �� µF∗ (s). Proof. This follows directly from the definition of T2
D.
D Interestingly, the dual integral scaling (the mean of Hessians along the line segment joining s and µ˜s) is not as ��canonical�� as the Nesterov–Todd scaling (the geodesic mean of the Hessians joining the same two points in the interior of K∗) in terms of primal-dual symmetry properties. For the remainder of this section, we elaborate on this and related issues. Also, in Section 8 when we specialize on Hyperbolicity Cone Programming problems, we show that the integral scaling can have advantages (when one of the primal and dual problems is more tractable or nicer for the approach at hand, as a benefit of breaking this primal-dual symmetry well, and properties like those given by Theorem 3.2). Notice that the dual integral scaling �� 1
0
µF∗ (ts + (1 − t)µ˜s)dt and the primal integral scaling T2
P :=
(�� 1
0
µF (tx + (1 − t)µ˜x)dt )−1 are both scalings that map s to x and ˜s to ˜x. These are not in general the same, though they do coincide with the usual scaling Diag(x) [Diag(s)]
−1
in the case of Linear Programming. Example 3.3. (A comparison of primal-dual local metrics for the positive semidefinite cone) We work out the integral scaling for the positive semidefinite cone Sn
+. If X is the primal iterate
and ˜X is the primal shadow of the dual iterate, then we see that (T2
P
)
−1
[H, H] = µ �� 1
0
〈 (tX + (1 − t)µ ˜ X)−1H(tX + (1 − t)µ ˜ X)−1,H 〉 dt. One can make this slightly more explicit. There always exists a U �� GL(n) such that UXU = I and Uµ ˜ XU is diagonal; one can compose a Q that orthogonally diagonalises X, an S that

Page 17
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 17
scales Q XQ to the identity matrix, and a Q that orthogonally diagonalises SQ µ ˜ XQS. Say Uµ ˜ XU = D. Then we can compute T2
D[UHU ,UHU ] = µ
�� 1
0
〈(tI + (1 − t)D)−1H(tI + (1 − t)D)−1,H〉dt. In particular, if H is Eij, we have T2
D[UEijU ,UEijU ] = µ
�� 1
0
(t + (1 − t)Di)−1(t + (1 − t)Dj)−1dt = µ lnDj − lnDi Dj − Di . Special attention needs to be given to the case when Di = Dj; here, the integral evaluates to µ/Di. If H = Eij + Ekl (with (i, j) = (k, l)), we have T2
D[U(Eij + Ekl)U ,U(Eij + Ekl)U ] given by
µ �� 1
0
(t + (1 − t)Di)−1(t + (1 − t)Dj)−1 + (t + (1 − t)Dk)−1(t + (1 − t)Dl)−1dt. This is the sum of T2
D[UEijU ,UEijU ] with T2 D[UEklU ,UEklU ], meaning that
T2
D[UEijU ,UEklU ] = ��ik��jlT2 D[UEijU ,UEijU ].
Put another way, the operator CU T2
DCU is diagonal where CU is the conjugation operator
given by CU (Z) = UZU . For every nonsingular U, the map CU preserves operator geometric means; that is, UA1/2 (A−1/2BA−1/2)1/2 A1/2U = (UAU )1/2 ((UAU )−1/2UBU (UAU )−1/2)1/2 (UAU )1/2. One can show this as follows: The geometric mean of X ≻ 0 and Y ≻ 0 is the unique positive- definite G := X1/2 (X1/2Y −1X1/2)
−1/2
X1/2 such that GX−1G = Y . Taking H = UGU , we have HU−X−1U−1H = UGU U−X−1U−1UGU = UGX−1GU = UYU . Interestingly, Moln��r [16] proved that every linear automorphism of the semidefinite cone over a complex Hilbert space that preserves geometric means is a conjugation operator. The converse is also true, since the set of automorphisms of Sn
+ is the set of conjugations given by the nonsingular
U (see a discussion in [46] of G��ler��s proof utilizing the proof technique of Waterhouse [49]), and as we observed above, it is easy to verify that the conjugation operator preserves operator geometric means. Thus, we can make a natural comparison with the Nesterov–Todd scaling given by N[H, H] = 〈(V D
1/2
V )−1H(V D
1/2
V )−1,H〉,

Page 18
18 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
where V D V is the spectral decomposition of operator geometric mean of X and µ ˜ X. Notice that N[V EijV ,VEijV ] = 〈D
−1/2
EijD
−1/2
,Eij〉 = 1 ��DiDj and that N is similarly diagonal with respect to that basis. (The conjugation CU NCU does not in general result in a diagonal matrix, so we need to take this different V instead in order to diagonalise N.) Notice that N[UEijU ,UEijU ] = ei U GUeiej U GUej whatever U is. Thus, when we form the matrix whose ij entry is N[UEijU ,UEijU ], we always obtain a rank-one matrix. However, such a matrix formed from the integral scaling T2
D
can have full rank. We proceed with an example. Consider D = (ϵ, ϵ2,...,ϵn). Notice that lnDi −lnDj = (i−j) lnϵ, while Di −Dj = ϵi −ϵj. Thus, the (i, j) entry of this matrix is on the order of ϵ− min(i,j). For sufficiently small ϵ, then, the determinant of this matrix is dominated by the product along the diagonal, which is positive—it follows that this matrix is nonsingular. 4. Local primal-dual metrics expressed as low rank updates It may be impractical in many cases to evaluate the integral scaling from Section 2 exactly or to high enough accuracy. Due to this, or perhaps due to numerical instabilities arising in computations, we may have to make do with an approximation H that does not necessarily satisfy the equations Hs = x and H˜s = ˜x. The second author [47] constructed the following low-rank update which ��fixes�� such problems encountered by any symmetric, positive-definite H: T2
H
:= H + a1x ⊗ x + g1Hs ⊗ Hs + ˜a1 ˜x ⊗ ˜x + ˜g1H˜s⊗ H˜s+ a2 [(x ⊗ ˜x) + (˜x ⊗ x)] (6) +g2 [(Hs ⊗ H˜s)+(H˜s⊗ Hs)], where a1 = ˜µ ϑ(µ˜µ − 1) , ˜a1 = µ ϑ(µ˜µ − 1) ,a2 = −1 ϑ(µ˜µ − 1) , g1 = H[˜s, ˜s] H[s, s]H[˜s, ˜s] − (H[˜s, s])
2
, ˜g1 = H[s, s] H[s, s]H[˜s, ˜s] − (H[˜s, s])
2
,g2 = H[s, ˜s] H[s, s]H[˜s, ˜s] − (H[˜s, s])
2
. The second author [47] proved that, as long as H is positive-definite, T2
H is positive-definite,
maps s to x, and maps ˜s to ˜x. As written, Equation (6) is somewhat unwieldy for purposes of analysis. It is not immediately obvious that, in the case that the pair (x, s) satisfy the centrality condition x = µ˜x (or, equivalently, µ˜µ = 1), the formula collapses to a rank-two update. (Indeed, it has a singularity there.) We prove that the above given complicated operator has an equivalent form as two, simple, consecutive updates due to the special structure of our set-up:

Page 19
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 19
Theorem 4.1. Let x, ˜x �� E and s, ˜s �� E∗ satisfy the following conditions (under the definitions ��P := x − µ˜x and ��D := s − µ˜s): • 0 < 〈˜s, x〉 = 〈s, ˜x〉 =: ϑ • 0 < 〈s, x〉 =: ϑµ • 0 < 〈˜s, ˜x〉 =: ϑ˜µ • 〈��D,��P 〉 > 0. Further let H �� S++(E∗,E∗). Then, H2 in the following formula is a symmetric, positive- definite bilinear form and it maps s to x, and maps ˜s to ˜x: H1 := H + 1 〈s, x〉 x ⊗ x − 1 〈s, Hs〉 Hs ⊗ Hs (7) H2 := H1 + 1 〈��D,��P 〉 ��P ⊗ ��P − 1 〈��D,H1��D〉 H1��D ⊗ H1��D. Proof. Notice that H1s = Hs + x − Hs = x. Notice further that 〈s, ��P 〉 = 〈��D,x〉 = 0 by expanding the dual pairing conditions, so H2s = H1s. Thus, H2 maps s to x. Next, note that H2��D = H1��D +��P −H1��D = ��P . Thus, H2 also maps ��D to ��P . Hence, H2 maps ˜s to ˜x. Clearly H2 is a linear combination of symmetric bilinear forms and hence itself is a symmetric bilinear form. We recall from the theory of quasi-Newton updates (see for instance Lemma 9.2.1 in [4]) that the ��curvature condition�� 〈s, x〉 > 0 is necessary and sufficient to guarantee that H1 is positive-definite, and, given that H1 is positive-definite, the curvature condition 〈��D,��P 〉 > 0 is necessary and sufficient for H2 to be positive-definite. Therefore, H2 is positive-definite as well. D Note that the positivity of the scalar products 〈s, x〉 and 〈��D,��P 〉, together with the orthogo- nality conditions 〈s, ��P 〉 = 〈��D,x〉 = 0 suffice for the above theorem to hold. Thus, there may be a potential use of these formulae in classical quasi-Newton approaches. Such considerations are left for future work. We remark that we can apply the above formulas after switching x and s and then inverting the resulting T2 to obtain the following low-rank updates (under the same conditions): Theorem 4.2. Let H, x, ˜x, s, and ˜s be as in Theorem 4.1. Then H2 in the following formula is symmetric, positive-definite, maps s to x, and maps ˜s to ˜x: (8) H1 := ( I − x ⊗ s 〈s, x〉 ) H ( I − s ⊗ x 〈s, x〉 ) + x ⊗ x 〈s, x〉 H2 := ( I − ��P ⊗ ��D 〈��D,��P 〉 ) H1 ( I − ��D ⊗ ��P 〈��D,��P 〉 ) + ��P ⊗ ��P 〈��D,��P 〉 .

Page 20
20 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
Proof. We again see that H1s = x and H2��D = ��P by correctness of the BFGS/DFP update. We compute H2s = ( I − ��P ⊗ ��D 〈��D,��P 〉 ) H1 ( I − ��D ⊗ ��P 〈��D,��P 〉 ) s + ��P ⊗ ��P 〈��D,��P 〉 s = ( I − ��P ⊗ ��D 〈��D,��P 〉 ) x +0= x. It is easy to verify that both H1 and H2 are symmetric bilinear forms. H1 is positive-definite because the curvature condition 〈s, x〉 > 0 is satisfied. H2 is positive-definite because the curvature condition 〈��D,��P 〉 > 0 is satisfied. D Due to the variational interpretations of quasi-Newton update formulae, we have the corre- sponding interpretations in our set-up (i.e., H1 is the closest—minimum distance—symmetric bilinear form to H satisfying H1s = x; similarly for H2 and H1). Moreover, as in [47], the above formulae can be used in convex combinations (analogous to Broyden��s convex class in the clas- sical quasi-Newton context) due to convexity of T1. Of course, we may also use the formula for H1 in Theorem 4.1 together with the formula for H2 in Theorem 4.2, etc. In Appendix B, we will look at these properties more deeply from a strict primal-dual symmetry viewpoint. 5. Primal-dual Hessian local metrics based on a single scaling point In this section, we ask and partially answer the following question: For which ϑ-LHSCBs F∗ does there exist a unique scaling point w �� int(K∗) such that F∗ (w) �� T 2
1 (x, s) for every
x �� int(K) and s �� int(K∗)? Note that, on the one hand, for every ϑ-LHSCB F∗, there exists a scaling point w �� int(K∗) such that F∗ (w) �� T 2
0 (x, s) for every x �� int(K) and s �� int(K∗) (as it was already proved by
Nesterov and Todd [30]; see [47], Theorem 3.1). On the other hand, the Nesterov–Todd scaling point given by the geodesic mean of s and −F (x) provides an example of such a w in the symmetric cone case (in this special case, F∗ (w) �� T 2
1 (x, s)). We show that this property does
not generalise to slices of a symmetric cone that are not themselves symmetric cones. Every symmetric cone is a slice of a positive semidefinite cone. Therefore, for the sake of simplicity of notation we will stick with the slices of positive semidefinite cones in the following. Lemma 5.1. Let L be a linear subspace of Sn that contains I but is not closed under matrix squaring. Then there exists a B �� L such that TrB = 0 and B2 �� L. Proof. Select a C such that C2 �� L. Let t := TrC/n and B := C − tI. Then, TrB = 0 and B2 = C2 − 2tC + t2I. The latter two terms of this expansion lie in L while C2 does not, so B2 �� L. D Given a linear subspace L, let ��L denote the orthogonal projection onto L. Proposition 5.2. Let L be a linear subspace of Sn that contains I but is not closed under matrix squaring. Choose the barrier F(X) = −ln detX for the cone K := Sn
+ ��L. (The dual cone of K

Page 21
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 21
is K∗ = {S �� L : ∀X �� K,Tr(SX) �� 0} = ��L ( S
n +).) There exist X �� int(K) and S �� int(K∗)
such that, for all W �� int(K), either F (W)[X] = S or F (W)[−F∗(S)] = −F (X). Proof. For the claim about K∗, note that K∗ = (Sn
+ �� L) ∗
�� L = (Sn
+ + L��) �� L = ��L
( S
n +),
where the second equation above uses the facts that L �� Sn
++ = ∅, (Sn +
)

= Sn
+ and L∗ = L��.
Assume for the purpose of contradiction that there are n and L such that, for every choice of X �� int(K) and S �� int(K∗), there exists a W �� int(K) for which F (W)[X] = S and F (W)[−F∗(S)] = −F (X). For Z �� int(K), we compute (e.g., by using Taylor��s formula for F(x + td), for x �� int(K), d �� L) F (Z) = −��L(Z−1) and F (Z)[H] = lim
h��0+
− 1 h ��L((Z + hH)−1 − Z−1)=��L(Z−1HZ−1). Select B �� L such that TrB = 0 and B2 �� L. Let S := X := I + ϵB; we shall choose ϵ > 0 later. (For sufficiently small ϵ, X lies in the interior of K and S in the interior of K∗.) Notice that this choice of S and X implies that S = F (W)[X]=��L(W−1XW−1). Next, we check that W = I is the only solution to S = F (W)[X]. Suppose W is such that S = F (W)[X]. Consider the problem min Tr(Z−1X) subject to Tr(ZX) �� Tr(WX) Z �� L Z ≽ 0. Notice that the objective is strictly convex (this is indeed related to the fact that the long-step Hessian estimation property holds for −ln det(·)) and that the feasible region is nonempty, compact and convex (since X is positive-definite). Thus this problem has a unique optimal solution. This optimal solution does not lie in the boundary of the positive semidefinite cone since the objective is +�� there and finite elsewhere. Thus, we may drop the constraint Z ≽ 0. Z := W/2 is a feasible solution satisfying the positive semidefiniteness constraint and the linear inequality strictly, so Slater��s condition holds. Notice that the gradient of the objective is −Z−1XZ−1. By the Karush-Kuhn-Tucker theorem applied to the problem without the explicit constraint Z succeq0, every point Z for which Tr(ZX) = Tr(WX) and there exists a �� �� 0 and l�� �� L�� such that Z−1XZ−1 − ��X + l�� = 0 is optimal. Note that Z = W is a solution with �� = 1. Note that Z := Tr(WX)I/TrX is also a solution with �� = [Tr(X)/Tr(WX)]
2
. Thus W must be a scalar multiple of I; since ��L(W−1XW−1) = X, that scalar must be 1.

Page 22
22 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
Define ˜X := −F∗(S). Then, we must have F (W)[ ˜X] = −F (X); ˜X = ��L(X−1) = X−1 − P for some P �� L��. Observe that, for sufficiently small ϵ, X−1 = I − ϵB + ϵ2B2 + O(ϵ3). Both I and B lie in L, so P = ��L�� (ϵ2B2) + O(ϵ3). We compute, for sufficiently small ϵ, n = TrS = Tr [ −F ( ˜X) ] = Tr[(X−1 − P)−1] = Tr[X + XPX + XPXPX + O(ϵ6)] = Tr[I + ϵB + P + ϵBP + ϵP B + ϵ2BPB + P2 + O(ϵ5)] = n +0+0+0+0+ ϵ2 Tr(B2P) + Tr(P2) + O(ϵ5). For the last equation above, we used the fact that Tr(P) = 0 (since P �� L�� and I �� L). Noting that P = ��L�� (ϵ2B2) + O (ϵ3), we see that ϵ2 Tr(B2P) = Tr(P2), which is the squared Frobenius norm of P. By our choice of B, this is positive. Thus, for sufficiently small ϵ, TrX = TrS = Tr [ −F ( ˜X) ] > TrX, a contradiction. D There is a five-dimensional semidefinite cone slice where a T1 scaling defined by a single point does not exist for all x and s, namely the cone of symmetric, positive semidefinite matrices with the sparsity pattern   ∗ ∗ ∗ ∗ ∗ 0 ∗ 0 ∗  . This cone (described above as a slice of 3-by-3 positive semidefinite cone) is also known as the Vinberg cone. (Here we are working with its SDP representation). It is the smallest-dimensional homogeneous cone that is not self-dual, whence not a symmetric cone. This particular subspace L contains I, and it is not closed under matrix squaring. Let us note that if L is a linear subspace of Sn that contains I and is closed under matrix squaring, then ∀U, V �� L, UV + V U = (U + V )2 − U2 − V 2 �� L. Moreover, since for every pair U, V �� L, the equation U (U2V + V U2) + (U2V + V U2)U = U2 (UV + V U)+(UV + V U)U2 is self-evident, we have an Euclidean Jordan algebra over L. Hence, (Sn
+ �� L) is an SDP
representation of a symmetric cone and indeed the function −ln det(·) : int(K) �� R is a self- scaled barrier for (Sn
+ �� L). Therefore, Proposition 5.2 proves that among all SDP-representable
cones (as a slice of Sn
+), symmetric cones are the only ones for which
{F∗ (w) : w �� int(K∗)}��T 2
1 (x, s) = ∅, ∀(x, s) �� int(K) ⊕ int(K∗),

Page 23
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 23
where F(X) := −ln det(X) (over the representing cone Sn
+).
The proof technique of Proposition 5.2 is more generally applicable. As a result, a more general and sharper characterization of the underlying behaviour is possible. This will be addressed, in detail, in future work. 6. The norm of the low-rank updates near the central path We know that if we can compute T2 �� T 2
2 (��;x, s) efficiently, with ensuring ��∗ = O(1) and
�� = O(1) for all iterates, then we will have one of the most important ingredients of a general family of primal-dual interior-point algorithms with iteration complexity O (�� ϑln(1/ϵ) ) . Up to this point, we have seen many ways of constructing T2 �� T 2
1 (x, s). However, we also
discovered that we cannot expect to have a z �� int(K∗) such that F∗ (z) �� T 2
1 (x, s), in general.
We will later present and analyse an algorithm for convex programming based on Mizuno, Todd, and Ye��s predictor-corrector approach [15]. We will first assume explicit access to oracles computing a ϑ-self-concordant primal barrier F for the primal cone and the conjugate barrier F∗ for the dual cone. We will argue later that the algorithms can be modified to work without an explicit F∗ oracle. These oracles may be expensive; it may be unreasonable (or simply unnecessary) to try to compute the dual integral scaling directly to high precision at every iteration. We thus consider computing an approximation H to T2
D and then using a low-rank update to H to get a scaling
in T2
H �� T1(x, s). Specifically, we approximate the operator integral by evaluating it at the
midpoint of the line segment joining s and µ˜s, corresponding to the two extremes of the set of symmetric bilinear positive-definite forms {F∗ (s − t��D) : t �� [0,1]}. Then, we use the low-rank updates of Section 4 to restore membership in T 2
1 (x, s).
Define š := s + µ˜s 2 and H := µF∗ (š), and take H1 and T2
H := H2 as in (7).
Our analysis hinges on the resulting scaling (local metric T2
H) being close to µF∗ (s) and
[µF (x)]
−1
(in the sense of Definition 2.13) in every iteration of the algorithm. We there- fore devote the remainder of this section to computing bounds, somehow dependent on the error in approximating T2
D by H, of the additional error introduced by the low-rank updates.
We will prove in this section that for every pair of interior points (x, s) �� int(K) ⊕ int(K∗) for which x is close to µ˜x, and hence s is close to µ˜s (in the sense that e.g., ||��D||
s
< 1/64), ��∗ is O(1) (in fact, less than 4/3), and T2
H is a 4/3-approximate solution to the SDP defining
T 2
2 (1;x, s). We made a particular choice of 1/64 for the neighbourhood parameter; indeed, a

Page 24
24 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
continuous parametrization of the following analysis with respect to the neighbourhood param- eter is possible (and implicit). More importantly, we chose a measure of proximity in the dual space for the analysis (rather than, for example, choosing a primal-dual symmetric measure of proximity). For the order of the iteration bounds this does not seem to make a difference, moreover, we wanted state at least some of the analysis in a way that would be useful in imple- mentations of the algorithms in this framework that break the symmetry in favor of the dual problem. For x �� intK, define ||h||
K,x
to be the smallest positive �� such that x + h/�� and x − h/�� both lie in K. Then, ||h||
K,x
= min{��x(h),��x(−h)}. One can check that ||·||
K,x
is a norm. Let ||h||
∗ K,xbe its dual norm. If H : E∗ ⊕ E∗ �� R is a
symmetric bilinear form, define ||H||
K,x
:= max
||u||∗
K,x��1
max
||l||∗
K,x��1
H[u, l]; this is the operator norm induced by the norm ||·||
K,x
. Good bounds on the difference between two vectors in ||·||
K,x
imply some very useful conic inequalities: Proposition 6.1. If ||u − l||
K,l
�� �� < 1, then there are x and y in K such that u = (1 − ��)l + x = (1 + ��)l − y. Proof. Let s �� K∗. Then 〈s, l〉 �� 0 and 〈s, l �� (u − l)/��〉 �� 0. Thus 〈s,(�� − 1)l + u〉 �� 0 and 〈s,(1 + ��)l − u〉 �� 0. This is true for every s �� K∗, so (�� − 1)l + u and (1 + ��)l − u both lie in K, as desired. D The operator norm still behaves as expected on rank-one matrices: Proposition 6.2. Let u, l �� E. Then the operator norm of the bilinear form u ⊗ l is ||u ⊗ l||
K,x
= ||u||
K,x
||l||
K,x
. Proof. We use the definitions, to deduce: ||u ⊗ l||
K,x
= max
||y||∗
K,x��1
max
||z||∗
K,x��1
〈u, y〉〈l, z〉 = ( max
||y||∗
K,x��1
〈u, y〉 )( max
||z||∗
K,x��1
〈l, z〉 ) = ||u||
K,x
||l||
K,x
. D The difference-of-squares bound also still holds: Proposition 6.3. Let u, l �� E. Then ||(u ⊗ u) − (l ⊗ l)||
K,x
�� ||u − l||
K,x
||u + l||
K,x

Page 25
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 25
Proof. We apply the definitions: (u ⊗ u) − (l ⊗ l) = 1 2 ([(u − l) ⊗ (u + l)] + [(u + l) ⊗ (u − l)]). Now, we apply the triangle inequality and the previous proposition. D We can control the ||·||
K,x
norm of the quasi-Newton updates from Section 4 in terms of the ||·||
K,x
norm of the correction to be made. The following theorem makes special use of the fact that s lies in K∗ and x in K to get a strong bound: Theorem 6.4. Let x �� int(K) and s �� int(K∗). Let H : E∗ ⊕ E∗ �� R be a symmetric bilinear positive-definite form. Suppose ||Hs − x||
K,x
�� ϵ1 < 1. Then, ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ Hs ⊗ Hs 〈s, Hs〉 − x ⊗ x 〈s, x〉 ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣K,x �� ϵ1 3 + ϵ1 1 − ϵ1 . Proof. By the triangle inequality, (9) ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ Hs ⊗ Hs 〈s, Hs〉 − x ⊗ x 〈s, x〉 ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣K,x �� ∣ ∣ ∣ ∣ 〈s, x − Hs〉 〈s, x〉〈s, Hs〉 ∣ ∣ ∣ ∣||Hs ⊗ Hs||
K,x
+ 1 〈s, x〉 ||(x ⊗ x) − (Hs ⊗ Hs)||
K,x
. We use Proposition 6.1 to write Hs = (1 − ϵ1)x + y = (1 + ϵ1)x − z for y and z in K. Then, 〈s, x − Hs〉 = 〈s, ϵ1x − y〉 �� ϵ1 〈s, x〉 and 〈s, x − Hs〉 = 〈s, z − ϵ1x〉��−ϵ1 〈s, x〉. Thus, |〈s, x − Hs〉| �� ϵ1 〈s, x〉. Hence, ∣ ∣ ∣ ∣ 〈s, x − Hs〉 〈s, x〉〈s, Hs〉 ∣ ∣ ∣ ∣ �� ∣ ∣ ∣ ∣ ϵ1 〈s, x〉 〈s, x〉(〈s, x〉 + 〈s, Hs − x〉) ∣ ∣ ∣ ∣ �� ϵ1 (1 − ϵ1)〈s, x〉 . Since ||Hs||
K,x
�� ||x||
K,x
+||Hs − x||
K,x
�� 1+ϵ1, it follows that the first term on the right-hand side of (9) is bounded above by ϵ1 ||Hs||2
K,x
(1 − ϵ1)〈s, x〉 �� ϵ1(1 + ϵ1)2 (1 − ϵ1)〈s, x〉 . The second term on the right-hand side of (9) can be bounded above by (2 + ϵ1)ϵ1 〈s, x〉 . This implies the desired bound. D The following theorem applies more generally, but the bound is weaker: Theorem 6.5. Let x �� int(K). Further let ��P �� E and ��D �� E∗ and H and T2 : E∗ ⊕ E∗ �� R be a pair of symmetric, positive-definite bilinear forms such that

Page 26
26 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
• 〈��D,��P 〉 > 0; • T2��D = ��P ; • there exists �� < 1 such that, for every z �� E∗, (1 − ��)2T2[z,z] �� H[z,z] �� T2[z,z] (1 − ��)2 ; • there exists ϵ2 < 1 such that ||H��D − ��P ||
K,x
�� ϵ2 ||H��D||
K,x
. Then, ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ��P ⊗ ��P 〈��D,��P 〉 − H��D ⊗ H��D 〈��D,H��D〉 ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣K,x �� (��(2 + ��) 1 − 4�� + ϵ2(2 + ϵ2) ) ||H��D||2
K,x
〈��D,��P 〉 . Proof. Note that ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ��P ⊗ ��P 〈��D,��P 〉 − H��D ⊗ H��D 〈��D,H��D〉 ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣K,x �� ∣ ∣ ∣ ∣ 〈��D,H��D − ��P 〉 〈��D,��P 〉〈��D,H��D〉 ∣ ∣ ∣ ∣||H��D ⊗ H��D||
K,x
+ 1 〈��D,��P 〉 ||(��P ⊗ ��P ) − (H��D ⊗ H��D)|| �� ∣ ∣ ∣ ∣ 〈��D,H��D − ��P 〉 〈��D,H��D〉 ∣ ∣ ∣ ∣ ||H��D||2
K,x
〈��D,��P 〉 + ϵ2(2 + ϵ2)||H��D||2
K,x
〈��D,��P 〉 . We write 〈��D,H��D − ��P 〉 = 〈��D,(H − T2)��D〉. Then, ((1 − ��)2 − 1)〈��D,T2��D〉 �� 〈��D,(H − T2)��D〉 �� ( 1 (1 − ��)2 − 1 ) 〈��D,T2��D〉. Thus, |〈��D,H��D − ��P 〉| �� 2�� + ��2 (1 − ��)2 〈��D,��P 〉. Writing 〈��D,H��D〉 = 〈��D,��P 〉 + 〈��D,H��D − ��P 〉, it follows that ∣ ∣ ∣
〈��D,H��D−��P 〉 〈��D,H��D〉
∣ ∣ ∣ ��
2��+��2 1−4��
. Hence, ∣ ∣ ∣ ∣ 〈��D,H��D − ��P 〉 〈��D,��P 〉〈��D,H��D〉 ∣ ∣ ∣ ∣||H��D ⊗ H��D||
K,x
�� 2�� + ��2 1 − 4�� ||H��D||2
K,x
〈��D,��P 〉 . The desired result follows. D Now that we have seen a high-level analysis on bounding the norms of low rank updates in a neighbourhood of the central path, we see that providing upper bounds on ||Hs − x||
∗ s
, ||H��D − ��P ||
∗ s
, and lower bounds on 〈��D,��P 〉, 〈��D,H��D〉, and 〈��D,H1��D〉 will be useful in the complexity analysis as well as in designing robust implementations of the algorithms. For the rest of the analysis, we will increase the use of explicit absolute constants, for the sake of concreteness. We either write these constants as ratios of two integers or as decimals which

Page 27
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 27
represent rationals with denominator: 106 (whence, we are able to express the latter constants exactly as decimals with six digits after the point). Theorem 6.6. Assume ||��D||
s
�� 1/64, and take H := µF∗ (š) and H1 := H + x ⊗ x 〈s, x〉 − Hs ⊗ Hs 〈s, Hs〉 . Then, ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ��P ⊗ ��P 〈��D,��P 〉 − H1��D ⊗ H1��D 〈��D,H1��D〉 ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣
∗ s
�� 3.502346µ||��D||s. Proof. We write ��P ⊗ ��P 〈��D,��P 〉 − H1��D ⊗ H1��D 〈��D,H1��D〉 = 1 〈��D,��P 〉 [(��P ⊗ ��P ) − (H1��D ⊗ H1��D)] + ( 1 〈��D,H1��D〉 − 1 〈��D,��P 〉 ) H1��D ⊗ H1��D. Notice that ∣ ∣ ∣ ∣ 1 〈��D,H1��D〉 − 1 〈��D,��P 〉 ∣ ∣ ∣ ∣ = ∣ ∣ ∣ ∣ 〈��D,H1��D − ��P 〉 〈��D,H1��D〉〈��D,��P 〉 ∣ ∣ ∣ ∣ �� ||��D||s||H1��D − ��P ||∗
s
|〈��D,H1��D〉〈��D,��P 〉| �� (1065087000000/907152721283) 1 µ||��D||s . Further, recall that ||H1��D||∗
s �� 1.024263µ||��D||s. Thus, the second term��s norm is bounded
above by 1.231765µ||��D||s. Using the lower bound on 〈��D,��P 〉 and the upper bounds on ||H1��D − ��P ||∗
s and ||H1��D + ��P ||∗ s, we get a bound on the first term��s norm of
(1.065087) · (2.048265) 0.960803 µ||��D||s �� 2.270581µ||��D||s. Adding the bounds on the two terms together gives the advertised bound. D Theorem 6.7. Assume ||��D||
s
�� 1/64, and take H := µF∗ (š), H1 := H + x ⊗ x 〈s, x〉 − Hs ⊗ Hs 〈s, Hs〉 ,and T2 := H1 + ��P ⊗ ��P 〈��D,��P 〉 − H1��D ⊗ H1��D 〈��D,H1��D〉 . Then, ||T2 − H||
∗ s
�� 9.859785µ||��D||
s
�� 0.154060µ. Proof. We consider the two rank-two updates separately; Theorem B.3 controls the size of the first rank-two update and Theorem 6.6 controls the size of the second update. We simply add the two bounds together. D

Page 28
28 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
Theorem 6.8. If ||��D||s �� 1/64, then there exists a T2 �� S++(E∗,E∗) satisfying the following properties: • T2s = x; • T2˜s = ˜x; • 0.8303376µF∗ (s) ≼ T2 ≼ 1.169871µF∗ (s); • 0.825446
µ
[F (x)]
−1
≼ T2 ≼ 1.174800
µ
[F (x)]
−1
. That is, T2 �� T 2
2 (1.211468;x, s).
Proof. We use Theorem 6.7 which gives a T2 �� T 2
1 (x, s) such that ||T2 − µF∗ (š)|| �� 0.154060µ.
Now, we use the Dikin ellipsoid bound, the fact that
1
(1−||��D||s)2 �� 1.031998 and Theorem 2.5 to obtain the last two operator relations. D Note that in the above analysis, we did not utilize the additional flexibility provided by the term (µ˜µ − 1). This establishes, in the language of [47], that ��∗ is O(1) within a particular neighbourhood of the central path. Moreover, our specific choice TH is in T2(��;x, s) for �� = O(1), for every pair (x, s) that is in the same neighbourhood. Therefore, Theorem 5.1 of [47] implies that a wide range of potential reduction algorithms (whose iterates are restricted in a neighbourhood of the central path) have the iteration com- plexity of O (�� ϑln (1/ϵ) ) . 6.1. Bounds in v-space. In this subsection and the next section, we assume that some suitable bases (and the underlying inner product) have been chosen for the underlying spaces (and we identify both E and E∗ with Rn) and for the sake of concreteness, we write A for the underlying matrix representation of A etc. The following lemma is useful to the convergence analysis in the next section. Lemma 6.9. Suppose x �� int(K) and s �� int(K∗) are such that ||��D||s < 1/64. Take T2 as in Theorem 6.8 and take T �� Sn
++ to be its unique symmetric positive-definite square root. Let
v := Ts = T−1x and ��v := T��D. Let z be an arbitrary vector in v-space. Let x �� int(K) and s �� int(K∗); define ��D := s + µF (x ) and ��v := T��D. Then, (1) ||Tz|| �� 1.081606 �� µ||z||
s
; (2) ||z||
s
�� 1.097395||Tz||/ �� µ; (3) ||��v|| �� 0.016901 �� µ; (4) ||��D||s ��
||��D||s 1−||s−s ||s
; (5) if ||��v|| �� 0.003703 �� µ and ||s − s ||s �� 1/25, then ||��D||s �� 0.004234; (6) if ||��v|| �� 0.010519 �� µ and ||s − s ||s �� 1/25, then ||��D||s �� 1/64.

Page 29
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 29
Proof. Recall from Theorem 6.8 that (10) 0.830376µ||z||2
s �� ||Tz||2 �� 1.1169871µ||z||2 s.
(1) This is the square root of ||Tz||
2 �� 1.169871µ||z||2 s
with a constant rounded up. (2) This is the square root of µ||z||
2 s
��
1 0.830376
||Tz||2 with a constant rounded up. (3) Take z = ��D in part (1) and then use ||��D||s �� 1/64; we see ||��v|| < 1.081606 �� µ||��D||s �� 0.016901 �� µ, as desired. (4) This is the Dikin ellipsoid bound for comparing the s -norm with the s-norm. (5) If ||��v|| �� 0.003703 �� µ, then by part (2) ||��D||s �� 0.004064. By part (4), then, ||��D||s �� 0.004234, as desired. (6) If ||��v|| �� 0.010519 �� µ, then by part (2) ||��D||s �� 0.011544. By part (4), then, ||��D||s �� 0.012025, which implies the desired result. D 7. Algorithms In this section, we complete our proof of O( �� ϑln 1
ϵ
) iteration complexity bounds on variants of the following feasible-start primal-dual interior point algorithm with different choices of �� and ��: Take k := 0 and (x(0),y(0),s(0)) to be feasible and central (we can also accept approximately central points). while (s(k)) x(k) > ϵϑ do Let µk := (s(k)) x(k)/ϑ. Take T2 as in Theorem 6.7. Select �� �� [0,1]. Solve   0 A I A 0 0 I 0 T2     dx dy ds   =   0 0 −x(k) − ��µkF∗(s)  . Select ��k �� [0,��). (x(k+1),y(k+1),s(k+1)) �� (x(k),y(k),s(k)) + ��k(dx,dy,ds). k �� k + 1. end while Lemma 7.1. Let rv := −v + ��µw. Then, the system of equations in Line 3 of the above algorithm imply T−1dx = ��ker(AT)rv, Tds = ��
im((AT) )rv.

Page 30
30 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
In particular, ||T−1dx|| �� ||rv|| and ||Tds|| �� ||rv||. Proof. The third equation ensures that T−1dx + Tds = rv. The first two equations imply that T−1dx must lie in ker(AT) while Tds must lie in im((AT) .) Since these two linear spaces are orthogonal, the result follows. D The following result does not hint at quadratic convergence. However, a tighter analysis of the low-rank updates showing that the approximation error is linear in ||��D||s within the 1
64
- neighbourhood would suffice to establish quadratic convergence. This is not hard to do, since the ingredients are already given above. We do not do this here, because quadratic convergence of centering is not needed to establish the desired complexity result. Lemma 7.2. Suppose x(k) �� int(K) and s(k) �� int(K∗) define a feasible solution. If ��k = 1 and ��k = 1 and ||��k
D||s(k) �� 1 64
, then • Ax(k+1) = b and A y(k+1) + s(k+1) = c. • x(k+1) �� int(K) and s(k+1) �� int(K∗). • ||��
(k+1) D
||s(k+1) �� 0.004234. • µk+1 = µk. Proof. The system of linear equations that determine dx, dy, and ds guarantee that dx �� kerA and ds = −A dy; since Ax(k) = b and A y(k) + s(k) = c, it follows that Ax(k+1) = b and A y(k+1) +s(k+1) = c. We drop the superscript k when speaking of the kth iterate in this proof. Notice that, with this choice of ��, T−1dx + Tds = −��v. Since ||dx||x �� �� 1.174800 �� µ ||T−1dx|| �� �� 1.174800 �� µ ||��v|| �� 0.018319 < 1, strict primal feasibility is retained (for the last inequality above, we used Lemma 6.9 part (3)). A similar argument shows that strict dual feasibility is retained. For the first equation below, we use the identity F (x + dx) = F (x) + ��
1 0
F (x + ��dx)dxd�� and obtain ||T(s + ds + µF (x + dx))|| = ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣T [ ��D + ds + µ (�� 1
0
F (x + ��dx)d�� ) dx ]∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ �� ||��v + Tds + T−1dx|| + ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣T [ T−2 − µ (�� 1
0
F (x + ��dx)d�� )] dx ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣.

Page 31
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 31
The first term is, of course, zero. However, notice that, by the Dikin ellipsoid bound and Theorem 6.8 µ (�� 1
0
F (x + ��dx)d�� ) ≼ 1.219055T−2 and, similarly, µ (�� 1
0
F (x + ��dx)d�� ) ≽ 0.795480T−2 We therefore bound, using Lemma 6.9 part (3), ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣T [ T−2 − µ (�� 1
0
F (x + ��dx)d�� )] dx ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ �� 0.219055∣∣∣∣T−1dx ∣ ∣ ∣ ∣ = 0.219055||��v|| < 0.003703 �� µ, which implies, by Lemma 6.9 part (5), the advertised bound on the new ||��D||s. As we observed in Section 2, µ is unchanged by a centering iteration. D Lemma 7.3. Suppose ||��
(k) D ||s(k) �� 0.004234, let ��k := 0 and ��k := 0.005 �� ϑ
. Then, • Ax(k+1) = b and A y(k+1) + s(k+1) = c; • x(k+1) �� int(K) and s(k+1) �� int(K∗); • µk+1 �� (1 − ��k)µk; • ||��
(k+1) D
||s(k+1) �� 1
64
. Proof. We recall that ||��D||s �� 0.004234 means that, by Lemma 6.9 part (1), ||��v|| �� 0.004580 �� µ. Notice that by Theorem ??, we have ||dx||x �� ��1.174800 µk ||T−1dx|| �� ��1.174800 µk ||v|| �� 1.0839 �� ϑ. Consequently, any step with �� < 1
2 �� ϑ
retains strict primal feasibility. A similar analysis (due to the primal-dual symmetry of our set-up) reveals that ||ds||s �� 1.097393 �� ϑ and hence the dual step retains strict dual feasibility for �� similarly bounded. Notice that ||��kds||s �� 1/25; this permits us to use Lemma 6.9 part (6) later. As we observed in Section 2, 〈s(��),x(��)〉 = (1 − ��)ϑµ. This establishes the desired reduction in µ.

Page 32
32 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
We compute ∣ ∣ ∣ ∣ ∣ ∣T��
(k+1) D
∣ ∣ ∣ ∣ ∣ ∣ = ||T [s + ��ds + µF (x + ��dx)]|| = ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣T [ s + ��ds + µF (x) + ��µ (�� 1
0
F (x + �Ӧ�dx)d�� ) dx ]∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ �� ||��v|| + �� ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣Tds + µT (�� 1
0
F (x + �Ӧ�dx)d�� ) dx ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣, where for the second equation above, we used the identity F (x + ��dx) = F (x) + �� (�� 1
0
F (x + �Ӧ�dx)d�� ) dx. Let us write E := µ (�� 1
0
F (x + �Ӧ�dx)d�� ) − T−2. Then, by Theorem 6.8, and the Dikin ellipsoid bound, we obtain −0.183477T−2 ≼ E ≼ 0.1876638T−2. We thus get an upper bound of ∣ ∣ ∣ ∣ ∣ ∣T��
(k+1) D
∣ ∣ ∣ ∣ ∣ ∣ �� ||��v|| + ��k ∣ ∣ ∣ ∣Tds + T (E + T−2)dx ∣ ∣ ∣ ∣ �� ||��v|| + ��k ∣ ∣ ∣ ∣Tds + T−1dx ∣ ∣ ∣ ∣ + ��k ||TEdx|| �� ||��v|| + ��k ||v|| + ��k ||TET||∣∣∣∣T−1dx ∣ ∣ ∣ ∣ �� ||��v|| + ��k (1 + ||TET||)||v|| �� 0.004580 �� µ + (0.005) · (1.187638) �� µ < 0.010519 �� µ. The norm in the expression ||TET|| above is the spectral (operator) 2-norm. This implies, by Lemma 6.9 part (6), that ∣ ∣ ∣ ∣ ∣ ∣��
(k+1) D
∣ ∣ ∣ ∣ ∣ ∣s(k+1) �� 1
64
, as desired. D We immediately have Corollary 7.4. Starting from an initial, primal-dual pair of feasible and central points, one can alternately apply the predictor and corrector steps outlined from the last two lemmata and recover an algorithm that takes at most 278 �� ϑ iterations to reduce µ by a factor of two. In particular, this gives an O (�� ϑln (1/ϵ) ) bound on the iteration complexity of the algorithm using this choice of ��. Proof. By Lemma 7.3, µ is reduced by a factor of ( 1 − 0.005
�� ϑ
) in each iteration performing a predictor step. By Lemma 7.2 every corrector step (which follows every predictor step) gets us in a neighbourhood of the central path given by ||��D||
s
�� 1/64 without changing µ. Therefore,

Page 33
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 33
to provide an upperbound on the number of predictor steps (to reduce µ by a factor of two), it suffices to solve the inequality ( 1 − 0.005 �� ϑ )k �� 1 2 , for a (small) positive integer k. We obtain (using the fact that ln(1 − ��) �� −��, for all �� < 1), choosing k �� ln(2) 0.005 �� ϑ suffices. Doubling such a feasible k (to account for the corrector steps) yields the claimed bound. D 8. Hyperbolicity cone programming: The hyperbolic barriers special case In this section, we assume that we are given access to a hyperbolic polynomial p and K is a corresponding hyperbolicity cone. As we mentioned earlier, on the one hand, F(x) := −ln(p(x)) is a self-concordant barrier for K with long-step Hessian estimation property. On the other hand, we do not necessarily have explicit and efficient access to F∗ or its derivatives; moreover, F∗ will not have the long-step Hessian estimation property, unless K is a symmetric cone. We will discuss two issues: • How do we evaluate F∗? • Can we compute the primal integral scaling? 8.1. Evaluating the dual barrier. Given an oracle returning F(x), F (x), and F (x) on input x, we describe an algorithm for approximating F��s Legendre-Fenchel conjugate, F∗, and discuss its convergence. loop r �� F (x) + s if ||r||∗
x < ϵ then return x
end if N �� −[F (x)]
−1
r x �� x + N end loop Intuitively, this is steepest descent in the local x-norm (i.e., Newton��s method for solving F (x) + s = 0). This algorithm is locally quadratically convergent, since the dual s-norm is well-approximated by the dual x-norm when −F (x) is ��close to�� s. In particular, one can show that if ||r||∗
x �� 1 4
, then the x-norm of the new residual is at most 16
27
of that of the old. This implies that the dual s-norm of the new residual is at most 64
81
of that of the old residual,

Page 34
34 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
ensuring descent. The dual s-norm is scaled by a factor of at most ||r||x/(1 − ||r||x)4 �� 3.2||r||x in each iteration, establishing local quadratic convergence. Note that replacing s with −F (x), for some sufficiently good approximation x, can degrade the complementarity gap and the measure of centrality ||��P ||x and ruins the equation A y + s = c. Thus one needs to work with an infeasible interior-point method or a self-dual embedding technique in the absence of an exactly-evaluated dual barrier. However, it is straightforward to bound the local s-norm of the increase in residual by ||s + F (x)||s. We can use various integration techniques to evaluate the primal integral scaling as we outline in Appendix C. 9. Conclusions and future work We presented a new primal-dual scaling map based on a line integral for convex programming where only a ϑ-LHSCB is supplied. We derived some properties of this scaling, notably that it points to the richness of potential primal-dual local metrics, enriches the connection of primal- dual interior-point methods to Riemannian geometry. We presented a new analysis of low-rank updates of [47] showing that, if one is close to the central path and one begins with a certain approximation to the integral scaling, the low-rank update has small norm close to the central path. Some steps of this analysis were peculiar to the particular approximation chosen; we leave it to future work to generalise the analysis to something depending more directly on the approximation error. We presented a generalization of the Mizuno-Todd-Ye predictor-corrector scheme that uses the above tools and showed that it matches the current best, worst-case iteration complexity of O( �� ϑln(1/ϵ)), of the special case of symmetric cone programming. We presented an algorithm for computing an approximation to the conjugate barrier given an oracle that computes the primal barrier and discussed some bounds that, within the context of an infeasible-start interior-point method or a self-dual embedding technique (see [51, 33]), do not degrade the worst-case iteration complexity. We presented two techniques based on Gaussian quadrature for evaluating the new primal-dual scaling map for hyperbolic barrier functions; one exact, and one with bounded approximation error. Again, we leave to future work the problem of tying such an approximation error bound to a bound on the magnitude of the necessary low-rank update to the approximation. References
[1] Heinz H. Bauschke, Osman G��ler, Adrian S. Lewis, and Hristo S. Sendov. Hyperbolic polynomials and convex analysis. Canad. J. Math., 53(3):470–488, 2001. [2] Chek Beng Chua. The primal-dual second-order cone approximations algorithm for symmetric cone pro- gramming. Found. Comput. Math., 7(3):271–302, 2007.

Page 35
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 35
[3] Chek Beng Chua. A T-algebraic approach to primal-dual interior-point algorithms. SIAM J. Optim., 20(1):503–523, 2009. [4] John E. Dennis, Jr. and Robert B. Schnabel. Numerical methods for unconstrained optimization and non- linear equations. Prentice Hall Series in Computational Mathematics. Prentice Hall, Inc., Englewood Cliffs, NJ, 1983. [5] Yves Genin, Yvan Hachez, Yu Nesterov, and Paul Van Dooren. Optimization problems over positive pseu- dopolynomial matrices. SIAM Journal on Matrix Analysis and Applications, 25(1):57–79, 2003. [6] Osman G��ler. Hyperbolic polynomials and interior point methods for convex programming. Math. Oper. Res., 22(2):350–377, 1997. [7] Raphael Hauser. The Nesterov-Todd direction and its relation to weighted analytic centers. Found. Comput. Math., 4(1):1–40, 2004. [8] Raphael A. Hauser and Osman G��ler. Self-scaled barrier functions on symmetric cones and their classifi- cation. Found. Comput. Math., 2(2):121–143, 2002. [9] Raphael A. Hauser and Yongdo Lim. Self-scaled barriers for irreducible symmetric cones. SIAM J. Optim., 12(3):715–723, 2002. [10] B. Jansen, C. Roos, and T. Terlaky. A polynomial primal-dual Dikin-type algorithm for linear programming. Math. Oper. Res., 21(2):341–353, 1996. [11] N. Karmarkar. A new polynomial-time algorithm for linear programming. Combinatorica, 4(4):373–395, 1984. [12] M. Kojima, N. Megiddo, T. Noma, and A. Yoshise. A unified approach to interior point algorithms for linear complementarity problems, volume 538 of Lecture Notes in Computer Science. Springer-Verlag, Berlin, 1991. [13] Masakazu Kojima, Shinji Mizuno, and Akiko Yoshise. An O( �� nL) iteration potential reduction algorithm for linear complementarity problems. Math. Programming, 50(3, (Ser. A)):331–342, 1991. [14] Yongdo Lim. Maximum-volume symmetric gauge ball problem on the convex cone of positive definite matrices and convexity of optimal sets. SIAM J. Optim., 21(4):1275–1288, 2011. [15] Shinji Mizuno, Michael J. Todd, and Yinyu Ye. On adaptive-step primal-dual interior-point algorithms for linear programming. Math. Oper. Res., 18(4):964–981, 1993. [16] Lajos Moln��r. Maps preserving the geometric mean of positive operators. Proc. Amer. Math. Soc., 137(5):1763–1770, 2009. [17] Renato D. C. Monteiro, Ilan Adler, and Mauricio G. C. Resende. A polynomial-time primal-dual affine scaling algorithm for linear and convex quadratic programming and its power series extension. Math. Oper. Res., 15(2):191–214, 1990. [18] Tor Myklebust and Levent Tunçel. Hyperbolic cone programming: Structure and interior-point algorithms. Talk presented at the SIAM Applied Algebraic Geometry Conference, Fort Collins, CO, USA, 2013. [19] Tor Gunnar Josefsson Myklebust. On primal-dual interior-point algorithms for convex optimisation. PhD thesis, Department of Combinatorics and Optimization, University of Waterloo, 2015. [20] Arkadi Nemirovski and Levent Tunçel. ��Cone-free�� primal-dual path-following and potential-reduction polynomial time interior-point methods. Math. Program., 102(2, Ser. A):261–294, 2005. [21] Yu. Nesterov. Long-step strategies in interior-point primal-dual methods. Math. Programming, 76(1, Ser. B):47–94, 1997. Interior point methods in theory and practice (Iowa City, IA, 1994). [22] Yurii Nesterov. Squared functional systems and optimization problems. In High performance optimization, pages 405–440. Springer, 2000. [23] Yurii Nesterov. Introductory lectures on convex optimization, volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA, 2004. A basic course. [24] Yurii Nesterov. Parabolic target space and primal-dual interior-point methods. Discrete Appl. Math., 156(11):2079–2100, 2008. [25] Yurii Nesterov. Towards non-symmetric conic optimization. Optim. Methods Softw., 27(4-5):893–917, 2012.

Page 36
36 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
[26] Yurii Nesterov and Arkadi Nemirovski. Multi-parameter surfaces of analytic centers and long-step surface- following interior point methods. Math. Oper. Res., 23(1):1–38, 1998. [27] Yurii Nesterov and Arkadi Nemirovski. Primal central paths and Riemannian distances for convex sets. Found. Comput. Math., 8(5):533–560, 2008. [28] Yurii Nesterov and Arkadii Nemirovskii. Conic duality and its applications in convex programming. Optim. Methods Softw, 1:95–115, 1992. [29] Yurii Nesterov and Arkadii Nemirovskii. Interior-point polynomial algorithms in convex programming, vol- ume 13 of SIAM Studies in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1994. [30] Yurii Nesterov and Michael J. Todd. Self-scaled barriers and interior-point methods for convex program- ming. Math. Oper. Res., 22(1):1–42, 1997. [31] Yurii Nesterov and Michael J. Todd. Primal-dual interior-point methods for self-scaled cones. SIAM J. Optim., 8(2):324–364, 1998. [32] Yurii Nesterov and Michael J. Todd. On the Riemannian geometry defined by self-concordant barriers and interior-point methods. Found. Comput. Math., 2(4):333–361, 2002. [33] Yurii Nesterov, Michael J. Todd, and Yinyu Ye. Infeasible-start primal-dual methods and infeasibility detectors for nonlinear programming problems. Math. Program., 84:227–267, 1999. [34] Yurii Nesterov and Levent Tunçel. Local superlinear convergence of polynomial-time interior-point methods for hyperbolicity cone optimization problems. SIAM Journal on Optimization, 26(1):139–170, 2016. [35] James Renegar. A polynomial-time algorithm, based on Newton��s method, for linear programming. Math. Programming, 40(1, (Ser. A)):59–93, 1988. [36] James Renegar. A mathematical view of interior-point methods in convex optimization. MPS/SIAM Series on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Programming Society (MPS), Philadelphia, PA, 2001. [37] James Renegar. Hyperbolic programs, and their derivative relaxations. Found. Comput. Math., 6(1):59–79, 2006. [38] James Renegar. Central swaths: a generalization of the central path. Found. Comput. Math., 13(3):405–454, 2013. [39] James Renegar. Primal-dual algorithms for optimization over hyperbolicity cones. Talk presented at the SIAM Applied Algebraic Geometry Conference, Fort Collins, CO, USA, 2013. [40] James Renegar and Mutiara Sondjaja. A polynomial-time affine-scaling method for semidefinite and hy- perbolic programming. arXiv:1410.6734, 2014. [41] R. Tyrrell Rockafellar. Convex analysis. Princeton Mathematical Series, No. 28. Princeton University Press, Princeton, N.J., 1970. [42] Stefan H. Schmieta. Complete classification of self-scaled barrier functions. Tech. Report, Dept. of IEOR, Columbia Univ., NY, USA, 2000. [43] Josef Stoer, Roland Bulirsch, Richard H. Bartels, Walter Gautschi, and Christoph Witzgall. Introduction to numerical analysis. Texts in applied mathematics. Springer, New York, 2002. [44] Jos F. Sturm and Shuzhong Zhang. Symmetric primal-dual path-following algorithms for semidefinite programming. In Proceedings of the Stieltjes Workshop on High Performance Optimization Techniques (HPOPT ��96) (Delft), volume 29, pages 301–315, 1999. [45] Levent Tunçel. Primal-dual symmetry and scale invariance of interior-point algorithms for convex opti- mization. Math. Oper. Res., 23(3):708–718, 1998. [46] Levent Tunçel. Potential reduction and primal-dual methods. In Handbook of semidefinite programming, volume 27 of Internat. Ser. Oper. Res. Management Sci., pages 235–265. Kluwer Acad. Publ., Boston, MA, 2000.

Page 37
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 37
[47] Levent Tunçel. Generalization of primal-dual interior-point methods to convex optimization problems in conic form. Found. Comput. Math., 1(3):229–254, 2001. [48] Levent Tunçel. Polyhedral and semidefinite programming methods in combinatorial optimization, volume 27 of Fields Institute Monographs. American Mathematical Society, Providence, RI; Fields Institute for Re- search in Mathematical Sciences, Toronto, ON, 2010. [49] William C. Waterhouse. Linear transformations preserving symmetric rank one matrices. J. Algebra, 125(2):502–518, 1989. [50] Hua Wei. Convergence analysis of generalized primal-dual interior-point algorithms for linear optimization. Department of Combinatorics and Optimization, Faculty of Mathematics, Waterloo, Ontario, Canada, 2002. Thesis (M.Math.)–University of Waterloo. [51] Yinyu Ye, Michael J. Todd, and Shinji Mizuno. An O( �� nL)-iteration homogeneous and self-dual linear programming algorithm. Math. Oper. Res., 19(1):53–67, 1994.
Appendix A. Primal-dual symmetry based on the local metrics In [47] a very general framework for primal-dual symmetric algorithms were provided. In this section, we make the richness for the choice of such algorithms within the framework provided by the sets T0,T1,T2, more explicit. Basically, every consistent choice of a local primal-dual metric T2 from any of the sets T0,T1,T2 can be used to design a primal-dual symmetric interior-point algorithm as we prove below. Proposition A.1. Let (x, s) �� int(K) ⊕ int(K∗). Then, for every pair H, T �� T 2
0 (x, s),
1 2 (H + T), (H−1 + T−1 2 )−1 �� T 2
0 (x, s).
The same property holds for T 2
1 (x, s) and for T 2 2 (��;x, s) (for every �� �� 1).
Proof. Follows from the definitions and convexity of T 2
0 ,T 2 1 ,T 2 2 .
D Corollary A.2. For every convex cone K ⊂ E and for every pair (x, s) �� int(K) ⊕ int(K∗), the sets T 2
0 (x, s), T 2 1 (x, s) and T 2 2 (��;x, s) (for every �� �� 1) are geodesically convex.
Proof. Since T 2
2 (��;x, s) is a closed subset of Sn ++, employing Proposition A.1 above and Lemma
2.3 of [14], we conclude that T 2
2 (��;x, s) is geodesically convex for all ��. For T0 and T1 we can
adapt the proof of the same lemma (even though our sets are not closed, we can argue that all limits of all mean iterations on the elements of T 2
0 and T 2 1 are positive-definite and hence stay
in the corresponding set). D The above corollary indicates a way to convert any consistent choice of T2 to a scaling for a primal-dual symmetric interior-point algorithm in the sense of [45]. (Simply take the operator geometric mean of the consistent choice of T2 with the inverse of the same formula for T2 applied to (P) and (D) switched.)

Page 38
38 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
Appendix B. Details of the complexity and error analysis B.1. Zeroth-order low-rank update. We bound the operator norm of the zeroth-order low- rank update in a small neighbourhood of the central path defined by the condition ||��D||s �� 1
64
. Theorem B.1. Assume ||��D||
s
�� 1/64. Then (1) ||F∗ (s)��D||∗
s = ||��D||s;
(2) ||F∗ (š)��D||
∗ s
�� 1.031998||��D||
s
; (3) for every v �� E∗, |〈v, µF∗ (š)��D − ��P 〉| �� 0.524190µ||v||
s
||��D||2
s
; (4) ||µF∗ (š)��D − ��P ||
∗ s
�� 0.524190µ||��D||2
s
. Proof. (1) This follows straightforwardly from the definitions. (2) Recall that F∗ (¯s) ≼
1 (1−||��D||)2 F∗ (s). Now we apply the previous part. Next, we notice
that
1 (1−||��D||s)2 �� 1.031998.
(3) Let f(t) = 〈u, F∗(š+ t��D)〉. We consider an order-two Taylor expansion of f around zero; we see that, for every t �� [−1/2,1/2], there exists a ¯t�� [min(0,t),max(0,t)] such that f(t) = f(0) + tf (0) + 1 2 t2f (¯t). Notice that |f (¯t)| = |F∗ (š+ ¯t��D)[u, ��D,��D]| �� 2||u||||��D||2, where both norms are the local (š+ ¯t��D) norms (we used self-concordance property of F∗). We then apply the Dikin ellipsoid bound to these norms to relate ||u||š+¯t��D �� 1 1 − ||��D||s ||u||s and similarly for ||��D||. Consequently, using 1/(1 − ||��D||s) �� 1.015874, we see that |f (¯t)| �� 1.048378||u||s||��D||2
s.
Thus, for some ¯t1 �� [−1/2,0] and ¯t2 �� [0,1/2], we have f(1/2) − f(−1/2) = f (0) + 1 8 (f (¯t1) − f (¯t2)). Consequently, |f (0) − f(1/2) + f(−1/2)| �� 0.262094||u||s||��D||2
s.
Notice that, by substitution and the chain rule, • f (0) = 〈u, F∗ (š)��D〉; • f(−1/2) = 〈u, F∗(µ˜s)〉 = −〈u, x/µ〉; • f(1/2) = 〈u, F∗(s)〉 = −〈u, ˜x〉.

Page 39
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 39
The claimed bound now follows. (4) We use the definition of a dual norm: ||F∗ (š)��D − x/µ + ˜x||∗
s
= sup
||u||s=1
〈u, F∗ (š)��D − x/µ + ˜x〉 �� sup
||u||s=1
0.524190||u||s||��D||2
s = 0.524190||��D||2 s.
D Lemma B.2. Assume ||��D||
s
�� 1/64. Then, (1) for every v �� E∗, |〈v, F∗ (š)[��D]〉| �� 1.015811||v||
s
||��D||
s
; (2) for every v �� E∗ and ¯s �� [s, µ˜s], |F∗ (¯s)[v, ��D,µ˜s]| �� 2.096758||v||
s
||��D||
s
�� ϑ; (3) ||Hs − x||
∗ s
�� 2.064190 �� ϑµ||��D||
s
; (4) ||x||
∗ s
�� 1.015874 �� ϑµ; (5) ||Hs||
∗ s
�� 1.015811 �� ϑµ; (6) ||Hs + x||
∗ s
�� 2.031685 �� ϑµ; (7) 〈s, Hs〉 �� 0.984436ϑµ. Proof. (1) We compute, using Cauchy-Schwarz and the Dikin ellipsoid bound, |〈v, F∗ (š)[��D]〉| = |F∗ (š)[��D,v]| �� ||��D||
š
||v||
š
�� 1.015810||��D||
s
||v||
s
. (2) We compute |F∗ (¯s)[v, ��D,µ˜s]| �� 2||v||
¯s
||��D||
¯s
||µ˜s||
¯s
�� 2.096758||v||
s
||��D||
s
||µ˜s||
µ˜s
. (3) We write Hs = µF∗ (š)(µ˜s) + µF∗ (š)��D. On the first term, we perform a Taylor expansion around µ˜s; for every v there is a ¯s on the line segment between s and µ˜s such that F∗ (š)[µ˜s, v] = F∗ (s − ��D)[µ˜s, v] + 1 2 F∗ (¯s)[µ˜s, ��D,v] = 〈v, x〉/µ + 1 2 F∗ (¯s)[µ˜s, ��D,v]. �� 〈v, x〉/µ + 1.048379||v||
s
||��D||
s
�� ϑ. We also bound (using the Dikin ellipsoid bound first, followed by Lemma 7.2) 〈v, F∗ (š)��D〉 �� 1.015811||v||s||��D||s.

Page 40
40 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
Adding these bounds and taking a supremum over all v such that ||v||s = 1, since ϑ �� 1, yields the bound ||Hs − x||∗
s �� 2.064190
�� ϑµ||��D||s, as desired. (4) This follows directly from the Dikin ellipsoid bound. (5) Notice that ||Hs||∗
s = µ〈s, F∗ (š)(F∗ (s))−1F∗ (š)s〉 1/2
�� µ 〈s, F∗ (s)s〉
1/2
(1 − ||��D||s/2)2 �� 1.015810 �� ϑµ. (6) We apply the triangle inequality to the last two parts. (7) Note that 〈s, Hs〉 = µF (š)[s, s] �� µ(1 − ||��D||
s
/2)2F (s)[s, s] �� (16129/16384)ϑµ. D Theorem B.3. Assume ||��D||
s
�� 1/64. Then, the zeroth-order low rank update has a small norm: ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ x ⊗ x 〈s, x〉 − Hs ⊗ Hs 〈s, Hs〉 ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣s �� 6.357439µ||��D||
s
. Proof. We write the first low-rank update as (x ⊗ x) − (Hs ⊗ Hs) 〈s, x〉 + ( 1 〈s, x〉 − 1 〈s, Hs〉 ) Hs ⊗ Hs. Then, using the triangle inequality, Lemma 2.11, and the equation ||h⊗h||s = ||h||∗2
s we bound
its norm above by 1 〈s, x〉 ||x − Hs||
∗ s
||x + Hs||
∗ s
+ ∣ ∣ ∣ ∣ 1 〈s, x〉 − 1 〈s, Hs〉 ∣ ∣ ∣ ∣||Hs||
∗2 s
. The first term is bounded above by (83875677203/20000000000)µ||��D||s. To bound the second term, note that ∣ ∣ ∣ ∣ 1 〈s, x〉 − 1 〈s, Hs〉 ∣ ∣ ∣ ∣ = ∣ ∣ ∣ ∣ 〈s, Hs − x〉 ϑµ〈s, Hs〉 ∣ ∣ ∣ ∣ �� ||s||s||Hs − x||∗
s
ϑµ〈s, Hs〉 �� 2.064190 ||��D||s 〈s, Hs〉 . Now, we bound 〈s, Hs〉 below by 0.984436ϑµ to get a bound of (105686528/50403125) ||��D||s ϑµ . The bound ||Hs||∗
s �� 1.015811
�� ϑµ then gives an overall bound on the second term of (2129979833381099/98443603515625000)µ||��D||s Adding fractions gives (something slightly stronger than) the desired bound. D

Page 41
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 41
B.2. First-order low-rank update. Next, we bound the operator norm of the first-order low- rank update in a small neighbourhood of the central path defined by the condition ||��D||s �� 1
64
. Lemma B.4. Assume ||��D||
s
�� 1/64. Let H1 := H + x⊗x
〈s,x〉
− Hs⊗Hs
〈s,Hs〉
. Then, (1) ||H��D − ��P ||
∗ s
�� 0.524190µ||��D||2
s
; (2) 〈��D,H��D〉 �� 0.968994µ||��D||2
s
; (3) 〈��D,��P 〉 �� 0.960803µ||��D||2
s
; (4) ||H��D||
∗ s
�� 1.015811µ||��D||
s
; (5) ||��P ||
∗ s
�� 1.024002µ||��D||
s
; (6) ||H��D + ��P ||
∗ s
�� 2.039813µ||��D||
s
; (7) ||H1��D − H��D||
∗ s
�� 0.540897µ||��D||2
s
; (8) ||H1��D − ��P ||∗
s �� 1.065087µ||��D||2 s;
(9) ||H1��D||∗
s �� 1.024263µ||��D||s;
(10) ||H1��D + ��P ||∗
s �� 2.048265µ||��D||s;
(11) 〈��D,H1��D〉 �� 0.944161µ||��D||2
s.
Proof. (1) This was proven in Theorem B.1, part (4). (2) This follows from the Dikin ellipsoid bound; H ≽ 0.968994µF∗ (s). (3) Notice that 〈��D,��P 〉 = 〈��D,H��D〉 + 〈��D,��P − H��D〉. We bound the second term by Cauchy-Schwarz: 〈��D,��P − H��D〉 �� ||��D||s||H��D − ��P ||∗
s �� 0.524190µ||��D||3 s.
Using this with the bound from the previous part gives the advertised inequality. (4) We compute, using Lemma 7.2 ||H��D||∗
s = µ
〈 ��D,F∗ (š)(F∗ (s))
−1
F∗ (š)��D 〉1/2 �� 1.015811µ〈��D,F∗ (s)��D〉
1/2
= 1.015811µ||��D||s, as advertised. (5) We use the triangle inequality followed by parts (1) and (4): ||��P ||
∗ s
�� ||H��D||
∗ s
+ ||��P − H��D||
∗ s
�� 1.015811µ||��D||
s
+ 0.524190µ||��D||2
s
�� 1.024002µ||��D||
s
. (6) We use the triangle inequality, part (4) and the bound ||��D||s �� 1/64: ||H��D + ��P ||∗
s �� 2||H��D||∗ s + ||H��D − ��P ||∗ s
�� 2.031622µ||��D||s + 0.524190µ||��D||2
s
�� 2.031622µ||��D||s + (52419/6400000)µ||��D||s, which is the claimed bound.

Page 42
42 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
(7) Recall that 〈��D,x〉 = 0, so H��D − H1��D = 〈s, H��D〉 〈s, Hs〉 Hs. Now, we bound using 〈s, ��P 〉 = 0, triangle inequality and part (1): |〈s, H��D〉| = |〈s, ��P 〉 + 〈s, H��D − ��P 〉|�� 0 + ||s||s||H��D − ��P ||∗
s �� 0.524190
�� ϑµ||��D||2
s
and recall (Lemma B.2 part (7)) 〈s, Hs〉 �� 0.984436ϑµ and (Lemma B.2 part (5)) ||Hs||∗
s �� 1.015811
�� ϑµ. Thus, ||H1��D − H��D||∗
s �� 0.540897µ||��D||2 s.
(8) We use the triangle inequality followed by parts (1) and (7). (9) We use the triangle inequality followed by parts (4), (7) and the fact that ||��D||
s
�� 1/64. (10) We use the triangle inequality and parts (5) and (9). (11) We compute, using previous parts of this lemma, 〈��D,H1��D〉 = 〈��D,��P 〉 + 〈��D,H1��D − ��P 〉 �� 0.960803µ||��D||2
s
− ||��D||
s
||H1��D − ��P ||
∗ s
�� 0.960803µ||��D||2
s − 1.065087µ||��D||3 s
�� 0.944161µ||��D||2
s.
D Appendix C. Evaluating the primal integral scaling Given an oracle that can: • Evaluate F(x), F (x), and F (x) for any x �� int(K), and • Compute, in some explicit form, the univariate polynomial t ↦�� exp(−F(x + td)) for any x �� int(K) and d �� Rn, we describe how to compute the primal integral scaling ( µ �� 1
0
F (x − t��P )dt )−1

Page 43
CONVEX OPTIMIZATION VIA PRIMAL-DUAL METRICS 43
exactly. We do not claim that this method is practical or useful; in particular, it requires ϑ evaluations of F . However, we later describe a slightly more practical variant that admits concrete bounds on approximation error. The method is a straightforward application of the theory of Gaussian quadrature. A reader unfamiliar with Gaussian quadrature might consult Section 3.6 of the excellent book by Stoer and Bulirsch [43]. Notice that, with F = −lnp, we have F = p (p ) − pp p2 . The denominator is a polynomial of degree 2ϑ and the numerator is a matrix whose entries are polynomials of degree 2ϑ − 2. Theorem C.1. There exist ϑ points r1,...,rϑ in [0,1] and associated weights w1,...,wϑ such that, if q is a univariate polynomial of degree less than 2ϑ, then �� 1
0
q(t) p2(x − t��P ) dt =
ϑ
��
i=1
wiq(ri). Proof. Notice that 1/p2(x − t��P ) is positive and bounded for t in [0,1]. In particular, it is measurable, all moments exist and are finite, and any nonnegative polynomial q such that �� 1
0
q(t)/p2(x − t��P )dt = 0 is itself zero. Thus a Gaussian quadrature rule with weight function 1/p2(x − t��P ) exists. That is, there exist ϑ points r1,...,rϑ in [0,1] and associated weights w1,...,wϑ such that, for any C2ϑ function f, ∣ ∣ ∣ ∣ ∣ �� 1
0
f(t) p2(x − t��P ) dt −
ϑ
��
i=1
wif(ri) ∣ ∣ ∣ ∣ ∣ �� f(2ϑ)(��) (2ϑ)! C for some 0 �� �� �� 1 and some constant C �� 0 dependent on ϑ. Take f = q; the derivative of order 2ϑ vanishes and hence the difference must be zero. D Indeed, the entries of p (p ) − pp have degree 2ϑ − 2; the primal integral scaling is exactly ( µ ��
i=1
wi(p (ri)(p (ri)) − p(ri)p (ri)) )
−1
. It may be practical to use a well-known Gaussian quadrature rule instead of the one arising from 1/p2. The following theorem considers Gauss-Legendre quadrature of fixed order:

Page 44
44 TOR G. J. MYKLEBUST, LEVENT TUNÇEL
Theorem C.2. If 1 �� k is an integer and ||��P ||x �� 1, then ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ T−2
P
− µ
k
��
i=1
wL
i F (x − rL i ��P )
∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣x �� 1 (1 − ||��P ||x)2 max
t��[0,1]
2||��P ||2k
x−t��P
||F (x − t��P )||
k/2 x
where rL
i are the order-k Gauss-Legendre nodes and wL i are the associated weights.
Proof. Note that ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ T−2
P
− µ
k
��
i=1
wL
i F (x − rL i ��P )
∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣ ∣x = max
||h||��1
∣ ∣ ∣ ∣ ∣ T−2
P [h, h] − µ k
��
i=1
wL
i F (x − rL i ��P )[h, h]
∣ ∣ ∣ ∣ ∣ . Thus let h be the point at which this maximum is attained. By the error bound for Gaussian quadrature ([43], Theorem 3.6.24), (11) ∣ ∣ ∣ ∣ ∣ T−2
P [h, h] − µ k
��
i=1
wL
i F (x − rL i ��P )[h, h]
∣ ∣ ∣ ∣ ∣ �� max
t��[0,1]
∣ ∣ ∣ ∣ F(2k+2)(x − t��P )[��P ,...,��P , h, h]〈pk,pk〉 (2k)! ∣ ∣ ∣ ∣, where pk is the kth Legendre polynomial. A theorem of G��ler ([6], Theorem 4.2), together with the results from Appendix 1 of Nesterov and Nemirovskii��s book [29], shows that (12) ∣ ∣F(2k+2)(x − t��P )[��P ,...,��P , h, h]∣∣ �� (2k + 1)!||��P ||2k
x−t��P
||h||2
x−t��P
. Using self-concordance, we bound ||h||
x−t��P
�� 1 1 − t||��P ||x ||h||
x
= 1 1 − t||��P ||x . The squared L2 norm of the kth Legendre polynomial is 2/(2k + 1). Substituting these and (12) into (11), we get the advertised bound. D

Set Home | Add to Favorites

All Rights Reserved Powered by Free Document Search and Download

Copyright © 2011
This site does not host pdf,doc,ppt,xls,rtf,txt files all document are the property of their respective owners. complaint#nuokui.com
TOP