Abstract and 1. Introduction

  1. Related Works

  2. Convex Relaxation Techniques for Hyperbolic SVMs

    3.1 Preliminaries

    3.2 Original Formulation of the HSVM

    3.3 Semidefinite Formulation

    3.4 Moment-Sum-of-Squares Relaxation

  3. Experiments

    4.1 Synthetic Dataset

    4.2 Real Dataset

  4. Discussions, Acknowledgements, and References

A. Proofs

B. Solution Extraction in Relaxed Formulation

C. On Moment Sum-of-Squares Relaxation Hierarchy

D. Platt Scaling [31]

E. Detailed Experimental Results

F. Robust Hyperbolic Support Vector Machine

5 Discussions

In this paper, we provide a stronger performance on hyperbolic support vector machine using semidefinite and sparse moment-sum-of-squares relaxations on the hyperbolic support vector machine problem compared to projected gradient descent. We observe that they achieve better classification accuracy and F1 score than the existing PGD approach on both simulated and real

dataset. Additionally, we discover small optimality gaps for moment-sum-of-squares relaxation, which approximately certifies global optimality of the moment solutions.

Perhaps the most critical drawback of SDP and sparse moment-sum-of-squares relaxations is their limited scalability. The runtime and memory consumption grows quickly with data size and we need to divide into sub-tasks, such as using one-vs-one training scheme, to alleviate the issue. For relatively large datasets, we may need to develop more heuristic approaches for solving our relaxed optimization problems to achieve runtimes comparable with projected gradient descent. Combining the GD dynamic with interior point iterates in a problem-dependent manner could be useful [41].

It remains to show if we have performance gain in either runtime or optimality by going through the dual of the problem or by designing feature kernels that map hyperbolic features to another set of hyperbolic features [17]. Nonetheless, we believe that our work introduces a valuable perspective—applying SDP and Moment relaxations—to the geometric machine learning community.

Acknowledgements

We thank Jacob A. Zavatone-Veth for helpful comments. CP was supported by NSF Award DMS-2134157 and NSF CAREER Award IIS-2239780. CP is further supported by a Sloan Research Fellowship. This work has been made possible in part by a gift from the Chan Zuckerberg Initiative Foundation to establish the Kempner Institute for the Study of Natural and Artificial Intelligence.

References

[1] Valentin Khrulkov, Leyla Mirvakhabova, Evgeniya Ustinova, Ivan Oseledets, and Victor Lempitsky. Hyperbolic image embeddings. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6418–6428, 2020.

[2] Maximillian Nickel and Douwe Kiela. Poincaré embeddings for learning hierarchical representations. Advances in neural information processing systems, 30, 2017.

[3] Anna Klimovskaia, David Lopez-Paz, Léon Bottou, and Maximilian Nickel. Poincaré maps for analyzing complex hierarchies in single-cell data. Nature communications, 11(1):2966, 2020.

[4] Hyunghoon Cho, Benjamin DeMeo, Jian Peng, and Bonnie Berger. Large-margin classification in hyperbolic space. In The 22nd international conference on artificial intelligence and statistics, pages 1832–1840. PMLR, 2019.

[5] Eli Chien, Chao Pan, Puoya Tabaghi, and Olgica Milenkovic. Highly scalable and provably accurate classification in poincaré balls. In 2021 IEEE International Conference on Data Mining (ICDM), pages 61–70. IEEE, 2021.

[6] Gal Mishne, Zhengchao Wan, Yusu Wang, and Sheng Yang. The numerical stability of hyperbolic representation learning. In International Conference on Machine Learning, pages 24925–24949. PMLR, 2023.

[7] Melanie Weber, Manzil Zaheer, Ankit Singh Rawat, Aditya K Menon, and Sanjiv Kumar. Robust large-margin learning in hyperbolic space. Advances in Neural Information Processing Systems, 33:17863–17873, 2020.

[8] Naum Z Shor. Quadratic optimization problems. Soviet Journal of Computer and Systems Sciences, 25:1–11, 1987.

[9] Jiawang Nie. Moment and Polynomial Optimization. SIAM, 2023. [10] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20:273–297, 1995.

[11] John Platt. Sequential minimal optimization: A fast algorithm for training support vector machines. 1998.

[12] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. the Journal of machine Learning research, 9:1871–1874, 2008.

[13] Chih-Chung Chang and Chih-Jen Lin. Libsvm: a library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):1–27, 2011.

[14] Octavian Ganea, Gary Bécigneul, and Thomas Hofmann. Hyperbolic neural networks. Advances in neural information processing systems, 31, 2018.

[15] Ines Chami, Adva Wolf, Da-Cheng Juan, Frederic Sala, Sujith Ravi, and Christopher Ré. Low-dimensional hyperbolic knowledge graph embeddings. arXiv preprint arXiv:2005.00545, 2020.

[16] Ines Chami, Zhitao Ying, Christopher Ré, and Jure Leskovec. Hyperbolic graph convolutional neural networks. Advances in neural information processing systems, 32, 2019.

[17] Keegan Lensink, Bas Peters, and Eldad Haber. Fully hyperbolic convolutional neural networks. Research in the Mathematical Sciences, 9(4):60, 2022.

[18] Andrii Skliar and Maurice Weiler. Hyperbolic convolutional neural networks. arXiv preprint arXiv:2308.15639, 2023.

[19] Ryohei Shimizu, Yusuke Mukuta, and Tatsuya Harada. Hyperbolic neural networks++. arXiv preprint arXiv:2006.08210, 2020.

[20] Wei Peng, Tuomas Varanka, Abdelrahman Mostafa, Henglin Shi, and Guoying Zhao. Hyperbolic deep neural networks: A survey. IEEE Transactions on pattern analysis and machine intelligence, 44(12):10023–10044, 2021.

[21] Grigoriy Blekherman, Pablo A Parrilo, and Rekha R Thomas. Semidefinite optimization and convex algebraic geometry. SIAM, 2012. [22] Michel X Goemans and David P Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6):1115–1145, 1995.

[23] Anders Eriksson, Carl Olsson, Fredrik Kahl, and Tat-Jun Chin. Rotation averaging and strong duality. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 127–135, 2018.

[24] David M Rosen, Luca Carlone, Afonso S Bandeira, and John J Leonard. A certifiably correct algorithm for synchronization over the special euclidean group. In Algorithmic Foundations of Robotics XII: Proceedings of the Twelfth Workshop on the Algorithmic Foundations of Robotics, pages 64–79. Springer, 2020.

[25] Lanhui Wang and Amit Singer. Exact and stable recovery of rotations for robust synchronization. Information and Inference: A Journal of the IMA, 2(2):145–193, 2013.

[26] Afonso S Bandeira, Nicolas Boumal, and Amit Singer. Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Mathematical Programming, 163:145–167, 2017.

[27] Lucas Brynte, Viktor Larsson, José Pedro Iglesias, Carl Olsson, and Fredrik Kahl. On the tightness of semidefinite relaxations for rotation estimation. Journal of Mathematical Imaging and Vision, pages 1–11, 2022.

[28] Richard Zhang. On the tightness of semidefinite relaxations for certifying robustness to adversarial examples. Advances in Neural Information Processing Systems, 33:3808–3820, 2020.

[29] Jean B Lasserre. Global optimization with polynomials and the problem of moments. SIAM Journal on optimization, 11(3):796–817, 2001.

[30] Didier Henrion and Jean-Bernard Lasserre. Detecting global optimality and extracting solutions in gloptipoly. In Positive polynomials in control, pages 293–310. Springer, 2005.

[31] John Platt et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3):61–74, 1999.

[32] Raúl E Curto and Lawrence A Fialkow. Truncated k-moment problems in several variables. Journal of Operator Theory, pages 189–226, 2005.

[33] Jean B Lasserre. The moment-sos hierarchy. In Proceedings of the International Congress of Mathematicians: Rio de Janeiro 2018, pages 3773–3794. World Scientific, 2018.

[34] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.

[35] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017.

[36] Franziska Paul, Ya’ara Arkin, Amir Giladi, Diego Adhemar Jaitin, Ephraim Kenigsberg, Hadas Keren-Shaul, Deborah Winter, David Lara-Astiaso, Meital Gury, Assaf Weiner, et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell, 163(7):1663–1677, 2015.

[37] Andre Olsson, Meenakshi Venkatasubramanian, Viren K Chaudhri, Bruce J Aronow, Nathan Salomonis, Harinder Singh, and H Leighton Grimes. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature, 537(7622):698–702, 2016.

[38] Jan Krumsiek, Carsten Marr, Timm Schroeder, and Fabian J Theis. Hierarchical differentiation of myeloid progenitors is encoded in the transcription factor network. PloS one, 6(8):e22649, 2011.

[39] Victoria Moignard, Steven Woodhouse, Laleh Haghverdi, Andrew J Lilly, Yosuke Tanaka, Adam C Wilkinson, Florian Buettner, Iain C Macaulay, Wajid Jawaid, Evangelia Diamanti, et al. Decoding the regulatory network of early blood development from single-cell gene expression measurements. Nature biotechnology, 33(3):269–276, 2015.

[40] Mosek ApS. Mosek optimizer api for python. Version, 9(17):6–4, 2022.

[41] Heng Yang, Ling Liang, Luca Carlone, and Kim-Chuan Toh. An inexact projected gradient method with rounding and lifting by nonlinear programming for solving rank-one semidefinite relaxation of polynomial optimization. Mathematical Programming, 201(1):409–472, 2023.

[42] Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski. Robust optimization, volume 28. Princeton university press, 2009.

[43] Dimitris Bertsimas and Dick den Hertog. Robust and adaptive optimization. (No Title), 2022.

Authors:

(1) Sheng Yang, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA ([email protected]);

(2) Peihan Liu, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA ([email protected]);

(3) Cengiz Pehlevan, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, Center for Brain Science, Harvard University, Cambridge, MA, and Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA ([email protected]).


This paper is available on arxiv under CC by-SA 4.0 Deed (Attribution-Sharealike 4.0 International) license.