Table of Links
-
Convex Relaxation Techniques for Hyperbolic SVMs
B. Solution Extraction in Relaxed Formulation
C. On Moment Sum-of-Squares Relaxation Hierarchy
E. Detailed Experimental Results
F. Robust Hyperbolic Support Vector Machine
Abstract
Hyperbolic spaces have increasingly been recognized for their outstanding performance in handling data with inherent hierarchical structures compared to their Euclidean counterparts. However, learning in hyperbolic spaces poses significant challenges. In particular, extending support vector machines to hyperbolic spaces is in general a constrained non-convex optimization problem. Previous and popular attempts to solve hyperbolic SVMs, primarily using projected gradient descent, are generally sensitive to hyperparameters and initializations, often leading to suboptimal solutions. In this work, by first rewriting the problem into a polynomial optimization, we apply semidefinite relaxation and sparse moment-sum-of-squares relaxation to effectively approximate the optima. From extensive empirical experiments, these methods are shown to perform better than the projected gradient descent approach.
1 Introduction
On the other hand, learning and optimization on hyperbolic spaces are typically more involved than that on Euclidean spaces. Problems that are convex in Euclidean spaces become constrained non-convex problems in hyperbolic spaces. The hyperbolic Support Vector Machine (HSVM), as explored in recent studies [4, 5], exemplifies such challenges by presenting as a non-convex constrained programming problem that has been solved predominantly based on projected gradient descent. Attempts have been made to alleviate its non-convex nature through reparametrization [6] or developing a hyperbolic perceptron algorithm that converges to a separator with finetuning using adversarial samples to approximate the large-margin solution [7]. To our best knowledge, these attempts are grounded in the gradient descent dynamics, which is highly sensitive to initialization and hyperparameters and cannot certify optimality.
This paper is
Authors:
(1) Sheng Yang, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA ([email protected]);
(2) Peihan Liu, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA ([email protected]);
(3) Cengiz Pehlevan, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, Center for Brain Science, Harvard University, Cambridge, MA, and Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA ([email protected]).