Abstract and 1. Introduction

  1. Related Works

  2. Convex Relaxation Techniques for Hyperbolic SVMs

    3.1 Preliminaries

    3.2 Original Formulation of the HSVM

    3.3 Semidefinite Formulation

    3.4 Moment-Sum-of-Squares Relaxation

  3. Experiments

    4.1 Synthetic Dataset

    4.2 Real Dataset

  4. Discussions, Acknowledgements, and References

A. Proofs

B. Solution Extraction in Relaxed Formulation

C. On Moment Sum-of-Squares Relaxation Hierarchy

D. Platt Scaling [31]

E. Detailed Experimental Results

F. Robust Hyperbolic Support Vector Machine

4.2 Real Dataset

Real datasets consist of embedding of various sizes and number of classes in H2, visualized in Figure 4. We first report performances of three models using one-vs-rest training scheme, described in Appendix D, in Tables 5 to 7 for 𝐢 ∈ {0.1, 1.0, 10} respectively, and report aggregated performances, by selecting the one with the highest average test weighted F1 score, in Table 2. In general, we observe that Moment achieves the best test accuracy and weighted F1 score, particularly in biological datasets with clear hyperbolic structures, and have smaller optimality gaps compared to SDP relaxation, for nearly all selected data. However, it is important to note that the optimality gaps of these two methods remain distant from zero, suggesting that these relaxations are not tight enough for these datasets. Nevertheless, both relaxed models significantly outperform projected gradient descent (PGD) by a wide margin. Furthermore, our observations reveal that in the onevs-rest training scheme, PGD shows considerable sensitivity to the choice of the regularization parameter 𝐢 from Tables 5 to 7, whereas SDP and Moment are less affected, demonstrating better stability and consistency across different 𝐢’s.

One critical drawback of semidefinite and sparse moment-sum-of-squares relaxation is that they do not scale efficiently with an increase in data samples, resulting in excessive consumption of time and memory, for example, CIFAR10 and Fashion-MNIST using a one-vs-rest training scheme. The workaround is one-vs-one training scheme, where we train for π’ͺ(𝐾 2 ) number of classifiers among data from each pair of classes and make final prediction decision using majority voting. We summarize the performance in Table 3 by aggregating results for different 𝐢 in Tables 8 to 10 as in the one-vs-rest case. We observe that in one-vs-one training, the improvement in general from the relaxation is not as significant as it in the one-vs-rest scheme, and SDP relaxation now gives the best performance in average test accuracy and test F1, albeit with large optimality gaps. Note that in the one-vs-one scheme, PGD is more consistent across different 𝐢’s, potentially because each subproblem-binary classifying one class against another-contains less data compared to onevs-rest, making it easier to identify solutions.

A more detailed analysis on the effect of regularization 𝐢 and runtime comparisons are provided in Appendix E.3 and Table 11.

Authors:

(1) Sheng Yang, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA ([email protected]);

(2) Peihan Liu, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA ([email protected]);

(3) Cengiz Pehlevan, John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, Center for Brain Science, Harvard University, Cambridge, MA, and Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Cambridge, MA ([email protected]).


This paper is available on arxiv under CC by-SA 4.0 Deed (Attribution-Sharealike 4.0 International) license.