Abstract and 1. Introduction

  1. Some recent trends in theoretical ML

    2.1 Deep Learning via continuous-time controlled dynamical system

    2.2 Probabilistic modeling and inference in DL

    2.3 Deep Learning in non-Euclidean spaces

    2.4 Physics Informed ML

  2. Kuramoto model

    3.1 Kuramoto models from the geometric point of view

    3.2 Hyperbolic geometry of Kuramoto ensembles

    3.3 Kuramoto models with several globally coupled sub-ensembles

  3. Kuramoto models on higher-dimensional manifolds

    4.1 Non-Abelian Kuramoto models on Lie groups

    4.2 Kuramoto models on spheres

    4.3 Kuramoto models on spheres with several globally coupled sub-ensembles

    4.4 Kuramoto models as gradient flows

    4.5 Consensus algorithms on other manifolds

  4. Directional statistics and swarms on manifolds for probabilistic modeling and inference on Riemannian manifolds

    5.1 Statistical models over circles and tori

    5.2 Statistical models over spheres

    5.3 Statistical models over hyperbolic spaces

    5.4 Statistical models over orthogonal groups, Grassmannians, homogeneous spaces

  5. Swarms on manifolds for DL

    6.1 Training swarms on manifolds for supervised ML

    6.2 Swarms on manifolds and directional statistics in RL

    6.3 Swarms on manifolds and directional statistics for unsupervised ML

    6.4 Statistical models for the latent space

    6.5 Kuramoto models for learning (coupled) actions of Lie groups

    6.6 Grassmannian shallow and deep learning

    6.7 Ensembles of coupled oscillators in ML: Beyond Kuramoto models

  6. Examples

    7.1 Wahba’s problem

    7.2 Linked robot’s arm (planar rotations)

    7.3 Linked robot’s arm (spatial rotations)

    7.4 Embedding multilayer complex networks (Learning coupled actions of Lorentz groups)

  7. Conclusion and References

5 Directional statistics and swarms on manifolds for probabilistic modeling and inference on Riemannian manifolds

Many ML algorithms involve stochastic policies over finite sets of outcomes. In such cases the statistical model is provided by the family of categorical distributions (i.e. family of probability distributions over a finite set of outcomes). This family is isomorphic to the unit simplex

In the absence of some case-specific information, the natural choice for the prior is the distribution where all probabilities are equal: p1 = · · · = pD = 1/D. Clearly, this is the maximum entropy distribution in this family.

In those cases when representative power of the Gaussian family is insufficient, the mixtures of Gaussians provide the most popular option.

Such a domination of the Gaussian family has a strong justification. Indeed, this family posses a combination of advantages which render it a universal model in probabilistic ML. The only potential alternative is the family of multivariate Cauchy distributions. However, in most cases this family is immediately ruled out due to the fact that Cauchy distributions do not have finite moments. We list four favorable properties of the family N (a, Σ):

P1) Given the fixed mean vector and the covariance matrix, Gaussian distributions are the maximum entropy distributions. Hence, they are natural choices for priors from the Bayesian point of view [91].

P2) The Gaussian family belongs to the class of exponential families of probability distributions, which implies all the desirable properties concerning reparametrization, sufficient statistics and duality in convex optimization (LegendreFenchel transform) [21].

P3) The Fisher information metric for the Gaussian family has a simple closed-form expression. This property enables the design of efficient and reasonably simple algorithms that follow the natural gradient update [22, 23, 24]. In particular, this property provides a theoretical justification for Natural Evolution Strategies in the black-box optimization [89].

With recent advances in Geometric Deep Learning, there is a rising number of tasks which require probabilistic modeling and inference over non-Euclidean spaces. When dealing with problems of this kind, ignoring intrinsic geometry of the data leads to incorrect or inaccurate algorithms. In such setups the Gaussian family becomes an inappropriate choice. Instead, one needs suitable and tractable statistical models over Riemannian manifolds. This fact has been widely recognized only recently and investigations in this direction are at a very early stage. Future advances will rely on results of directional statistics.

The family satisfying all properties (P1)-(P4) does not exist on curved spaces. Having that in mind, the key issue consists in determining which of these properties are the most essential for ML.

Property (P1) is not very significant in practice. On compact manifolds the uniform distribution posses the MaxEnt property and provides an obvious choice for the prior. Therefore, it is necessary to work with those statistical models which contain the uniform distribution.

On the other hand, property (P4) plays an essential role in many ML algorithms which employ the Gaussian family. Therefore, it is highly desirable to deal with families of probability distributions on spheres, tori and homogeneous spaces that exhibit the group-invariance property analogous to (P4).

This Section contains an overview of families of probability distributions over Riemannian manifolds that have been (or should be) used in ML. The overview is accompanied with a brief discussion on advantages and drawbacks of the corresponding statistical models. The main point of the present section is that some of the most suitable statistical models on curved spaces have been neglected in ML so far.

We will point out those distributions which are associated with Kuramoto and swarming dynamics in the continuum limit, i.e. by passing to the continuum of oscillators or particles. Then dynamics are governed by evolution equations (first order PDE’s on manifolds). In such cases, symmetries impose low-dimensional evolution of densities on invariant submanifolds.

In the presence of noise, continuum limit yields the second-order PDE involving the diffusion term (the Fokker-Planck equation). We will be interested in distributions which can be generated by the swarming dynamics, either as stationary distributions in noisy models in the long time, or as invariant families in deterministic (noiseless) dynamics. Notice that the later satisfy the group-invariance property (P4).

In such a way we identify models that are potentially relevant for the probabilistic-geometric ML. At the same time, we systematize some mutually related recent advances in directional statistics and physics of complex systems. We refer to [92] for a more comprehensive overview of probability distributions over circular, toroidal and cylindrical geometries. Moreover, [92] also contains the discussion on inference methods on Riemannian manifolds with the extensive list of references.

Author:

(1) Vladimir Jacimovic, Faculty of Natural Sciences and Mathematics, University of Montenegro Cetinjski put bb., 81000 Podgorica Montenegro ([email protected]).


This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.