Table of Links
-
Some recent trends in theoretical ML
2.1 Deep Learning via continuous-time controlled dynamical system
2.2 Probabilistic modeling and inference in DL
-
3.1 Kuramoto models from the geometric point of view
3.2 Hyperbolic geometry of Kuramoto ensembles
3.3 Kuramoto models with several globally coupled sub-ensembles
-
Kuramoto models on higher-dimensional manifolds
4.1 Non-Abelian Kuramoto models on Lie groups
4.2 Kuramoto models on spheres
4.3 Kuramoto models on spheres with several globally coupled sub-ensembles
-
5.1 Statistical models over circles and tori
5.2 Statistical models over spheres
5.3 Statistical models over hyperbolic spaces
5.4 Statistical models over orthogonal groups, Grassmannians, homogeneous spaces
-
6.1 Training swarms on manifolds for supervised ML
6.2 Swarms on manifolds and directional statistics in RL
6.3 Swarms on manifolds and directional statistics for unsupervised ML
6.4 Statistical models for the latent space
6.5 Kuramoto models for learning (coupled) actions of Lie groups
6.6 Grassmannian shallow and deep learning
6.7 Ensembles of coupled oscillators in ML: Beyond Kuramoto models
-
Examples
7.2 Linked robot’s arm (planar rotations)
7.3 Linked robot’s arm (spatial rotations)
7.4 Embedding multilayer complex networks (Learning coupled actions of Lorentz groups)
5.1 Statistical models over circles and tori
5.1.1 von Mises distribution
Multivariate von Mises distributions provide a statistical model for probabilistic modeling and inference on tori. This statistical model has been applied in bioinformatics [93].
On the other hand, the von Mises family does not satisfy any group-theoretic property analogous to (P4). This drawback renders it inconvenient choice in some architectures and setups.
The von Mises family is associated with Kuramoto models of the following form
where ξj is the Gaussian noise defined by (2).
By taking limit of the Langevin dynamics (26) for the continuum of oscillators, we obtain the Fokker-Planck PDE for evolution of densities on the circle. This PDE has a stationary density (25).
In conclusion, Kuramoto models with noise can encode von Mises distributions vM(κ, µ) and their mixtures. Algorithm starts from the uniform distribution (corresponding to κ = 0 and arbitrary µ) and performs an update of κ and µ by learning K and β in (26). When dealing with torical data, multivariate von Mises distributions can be approximated by adding the noise to models with several sub-ensembles (8).
5.1.2 Wrapped Cauchy distribution
One of the most important families of probability measures on the unit circle is obtained by "wrapping" the Cauchy (Lorentzian) distributions on the real line to the circle. This yields so-called wrapped (or circular) Cauchy distributions with the density functions [94]
Proposition 4. The family wC(α) is invariant with respect to actions of the Möbius group G. All wrapped Cauchy distributions are obtained as Möbius transformations of the uniform distribution on the circle.
Recalling Proposition 1, we conclude that densities (27) can be approximated by the simplest Kuramoto models. Consider the system (6) in the continuum limit and assume that initial distribution of oscillators is uniform on S 1 . Proposition 4 asserts that the density of oscillators at each moment t is of the form (27).
Obviously, the representative power of wC(α) is very limited. This statistical manifold is the two-dimensional orbit of the Möbius group[1] and consists of unimodal and symmetric densities on the circle.
Most important, we have a clear understanding of the representative power of architectures associated with this model. One can overcome limitations by using mixtures of m wrapped Cauchy distributions. Such mixtures can be generated by m uncoupled Kuramoto models.
Family of multivariate wrapped Cauchy distributions provides a suitable model for learning toroidal data, we refer to [96, 97] for some theoretical results. Exact formula for the Fisher information is not available on higher-dimensional tori, making it difficult to design accurate algorithms. However, multivariate wrapped Cauchy distributions can be encoded in Kuramoto models with several sub-ensembles (8).
5.1.3 Kato-Jones distribution
Proposition 5. [98]
The family KJ(µ, ν, r, κ) is invariant with respect to actions of the Möbius group G. All Kato-Jones distributions with fixed κ are obtained as Möbius transformations of the von Mises vM(κ, 0) distributions on the circle.
Hence, K-J families for fixed κ > 0 are three-dimensional orbits of the Möbius group. Applying Proposition 1 we conclude that densities (28) appear as a result of the two-stage dynamics. The von Mises distribution vM(κ, 0) arises as the stationary distribution in the model (26) for 0 ≤ t ≤ T with T sufficiently large. At the second stage dynamics (6) for t > T transforms von Mises into K-J distributions, we refer [99] for more detailed explanation.
In conclusion, the statistical manifold KJ(µ, ν, r, κ) is four-dimensional and contains both submanifolds vM(κ, µ) (for r = 0) and wC(α) (for κ = 0). Representative power of KJ(µ, ν, r, κ) is higher, as it contains bimodal and asymmetric densities.
The Fisher information for this family is explicitly calculated[98]. Therefore, the K-J family provides a statistical model with improved representative power satisfying properties (P2) and (P4). These distributions have not been used in ML so far. We point out the recent preprint [100] which works with mixtures of two K-J distributions for the inference problem on daily periodic traffic data.
5.1.4 Hyperbolic von Mises distribution
For the last family in this overview, consider the Kuramoto model with the multiplicative noise
For C = 0 (29) reduces to the model (26) with the additive noise. In general (for C 6= 0), the stationary distributions for (29) are hyperbolic von Mises [101]. Their densities are of the form
The parameters of (30) are η > 0, α ∈ R and ψ ∈ [0, 2π). The family (30) contains a uniform distribution for η = 0.
In conclusion, the Kuramoto model with multiplicative noise (29) can be used to learn over the three-dimensional manifold of hyperbolic von Mises distributions.
Author:
(1) Vladimir Jacimovic, Faculty of Natural Sciences and Mathematics, University of Montenegro Cetinjski put bb., 81000 Podgorica Montenegro ([email protected]).
This paper is
[1] In this context we recall negative results on the representative power of NN architectures based on group orbits reported in the famous classical book [95].