Shayan Hundrieser
Marcel Klatt
Axel Munk
Optimal transport (OT) based distances compare probability measures while incorporating the geometry of the underlying ground space. Therefore, OT has recently been recognized as a highly informative and effective tool for statistical data analysis and inferential purposes. For more details on OT and its applications we refer to (Villani, 2003,2008), (Santambrogio, 2015), (Peyré and Cuturi, 2019), (Panaretos and Zemel, 2020).
In this work, we discuss statistical properties of OT on the circle . For this purpose, we parametrize
by
, equipped with its geodesic distance
, and consider the circular OT (COT) distance between probability measures
and
on
defined by
For applications, the true population measure is typically not available and has to be estimated, e.g. by the empirical measure
where
. This yields the random quantity
whose asymptotic
fluctuation around
is characterized by a central limit theorem. This paves the way for a variety of statistical inference tasks based on empirical OT for circular data. In particular, we formulate an asymptotically consistent test for the assessment of goodness of fit of circular data, the circular optimal transport test (COTT). According to COTT a sample
on
cannot be drawn from
if the statistic
is too large. Statistically speaking we investigate the null hypothesis
For testing uniformity
Unif
it turns out that the COTT performs particularly well for unimodal alternatives and is almost as powerful as Rayleigh's test known to be the most powerful invariant test in case of von Mises alternatives (Figure 2). For alternatives with many modes the COTT is found to be less powerful which is explained by the shape of the corresponding transport plan.
The performance of the COTT testing for uniformity is a typical artifact of the essence of OT. It is certainly more costly to transport mass uniformly around the circle that is concentrated almost exclusively around a single mode compared to mass that is sufficiently spread (Video 3). As a result, the associated COT distance is likely to be larger leading to a higher rejection probability.
Code Examples
### Install package "circularOT"
install.package("circularOT")
### Load package
library(circularOT)
library(circular) # Package for random number generation of
# von Mises distribution
### Test for uniformity using COTT
set.seed(0)
cot.test_Uniformity(runif(15, 0.2, 0.8), typeOfData="UnitInt")
#
# One-Sample COTT for Uniformity
# Test statistic: 0.3717834
# P-Value: 0.046
### Test if two samples stem from identical distribution
set.seed(5)
cot.test_Bivariate_Bootstrap(
rvonmises(10, circular(0), 3, control.circular=list(units="radians")),
rvonmises(10, circular(pi), 3, control.circular=list(units="radians")),
typeOfData = "Radian")
#
# Bivariate (Bootstrap) COTT for Goodness of Fit
# Test statistic: 0.4943681
# P-Value: 0.0046