Abstract
The scarcity of labeled data is a long-standing challenge for cross-domain machine learning tasks. This paper leverages the existing dataset (i.e., source) to augment new samples that are close to the dataset of interest (i.e., target). To relieve the need to learn a metric on the feature-label space, we lift both datasets to the space of probability distributions on the feature-Gaussian manifold, and then develop a gradient flow that minimizes the maximum mean discrepancy loss. To perform the gradient flow of distributions on the curved feature-Gaussian space, we unravel the Riemannian structure of the space and compute explicitly the Riemannian gradient of the loss function induced by the optimal transport metric. For practical purposes, we also propose a discretized flow, and provide conditional results guaranteeing the global convergence of the flow to the optimum. We illustrate the results of our proposed gradient flow method in several real-world datasets.
Authors
Truyen Nguyen, Xinru Hua, Tam Le, Jose Blanchet, Viet Anh Nguyen