A manifold fitting approach for high-dimensional data reduction beyond Euclidean space

National University of Singapore (NUS) statisticians have introduced a new technique that accurately describes high-dimensional data using lower-dimensional smooth structures. This innovation marks a significant step forward in addressing the challenges of complex nonlinear dimension reduction.

Traditional data analysis methods often rely on Euclidean (linear) dependencies among features. While this approach simplifies data representation, it struggles to capture the underlying complex patterns in high-dimensional data, typically located close to low-dimensional manifolds. To bridge this gap, manifold-learning techniques have emerged as a promising solution. However, existing methods, such as manifold embedding and denoising, have been limited by a lack of detailed geometric understanding and robust theoretical underpinnings.

The team, led by Associate Professor Zhigang YAO from the NUS Department of Statistics and Data Science with his PhD student Jiaji SU pioneered a novel method for effectively estimating low-dimensional manifolds hidden within high-dimensional data (see Figure 1 for illustration). This approach not only achieves cutting-edge estimation accuracy and convergence rates but also enhances computational efficiency through the utilisation of deep Generative Adversarial Networks (GANs). This work is in collaboration with Professor Shing-Tung YAU from the Yau Mathematical Sciences Centre (YMSC) at Tsinghua University. Part of the work comes from Assoc Prof Yao’s collaboration with Prof Yau during his sabbatical visit to the Centre of Mathematical Sciences and Applications (CMSA) at Harvard University.

Their findings have been published as a methodology paper in Proceedings of the National Academy of Sciences of the United States of America on 24 January 2024.

Assoc Prof Yao delivered a 45-minute invited lecture on this research at the recent International Congress of Chinese Mathematicians (ICCM) held in Shanghai in 2024.

Highlighting the significance of the work, Assoc Prof Yao said, “By accurately fitting manifolds, we can reduce data dimensionality while preserving crucial information, including the underlying geometric structure. This represents a major leap in data analysis, enhancing both accuracy and efficiency. By providing a solution that overcomes the limitations of previous methods, our research paves the way for enhanced data analysis and offers valuable insights for diverse applications in the scientific community.”

Looking ahead, Assoc Prof Yao’s research team is developing a new framework to process even more complex data, such as single-cell RNA sequence data, while continuing to collaborate with the YMSC team. This ongoing work promises to revolutionise the approach for the reduction and processing of complex datasets, potentially offering new insights into a range of scientific fields.

withyou android app