As scientific user facilities upgrade and expand, their capacity for generating unwieldy amounts of scientific data has started to exceed scientists’ abilities to stream, archive, and analyze that data. This has created an urgent need to develop new mathematical and computer-science techniques to shrink these data sets by removing trivial or repetitive data while preserving the important scientific information that can lead to discovery.
While the need for data reduction techniques is clear, the scientists using those techniques must trust that they are not losing important scientific information, and this presents a key challenge. Research supported by this program must address not only the efficiency and effectiveness of a data reduction technique, but its trustworthiness as well.
“Scientific user facilities across the nation, including the DOE Office of Science, are producing data that could lead to exciting and important scientific discoveries, but the size of that data is creating new challenges,” said Barb Helland, Associate Director for Advanced Scientific Computing Research, DOE Office of Science. “Those discoveries can only be uncovered if the data is made manageable, and the techniques employed to do that are trusted by the scientists.”
Projects selected in today’s announcement cover a wide range of topics that promise important innovations in data-reduction techniques, including techniques using advanced machine learning, large-scale statistical calculations, and novel hardware accelerators. A sample of the projects includes:
- Methods to compress streaming data: Researchers at Oak Ridge National Laboratory will develop techniques to compress data coming directly from a scientific instrument or a computer model by taking advantage of its specific structure and integrating advanced machine-learning techniques, while allowing scientists to control certain features of the data.
- Methods to intelligently select and tune compression techniques: Researchers at Texas State University will develop techniques to search the vast space of potential data compression techniques and select the best method based on the user’s requirements for fidelity, speed, and memory usage.
- Compression methods for related groups of data sets: Researchers at the University of California, San Diego will develop scalable techniques for compressing multiple related streams of data, such as those from multiple sensors observing the same physical system, by taking advantage of the relationships between the data sets.
- Methods for programming custom hardware accelerators for streaming compression: Researchers at Fermi National Accelerator Laboratory will develop techniques for encoding advanced compression and filtering, including those based on machine learning methods, as custom hardware accelerators for use in a wide array of experimental settings, from particle physics experiments to electron microscopes.
The projects are managed by the Advanced Scientific Computing Research (ASCR) program within the DOE Office of Science.
The full list of projects and more information can be found here.