Waking up to an interactive coffee cup of data

sarah Jonas

3 years ago

When coffee is sold as single origin or as the more expensive Arabica beans— do you really know whether you are getting what you’re paying for? Different coffee-producing regions need to enforce the standards and reputation of their coffee; thus, there is a growing industry looking at different technologies to more accurately classify and test coffee beans from different origins. Researchers in Columbia from Universidad del Valle and Universidad del Atlantico and the company Almacafe have taken steps toward making it easier to validate the variety under which the coffee is being sold. For this, they analyzed hundreds of coffee samples from multiple countries using highly sensitive Nuclear Magnetic Resonance (NMR), and made these data broadly available in an inexpensive and interactive manner; thus, allowing researchers to look at their coffee to see ‘what’s in that cup’ (or should be.) This study was published in the open science journal GigaByte¹.

NMR is an extremely sensitive technique that provides very detailed information, down to the level of molecular structure, about the contents of any sample analyzed. NMR has long been the gold standard in medical and pharmacology studies for content identification, but it is less often used in the food industry as it has been far too expensive for more general use. To open up the use of this technique in the coffee sector, the researchers here gathered 715 coffee samples from 27 different countries and used NMR to obtain detailed information on the content of those samples. They then made all of these data openly available for general use.

The researchers have primarily been engaged in using their technique to aid the Colombian Coffee Federation to enforce the Protected Geographical Indication (GPI) that monitors agricultural products, such as Columbian coffee, whose quality and reputation is linked to a specific geographical area. For this they have primarily been involved in using different technologies to classify coffee beans from different origins. However, it quickly became apparent to the scientists that NMR could also give very accurate information about coffee quality.

Lead author Julien Wist from Universidad del Valle noted that “Although roasting is very important as it can ruin the best beans, it is impossible to make good coffee out of bad beans.” With a hint of humor, he added: “Our research group has had a wonderful time working with coffee samples. The whole lab was, for once, smelling nice. The sample preparation is so simple that we just prepared coffee— cold for the [NMR] magnet and hot for us!”

Most important to this work, the authors have made this huge collection of samples and spectra freely available so that it can be shared without restriction quickly, cheaply — and interactively. Readers can directly engage with these gigabytes of NMR data because, in addition to the datasets, the authors have made software called the NMRium-browser² available so readers can look through the spectra for themselves. NMRium is the newest iteration of a project that started 2 decades ago to bring NMR spectra to the browser.

Wist says of this interactive paper: “Visualization of data is often difficult and requires expensive pieces of software. Often, the consequence is that data is overlooked and simply fed into a black box. I think the first step should always be to look at the data. NMRium does that in the browser and for free”

By sharing what is thought to be the largest available database of NMR spectra of coffee samples, researchers across the world now have a baseline to look at the effectiveness of the technology for applications such as determining coffee origin, purity and adulteration, as well as the effect of roasting.

Making large data sets interactive within the article is possible due to their publishing the work in the journal GigaByte, which uses custom-built publishing technology that includes the ability to integrate interactive content. Making the data available and interactive as part of the publishing process increases trust in article content and creates living documents rather than the more common publishing-industry standard of posting articles online in a static format. Other GigaByte articles have included many different types visualization tools as is best suited to the data being presented, these include Hi-C maps², 3D imaging viewers³that can run on VR-headsets, interactive maps⁴, and interactive protocols⁵. These types of embedded interactive tools showcase new things that can be done in publishing, and demonstrate this more hands-on approach as a way to share research in a manner better suited to communicate modern research and data — Even to the point of letting readers explore the contents of their morning cup of coffee.

About GigaScience Press

GigaScience Press is BGI’s Open Access Publishing division, which publishes scientific journals and data. Its publishing projects are carried out with international publishing partners and infrastructure providers, including Oxford University Press and River Valley Technologies. It currently publishes two data-centric journals: its premier journal GigaScience (launched 2012) and its new journal GigaByte (launched 2020). It also publishes data, software, and other research objects via its GigaDB.org database. To encourage transparent reporting of scientific research as well as enable future access and analyses, it is a requirement of manuscript submission to all GigaScience Press journals that all supporting data and source code be made available in GigaDB or in a community approved, publicly available repository. See GigaSciencePress.com