It seems obvious. But twenty-five years ago, obtaining that population data was nearly impossible.
Researchers at Oak Ridge National Laboratory have spent more than two decades working to solve this challenge through the development of cutting-edge population distribution models. Now their suite of LandScan datasets is available online to the global public under a new open-source creative commons license.
Peanut butter and jello?
In the mid-1990s, the world had no good, consistent data on populations around the globe.
Most countries conducted censuses, but the data was wildly inconsistent. Recognizing that this lack of data could prove fatal when disaster struck, ORNL’s geospatial scientists and human geographers started a project to develop the world’s first reliable, standardized population distribution model. (Learn more about ORNL’s early population modeling work.)
“The idea when we started was to provide a realistic estimate of population counts for every square kilometer of the planet,” said Amy Rose, an ORNL senior staff scientist and LandScan project principal investigator.
The project began not by counting people but by analyzing and characterizing existing environmental and infrastructure data. Understanding what the physical space looked like enabled researchers to identify where people were likely to be and, importantly, where they weren’t.
“When you factor in things like terrain, land cover and usage, and facility characterizations, you can start to build a sort of jello mold that identifies the relative likelihood of people being in a given location,” Rose said.
The alternative — called the “peanut butter” method — was to spread the known population count evenly across the geographic space. But that would frequently result in models that could only assume people were living in water, on steep mountain slopes or in other impossible locations.
Instead, ORNL’s researchers took the available census data for a given area and dropped it into their so-called jello mold — in reality, a complex algorithm that gives greater weight to the areas more attractive to populations — to get a much clearer picture of where within a given region people were likely to be and an estimate of how many in each area.
Rose likened this approach to a conference room with a table, chairs and a podium.
“People are more likely to be in those chairs around the table, possibly someone up at the podium, than they would be in the remainder of the space in the conference room,” she said. “So, people aren’t equally distributed throughout the room, but we can intelligently estimate where they might be based on the characteristics of the room.”
Not all conference rooms — or countries — are alike
The team quickly realized that modeling global population distribution was not a one-size-fits-all proposition. A single jello mold would never capture the distribution worldwide.
“One of the main distinguishing characteristics of LandScan, as opposed to other population datasets that came along later, is that when we first started we realized that population distribution was certainly not a case where a single model would adequately capture the distribution world-wide,” said Eddie Bright, a retired ORNL scientist and one of the original developers of the LandScan program.
Bright added that some regions are dominated by nucleated, or clustered, settlements while others may be more linear along rivers and roads. “So, we developed separate models for each country based upon both the cultural and natural factors present for that country as well as the quality of data that was available for each nation.”
LandScan Global was launched in 1998 and made available to the U.S. government and mission partners, offering the first global population distribution dataset capturing the full potential activity space of people over the course of 24 hours.
“The initial release was a pretty big deal because it was the first comprehensive population dataset of its kind,” Rose said, “and it was also the first to specifically consider populations at risk — whether from man-made events or natural disasters.”
It didn’t take long for LandScan, which won an R&D 100 Award in 2006, to show its value and life-saving impact. By 2003, the database was being used to aid recovery efforts following an earthquake in Iran and, a year later, the tsunamis that devastated Sri Lanka and Indonesia — giving first responders and relief workers quick insight into where potential victims were and how best to reach them.
“If you have a good sense of where people were at the time of an event, you have a very educated guess as to how many people may have been affected in a particular location,” said Marie Urban, Human Geography group leader at ORNL. “From there, you can adjust the level of response and assistance for the greatest impact.”
Expansion, enhancement and geospatial innovation
As technologies improved, compute capabilities advanced, and the amount of data increased, the LandScan team refined their models and branched off on related efforts. In 2004, they built LandScan USA, an enhanced version of LandScan Global that offers day and night population variations for the entire U.S. at a much finer spatial resolution — approximately 90 meters. And the ongoing rollout of LandScan HD provides similar granularity for additional countries and regions.
Along the way, the team has developed tools and resources to support a variety of research missions, and vice versa, helping ORNL remain a leader in geospatial science discovery and innovation.
“We take advantage of work done at ORNL in GeoAI, which enables us to incorporate automated building feature extraction, as well as population density tables, which provide estimates of people per 1,000 square feet for different types of facilities globally,” Rose said.
Similarly, a project to develop foundational critical infrastructure data for the U.S. Department of Homeland Security grew out of LandScan USA, and much of ORNL’s machine learning and computer vision research for settlement mapping has been supported by the LandScan program.
“We are very conscious of the relationships among these different research efforts,” Urban said. “Because they’re all funded by the government, we are constantly sharing our results and re-integrating to get the best value out of each taxpayer dollar.”
The team is excited to make this groundbreaking population data free and open to the global public. The National Geospatial-Intelligence Agency, which funds the LandScan program, has worked to make the datasets publicly available, in particular to help the humanitarian community. The COVID-19 pandemic added urgency to the efforts.
“We had always been talking about trying to make these open-source datasets, but when COVID came around, it presented a very clear case that more people needed access to good population data,” said Rose.
The LandScan dataset has been cited in all kinds of research outside ORNL — disaster preparedness and recovery planning, urban design and walkability plans, and facility siting studies. With the new open-source copyright and unrestricted transfer of this technology to the public, the possibilities seem endless.
“It’s really amazing to me how far this data has reached, across multiple government agencies, the humanitarian community and international partners,” Urban said. “And the usage and impacts on people’s lives will absolutely grow because of this public release. I’m excited to see what people might do with it.”
UT-Battelle manages ORNL for the Department of Energy’s Office of Science, the single largest supporter of basic research in the physical sciences in the United States. The Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science.