The National Institutes of Health has awarded $3.2 million to the University of Minnesota’s Minnesota Population Center for a multi-year project to expand the Integrated Public Use Microdata Series. The project will add demographic and geographic data on the entire U.S. population from 1790 to 1930, more than 600 million persons, to the IPUMS, quadrupling the quantity of U.S. census microdata available for scientific research.
These new data, which will allow for longitudinal surveys that trace individuals and families over a much longer period of time than previously available, will have far-reaching implications for research across the social and behavior sciences as well as medicine and healthcare.
The project is made possible through the donation of a database of U.S. census information from 1790 to 1930 by two of the nation’s largest genealogical organizations: Ancestry.com and FamilySearch. These data, which the two organizations independently collected and digitized over the past decade and combined in 2012, offer the earliest information available on key social and economic characteristics.
The data covers entire populations with full geographic detail, providing contextual information on neighborhood characteristics, including ethnic composition, demographic behavior and population mobility. Until now, the demographic data available to researchers for this historical period was limited. The MPC project will leverage cutting-edge technology to convert this immense body of raw data into a format suitable for scientific analysis.
According to Steve Ruggles, MPC executive director, this evidence sheds new light on previously held theories regarding mortality, fertility, migration and other demographic and health-related trends in the U.S.
For example, social theorists previously believed that ever-increasing geographic mobility contributed to the growth of new institutions in the U.S. in the twentieth century. However, a sampling of these new data reveal that mobility actually declined dramatically from the mid-nineteenth century to the mid-twentieth, both providing new information about our society and overturning previously held assumptions.
“Revealing the causal mechanisms underlying the massive structural shifts of the last century is crucial for understanding the forces that are now shaping twenty-first century society,” said Ruggles. “These massive data collections represent a permanent and substantial addition to the nation’s statistical infrastructure.”
The project also reflects larger trends in research toward the use of informatics, the process of examining large amounts of data of a variety of types (also known as “big data”) to uncover hidden patterns, unknown correlations and other useful information. For example, a medical researcher might use informatics to develop cures for certain types of cancer by examining large amounts of data on social or demographic behavior or genetic data.
Vice President for Research Brian Herman sees great promise in the use of informatics to enhance research excellence and to promote collaboration across disciplines and institutions.
“This NIH award represents a significant contribution to our understanding of American social behavior and health before 1930 that will have a direct impact on academic, scientific and clinical research,” said Herman. “Through this project and work in other areas, such as biomedical informatics, the U of M is leading the way in big data by increasing capacity and building a robust informatics infrastructure that will create a cutting edge research ecosystem.”