A Data Science Central Community
The ability to collect and analyze massive amounts of data is rapidly transforming science, industry and everyday life, but what we have seen so far is likely just the tip of the iceberg. Many of the benefits of "Big Data" have yet to surface because of a lack of interoperability, missing tools and hardware that is still evolving to meet the diverse needs of scientific communities.
One of the National Science Foundation's (NSF) priority goals is to improve the nation's capacity in data science by investing in the development of infrastructure, building multi-institutional partnerships to increase the number of U.S. data scientists and augmenting the usefulness and ease of using data.
As part of that effort, NSF today announced $31 million in new funding to support 17 innovative projects under the Data Infrastructure Building Blocks (DIBBs) program. Now in its second year, the 2014 DIBBs awards support research in 22 states and touch on research topics in computer science, information technology and nearly every field of science supported by NSF.
"Developed through extensive community input and vetting, NSF has an ambitious vision and strategy for advancing scientific discovery through data," said Irene Qualters, division director for Advanced Cyberinfrastructure at NSF. "This vision requires a collaborative national data infrastructure that is aligned to research priorities and that is efficient, highly interoperable and anticipates emerging data policies."
This year's data cyberinfrastructure awards build capacity and capability across the nation and across research communities and complement previous awards.
"Each project tests a critical component in a future data ecosystem in conjunction with a research community of users," Qualters said. "This assures that solutions will be applied and use-inspired."
NSF sees these building blocks as digital components that can be joined together to develop the foundations for a robust data infrastructure. The building blocks encompass hardware, software and networking tools, as well as the communities and people who manage data and who are the practitioners of data science.
Of the 17 awards, two support early implementations of research projects that are more mature; the others support pilot demonstrations. Each is a partnership between researchers in computer science and other science domains.
One of the two early implementation grants will support a research team led by Geoffrey Fox, a professor of computer science and informatics at Indiana University. Fox's team plans to create middleware and analytics libraries to allow data science to work at large scale on high-performance computing systems (also known as supercomputers).
Fox and his interdisciplinary team plan to test their platform with several different applications, including those used in geospatial information systems (GIS), biomedicine, epidemiology and remote sensing.
"Our innovative architecture integrates key features of open source cloud computing software with supercomputing technology," Fox said. "And our outreach involves 'data analytics as a service' with training and curricula set up in a Massive Open Online Course or MOOC."
Other institutions collaborating on the project include: Arizona State University, Emory University, Rutgers University, University of Kansas, University of Utah and Virginia Tech.
The other early implementation project is led by Ken Koedinger, professor of human computer interaction and psychology at Carnegie Mellon University. Whereas Fox's team focuses on problems in sensing and the life sciences, Koedinger's team concentrates on developing infrastructure that will drive innovation in education.
The team will develop a distributed data infrastructure called LearnSphere that will make more educational data accessible to course developers, while also motivating more researchers and companies to share their data with the greater learning sciences community. LearnSphere will include a graphical user interface, a library of analytical methods and a wide variety of educational data gathered from such sources as interactive tutoring systems, educational games and MOOCs.
"We've seen the power that data has to improve performance in many fields, from medicine to movie recommendations," Koedinger said. "Educational data holds the same potential to guide the development of courses that enhance learning while also generating even more data to give us a deeper understanding of the learning process."
Other institutions collaborating on this project include: MIT, Stanford University and the University of Memphis.
The DIBBs program awarded each early implementation project $5 million over 5 years.
The second group of awards supports pilot demonstrations that build upon the advanced cyberinfrastructure capabilities of existing research communities to address specific challenges in science and engineering research and extend those data capabilities to meet broad community needs. The awards provide $1.5 million over 3 years.
Among the projects supported by DIBBs awards are efforts to develop cyberinfrastructure to visualize geo-chronological data, like uranium dating of corals (College of Charleston); data capture and curation for materials science research (University of Illinois Urbana-Champaign); and efforts to manage data emerging from the Laser Interferometer Gravitational-wave Observatory or LIGO (Syracuse University).
The DIBBs program is part of a coordinated strategy within NSF to advance data-driven cyberinfrastructure. It complements other major efforts including the DataOne project, the Research Data Alliance and Wrangler, a groundbreaking data analysis and management system for the national open science community.
2014 NSF DIBBs Awards
Aaron Dubrow, NSF, (703) 292-4489, [email protected]
Irene Qualters, NSF, (703) 292-2339, [email protected]
Data Infrastructure Building Blocks:http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504776
The National Science Foundation (NSF) is an independent federal agency that supports fundamental research and education across all fields of science and engineering. In fiscal year (FY) 2014, its budget is $7.2 billion. NSF funds reach all 50 states through grants to nearly 2,000 colleges, universities and other institutions. Each year, NSF receives about 50,000 competitive requests for funding, and makes about 11,500 new funding awards. NSF also awards about $593 million in professional and service contracts yearly.