Go to Google Home
Go
A data-code-compute resource for research and education in information visualization
InfoVis Home Learning Modules Software Databases Compute Resources References

Databases > National Science Foundation (NSF) Grant Award Data

Description | Origins | Data Format and Size | Data Quality | Data Cleaning | Acknowledgments


line
Description

The National Science Foundation (NSF) funds research and education in science and engineering. It does this through grants, contracts, and cooperative agreements to more than 2,000 colleges, universities, and other research and/or education institutions in all parts of the United States. The Foundation accounts for about 20 percent of federal support to academic institutions for basic research.

Each year, NSF receives approximately 30,000 new or renewal support proposals for research, graduate and postdoctoral fellowships, and math/science/engineering education projects; it makes approximately 9,000 new awards. These typically go to universities, colleges, academic consortia, nonprofit institutions, and small businesses.


line
Origins

In June 2004, a comprehensive set of NSF award data was downloaded from the NSF website, in XML format, via the NSF Award Search. A procedure for automatic retrieval of new award information was put in place at this time. Through the NSF interface, results may be downloaded in sets of 1000 records, in Excel, XML or CSV format. Full records go back as far as 1960. Less complete records from 1952-1975, which are deemed 'historical', are also available through this search mechanism.

NFS' Budget Internet Information System provides general statistics of NSF funding.


line
Data Format

Raw Data:

Please use the NFS Award Search to familiarize yourself with this data set.

According to the NSF website, the award numbering scheme is as follows:

NSF award numbers are 7-digit numbers. The first two digits represent the last two digits of the NSF fiscal year in which the award was made (NSF was created in 1950, but there are no awards in this database before 1970).

To search within a given year, see these guidelines:

The NSF search engine treats the award number as just a number, so if you search for award numbers greater than "0000001", you will get all the awards, not just the awards since fiscal year 2000 started. If you want all the awards in fiscal year 2000, you could search for "award numbers between 0000001 and 0099999". Better yet, use the "start date" searches below, which handle dates as dates, and will not have any surprises for you.

XML Data Fields:

NSF Org varchar2(4000)
Date Last Amended date
Award Number number
Award Instrument varchar2(4000)
Date Started date
Date Expires date
Expected Total Amount number
Data Is Ok char
Program varchar2(500)
Field Application varchar2(500)
Element Code number
Program Ref Code varchar2(4)
Abstract clob
(Program Manager) Name varchar2(1000)
(Program Manager) Division varchar2(1000)
(Program Manager) Directorate varchar2(1000)
(Sponsor) Institution Name varchar2(1000)
(Sponsor) Address varchar2(1000)
(Sponsor) City varchar2(1000)
(Sponsor) State varchar2(1000)
(Sponsor) Zipcode1 number
(Sponsor) Zipcode2 number
(Sponsor) Country varchar2(1000)
(Sponsor) Phone number

Statistics:

See this Excel file for statistics on the records in our NSF datasets.

Storage Space Required:

1985-2005

  • Number of Entries: ~181,132
  • Estimated size: 400 MB
1960-1984 (non-historical)
  • Number of Entries: ~85,649
  • Estimated size: 200 MB


line
Data Quality
  • Some titles and some abstracts contain a ":", "|", or other potential delimiter.
  • Abstracts may be corrupted with a field-header string, such as a line that contains "Investigator:"
  • Many abstracts contain portions of raw non-text files (at least 384)

To see how foreign sponsor addresses are formatted by NSF, please view this list of NSF sponsors lacking a US state abbreviation and the list of country abbreviations used in NSF records.


line
Data Cleaning

Files with corrupt abstracts are being flagged.


line
Acknowledgements

We would like to thank Michael Pazzani, Division Director, Information and Intelligent Systems at NSF for making the set of 1990-2003 NSF data available to us in September 2003.
Rahul Doshi worked on a first attempt to download records from 1985-1989 through the NSF fastlane interface during the summer of 2002. This data set description was compiled Andrew Bangert, Ruchi Kapoor, Katy Börner, and Caroline Courtney.

line
Information Visualization CyberInfraStructure @ SLIS, Indiana University
Last Modified June 04, 2004