Go to Google Home
Go
A data-code-compute resource for research and education in information visualization
InfoVis Home Learning Modules Software Databases Compute Resources References

Databases > Community of Science Data (COS)

Description | Origins | Data Format and Size | Data Quality | Data Cleaning | Acknowledgments


line
Description

The Community of Science (COS) is a global resource for scientific funding opportunities. As of June 8th 2004, more than 23,000 records representing nearly 400,000 opportunities, worth over $33 billion can be searched.


line
Origins

In Summer 2002, COS Funding Opportunities data was retrieved from http://fundingopps.cos.com/. Since June 2004 COS data is harvested on a continuous basis.


line
Data Format and Size

Raw Data: Use the COS Search to get familiar with this data set.

Data Fields:
There are 28 attributes for each COS entry.

COS Unique Id number
Title varchar2(2550)
Keywords varchar2(255)
Funding Type varchar2(1000)
Sponsor Institution Name varchar2(1000)
Sponsor Institution Address varchar2(1000)
Sponsor Institution City varchar2(1000)
Sponsor Institution State varchar2(1000)
Sponsor Institution Zip Code 1 number
Sponsor Institution Zip Code 2 number
Sponsor Institution Phone number varchar2(512)
Sponsor Type varchar2(1000)
Deadline deadline
Deadline Note varchar2(1000)
Amount number
Upper Amount clob
Amount Note clob
Eligibility clob
Citizenship or Residency varchar2(255)
Activity Location varchar2(1000)
Requirements varchar2(1000)
Abstract clob
Contact First Name varchar2(255)
Contact Middle Name varchar2(255)
Contact Last Name varchar2(255)
Contact Address varchar2(500)
Contact City varchar2(255)
Contact State varchar2(100)
Contact Zip Code number
Contact Country varchar2(1000)
Contact Phone varchar2(255)
Contact Fax varchar2(255)
Contact Email varchar2(1000)
URL for more information varchar2(1000)
Date Last Revised date
URL from COS to Bookmark this record varchar2(1000)

Statistics:
See Excel file for details on the number of records and the number of empty fields for the data harvested in 2002.

Storage Space Required:
Number of Entries: 38,154. With about 5000 new records per month we estimate that 100 MB of disk space will suffice to store the raw data for 2002-2005.


line
Data Quality

A total of seven records had an incorrect number of fields (five had too few but two of those were duplicate records, two had too many as they appeared to be the concatenation of parts of two different records).

All records that were short were short one field, which was the last ("Funding Type") field. There are missing columns (columns with NULL data), see number/percentage of null fields.


line
Data Cleaning

Of the records that had an incorrect number of fields, two duplicate records were deleted and Funding Type information was added to two other records where this information was available online (via the Bookmark
URL). Information regarding the remaining faulty records could not be found online, so those records couldn't be corrected.


line
Acknowledgements

This data set description was compiled by Jay Askren, Saiful Bahari, Chris Friend, Katy Börner and Caroline Courtney.

line
Information Visualization CyberInfraStructure @ SLIS, Indiana University
Last Modified June 04, 2004