Databases
> Community of Science Data (COS)
Description | Origins
| Data Format and Size | Data
Quality | Data Cleaning | Acknowledgments
Description |
|
The Community of Science (COS) is a
global resource for scientific funding opportunities. As of June 8th 2004,
more than 23,000 records representing nearly 400,000 opportunities, worth
over $33 billion can be searched.
Origins |
|
In Summer 2002, COS Funding Opportunities data was retrieved from http://fundingopps.cos.com/.
Since June 2004 COS data is harvested on a continuous basis.
Data Format and Size |
|
Raw Data: Use the COS
Search to get familiar with this data set.
Data Fields:
There are 28 attributes for each COS entry.
COS Unique Id number
Title varchar2(2550)
Keywords varchar2(255)
Funding Type varchar2(1000)
Sponsor Institution Name varchar2(1000)
Sponsor Institution Address varchar2(1000)
Sponsor Institution City varchar2(1000)
Sponsor Institution State varchar2(1000)
Sponsor Institution Zip Code 1 number
Sponsor Institution Zip Code 2 number
Sponsor Institution Phone number varchar2(512)
Sponsor Type varchar2(1000)
Deadline deadline
Deadline Note varchar2(1000)
Amount number
Upper Amount clob
Amount Note clob
Eligibility clob
Citizenship or Residency varchar2(255)
Activity Location varchar2(1000)
Requirements varchar2(1000)
Abstract clob
Contact First Name varchar2(255)
Contact Middle Name varchar2(255)
Contact Last Name varchar2(255)
Contact Address varchar2(500)
Contact City varchar2(255)
Contact State varchar2(100)
Contact Zip Code number
Contact Country varchar2(1000)
Contact Phone varchar2(255)
Contact Fax varchar2(255)
Contact Email varchar2(1000)
URL for more information varchar2(1000)
Date Last Revised date
URL from COS to Bookmark this record varchar2(1000)
Statistics:
See Excel file for details on the number
of records and the number of empty fields for the data harvested in 2002.
Storage Space Required:
Number of Entries: 38,154. With about 5000 new records per month we estimate
that 100 MB of disk space will suffice to store the raw data for 2002-2005.
Data Quality |
|
A total of seven records had an incorrect
number of fields (five had too few but two of those were duplicate records,
two had too many as they appeared to be the concatenation of parts of
two different records).
All records that were short were short one field, which was the last
("Funding Type") field. There are missing columns (columns with
NULL data), see number/percentage of null
fields.
Data Cleaning |
|
Of the records that had an incorrect number of fields, two duplicate
records were deleted and Funding Type information was added to two other
records where this information was available online (via the Bookmark
URL). Information regarding the remaining faulty records could not be
found online, so those records couldn't be corrected.
Acknowledgements |
|
This data set description was compiled by Jay
Askren, Saiful
Bahari, Chris
Friend, Katy
Börner and Caroline
Courtney.
Information Visualization CyberInfraStructure
@ SLIS, Indiana University
Last Modified June 04, 2004 |