Public Data Set Concepts
Amazon EC2 provides a repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. Amazon stores the data sets at no charge to the community and, like all AWS services, users pay only for the compute and storage they use for their own applications.
Previously, large data sets such as the mapping of the Human Genome and the US Census data required hours or days to locate, download, customize, and analyze. Now, anyone can access these data sets from their Amazon EC2 instances and start computing on the data within minutes. Users can also leverage the entire AWS ecosystem and easily collaborate with other AWS users. For example, users can produce or use prebuilt server images with tools and applications to analyze the data sets. By hosting this important and useful data with cost-efficient services such as Amazon EC2, AWS hopes to provide researchers across a variety of disciplines and industries with tools to enable more innovation, more quickly.
Note | |
---|---|
For more information, go to the Public Data Sets Page |
Available Public Data Sets
Public data sets are currently available in the following categories:
API List
-
Biology—Includes Human Genome Project, GenBank, and other content.
-
Chemistry—Includes multiple versions of PubChem and other content.
-
Economics—Includes census data, labor statistics, transportation statistics, and other content.
-
Encyclopedic—Includes Wikipedia content from multiple sources and other content.