Guest blog post by Vincent Granville
A few data sets are accessible from our data science apprenticeship web page.
- Source code and data for our Big Data keyword correlation API (see also section in separate chapter, in our book)
- Great statistical analysis: forecasting meteorite hits (see also section in separate chapter, in our book)
- Fast clustering algorithms for massive datasets (see also section in separate chapter, in our book)
- 53.5 billion clicks dataset available for benchmarking and testing
- Over 5,000,000 financial, economic and social datasets
- New pattern to predict stock prices, multiplies return by factor 5 (stock market data, S&P 500; see also section in separate chapter, in our book)
- 3.5 billion web pages: The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages
- Another large data set - 250 million data points: This is the full resolution GDELT event dataset running January 1, 1979 through March 31, 2013 and containing all data fields for each event record.
- 125 Years of Public Health Data Available for Download
You can find additional data sets at the Harvard University Data Science website. I was particularly interested in their LinkedIn data set. KDNuggets is also a great resource, and for more, check out this link.
Cross-disciplinary data repositories, data collections and data search engines:
Single datasets and data repositories