Datasets referances

 

  • Amazon makes large data sets available on its Amazon Web Services platform.

  • The Million Song Dataset is collection of audio features and metadata for a million contemporary popular music tracks.

  • Large-scale self-driving datasets Apollo Scape and BDD100K

  • Waymo Open Dataset for Autonomous Driving 

  • Open Images Dataset, a dataset consisting of ~9 million URLs to images that have been annotated with labels spanning over 6000 categories. 

  • The CIFAR-10 / CIFAR-100 dataset (Canadian Institute For Advanced Research) is a collection of images that are commonly used to train machine learning and computer vision algorithms. 

  • The MNIST database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. The digits have been size-normalized and centered in a fixed-size image.

  • GroupLens Research has collected and made available movie rating data sets from the MovieLens.

  • FiveThirtyEight provides good number of datasets like Airline accidents, Sports, historical US weather,..etc.

  • Twenty Newsgroup dataset contains information about newsgroups. This dataset curated with 1000 Usenet articles, taken from 20 different newsgroups.

  • OGD ( Open Government Data) a platform by Indian Govt. provides datasets for Health, Weather, Agriculture, Education, Transport…etc.  

  • IIIT 5K-word dataset is collection of Query words like billboards, signboard, house numbers, house name plates, movie posters…etc.; Contains 5000 cropped word images from Scene Texts and born-digital images.

  • The ApolloScape dataset provided by Baidu, Inc. will include RGB videos with high resolution images and per pixel annotation, survey- grade dense 3D points with semantic segmentation, stereoscopic video, and panoramic images. this developed to promote self-driving technologies.

  • Common objects in context (COCO) is a large-scale object detection, segmentation, and captioning; This dataset contains 1.5 million object instances with 80 object categories, 91 stuff categories and has been annotated with 5 captions per images.

  • Large Health Data Sets β€“ Collection of large health-related datasets

  • Enron Email Dataset β€“ Enron email database, contains 500 thousand emails between 150 former Enron employees, mostly senior executives. It’s also the only large public database of real emails, which makes it more valuable; hosted by CMU.

  • cricsheet.org – Ball-by-ball data for international and IPL cricket matches.

Begin typing your search above and press return to search. Press Esc to cancel.