Datasets referances

-
Amazon makes large data sets available on its Amazon Web Services platform.
-
The Million Song Dataset is collection of audio features and metadata for a million contemporary popular music tracks.
-
Large-scale self-driving datasets Apollo Scape and BDD100K
-
Waymo Open Dataset for Autonomous Driving
-
Open Images Dataset, a dataset consisting of ~9 million URLs to images that have been annotated with labels spanning over 6000 categories.
-
The CIFAR-10 / CIFAR-100 dataset (Canadian Institute For Advanced Research) is a collection of images that are commonly used to train machine learning and computer vision algorithms.
-
The MNIST database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. The digits have been size-normalized and centered in a fixed-size image.
-
GroupLens Research has collected and made available movie rating data sets from the MovieLens.
-
FiveThirtyEight provides good number of datasets like Airline accidents, Sports, historical US weather,..etc.
-
Twenty Newsgroup dataset contains information about newsgroups. This dataset curated with 1000 Usenet articles, taken from 20 different newsgroups.
-
OGD ( Open Government Data) a platform by Indian Govt. provides datasets for Health, Weather, Agriculture, Education, Transport…etc.
-
IIIT 5K-word dataset is collection of Query words like billboards, signboard, house numbers, house name plates, movie posters…etc.; Contains 5000 cropped word images from Scene Texts and born-digital images.
-
The ApolloScape dataset provided by Baidu, Inc. will include RGB videos with high resolution images and per pixel annotation, survey- grade dense 3D points with semantic segmentation, stereoscopic video, and panoramic images. this developed to promote self-driving technologies.
-
Common objects in context (COCO) is a large-scale object detection, segmentation, and captioning; This dataset contains 1.5 million object instances with 80 object categories, 91 stuff categories and has been annotated with 5 captions per images.
-
Large Health Data Sets β Collection of large health-related datasets
-
Enron Email Dataset β Enron email database, contains 500 thousand emails between 150 former Enron employees, mostly senior executives. Itβs also the only large public database of real emails, which makes it more valuable; hosted by CMU.
-
cricsheet.org β Ball-by-ball data for international and IPL cricket matches.
CategoriesAI & Machine Learning