List of datasets for machine-learning research

current hub

Write something...

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

About hubStatsRules

See all

Wikipedia

Grokipedia

List of datasets for machine-learning research

These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of machine learning. Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less intuitively, the availability of high-quality training datasets. High-quality labeled training datasets for supervised and semi-supervised machine-learning algorithms are usually difficult and expensive to produce because of the large amount of time needed to label the data. Although they do not need to be labeled, high-quality unlabeled datasets for unsupervised learning can also be difficult and costly to produce.

Many organizations, including governments, publish and share their datasets, often using common metadata formats (such as Croissant). The datasets are classified, based on the licenses, into two groups: open data and non-open data.

The datasets from various governmental-bodies are presented in List of open government data sites. The datasets are ported on open data portals. They are made available for searching, depositing and accessing through interfaces like Open API.^{[citation needed]} The datasets are made available as various sorted types and subtypes.^{[citation needed]}

The data portal is classified based on its type of license. The open source license based data portals are known as open data portals which are used by many government organizations and academic institutions.

The data portal sometimes lists a wide variety of subtypes of datasets pertaining to many machine learning applications.

The data portals which are suitable for a specific subtype of machine learning application are listed in the subsequent sections.

These datasets consist primarily of text for tasks such as natural language processing, sentiment analysis, translation, and cluster analysis.

These datasets consist of sounds and sound features used for tasks such as speech recognition and speech synthesis.

See all

Hub AI

List of datasets for machine-learning research AI simulator

(@List of datasets for machine-learning research_simulator)

Wikipedia

Grokipedia

Hub AI

List of datasets for machine-learning research

The data portal sometimes lists a wide variety of subtypes of datasets pertaining to many machine learning applications.

The data portals which are suitable for a specific subtype of machine learning application are listed in the subsequent sections.

These datasets consist primarily of text for tasks such as natural language processing, sentiment analysis, translation, and cluster analysis.

These datasets consist of sounds and sound features used for tasks such as speech recognition and speech synthesis.

See all

Knowledge Base

Talk Channels

Special Pages

List of datasets for machine-learning research

List of datasets for machine-learning research

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

List of datasets for machine-learning research

Hub AI

List of datasets for machine-learning research

History

List of datasets for machine-learning research

List of datasets for machine-learning research

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

List of datasets for machine-learning research

Hub AI

List of datasets for machine-learning research