How to Find a Dataset#

There are many excellent sources of open data online. You are free to use any of them. I would recommend the following.

Kaggle Datasets#

Kaggle is an online community platform for data scientists and machine learning enthusiasts.

  • Kaggle allows users to collaborate with other users, find and publish datasets, use GPU-integrated notebooks, and compete with other data scientists to solve data science challenges.

  • The aim of this online platform (founded in 2010 by Anthony Goldbloom and Jeremy Howard and acquired by Google in 2017) is to help professionals and learners reach their goals in their data science

  • As of today (2021), there are over 8 million registered users on Kaggle.

Look at Kaggle Competitions

The advantage of Kaggle is all of the datasets are designed for machine learning tasks.

Zenodo#

from IPython.display import IFrame
IFrame('https://about.zenodo.org/', width=800, height=1200)
  • Zenodo has a massive amount of data online

  • You need to vet the quality of the data because anyone, even you can add your own data

Open Science Framework#

This is an identical service to Zenodo.

Open Science Framework