Here is an overview of (some of) the data sets that we will be using in DATA SCIENCE SESSION VOL. 2 :: Introduction to R for Data Science.
This is a collection of frequently updated public Airbnb data sets which are nicely suited to practice basic data visualization and Exploratory Data Analysis (EDA).
Data collected by Wikimedia Foundation’s Product Analytics team on the development of different language versions of Wikipedia, the free encyclopedia.
A classic binary classification problem: predict a binary response variable admit
from gre
, gpa
, and rank
.
An excellent data set to practice Poisson Regression from a classic GLM book in R.
The Boston Housing Dataset is a derived from information collected by the U.S. Census Service concerning housing in the area of Boston MA. We will use it to practice Random Forest models for regression problems.
Predict Air Quality from the data recorede by a gas multisensor device deployed on the field.
Database of common fish species for fish market: build a predictive model to estimate if the weight of fish can be predicted.
Predict the pricing of a property.
The goal of the exercise in which we use the Wine Quality dataset is to train a regularized Multinomial Regression model to predict the wine quality class.
The task is to predict the Exited
variable, making this pretty much a churn prediction problem.
The task is to predict the web popularity of a post: the number of shares a post receives once it is published.
License
Goran S. Milovanović, Chief Scientist & Owner, DataKolektiv, Lead Data Scientist, Smartocto
Contact: goran.milovanovic@datakolektiv.com. This is free software: all content is GPL v2.0 licensed.