Session 00: Installations, Organization, Intro Readings

Feedback should be send to goran.milovanovic@datakolektiv.com. These notebooks accompany the Intro to Data Science: Non-Technical Background course 2020/21.


Welcome to R!

What do we want to do today?

Our goal in Session 00 is to prepare ourselves technically for what follows. We need to install (1) the programming language R, the RStudio IDE (IDE stands for: Integrated Development Environment), (3) understand how to organize our files and folders, and make sure that everything is working as expected.

0. Prerequisits.

None. You have your machine in front of you, and that machine is running any of the following operative systems:

  • Windows 10 (earlier versions are fine too)
  • Linux Ubuntu/Debian
  • macOS

1. Install R

In case of any problems during the installation of R and RStudio, do not worry: we will review the procedure completelly in our first session.

NOTE. Please take care to install the latest available versions. At the time of this writing, those were:

Please follow the instructions provided here:

Earth Data Analytics Online Certificate, Lesson 1. Install & Set Up R and RStudio on Your Computer

Essentially, there are two installation steps:

  • install R (the programming language)
  • install RStudio (your IDE, i.e. your working environment, where you write code, inspect data, etc.)

For Windows users: Video Instructions

For Mac users: Video Instructions

For Linux users:

2. Organization

It is of essential importance to keep your files and folders neatly organized. That is not important only to be able to follow this course: all successful Data Scientists suffer a bit from something similar to OCD (Obsessive compulsive disorder) when it comes to organizing their data and code into directories and code repositories.

For each new step that I make in Data Science, for each new project, my approach to organization is the following:

  • I start a new directory which bears the project name (NOTE. Avoid using empty spaces, " ", in naming your files and directories!);
  • In that directory, I make three new directories:
    • _data - where I intend to keep the raw data,
    • _analytics - where I intend to keep the processed data,
    • _results - where I intend to keep the outputs of my work, and
    • _img - where I intend to keep any images that were produced in the course of my work in the project.

I suggest that, at least in the beginning, you use the same schema to organize your directories. Later on you can decide upon the exact form of organization that you find to be most suitable to you.

3. Intro Readings and Videos

R Markdown

R Markdown is what I have used to produce this beautiful Notebook. We will learn more about it near the end of the course, but if you already feel ready to dive deep, here’s a book: R Markdown: The Definitive Guide, Yihui Xie, J. J. Allaire, Garrett Grolemunds.


Goran S. Milovanović

DataKolektiv, 2020/21

contact:


License: GPLv3 This Notebook is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This Notebook is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this Notebook. If not, see http://www.gnu.org/licenses/.


