Set up

To find other tutorials for this class, go to the main website,

Welcome to your first tutorial for this class, COMP/STAT 112: Introduction to Data Science! As you work through the different sections, there will be videos for you to watch (both embedded YouTube videos and links to the videos on Voicethread), files for you to download, and exercises for you to work through. The solutions to the exercises are usually provided, but in order to get the most out of these tutorials, you should work through the exercises and only look at the solutions if you get really stuck. You could also work through the exercises in your own R Markdown file in order to keep the results permanently. If you do that, start the file with the three code chunks I talk about below. Then copy and paste the questions into your document and put your solutions in R code chunks.

If you haven’t done so already, please go through the R Basics document.

When you start your own document, you should have the following three code chunks at the top of your R Markdown file:

  1. Options that control what happens to the R code chunks.
knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning=FALSE)
  1. Libraries that are used and other settings, like a theme you would like to use throughout the document. If you have not yet installed the libraries you are going to use, you will first have to install them. Go to the Packages tab (top of lower right box) and choose Install. You can then list the packages you would like to install. Alternatively, you can use the install.packages() function in the console and write the name of each of the packages you want to install. Some packages (like my gardenR package) needs to be installed in a special way using the install_github() function in the remotes library - uncomment (delete the hashtags from the front) those two lines of code to install the library. Then either delete those two lines or comment them again. You only need to install packages once, although you will need to re-install them if you upgrade to a new version of R. You need to load them with the library() statements each time you use them. There is a good analogy with lights: installing the package is like putting the light in the socket, loading the package is like turning the light on.
library(tidyverse)         # for graphing and data cleaning
library(lubridate)         # for working with dates
library(palmerpenguins)    # for palmer penguin data
# library(remotes)        # for installing package from GitHub
# remotes::install_github("llendway/gardenR") # run if package is not already installed
library(gardenR)           # for Lisa's garden data
theme_set(theme_minimal()) # my favorite ggplot theme
  1. Load data that will be used. Data from packages can be loaded using the data() function. Data outside of a package can be loaded in different ways depending where it is and what type of data it is. Later in the course, we will learn different functions that can be used to read in data from other places.
# Palmer Penguins data from palmerpenguins library

# Lisa's garden data from gardenR library


Before jumping into teaching you some Data Science skills in R, I want to give you some motivation. I picked three graphs I’ve recently seen on Twitter. These are all responses to #TidyTuesday which you’ll be participating in very soon! Read more about it here if you’re curious. There are many definitions of Data Science but I broadly like to think of it as using data to tell a story. These three graphs are just a small sample of doing just that.

One of my favorite Data Visualizers on Twitter:

One of my former students (and your preceptor!):