To find other tutorials for this class, go to the main website, https://ds112-lendway.netlify.app/.
Welcome to your first tutorial for this class, COMP/STAT 112: Introduction to Data Science! As you work through the different sections, there will be videos for you to watch (both embedded YouTube videos and links to the videos on Voicethread), files for you to download, and exercises for you to work through. The solutions to the exercises are usually provided, but in order to get the most out of these tutorials, you should work through the exercises and only look at the solutions if you get really stuck. You could also work through the exercises in your own R Markdown file in order to keep the results permanently. If you do that, start the file with the three code chunks I talk about below. Then copy and paste the questions into your document and put your solutions in R code chunks.
If you haven’t done so already, please go through the R Basics document.
When you start your own document, you should have the following three code chunks at the top of your R Markdown file:
knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning=FALSE)
install.packages() function in the console and write the name of each of the packages you want to install. Some packages (like my
gardenR package) needs to be installed in a special way using the
install_github() function in the
remotes library - uncomment (delete the hashtags from the front) those two lines of code to install the library. Then either delete those two lines or comment them again. You only need to install packages once, although you will need to re-install them if you upgrade to a new version of R. You need to load them with the
library() statements each time you use them. There is a good analogy with lights: installing the package is like putting the light in the socket, loading the package is like turning the light on.
library(tidyverse) # for graphing and data cleaning
library(lubridate) # for working with dates
library(palmerpenguins) # for palmer penguin data
# library(remotes) # for installing package from GitHub
# remotes::install_github("llendway/gardenR") # run if package is not already installed
library(gardenR) # for Lisa's garden data
theme_set(theme_minimal()) # my favorite ggplot theme
data() function. Data outside of a package can be loaded in different ways depending where it is and what type of data it is. Later in the course, we will learn different functions that can be used to read in data from other places.
# Palmer Penguins data from palmerpenguins library
# Lisa's garden data from gardenR library
Before jumping into teaching you some Data Science skills in R, I want to give you some motivation. I picked three graphs I’ve recently seen on Twitter. These are all responses to
#TidyTuesday which you’ll be participating in very soon! Read more about it here if you’re curious. There are many definitions of Data Science but I broadly like to think of it as using data to tell a story. These three graphs are just a small sample of doing just that.
One of my favorite Data Visualizers on Twitter:
@geokaramanis) April 18, 2020
One of my former students (and your preceptor!):
This wk's @R4DScommunity #TidyTuesday: guess what a centered dot-plot of astronauts in space by year and nation looks a lot like?— lil bobby tables 🐳 (@robert_b_) July 15, 2020
A space station in mid-orbit (or Cloud City)! #RStats #r4ds #DataScience #DataViz #tidyverse #ggplot2 pic.twitter.com/hqW7KLWmsn