August 2022, UC Berkeley
Chris Paciorek
This website and the associated GitHub site (https://github.com/berkeley-scf/r-bootcamp-fall-2022) is the main site for the bootcamp. It has information on logistics, software installation, and is the master repository for materials for the modules.
We have an Ed Discussion site for discussion and answering questions online during (and before) the bootcamp.
If you have an administrative question before or after the bootcamp, email r-bootcamp@lists.berkeley.edu.
The campus WiFi is now eduroam, not AirBears. Follow these instructions for how to set up your eduroam account. If you need wireless access as a guest (i.e., you don’t have a CalNet ID), connect to ‘CalVisitor’.
The bootcamp will be organized in modules, each of which will be a combination of lecture/demo presentation concluded by a breakout session in which you’ll work on a variety of problems of different levels of difficulty. The idea is for each person to find problems that challenge them but are not too hard. Solutions to the breakout problems will be presented before the start of the next module.
Many of the modules will use a common dataset as an example on which to carry out various operations. We’ll focus on dataset of demographic/economic information (population, GDP per capita, life expectancy) for many of the countries in the world every five years, provided by the Gapminder project. (Note that this is almost the full population of countries – I’ll fit some statistical models but the interpretation is tricky as we are not working with a sample from a well-defined population.)
Your counseloRs are: Alan Aw (Statistics), Florica Constantine (Statistics), Corrine Elliott (Statistics), and Natalia Sarabia Vasquez (Statistics).
I encourage you to:
This is a bootcamp. So there may be some pain involved! If you find yourself not following everything, that’s ok. You may miss some details, but try to follow the basics and the big picture.
A few additional thoughts on my pedagogical philosophy here:
We’ll present most of the material from within RStudio, using R Markdown documents with embedded R code. R Markdown is an extension to the Markdown markup language that makes it easy to write HTML in a simple plain text format. This allows us to both run the R code directly as well as compile on-the-fly to an HTML file that can be used for presentation. All files will be available on GitHub.
Note: The files named moduleX_blah.html have individual slides, while the files named moduleX_blah_onepage.html have the same content but all on one page.
Warning: in some cases the processing of the R code in the R Markdown is screwy and the slides have error messages that do not occur if you just run the code directly in R or RStudio.
To download the files from GitHub, you can do the following.
Within RStudio go to File->New Project->Version Control->Git and enter:
Then to update from the repository to get any changes we’ve made, you can select (from within RStudio): Tools->Version Control->Pull Branches
or from the Environment/History/Git window, click on the Git tab and then on the blue down arrow.
Be warned that you probably do not want to make your own notes or changes to the files we are providing. Because if you do, and you then do a “Git Pull” to update the materials, you’ll have to deal with the conflict between your local version and our version. You probably will want to make a personal copy of such files in another directory or by making copies of files with new names.
Run the following commands:
cd /directory/where/you/want/repository/located
git clone https://github.com/berkeley-scf/r-bootcamp-fall-2022
Then to update from the repository to get any changes we’ve made:
cd /directory/where/you/put/the/repository/r-bootcamp-fall-2022
git pull
If you don’t want to bother using Git or have problems, simply download a zip file with all the material from https://github.com/berkeley-scf/r-bootcamp-fall-2022/archive/main.zip.
The pieces of an R session include:
RStudio provides an integrated development environment in which all of these pieces are in a single application and tightly integrated, with a built-in editor for your code/scripts.
Other software is better than R at various tasks
E.g., Python is very good for text manipulation, interacting with the operating system, and as a glue for tying together various applications/software in a workflow
R can be much slower than compiled languages (but is often quite fast with good coding practices!)
R’s packages are only as good as the person who wrote them; no explicit quality control
R is a sprawling and unstandardized ecosystem
In addition to learning some R, this workshop will expose you to a way of thinking about doing your computational work.
The building blocks of scientific computing include:
During the afternoon break tomorrow, we’ll ask everyone to fill out a feedback form, but if you leave early, please see the Ed Discussion board for the link.