Microbiome Analysis Workshop

GOAL

In addition to understanding the importance of experimental design, we will walk through turning raw sequence data into useful counts-data that we can use to visualize microbiome sample-composition. We will then discuss methods to extrapolate function from these abundance-data (and limitations thereof) to ultimately arrive at biological insight.

Pre-course Materials
Day One
- Presentation Slides
Day Two
Day Three
- Presentation Slides
- Microbiome Data Visualization
Resources

Pre-course Materials

⬣ Best Practices for Analyzing Microbiomes - This article discusses how all stages of conducting a microbiome study, from designing the experiment to collecting and storing the samples to obtaining insight from graphical displays of the sequence data, can substantially impact the result.

⬣ If you do not have R and RStudio installed already, you can follow these instructions for both Mac and Windows. The R versions listed in the instructions might be outdated but the links are the correct. If you already have RStudio, make sure you’re using R version 4.0.5 by clicking on ‘Global Options…’ in the Tools tab. The version is also stated in the first line of the console when you first open RStudio. If you are not using the most recent version, follow the previous installation instructions and restart R.

If you have Windows, then a very easy way to update your R-version and packages is by simply running the following code in the RStudio console:

install.packages("installr")

library(installr)

updateR()

You can also use latest version of RStudio. You can check this within RStudio by going the Help tab and clicking ‘Check for Updates’.

Finally, update your packages by clicking ‘Check for Package Updates…’ in Tools.

⬣ We know that programming can be very intimidating at first, so we created this introductory R course to help researchers such as you start your programming journey. If you are a bit familiar with R, please still check out this resource as it covers how the workshop tutorials will be set up. We’ll be moving quickly through basic concepts in R to get to the actual data-analysis. We strongly recommend reviewing the R tutorial to get you started/help you keep up.

Day One

Presentation Slides

Day one of this workshop focuses on an overview of available methods and best-practice considerations for experimental design in microbiome studies. You can download the individual presentations for each topic in the agenda below. The full agenda can be downloaded here.

NOTE: Speakers may change with each workshop event. Presentation slides from every workshop are still listed here.

AGENDA	INSTRUCTOR
Intro to Microbiome Studies
USF Genomics Introduction and Workshop Overview	Dr. Jenna Oberstaller
USF Genomics Equipment Core	Dr. Min Zhang
Introduction to Microbiome Data Analysis	Dr. Anujit Sarkar
Best Practices for Microbiome Sample-handling and Nucleic Acid-processing	Swamy Rakesh Adapa, MS
Statistical Considerations for Microbiome Studies	Dr. Ryan McMinds
Experimental Design	Swamy Rakesh Adapa, MS
Overview of Microbiome Data-Visualization	Dr. Justin Gibbons
Functional Profiling with PICRUSt2	Dr. Thomas Keller

Day Two

Presentation Slides

Introduction to R and Plotting Data (Dr. Charley Wang)

Taxonomic Analysis (Dr. Anujit Sarkar)

R Hands-on Practice

Download Charley’s tutorial. Follow along below!

Initial ASV Analysis

Download the zip file and extract it to get started on Charley’s initial ASV analysis tutorial. Open the .Rmd file in RStudio by going to File tab and clicking ‘Open.’ Depending on where you extracted your folder to on your computer, your directory path will be different. More on working directories can be found here. We will need to change the paths in the first chunk of code in this Rscript which loads the text files we need to run the tutorial. The path is the first part of the read.table function surrounded by single quotations.

Follow along Charleys tutorial here!!

NOTE: We will be using an R Project format for the rest of our tutorials so we will not need to worry about changing paths for the remainder of the workshop but it is still important to understand how file paths and directories work when loading data from our local computer since you will most likely be doing it a lot.

DADA2 Pipeline

Overview

Goal: The purpose of this analysis is to obtain an Amplicon Sequence Variant (ASV) table for all of our microbiome-sample example-data.

Input data: We will start with demultiplexed fastq files for all samples. This analysis is for paired-end data. Thus, for each sample, there will be two files, named according to Illumina platform conventions:

Forward-reads, named *_R1_001.fastq
Reverse-reads, named *_R2_001.fastq

Creating the Project

1. Follow this link and download a zipped file of this folder going to this unsophisticated icon in the top right corner and clicking “Download” or if you are not already logged on to your Box account, it will just say “Download”. You will not need a Box account to download this folder.

2. Extract the downloaded zip file to where you want it.

3. Open RStudio and click on New Project in the File tab.

4. Create the new project by choosing ‘Existing Directory’

5. Browse to the directory where you extracted the zip file and make sure ‘Day2’ is the base name in your project directory file path.

You should see the folders(Ranalysis,Rdata,etc..) when you open the Day2 folder.

6. Click create project. You should now see a Day2.Rproj file in the lower right files pane. Double click it to make sure you are within the project. If you are not already within your R project, you will be asked to open it. You can tell you are in your R Project if you see the name of your R Project at the top of your RStudio window.

For more info behind the logic of creating RStudio projects and adhering to an organizational directory-structure as you build your data-analysis skills, see this post on reproducible scientific data-analyses from Software Carpentry. We don’t use exactly the same structure they do, but the concepts are the same: structured analyses make sharing and reproducing analyses much easier!

Tutorial Stucture

Before we begin, let’s take a moment to get organized. The importance of documentation and good record-keeping are essential to producing high-quality and reproducible computational analyses, just as they are at the bench!

We recommend you keep your analyses organized by project (just as we organized this example).

Looking around in the file browser tab of the lower right section, you should find the following folders if you set the project directory correctly:

Rdata: this folder contains our input .fastq.gz files and our input database of 16S-sequences that we’ll use to identify taxa present in our samples.

Ranalysis: this folder contains any scripts we create to analyze our data, like this R-Markdown (.Rmd) document.

Routput: we will direct any output data-files from our analyses to this folder.

Rfigs: we will direct any figures we generate from our analyses to this folder.

Rsource: this folder contains any R source-scripts we create to set up our environment for our analyses–custom functions, which packages to load, etc. etc. You don’t need to worry about this one since we made it for you.

You can think of any files in Rsource as set-up scripts–just load it at the beginning of your session and forget about it.

Setting up the Environment

Now that we are familar with the project, we can set up the environment!

1. Go to the Ranalysis folder in the lower right files pane and open the .Rmd file

2. Make sure your Knit Directory is set to project directory as shown below.

3. Run only the second chunk of code beginning at line 48 by clicking the green arrow within the upper right corner of the chunk. Running this code calls a source script from the RSource folder that installs all of the packages needed to run the tutorial.

This pipeline is written in R Markdown, a file format for making dynamic documents with R. An R Markdown document is written in markdown (an easy-to-write plain text format) and contains chunks of embedded R code. We rendered this R markdown script into an HTML file linked below that shows the results of the code so you can follow along.

Let’s begin the day2 tutorial!