Basic_R

Data Types

R has a wide variety of data types including scalars, vectors, matrices, data frames and lists.

Creating new variables

a <- 1
b <- 2
c <- a + b

Vectors

a <- c(1,2,5.3,6,-2,4) # numeric vector
b <- c("one","two","three") # character vector
c <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE) #logical vector

Refer to elements of a vector using subscripts

## [1]  1.0  2.0  5.3  6.0 -2.0  4.0

a[c(2,4)] #2nd and 4th elements of vector

## [1] 2 6

Matrix

All columns in a matrix must have the same mode(numeric, character, etc.) and the same length. The general format is

mymatrix <- matrix(vector, nrow=r, ncol=c, byrow=FALSE, dimnames=list(char_vector_rownames, char_vector_colnames))

byrow=TRUE indicates that the matrix should be filled by rows. byrow=FALSE indicates that the matrix should be filled by columns (the default). dimnames provides optional labels for the columns and rows.

# generates 5 x 4 numeric matrix 
y<-matrix(1:20, nrow=5,ncol=4)

# another example
cells <- c(1,26,24,68)
rnames <- c("R1", "R2")
cnames <- c("C1", "C2") 
mymatrix <- matrix(cells, nrow=2, ncol=2, byrow=TRUE,
  dimnames=list(rnames, cnames))

mymatrix

##    C1 C2
## R1  1 26
## R2 24 68

Identify rows, columns or elements using subscripts.

x <- matrix(1:16,nrow = 4,ncol = 4)
x

##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16

x[,4] # 4th column of matrix

## [1] 13 14 15 16

x[3,] # 3rd row of matrix

## [1]  3  7 11 15

x[2:4,1:3] # rows 2,3,4 of columns 1,2,3

##      [,1] [,2] [,3]
## [1,]    2    6   10
## [2,]    3    7   11
## [3,]    4    8   12

Data Frames

A data frame is more general than a matrix, in that different columns can have different modes (numeric, character, factor, etc.). This is similar to SAS and SPSS datasets.

expression <- runif(4) # random sampling four values from 0 to 1
gene <- c("gene1", "gene2", "gene3", "gene4")
f <- c(TRUE,TRUE,TRUE,FALSE)
mydata <- data.frame(Gene       = gene, 
                     Expression = expression, 
                     pick       =f)
print(mydata)

##    Gene Expression  pick
## 1 gene1  0.9121338  TRUE
## 2 gene2  0.5403839  TRUE
## 3 gene3  0.1790798  TRUE
## 4 gene4  0.2871407 FALSE

List

An ordered collection of objects (components). A list allows you to gather a variety of (possibly unrelated) objects under one name.

# example of a list with 4 components - 
# a string, a numeric vector, a matrix, and a scaler 
w <- list(name = 'Marcus',  days = c('2ed', '4th', '20th', '28th'),
          exp = expression, height = '177cm')
w

## $name
## [1] "Marcus"
## 
## $days
## [1] "2ed"  "4th"  "20th" "28th"
## 
## $exp
## [1] 0.9121338 0.5403839 0.1790798 0.2871407
## 
## $height
## [1] "177cm"

Identify elements of a list using the [[]] convention.

w[[3]] # 3rd component of the list

## [1] 0.9121338 0.5403839 0.1790798 0.2871407

w[["exp"]] # component named exp in list

## [1] 0.9121338 0.5403839 0.1790798 0.2871407

Factor

Tell R that a variable is nominal by making it a factor. The factor stores the nominal values as a vector of integers in the range [ 1… k ] (where k is the number of unique values in the nominal variable), and an internal vector of character strings (the original values) mapped to these integers.

# variable gender with 20 "male" entries and 
# 30 "female" entries 
gender <- c(rep("male",20), rep("female", 30)) 
gender <- factor(gender) 
# stores gender as 20 1s and 30 2s and associates
# 1=female, 2=male internally (alphabetically)
# R now treats gender as a nominal variable 
summary(gender)

## female   male 
##     30     20

Useful Functions

length(expression) # number of elements or components

## [1] 4

str(expression)    # structure of an object

##  num [1:4] 0.912 0.54 0.179 0.287

class(expression)  # class or type of an object

## [1] "numeric"

names(expression)  # names

## NULL

x1 <- 1:3; y1 <- 1:3
c(x1,y1)       # combine objects into a vector

## [1] 1 2 3 1 2 3

cbind(x1,y1)   # combine objects as columns

##      x1 y1
## [1,]  1  1
## [2,]  2  2
## [3,]  3  3

rbind(x1,y1)   # combine objects as rows

##    [,1] [,2] [,3]
## x1    1    2    3
## y1    1    2    3

x1     # prints the object

## [1] 1 2 3

ls()       # list current objects

##  [1] "a"          "b"          "c"          "cells"      "cnames"    
##  [6] "expression" "f"          "gender"     "gene"       "mydata"    
## [11] "mymatrix"   "rnames"     "w"          "x"          "x1"        
## [16] "y"          "y1"

rm(x1) # delete an object