Tutorial 1: Introdcution to R Programming
========================================================
author: Mr. Zebin Yang
date: Sep. 11, 2018
autosize: true
width: 2400
height: 1000
* I am a year 2 PhD Student at SaaS, studying machine learning.
* Office Hour: Thursday 13pm-16pm
* Location: RR114
========================================================
**Welcome to R!**
* [R website: https://cran.r-project.org](https://cran.r-project.org)
* R is a freely available programming language and is also the most widely used statistics programming language.
* The core of R is an interpreted computer language which allows branching and looping as well as modular programming using functions.
* R has the state-of-art graphics capability.
**Intended Learning Outcomes**
* Learn R's basic data structure and syntax.
* Capable to use R for simple data processing.
**Practice**
* datacamp
https://campus.datacamp.com/courses/free-introduction-to-r/
========================================================
**"Hello, World!" program**
* R Command Prompt
It is easy to start your R command prompt by just typing commands at console. For example
```{r}
myString <- "Hello, World!"
print(myString)
```
========================================================
**"Hello, World!" program**
* R Script File
Suppose we have a file "test.R" contatining the following codes.
myString <- "Hello, World!"
print(myString)
Then we can run this file in the command line by:
Rscript test.R
========================================================
**Loading Data**
* Check & reset working directory
```{r}
getwd()
setwd("../") # redirecting to a new directory
```
* Load data from csv or txt file
```{r}
df = read.csv("City_df.csv")
df = read.table("MyData.txt",sep=',',header=TRUE)
head(df, 2)
```
========================================================
**Loading Data**
* R built-in dataset
```{r, width=10}
head(iris)
summary(iris)
```
========================================================
**If Else Control**
An if statement can be followed by an optional else if...else statement, which is very useful to test various conditions using single if...else if statement.
```{r, width=10}
x <- c("what","is","truth")
if("Truth" %in% x) {
print("Truth is found the first time")
} else if ("truth" %in% x) {
print("truth is found the second time")
} else {
print("No truth found")
}
```
========================================================
**Loops Control**
* while loop
Repeats a statement or group of statements while a given condition is true. It tests the condition before executing the loop body.
* for loop
Like a while statement, except that it tests the condition at the end of the loop body.
* repeat loop
Executes a sequence of statements multiple times and abbreviates the code that manages the loop variable.
```{r, width=10}
v <- c("Hello","loop")
cnt <- 2
repeat {
print(v)
cnt <- cnt+1
if(cnt > 5) {
break
}
}
```
========================================================
**Function**
A function is a set of statements organized together to perform a specific task. R has a large number of in-built functions and the user can create their own functions.
* Function Name: This is the actual name of the function. It is stored in R environment as an object with this name.
* Arguments: An argument is a placeholder. When a function is invoked, you pass a value to the argument. Arguments are optional; that is, a function may contain no arguments. Also arguments can have default values.
* Function Body: The function body contains a collection of statements that defines what the function does.
* Return Value: The return value of a function is the last expression in the function body to be evaluated.
```{r, width=10}
# Create a function with arguments.
new.function <- function(a = 3, b = 6) {
print(a * b)
}
# Call the function without giving any argument.
new.function()
# Call the function with giving new values of the argument.
new.function(a = 9,5)
```
========================================================
**Packages**
R packages are a collection of R functions, complied code and sample data. They are stored under a directory called "library" in the R environment.
```{r, width=10}
#Get library locations containing R packages
.libPaths()
```
* Install package
install.packages("Package Name")
* Load Package to Library
library("package Name", lib.loc = "path to library")
========================================================
**Data Types**
* Atomic data types
| No.| Examples |
|----|---------------------------------------------------------------- |
| Logical | TRUE, FALSE |
| Numeric | 12.3, 5, 999 |
| Integer | 2L, 34L, 0L |
| Character | 'a' , '"good", "TRUE", '23.4' |
| Complex | 3 + 2i |
| Raw | "Hello" is stored as 48 65 6c 6c 6f |
========================================================
**Data Types**
* R-objects
* Vectors
* Lists
* Matrices
* Arrays
* Factors
* Data Frames
========================================================
**Vectors**
Vector is the very basic the R-object, which hold elements of different classes.
* Use c() function
```{r}
x = c("aa","nn","cc"); x
x = 1:10; x
```
* Use seq() function
```{r}
seq(1, 9, by=2) # From 1 to 19, with step 2
seq(1, 19, length=10) # From 1 to 19, with length 10 (step = (19-1)/(10-1) )
```
========================================================
**Vectors**
* Use rep() function
```{r}
rep(1:4,2) # Repeated the whole vector twice
rep(1:4,c(2,1,2,1)) # Each element is repeated individually, and the second para determines the repeated number of each element.
rep(1:4,each=2,len=10) # Each element is repeated individually, and the para "each" means that each element is repeated twice.
rep(1:4,each=2,times=3) # The para "time" determines this procedure is repeated 3 times.
```
========================================================
**Vectors**
* Basic operations of vectors
```{r}
# Length
x = c(3:7)
length(x)
# Naming
names(x) = c(letters[1:length(x)])
x
n = c(2,3,5); s = c("aa","bb","cc","dd","ee")
c(n, s) # Combine Vectors
s[-4] #Exclude Elements
```
========================================================
**Vectors**
* Vector arithmetic
```{r}
a = c(1,3,5,7)
b = c(1,2,4,8)
5*a
a + b
```
========================================================
**Vectors**
* Vector arithmetic
```{r}
a/b
c(sum(a), mean(a), sd(a))
```
========================================================
**Factors**
Factors are the r-objects which are created using a vector. It stores the vector along with the distinct values of the elements in the vector as labels.
```{r}
data = c("East","West","East","North","North"); class(data)
factor_data = factor(data)
class(factor_data)
levels(factor_data)
```
========================================================
**Lists**
A list is an R-object which can contain many different types of elements inside it like vectors, functions and even another list inside it
```{r}
# Create a list.
list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
# Show the list.
print(list_data)
```
========================================================
**Lists**
```{r}
# Add element at the end of the list.
list_data[4] <- "New element"
print(list_data[4])
# Remove the last element.
list_data[4] <- NULL
list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue")
# Convert the lists to vectors.
v1 <- unlist(list1)
v2 <- unlist(list2)
c(v1,v2)
```
========================================================
**Lists**
* Merging Lists
```{r}
# You can merge many lists into one list by placing all the lists inside one list() function.
list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue")
# Merge the two lists.
merged.list <- c(list1,list2)
print(merged.list)
```
========================================================
**Matrix**
A matrix is a two-dimensional rectangular data set. It can be created using a vector input to the matrix function.
* Construct Matrices
```{r}
A = matrix(c(2,4,3,1,5,7), #the data elements
nrow =2, #number of rows
ncol = 3, #number of columns
byrow = TRUE) #fill by row first
rownames(A) = c("row1","row2")
colnames(A) = c("col1","col2","col3")
A
```
========================================================
**Matrix**
* Submatrix
```{r}
A[,c(1,3)] #first and third columns
A[,-2] #all columns except the second
```
========================================================
**Matrix**
* Combine existing matrices
```{r}
B = matrix(c(2,4,3,1,5,7),nrow=3,ncol=2)
C = matrix(c(7,4,2),nrow=3,ncol=1)
D = matrix(c(6,2),nrow=1,ncol=2)
cbind(B,C) # Column Bind
rbind(B,D) # Row Bind
```
========================================================
**Matrix**
* Useful matrix operations
```{r}
x = matrix(1:4,2,2)
c(nrow(x), ncol(x))# dimensions of x; could also use dim(x)
c(rowSums(x), colSums(x), rowMeans(x), colMeans(x))# sums
```
========================================================
**Arrays**
Matrix is confined to two dimensions, and arrays can be of any number of dimensions.
```{r}
column.names <- c("COL1","COL2","COL3")
row.names <- c("ROW1","ROW2","ROW3")
matrix.names <- c("Matrix1","Matrix2")
a <- array(c('green','yellow'),dim = c(3,3,2))
# Take these vectors as input to the array.
result <- array(c('green','yellow'),dim = c(3,3,2),dimnames = list(row.names, column.names, matrix.names))
print(result)
```
========================================================
**Arrays**
```{r}
# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
# Take these vectors as input to the array.
new.array <- array(c(vector1,vector2),dim = c(3,3,2))
print(new.array)
```
========================================================
**Arrays**
* Calculations Across Array Elements
apply(x, margin, fun)
* x is an array.
* margin is the name of the data set used.
* fun is the function to be applied across the elements of the array.
```{r}
# Use apply to calculate the sum of the rows across all the matrices.
result <- apply(new.array, c(1), sum)
print(result)
```
========================================================
**Data Frame**
Unlike a matrix in data frame each column can contain different modes of data. The first column can be numeric while the second column can be character and third column can be logical.
* Construct Data Frame
```{r}
df = data.frame(col1 = c("this","is","text"), col2 = c(FALSE, TRUE, FALSE), col3 = c(2,0,4))
colnames(df) = c("Text", "isVerbal", "Length")
rownames(df) = c("Word1","Word2","Word3")
head(df)
```
* Convert a matrix to data.frame
```{r}
cc = matrix(1:6,3,2)
df = as.data.frame(cc)
df
```
========================================================
**Data Frame**
* print a data.frame
```{r}
head(df, 6)
summary(df)
```
========================================================
**Data Frame**
* Adding columns
```{r}
df$newx = c("IT","HR","Finance")
head(df)
```
* Removing columns
```{r}
df$newx = NULL
head(df)
```
========================================================
**Thanks**
* Have a good day!