# Data Structure in R Language

Posted by discofreakboot on Sat, 11 May 2019 15:18:37 +0200

# Data Structure in R Language

In the text, the data structure in R language is summarized and presented by illustration and example.
Mainly includes: vectors, arrays, lists, data boxes, factors, matrices, and some commonly used functions.
Note: The following code can run directly!

## 1, vector

Vector, a one-dimensional array for storing numeric, character, and logical data
It is impossible to mix data of different modes in the same vector.
Give an example:

#### Create a vector containing numbers from 1 to 5

```a<-c(1:5)
a
```

```b<-append(a,7)
b
```

#### Add vectors (insert at the second position)

```c<-append(a,c(8:10),after = 2)
c
a
```

#### Summation function

```sum(a)
sum(b)
sum(c)
```

#### Finding the Maximum Function

```max(a)
max(b)
max(c)
```

#### Finding the Minimum Function

```min(a)
min(b)
min(c)
```

#### Finding the Mean Function

```mean(a)
mean(b)
mean(c)
```

#### Variance calculation

```f <- sum((a-mean(a))^2/4)
```

#### standard deviation

```bb<- sqrt(f)
bb
f
```

#### Finding the Variance Function

```var(a)
var(b)
var(c)
```

#### Finding Standard Deviation Function

```sd(a)
sd(b)
sd(c)
```
##### From small to large for false
```sort(a,decreasing = TRUE)#true is from big to small
```
##### reverse
```rev(a)
rev(sort(a))
```

#### Multiplication of elements in vectors

```prod(a)
```

#### Factorial

```prod(1:5)
```

## 2, matrix

Matrix, a two-dimensional array for storing numeric, character, and logical data
It is impossible to mix data of different modes in the same vector.
Give an example:

#### Create a matrix of three rows and four columns

```mat<-matrix(c(1:12),nrow = 3,ncol = 4)
mat
```

#### Dimension acquisition

```dim(mat)
```

#### Each element in a matrix is multiplied by 2

```mat_s<-mat*2
mat_s
```

#### Conditions for Matrix Multiplication

##### The column number of the left matrix = the row number of the right matrix
```mat2<-matrix(c(1:12),nrow = 4,ncol = 3)
mat2
```

#### matrix multiplication

```mat %*% mat2
```

```colnames(mat) <- c('Serial number','Chinese','English','Mathematics')
mat
```

#### Remove the values of one row and two columns

```mat[1,2]
```

#### Take out the second line

```mat[2,]
```

#### Take out the second column

```mat[,2]
```

#### Remove the values of 1,2 rows, 1,2 columns

```mat[c(1:2),c(1,2)]
```

#### Take out the value of 2,3 rows

```mat[c(2,3),]
```

#### Logical judgement

##### Determine whether the second column is greater than 5, return true or false
```mat[,2] >=5
```
##### Returns rows with the second column greater than or equal to 5
```mat[mat[,2]>=5,]
```
##### Returns a sequence of rows with a fourth column greater than or equal to 12
```which(mat[,4]>=12)
```
##### Returns rows with a fourth column greater than or equal to 12
```mat[which(mat[,4]>=12),]
```
##### Returns a sequence of rows with a fourth column greater than or equal to 11
```which(mat[,4]>=11)
```
##### Returns rows with a fourth column greater than or equal to 11
```mat[which(mat[,4]>=11),]
```

### apply

##### View documents
```?apply
```

#### Average the data in the second column

```mean(mat[,2])
```

#### Line operation

```apply(mat,1,mean)
```

#### Alignment operation

```apply(mat,2,mean)
```

### 3, array

Arrays, arrays and matrices are similar, but dimensions can be larger than two
Give an example:

#### Create a three-dimensional array

```arr <- array(c(1:24),dim = c(2,3,4))
arr
```

#### Get the array dimension

```dim(arr)
```

#### Get the elements of the first row of the second column in each dimension

```arr[1,2,]
```

#### Get the first row element for each dimension

```arr[1,,]
```

#### Using summation function

```sum(arr[1,2,])
```

### 4. Data Frame

Data boxes can be understood as an array of high latitudes, and different columns can contain different patterns
Give an example:

```name<- c('zs','lss','ww')
sex<-c('n','v','v')
age<-c(22,21,23)
```

#### Create a data box

```dat<-data.frame(name,sex,age)
dat
```
##### View data classes
```class(dat)#Data class
mode(dat)#Data Category (including Character Type, Numeric Type, Logic)
typeof(dat)#Data subclasses (including floating-point, int...)
```

#### Get the value of the third column in row 1

```dat[1,3]
```

#### Get the third value from the column name

```dat\$age
```

```dat\$score<-c(98,99,80)
dat
```

#### The line with a score of 99

```dat[which(dat\$score==99),]
```

#### Create a matrix

```dat2<-matrix(c(1:12),nrow = 3)
dat2
```

#### To convert a matrix into a data box, the data box must have a column name

```dat3<-as.data.frame(dat2)
dat3
```

#### Get a column of data by column name

```dat3\$V1
```

#### Connect

```dat
dat3

dat0<-data.frame(name=name,weight=c(60:62))
dat0
```

#### Data Box Connection

```dat0\$age<-c(20:22)
dat0
dat01<-merge(dat,dat0,by.x = 'age',by.y = 'age')
dat01[which(dat01\$name.x==dat01\$name.y),]
```

#### merge

```dat
dat0
```

# There is a mistake here. Look for it.

#### Merge by row, column number must be the same

```dat_r<-rbind(dat,dat0)
dat_r
```

#### Merge by column, the number of rows must be the same

```dat_c<-cbind(dat,dat0)
dat_c
```

#### Rename column names

```names(dat0)<-c('name','score','agr')
dat0
dat0\$sex<-c(23,23,24)
dat0
```

#### Returns a list

```lapply(dat\$age,sum)
```

#### Return a vector

```sapply(dat0\$score,sum)
```

### 5, factor

Factor, nominal variable and ordered variable are called factors in R.
Give an example:

#### Create a factor

```a<-factor(c('A','B','C','C','A'))
a
```

#### Possible levels of levels Factor

```#Label factor label
#exclude horizontal values removed from vectors
b<-factor(c('A','B','C','D','C','A'),levels = c('A','B','C','D'),labels = c('A Cup','B Cup','C Cup','D Cup'))
b

colour<-c('G','G','R','R','Y','G','G','Y','G','R','G')
col<- factor(colour)
col
col1<-factor(colour,levels = c('G','R','Y'),labels = c('Green','Red','Yellow'))
col1
col2<-factor(colour,levels = c('G','R','Y'),labels = c('1','2','3'))
col2

class(col2)
typeof(col2)
mode(col2)
```

#### Convert to a vector

```as.vector(col2)

#Ordered creates an ordered factor
score<-c('A','B','A','C','B')
score1<-ordered(score)
score1
score1<-ordered(score,levels=c('C','B','A'))
score1

#cut (creating ordered variables)
exam<-c(98, 97, 52, 88, 85, 75, 97, 92, 77, 74, 70, 63, 97, 71, 98, 65, 79, 74, 58, 59, 60, 63, 87, 82, 95, 75, 79, 96, 50, 88)
exam1<-cut(exam,breaks = 3)#Divided into three groups.
exam1
#Interval step length
(max(exam)-min(exam))/3

#Common function tapply()
gender<-c('f','m','m','m','f')
age<-c(12,35,32,34,25)
#tapply(vector, index, function)
tapply(age, gender, mean)
```

### 6, list

A list is an ordered collection of objects, including vectors, matrices, data boxes, or other lists.
Give an example:

```a<-2
b<-'abc'
c<-c(1:4)
l<-list(a,b,c)
l
l[]
l[]
```

#### List index

```l<-list(a=a,b=b,c=c)
l
l\$c
```

#### Binding list

```attach(l)
a
c
```

#### Conversion to vectors

```l<-unlist(l)
l
l
```

#### Computation operation

```x<-list(a=1,b=c(1:4),c=c(3:6))
x
```

#### Returns a list

```lapply(x,sum)
lapply(x,mean)

lapply(x,sum)[]
```

#### Return a vector

```sapply(x,sum)
sapply(x,mean)
sapply(x,mean)
sapply(x,mean)
```

Topics: Big Data R Language