Data Structure in R Language

Posted by discofreakboot on Sat, 11 May 2019 15:18:37 +0200

Data Structure in R Language

In the text, the data structure in R language is summarized and presented by illustration and example.
Mainly includes: vectors, arrays, lists, data boxes, factors, matrices, and some commonly used functions.
Note: The following code can run directly!

1, vector

Vector, a one-dimensional array for storing numeric, character, and logical data
It is impossible to mix data of different modes in the same vector.
Give an example:

Create a vector containing numbers from 1 to 5

a<-c(1:5)
a

Add data

b<-append(a,7)
b

Add vectors (insert at the second position)

c<-append(a,c(8:10),after = 2)
c
a

Summation function

sum(a)
sum(b)
sum(c)

Finding the Maximum Function

max(a)
max(b)
max(c)

Finding the Minimum Function

min(a)
min(b)
min(c)

Finding the Mean Function

mean(a)
mean(b)
mean(c)

Variance calculation

f <- sum((a-mean(a))^2/4)

standard deviation

bb<- sqrt(f)
bb
f

Finding the Variance Function

var(a)
var(b)
var(c)

Finding Standard Deviation Function

sd(a)
sd(b)
sd(c)
Ranking from small to large
From small to large for false
sort(a,decreasing = TRUE)#true is from big to small
reverse
rev(a)
rev(sort(a))

Multiplication of elements in vectors

prod(a)

Factorial

prod(1:5)

2, matrix

Matrix, a two-dimensional array for storing numeric, character, and logical data
It is impossible to mix data of different modes in the same vector.
Give an example:

Create a matrix of three rows and four columns

mat<-matrix(c(1:12),nrow = 3,ncol = 4)
mat

Dimension acquisition

dim(mat)

Each element in a matrix is multiplied by 2

mat_s<-mat*2
mat_s

Conditions for Matrix Multiplication

The column number of the left matrix = the row number of the right matrix
mat2<-matrix(c(1:12),nrow = 4,ncol = 3)
mat2

matrix multiplication

mat %*% mat2

Add column name

colnames(mat) <- c('Serial number','Chinese','English','Mathematics')
mat

Remove the values of one row and two columns

mat[1,2]

Take out the second line

mat[2,]

Take out the second column

mat[,2]

Remove the values of 1,2 rows, 1,2 columns

mat[c(1:2),c(1,2)]

Take out the value of 2,3 rows

mat[c(2,3),]

Logical judgement

Determine whether the second column is greater than 5, return true or false
mat[,2] >=5
Returns rows with the second column greater than or equal to 5
mat[mat[,2]>=5,]
which
Returns a sequence of rows with a fourth column greater than or equal to 12
which(mat[,4]>=12)
Returns rows with a fourth column greater than or equal to 12
mat[which(mat[,4]>=12),]
Returns a sequence of rows with a fourth column greater than or equal to 11
which(mat[,4]>=11)
Returns rows with a fourth column greater than or equal to 11
mat[which(mat[,4]>=11),]

apply

View documents
?apply

Average the data in the second column

mean(mat[,2])

Line operation

apply(mat,1,mean)

Alignment operation

apply(mat,2,mean)

3, array

Arrays, arrays and matrices are similar, but dimensions can be larger than two
Give an example:

Create a three-dimensional array

arr <- array(c(1:24),dim = c(2,3,4))
arr

Get the array dimension

dim(arr)

Get the elements of the first row of the second column in each dimension

arr[1,2,]

Get the first row element for each dimension

arr[1,,]

Using summation function

sum(arr[1,2,])

4. Data Frame

Data boxes can be understood as an array of high latitudes, and different columns can contain different patterns
Give an example:

name<- c('zs','lss','ww')
sex<-c('n','v','v')
age<-c(22,21,23)

Create a data box

dat<-data.frame(name,sex,age)
dat
View data classes
class(dat)#Data class
mode(dat)#Data Category (including Character Type, Numeric Type, Logic)
typeof(dat)#Data subclasses (including floating-point, int...)

Get the value of the third column in row 1

dat[1,3]

Get the third value from the column name

dat$age[3]

Add column

dat$score<-c(98,99,80)
dat

The line with a score of 99

dat[which(dat$score==99),]

Create a matrix

dat2<-matrix(c(1:12),nrow = 3)
dat2

To convert a matrix into a data box, the data box must have a column name

dat3<-as.data.frame(dat2)
dat3

Get a column of data by column name

dat3$V1

Connect

dat
dat3

dat0<-data.frame(name=name,weight=c(60:62))
dat0

Data Box Connection

dat0$age<-c(20:22)
dat0
dat01<-merge(dat,dat0,by.x = 'age',by.y = 'age')
dat01[which(dat01$name.x==dat01$name.y),]

merge

dat
dat0

# There is a mistake here. Look for it.

Merge by row, column number must be the same

dat_r<-rbind(dat,dat0)
dat_r

Merge by column, the number of rows must be the same

dat_c<-cbind(dat,dat0)
dat_c

Rename column names

names(dat0)<-c('name','score','agr')
dat0
dat0$sex<-c(23,23,24)
dat0

Returns a list

lapply(dat$age,sum)

Return a vector

sapply(dat0$score,sum)

5, factor

Factor, nominal variable and ordered variable are called factors in R.
Give an example:

Create a factor

a<-factor(c('A','B','C','C','A'))
a

Possible levels of levels Factor

#Label factor label
#exclude horizontal values removed from vectors
b<-factor(c('A','B','C','D','C','A'),levels = c('A','B','C','D'),labels = c('A Cup','B Cup','C Cup','D Cup'))
b

colour<-c('G','G','R','R','Y','G','G','Y','G','R','G')
col<- factor(colour)
col
col1<-factor(colour,levels = c('G','R','Y'),labels = c('Green','Red','Yellow'))
col1
col2<-factor(colour,levels = c('G','R','Y'),labels = c('1','2','3'))
col2

class(col2)
typeof(col2)
mode(col2)

Convert to a vector

as.vector(col2)

#Ordered creates an ordered factor
score<-c('A','B','A','C','B')
score1<-ordered(score)
score1
score1<-ordered(score,levels=c('C','B','A'))
score1

#cut (creating ordered variables)
exam<-c(98, 97, 52, 88, 85, 75, 97, 92, 77, 74, 70, 63, 97, 71, 98, 65, 79, 74, 58, 59, 60, 63, 87, 82, 95, 75, 79, 96, 50, 88)
exam1<-cut(exam,breaks = 3)#Divided into three groups.
exam1
#Interval step length
(max(exam)-min(exam))/3

#Common function tapply()
gender<-c('f','m','m','m','f')
age<-c(12,35,32,34,25)
#tapply(vector, index, function)
tapply(age, gender, mean)

6, list

A list is an ordered collection of objects, including vectors, matrices, data boxes, or other lists.
Give an example:

Lists can contain numbers, strings, vectors...

a<-2
b<-'abc'
c<-c(1:4)
l<-list(a,b,c)
l
l[[2]]
l[[3]]

List index

l<-list(a=a,b=b,c=c)
l
l$c

Binding list

attach(l)
a
c

Conversion to vectors

l<-unlist(l)
l
l[5]

Computation operation

x<-list(a=1,b=c(1:4),c=c(3:6))
x

Returns a list

lapply(x,sum)
lapply(x,mean)

lapply(x,sum)[[2]]

Return a vector

sapply(x,sum)
sapply(x,mean)
sapply(x,mean)[2]
sapply(x,mean)

Topics: Big Data R Language