Data Structure in R Language
In the text, the data structure in R language is summarized and presented by illustration and example.
Mainly includes: vectors, arrays, lists, data boxes, factors, matrices, and some commonly used functions.
Note: The following code can run directly!
1, vector
Vector, a one-dimensional array for storing numeric, character, and logical data
It is impossible to mix data of different modes in the same vector.
Give an example:
Create a vector containing numbers from 1 to 5
a<-c(1:5) a
Add data
b<-append(a,7) b
Add vectors (insert at the second position)
c<-append(a,c(8:10),after = 2) c a
Summation function
sum(a) sum(b) sum(c)
Finding the Maximum Function
max(a) max(b) max(c)
Finding the Minimum Function
min(a) min(b) min(c)
Finding the Mean Function
mean(a) mean(b) mean(c)
Variance calculation
f <- sum((a-mean(a))^2/4)
standard deviation
bb<- sqrt(f) bb f
Finding the Variance Function
var(a) var(b) var(c)
Finding Standard Deviation Function
sd(a) sd(b) sd(c)
Ranking from small to large
From small to large for false
sort(a,decreasing = TRUE)#true is from big to small
reverse
rev(a) rev(sort(a))
Multiplication of elements in vectors
prod(a)
Factorial
prod(1:5)
2, matrix
Matrix, a two-dimensional array for storing numeric, character, and logical data
It is impossible to mix data of different modes in the same vector.
Give an example:
Create a matrix of three rows and four columns
mat<-matrix(c(1:12),nrow = 3,ncol = 4) mat
Dimension acquisition
dim(mat)
Each element in a matrix is multiplied by 2
mat_s<-mat*2 mat_s
Conditions for Matrix Multiplication
The column number of the left matrix = the row number of the right matrix
mat2<-matrix(c(1:12),nrow = 4,ncol = 3) mat2
matrix multiplication
mat %*% mat2
Add column name
colnames(mat) <- c('Serial number','Chinese','English','Mathematics') mat
Remove the values of one row and two columns
mat[1,2]
Take out the second line
mat[2,]
Take out the second column
mat[,2]
Remove the values of 1,2 rows, 1,2 columns
mat[c(1:2),c(1,2)]
Take out the value of 2,3 rows
mat[c(2,3),]
Logical judgement
Determine whether the second column is greater than 5, return true or false
mat[,2] >=5
Returns rows with the second column greater than or equal to 5
mat[mat[,2]>=5,]
which
Returns a sequence of rows with a fourth column greater than or equal to 12
which(mat[,4]>=12)
Returns rows with a fourth column greater than or equal to 12
mat[which(mat[,4]>=12),]
Returns a sequence of rows with a fourth column greater than or equal to 11
which(mat[,4]>=11)
Returns rows with a fourth column greater than or equal to 11
mat[which(mat[,4]>=11),]
apply
View documents
?apply
Average the data in the second column
mean(mat[,2])
Line operation
apply(mat,1,mean)
Alignment operation
apply(mat,2,mean)
3, array
Arrays, arrays and matrices are similar, but dimensions can be larger than two
Give an example:
Create a three-dimensional array
arr <- array(c(1:24),dim = c(2,3,4)) arr
Get the array dimension
dim(arr)
Get the elements of the first row of the second column in each dimension
arr[1,2,]
Get the first row element for each dimension
arr[1,,]
Using summation function
sum(arr[1,2,])
4. Data Frame
Data boxes can be understood as an array of high latitudes, and different columns can contain different patterns
Give an example:
name<- c('zs','lss','ww') sex<-c('n','v','v') age<-c(22,21,23)
Create a data box
dat<-data.frame(name,sex,age) dat
View data classes
class(dat)#Data class mode(dat)#Data Category (including Character Type, Numeric Type, Logic) typeof(dat)#Data subclasses (including floating-point, int...)
Get the value of the third column in row 1
dat[1,3]
Get the third value from the column name
dat$age[3]
Add column
dat$score<-c(98,99,80) dat
The line with a score of 99
dat[which(dat$score==99),]
Create a matrix
dat2<-matrix(c(1:12),nrow = 3) dat2
To convert a matrix into a data box, the data box must have a column name
dat3<-as.data.frame(dat2) dat3
Get a column of data by column name
dat3$V1
Connect
dat dat3 dat0<-data.frame(name=name,weight=c(60:62)) dat0
Data Box Connection
dat0$age<-c(20:22) dat0 dat01<-merge(dat,dat0,by.x = 'age',by.y = 'age') dat01[which(dat01$name.x==dat01$name.y),]
merge
dat dat0
# There is a mistake here. Look for it.
Merge by row, column number must be the same
dat_r<-rbind(dat,dat0) dat_r
Merge by column, the number of rows must be the same
dat_c<-cbind(dat,dat0) dat_c
Rename column names
names(dat0)<-c('name','score','agr') dat0 dat0$sex<-c(23,23,24) dat0
Returns a list
lapply(dat$age,sum)
Return a vector
sapply(dat0$score,sum)
5, factor
Factor, nominal variable and ordered variable are called factors in R.
Give an example:
Create a factor
a<-factor(c('A','B','C','C','A')) a
Possible levels of levels Factor
#Label factor label #exclude horizontal values removed from vectors b<-factor(c('A','B','C','D','C','A'),levels = c('A','B','C','D'),labels = c('A Cup','B Cup','C Cup','D Cup')) b colour<-c('G','G','R','R','Y','G','G','Y','G','R','G') col<- factor(colour) col col1<-factor(colour,levels = c('G','R','Y'),labels = c('Green','Red','Yellow')) col1 col2<-factor(colour,levels = c('G','R','Y'),labels = c('1','2','3')) col2 class(col2) typeof(col2) mode(col2)
Convert to a vector
as.vector(col2) #Ordered creates an ordered factor score<-c('A','B','A','C','B') score1<-ordered(score) score1 score1<-ordered(score,levels=c('C','B','A')) score1 #cut (creating ordered variables) exam<-c(98, 97, 52, 88, 85, 75, 97, 92, 77, 74, 70, 63, 97, 71, 98, 65, 79, 74, 58, 59, 60, 63, 87, 82, 95, 75, 79, 96, 50, 88) exam1<-cut(exam,breaks = 3)#Divided into three groups. exam1 #Interval step length (max(exam)-min(exam))/3 #Common function tapply() gender<-c('f','m','m','m','f') age<-c(12,35,32,34,25) #tapply(vector, index, function) tapply(age, gender, mean)
6, list
A list is an ordered collection of objects, including vectors, matrices, data boxes, or other lists.
Give an example:
Lists can contain numbers, strings, vectors...
a<-2 b<-'abc' c<-c(1:4) l<-list(a,b,c) l l[[2]] l[[3]]
List index
l<-list(a=a,b=b,c=c) l l$c
Binding list
attach(l) a c
Conversion to vectors
l<-unlist(l) l l[5]
Computation operation
x<-list(a=1,b=c(1:4),c=c(3:6)) x
Returns a list
lapply(x,sum) lapply(x,mean) lapply(x,sum)[[2]]
Return a vector
sapply(x,sum) sapply(x,mean) sapply(x,mean)[2] sapply(x,mean)