Catalogue: Operations on vectors, matrices, and lists
- Vector computing
- Recirculation
- Basic functions
- Information on the overall structure
- Consolidated form
- Function apply ()
- Function sweep ()
- Function aggregate ()
- Function transform ()
- Circulation of lists
First: Operations of vectors, matrices, and lists
1: Vector Operations
Vectors and matrices can be calculated in R.
2: Recycling
When an operation is performed on two vectors of different lengths, R completes the shortest vector by repeatedly using the value of the vector.
For example:
3: Fundamental functions
- Length (): Returns the length of a vector.
> length(c(1,2,3,4,5)) [1] 5
- sort (): sort the elements of a vector incrementally or progressively.
> sort(c(1,4,2,9,7,6)) [1] 1 2 4 6 7 9 > sort(c(1,3,6,2,7,4,8,1,0),decreasing = TRUE) [1] 8 7 6 4 3 2 1 1 0
- rev (): Rearrange elements of a vector in reverse order
- order (), rank (): The first function returns the position of each element in the original vector in the form of a vector, and the second function returns the vector formed by the rank number of each element. If some elements have the same values, the sorting always proceeds according to the left-to-right principle.
- unique (): Removes elements that recur in vectors.
- duplicated (): Determine whether each element in the vector (TRUE, FALSE) has already appeared in the element before it.
3.1. Information on the overall structure
- dim (): Dimensions of matrices or data boxes
- nrow (): number of row s
- ncol (): number of columns
- dimnames (): the names of rows and columns (in the form of lists)
- names (), colnames (): the name of the column
- rownames (): the name of the row.
3.2 merging tables
- Merge cbind () by column
> cbind(1:4,5:8) [,1] [,2] [1,] 1 5 [2,] 2 6 [3,] 3 7 [4,] 4 8
But this merge is not optimal, especially when two tables are merged, there will be duplicates. In this case, a very useful function is merge ()
By default, the merge () function merges two data boxes, making X and Y the data boxes we want to merge, and Z the merged data boxes of X and Y. Be careful,
The merge is based on columns with the same name in the two data boxes. We call these "common columns". The parameter by can be used to specify which columns are (mandatory)
Collective. The value of the parameter can be a name vector, an index vector or a logical value vector. All other columns will be treated as separate columns by merge (), although they have the same name. The function merge () works as follows:
(1) For each row (individual) of data box x, the function merge () compares all elements of this row with those of each row of Y, but only limited to a subset of the common column.
(2) If a perfect match is found, he will consider it to be the same entity, which is added to z (as a row) and filled in the values from the non-common columns of x and y.
(3) If no perfect match is found, the individual is either added to z and filled with NA [if all() is TRUE], or removed directly [all() is FALSE].
(4) Repeat the above operation on the next line until the last action.
ex:
> x<-data.frame(gender=c("f","m","m","f"),height=c(165,182,178,160),weight=c(50,65,67,55),income=c(80,90,60,150)) > y<-data.frame(gender=c("f","m","m","f"),height=c(165,182,178,160),weight=c(55,65,67,85),income=c(70,90,40,40),row.names=4:7) > x gender height weight income 1 f 165 50 80 2 m 182 65 90 3 m 178 67 60 4 f 160 55 150 > y gender height weight income 4 f 165 55 70 5 m 182 65 90 6 m 178 67 40 7 f 160 85 40 > merge(x,y,by=c("gender","height")) gender height weight.x income.x weight.y income.y 1 f 160 55 150 85 40 2 f 165 50 80 55 70 3 m 178 67 60 67 40 4 m 182 65 90 65 90
> merge(x,y,by=c("gender","weight")) gender weight height.x income.x height.y income.y 1 f 55 160 150 165 70 2 m 65 182 90 182 90 3 m 67 178 60 178 40 > merge(x,y,by=c("gender","weight"),all=TRUE) gender weight height.x income.x height.y income.y 1 f 50 165 80 NA NA 2 f 55 160 150 165 70 3 f 85 NA NA 160 40 4 m 65 182 90 182 90 5 m 67 178 60 178 40 > merge(x,y,by=c("row.names","weight")) Row.names weight gender.x height.x income.x gender.y height.y income.y 1 4 55 f 160 150 f 165 70 > merge(x,y,by=c("row.names","weight"),all=TRUE) Row.names weight gender.x height.x income.x gender.y height.y income.y 1 1 50 f 165 80 <NA> NA NA 2 2 65 m 182 90 <NA> NA NA 3 3 67 m 178 60 <NA> NA NA 4 4 55 f 160 150 f 165 70 5 5 65 <NA> NA NA m 182 90 6 6 67 <NA> NA NA m 178 40 7 7 85 <NA> NA NA f 160 40
Note: When identifying common individuals, the function merge () defaults to not consider the names of individuals in data boxes x and y. To include the names of individuals, you can choose to add an ID column to x and y to identify individuals, or use the row name "row.names" as the value of the parameter by.
- Merge line
rbind () for general functions
3.3. Function apply ()
apply () is a commonly used function that can calculate all rows (MARGIN=1) or all columns (MARGIN=2) of a matrix or data box using another given function (specified by the value of the parameter FUN).
> x<-matrix(c(1:4,1,6:8),nrow=2) > x [,1] [,2] [,3] [,4] [1,] 1 3 1 7 [2,] 2 4 6 8 > apply(x,MARGIN=1,FUN=mean) [1] 3 5 > apply(x,MARGIN=2,FUN=mean) [1] 1.5 3.5 3.5 7.5
Tip: When an operation or operation is to aggregate or average rows or columns, you can also use functions directly: rowSums(),colSums(),rowMeans(),colMeans().
3.4 function sweep ()
The sweep () function is used to "clean up" a statistical value (specified by the value of the parameter STATS) from each row of a table (MARGIN=1) or column (MARGIN=2).
> x [,1] [,2] [,3] [,4] [1,] 1 3 1 7 [2,] 2 4 6 8 > sweep(x,MARGIN=1,STATS=c(3,5),FUN="-") ##Subtract 3 from the first line and 5 from the second line [,1] [,2] [,3] [,4] [1,] -2 0 -2 4 [2,] -3 -1 1 3 > sweep(x,MARGIN=2,STATS=c(1,2,3,4),FUN="-") ##Subtract 1 from the first column, 2 from the second column, 3 from the third column, and 4 from the fourth column. [,1] [,2] [,3] [,4] [1,] 0 1 -2 3 [2,] 1 2 3 4
3.5 function stack ()
Function stack () cascades the values of some columns of a data box into a single vector. The function outputs a data box. The first column is a stacked vector. The second column contains a factor to indicate the origin of each observation. Function unstack () performs reverse operation, which is very useful for ANOVA. Of
> x<-data.frame(trt1=c(1,4,6,9),trt2=c(2,5,7,8)) > x trt1 trt2 1 1 2 2 4 5 3 6 7 4 9 8 > stack(x) values ind 1 1 trt1 2 4 trt1 3 6 trt1 4 9 trt1 5 2 trt2 6 5 trt2 7 7 trt2 8 8 trt2
3.6 function aggregate ()
The function aggregate () splits a data box into several sub-populations based on a factor (specified by the value of the parameter by) and applies a pre-defined function to each sub-population.
> x<-data.frame(gender=c("f","m","m","f"),height=c(165,182,178,160),weight=c(50,65,67,55),income=c(80,90,60,150)) > x gender height weight income 1 f 165 50 80 2 m 182 65 90 3 m 178 67 60 4 f 160 55 150 > aggregate(x[,-1],by=list(gender=x[,1]),FUN=mean) **x[,-1]Represents extracting all columns except the first column gender height weight income 1 f 162.5 52.5 115 2 m 180.0 66.0 75
3.7 function transform ()
This function is used to convert a column of a data box. For example, the next example converts height units from cm to m and adds a new column BMI to the data box.
> x gender height weight income 1 f 165 50 80 2 m 182 65 90 3 m 178 67 60 4 f 160 55 150 > y<-transform(x,height=height/100,BMI=weight/(height/100)^2) > y gender height weight income BMI 1 f 1.65 50 80 18.36547 2 m 1.82 65 90 19.62323 3 m 1.78 67 60 21.14632 4 f 1.60 55 150 21.48437
Note: The package plyr manages and operates data tables in a simple and effective way.
4. Operation of lists (circular functions)
The functions lapply () and sapply () are similar to the functions apply (). They all apply a function to each component of a list, but the former outputs a list, while the latter outputs a vector.
lapply(x,function),sapply(x,function)