Operations on vectors, matrices, and lists

Posted by Option on Thu, 12 Sep 2019 03:50:21 +0200

Catalogue: Operations on vectors, matrices, and lists

  • Vector computing
  • Recirculation
  • Basic functions
  1. Information on the overall structure
  2. Consolidated form
  3. Function apply ()
  4. Function sweep ()
  5. Function aggregate ()
  6. Function transform ()
  • Circulation of lists

First: Operations of vectors, matrices, and lists

1: Vector Operations

Vectors and matrices can be calculated in R.

2: Recycling

When an operation is performed on two vectors of different lengths, R completes the shortest vector by repeatedly using the value of the vector.

For example:

3: Fundamental functions

  • Length (): Returns the length of a vector.
> length(c(1,2,3,4,5))
[1] 5
  • sort (): sort the elements of a vector incrementally or progressively.
> sort(c(1,4,2,9,7,6))
[1] 1 2 4 6 7 9
> sort(c(1,3,6,2,7,4,8,1,0),decreasing = TRUE)
[1] 8 7 6 4 3 2 1 1 0
  • rev (): Rearrange elements of a vector in reverse order
  • order (), rank (): The first function returns the position of each element in the original vector in the form of a vector, and the second function returns the vector formed by the rank number of each element. If some elements have the same values, the sorting always proceeds according to the left-to-right principle.
  • unique (): Removes elements that recur in vectors.
  • duplicated (): Determine whether each element in the vector (TRUE, FALSE) has already appeared in the element before it.

3.1. Information on the overall structure

  • dim (): Dimensions of matrices or data boxes
  • nrow (): number of row s
  • ncol (): number of columns
  • dimnames (): the names of rows and columns (in the form of lists)
  • names (), colnames (): the name of the column
  • rownames (): the name of the row.

 

 

3.2 merging tables

  • Merge cbind () by column
> cbind(1:4,5:8)
     [,1] [,2]
[1,]    1    5
[2,]    2    6
[3,]    3    7
[4,]    4    8

But this merge is not optimal, especially when two tables are merged, there will be duplicates. In this case, a very useful function is merge ()

By default, the merge () function merges two data boxes, making X and Y the data boxes we want to merge, and Z the merged data boxes of X and Y. Be careful,

The merge is based on columns with the same name in the two data boxes. We call these "common columns". The parameter by can be used to specify which columns are (mandatory)

Collective. The value of the parameter can be a name vector, an index vector or a logical value vector. All other columns will be treated as separate columns by merge (), although they have the same name. The function merge () works as follows:

(1) For each row (individual) of data box x, the function merge () compares all elements of this row with those of each row of Y, but only limited to a subset of the common column.

(2) If a perfect match is found, he will consider it to be the same entity, which is added to z (as a row) and filled in the values from the non-common columns of x and y.

(3) If no perfect match is found, the individual is either added to z and filled with NA [if all() is TRUE], or removed directly [all() is FALSE].

(4) Repeat the above operation on the next line until the last action.

ex:

> x<-data.frame(gender=c("f","m","m","f"),height=c(165,182,178,160),weight=c(50,65,67,55),income=c(80,90,60,150))
> y<-data.frame(gender=c("f","m","m","f"),height=c(165,182,178,160),weight=c(55,65,67,85),income=c(70,90,40,40),row.names=4:7)
> x
  gender height weight income
1      f    165     50     80
2      m    182     65     90
3      m    178     67     60
4      f    160     55    150
> y
  gender height weight income
4      f    165     55     70
5      m    182     65     90
6      m    178     67     40
7      f    160     85     40
> merge(x,y,by=c("gender","height"))
  gender height weight.x income.x weight.y income.y
1      f    160       55      150       85       40
2      f    165       50       80       55       70
3      m    178       67       60       67       40
4      m    182       65       90       65       90

 

> merge(x,y,by=c("gender","weight"))
  gender weight height.x income.x height.y income.y
1      f     55      160      150      165       70
2      m     65      182       90      182       90
3      m     67      178       60      178       40
> merge(x,y,by=c("gender","weight"),all=TRUE)
  gender weight height.x income.x height.y income.y
1      f     50      165       80       NA       NA
2      f     55      160      150      165       70
3      f     85       NA       NA      160       40
4      m     65      182       90      182       90
5      m     67      178       60      178       40
> merge(x,y,by=c("row.names","weight"))
  Row.names weight gender.x height.x income.x gender.y height.y income.y
1         4     55        f      160      150        f      165       70
> merge(x,y,by=c("row.names","weight"),all=TRUE)
  Row.names weight gender.x height.x income.x gender.y height.y income.y
1         1     50        f      165       80     <NA>       NA       NA
2         2     65        m      182       90     <NA>       NA       NA
3         3     67        m      178       60     <NA>       NA       NA
4         4     55        f      160      150        f      165       70
5         5     65     <NA>       NA       NA        m      182       90
6         6     67     <NA>       NA       NA        m      178       40
7         7     85     <NA>       NA       NA        f      160       40

Note: When identifying common individuals, the function merge () defaults to not consider the names of individuals in data boxes x and y. To include the names of individuals, you can choose to add an ID column to x and y to identify individuals, or use the row name "row.names" as the value of the parameter by.

  • Merge line

rbind () for general functions

 

 

3.3. Function apply ()

apply () is a commonly used function that can calculate all rows (MARGIN=1) or all columns (MARGIN=2) of a matrix or data box using another given function (specified by the value of the parameter FUN).

> x<-matrix(c(1:4,1,6:8),nrow=2)
> x
     [,1] [,2] [,3] [,4]
[1,]    1    3    1    7
[2,]    2    4    6    8
> apply(x,MARGIN=1,FUN=mean)
[1] 3 5
> apply(x,MARGIN=2,FUN=mean)
[1] 1.5 3.5 3.5 7.5

 

Tip: When an operation or operation is to aggregate or average rows or columns, you can also use functions directly: rowSums(),colSums(),rowMeans(),colMeans().

 

 

3.4 function sweep ()

The sweep () function is used to "clean up" a statistical value (specified by the value of the parameter STATS) from each row of a table (MARGIN=1) or column (MARGIN=2).

 

> x
     [,1] [,2] [,3] [,4]
[1,]    1    3    1    7
[2,]    2    4    6    8
> sweep(x,MARGIN=1,STATS=c(3,5),FUN="-")     ##Subtract 3 from the first line and 5 from the second line
     [,1] [,2] [,3] [,4]
[1,]   -2    0   -2    4
[2,]   -3   -1    1    3
> sweep(x,MARGIN=2,STATS=c(1,2,3,4),FUN="-")     ##Subtract 1 from the first column, 2 from the second column, 3 from the third column, and 4 from the fourth column.
     [,1] [,2] [,3] [,4]
[1,]    0    1   -2    3
[2,]    1    2    3    4

3.5 function stack ()

Function stack () cascades the values of some columns of a data box into a single vector. The function outputs a data box. The first column is a stacked vector. The second column contains a factor to indicate the origin of each observation. Function unstack () performs reverse operation, which is very useful for ANOVA. Of

> x<-data.frame(trt1=c(1,4,6,9),trt2=c(2,5,7,8))
> x
  trt1 trt2
1    1    2
2    4    5
3    6    7
4    9    8
> stack(x)
  values  ind
1      1 trt1
2      4 trt1
3      6 trt1
4      9 trt1
5      2 trt2
6      5 trt2
7      7 trt2
8      8 trt2

3.6 function aggregate ()

The function aggregate () splits a data box into several sub-populations based on a factor (specified by the value of the parameter by) and applies a pre-defined function to each sub-population.

> x<-data.frame(gender=c("f","m","m","f"),height=c(165,182,178,160),weight=c(50,65,67,55),income=c(80,90,60,150))
> x
  gender height weight income
1      f    165     50     80
2      m    182     65     90
3      m    178     67     60
4      f    160     55    150
> aggregate(x[,-1],by=list(gender=x[,1]),FUN=mean)    **x[,-1]Represents extracting all columns except the first column
  gender height weight income
1      f  162.5   52.5    115
2      m  180.0   66.0     75

3.7 function transform ()

This function is used to convert a column of a data box. For example, the next example converts height units from cm to m and adds a new column BMI to the data box.

> x
  gender height weight income
1      f    165     50     80
2      m    182     65     90
3      m    178     67     60
4      f    160     55    150
> y<-transform(x,height=height/100,BMI=weight/(height/100)^2)
> y
  gender height weight income      BMI
1      f   1.65     50     80 18.36547
2      m   1.82     65     90 19.62323
3      m   1.78     67     60 21.14632
4      f   1.60     55    150 21.48437

Note: The package plyr manages and operates data tables in a simple and effective way.

4. Operation of lists (circular functions)

The functions lapply () and sapply () are similar to the functions apply (). They all apply a function to each component of a list, but the former outputs a list, while the latter outputs a vector.

lapply(x,function),sapply(x,function)