Topic 1. Meta for meta analysis in SCI articles

Posted by s_r_elliott on Tue, 08 Feb 2022 14:14:43 +0100

It's interesting to do meta analysis recently. Once you master these skills, you don't have to worry about publishing articles in the future. Meta analysis is also one of the very important methods in medicine. This issue analyzes several classic graphics based on existing data, including forest, vulnerability, radial, labe And Q-Q normal.

01 concept of meta # analysis

In 1976, Glass named this concept meta analysis for the first time. It is defined as a method of collecting, merging and statistical analysis of different research results. This method has gradually developed into a new discipline - "evidence-based medicine", the main content and research means. The main purpose of Meta-analysis is to comprehensively reflect the previous research results more objectively. Researchers do not conduct the original research, but comprehensively analyze the results obtained from the research.

02 input data mode

We use R package metafor, developed by Viechtbauer, which can not only complete the Meta-analysis of binary classification and continuity variables, but also carry out regression analysis. This software package is the only package that can carry out the fitting operation of mixed effect model, including:

  1. Single variable

  2. Multi classification variable

  3. Continuity variable

It can also detect the model coefficients and obtain the confidence interval, and accurately test the parameters, such as permutation test.

Package installation, as follows:

install.packages("metafor")
library("metafor")

The software package data were obtained from 13 studies that examined BCG's anti tuberculosis effects. The purpose of meta analysis was to examine the overall effectiveness of BCG in preventing TB and to examine possible mitigation agents. This data set has been used in some publications to illustrate the meta-analysis method, as follows:

data("dat.bcg")
print(dat.bcg, row.names = FALSE)
trial               author year tpos  tneg cpos  cneg ablat      alloc
     1              Aronson 1948    4   119   11   128    44     random
     2     Ferguson & Simes 1949    6   300   29   274    55     random
     3      Rosenthal et al 1960    3   228   11   209    42     random
     4    Hart & Sutherland 1977   62 13536  248 12619    52     random
     5 Frimodt-Moller et al 1973   33  5036   47  5761    13  alternate
     6      Stein & Aronson 1953  180  1361  372  1079    44  alternate
     7     Vandiviere et al 1973    8  2537   10   619    19     random
     8           TPT Madras 1980  505 87886  499 87892    13     random
     9     Coetzee & Berjak 1968   29  7470   45  7232    27     random
    10      Rosenthal et al 1961   17  1699   65  1600    42 systematic
    11       Comstock et al 1974  186 50448  141 27197    18 systematic
    12   Comstock & Webster 1969    5  2493    3  2338    33 systematic
    13       Comstock et al 1976   27 16886   29 17825    33 systematic

#######Data description
?dat.bcg
  Format
The data frame contains the following columns:

trial  numeric  trial number
author  character  author(s)
year  numeric  publication year
tpos  numeric  number of TB positive cases in the treated (vaccinated) group
tneg  numeric  number of TB negative cases in the treated (vaccinated) group
cpos  numeric  number of TB positive cases in the control (non-vaccinated) group
cneg  numeric  number of TB negative cases in the control (non-vaccinated) group
ablat  numeric  absolute latitude of the study location (in degrees)
alloc  character  method of treatment allocation (random, alternate, or systematic assignment)  

The 13 studies provide data details in the following tables:


TB positiveTB negative
vaccinated grouptpostneg
control groupcposcneg

03 example analysis

Calculate the log relative risk and the corresponding sampling variance as follows:

dat <- escalc(measure = "RR", ai = tpos, bi = tneg, ci = cpos, 
              di = cneg, data = dat.bcg, append = TRUE)
print(dat[,-c(4:7)], row.names = FALSE
trial               author year ablat      alloc      yi     vi 
     1              Aronson 1948    44     random -0.8893 0.3256 
     2     Ferguson & Simes 1949    55     random -1.5854 0.1946 
     3      Rosenthal et al 1960    42     random -1.3481 0.4154 
     4    Hart & Sutherland 1977    52     random -1.4416 0.0200 
     5 Frimodt-Moller et al 1973    13  alternate -0.2175 0.0512 
     6      Stein & Aronson 1953    44  alternate -0.7861 0.0069 
     7     Vandiviere et al 1973    19     random -1.6209 0.2230 
     8           TPT Madras 1980    13     random  0.0120 0.0040 
     9     Coetzee & Berjak 1968    27     random -0.4694 0.0564 
    10      Rosenthal et al 1961    42 systematic -1.3713 0.0730 
    11       Comstock et al 1974    18 systematic -0.3394 0.0124 
    12   Comstock & Webster 1969    33 systematic  0.4459 0.5325 
    13       Comstock et al 1976    33 systematic -0.0173 0.0714

The demo formula interface is as follows:

k <- length(dat.bcg$trial)
dat.fm <- data.frame(study = factor(rep(1:k, each = 4)))
dat.fm$grp <- factor(rep(c("T", "T", "C", "C"), k), levels = c("T", "C"))
dat.fm$out <- factor(rep(c("+", "-", "+", "-"), k), levels = c("+", "-"))
dat.fm$freq <- with(dat.bcg, c(rbind(tpos, tneg, cpos, cneg)))
dat.fm
   study grp out  freq
1      1   T   +     4
2      1   T   -   119
3      1   C   +    11
4      1   C   -   128
5      2   T   +     6
6      2   T   -   300
7      2   C   +    29
8      2   C   -   274
9      3   T   +     3
10     3   T   -   228
11     3   C   +    11
12     3   C   -   209
13     4   T   +    62
14     4   T   - 13536
15     4   C   +   248
16     4   C   - 12619
17     5   T   +    33
18     5   T   -  5036
19     5   C   +    47
20     5   C   -  5761
21     6   T   +   180
22     6   T   -  1361
23     6   C   +   372
24     6   C   -  1079
25     7   T   +     8
26     7   T   -  2537
27     7   C   +    10
28     7   C   -   619
29     8   T   +   505
30     8   T   - 87886
31     8   C   +   499
32     8   C   - 87892
33     9   T   +    29
34     9   T   -  7470
35     9   C   +    45
36     9   C   -  7232
37    10   T   +    17
38    10   T   -  1699
39    10   C   +    65
40    10   C   -  1600
41    11   T   +   186
42    11   T   - 50448
43    11   C   +   141
44    11   C   - 27197
45    12   T   +     5
46    12   T   -  2493
47    12   C   +     3
48    12   C   -  2338
49    13   T   +    27
50    13   T   - 16886
51    13   C   +    29
52    13   C   - 17825
  • The random effect model is as follows:
#The default model is REML model random effect model using restricted maximum likelihood estimator
res <- rma(yi, vi, data = dat)
res
Random-Effects Model (k = 13; tau^2 estimator: REML)

tau^2 (estimated amount of total heterogeneity): 0.3132 (SE = 0.1664)
tau (square root of estimated tau^2 value):      0.5597
I^2 (total heterogeneity / total variability):   92.22%
H^2 (total variability / sampling variability):  12.86

Test for Heterogeneity:
Q(df = 12) = 152.2330, p-val < .0001

Model Results:

estimate      se     zval    pval    ci.lb    ci.ub 
 -0.7145  0.1798  -3.9744  <.0001  -1.0669  -0.3622  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

res <- rma(ai = tpos, bi = tneg, ci = cpos, 
           di = cneg, data = dat, measure = "RR")
res
Random-Effects Model (k = 13; tau^2 estimator: REML)

tau^2 (estimated amount of total heterogeneity): 0.3132 (SE = 0.1664)
tau (square root of estimated tau^2 value):      0.5597
I^2 (total heterogeneity / total variability):   92.22%
H^2 (total variability / sampling variability):  12.86

Test for Heterogeneity:
Q(df = 12) = 152.2330, p-val < .0001

Model Results:

estimate      se     zval    pval    ci.lb    ci.ub 
 -0.7145  0.1798  -3.9744  <.0001  -1.0669  -0.3622  *** 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For the confidence interval of heterogeneity, the confidence interval of heterogeneity coefficient (tau2,tau,I2(%),H^2) can be calculated by confint () in metafor, as follows:

### confidence intervals for tau^2, I^2, and H^2

confint(res)
     estimate   ci.lb   ci.ub 
tau^2    0.3132  0.1197  1.1115 
tau      0.5597  0.3460  1.0543 
I^2(%)  92.2214 81.9177 97.6781 
H^2     12.8558  5.5303 43.0680 
# Prediction / confidence interval of estimated effect quantity
predict(res, digits = 3)
 pred    se  ci.lb  ci.ub  pi.lb pi.ub 
 -0.715 0.180 -1.067 -0.362 -1.867 0.438
  • forest map

Forest map is based on statistical indicators and statistical analysis methods and drawn with numerical calculation results. In the plane rectangular coordinate system, it takes a vertical invalid line (the abscissa scale is 1 or 0) as the center, uses multiple line segments parallel to the horizontal axis to describe the effect quantity and confidence interval of each research, and uses a prism (or other graphics) to describe the effect quantity and confidence interval of multiple research combinations. It is very simple and intuitive to describe the statistical results of Meta-analysis. It is the most commonly used result expression in Meta-analysis. As follows:

### forst plot
forest(res, slab = paste(dat$author, dat$year, sep = ", "), 
       xlim = c(-16, 6), at = log(c(.05, .25, 1, 4)), atransf = exp,
       ilab = cbind(dat$tpos, dat$tneg, dat$cpos, dat$cneg), 
       ilab.xpos = c(-9.5, -8, -6, -4.5), cex = .75)
op <- par(cex = .75, font = 2)
text(c(-9.5, -8, -6, -4.5), 15, c("TB+", "TB-", "TB+", "TB-"))
text(c(-8.75, -5.25),       16, c("Vaccinated", "Control"))
text(-16,                   15, "Author(s) and Year",     pos = 4)
text(6,                     15, "Relative Risk [95% CI]", pos = 2)
par(op)

The mixed effect model is as follows:

### mixed-effects model with absolute latitude and publication year as moderators

res <- rma(yi, vi, mods = cbind(ablat, year), data = dat)
res
Mixed-Effects Model (k = 13; tau^2 estimator: REML)

tau^2 (estimated amount of residual heterogeneity):     0.1108 (SE = 0.0845)
tau (square root of estimated tau^2 value):             0.3328
I^2 (residual heterogeneity / unaccounted variability): 71.98%
H^2 (unaccounted variability / sampling variability):   3.57
R^2 (amount of heterogeneity accounted for):            64.63%

Test for Residual Heterogeneity:
QE(df = 10) = 28.3251, p-val = 0.0016

Test of Moderators (coefficients 2:3):
QM(df = 2) = 12.2043, p-val = 0.0022

Model Results:

         estimate       se     zval    pval     ci.lb    ci.ub 
intrcpt   -3.5455  29.0959  -0.1219  0.9030  -60.5724  53.4814     
ablat     -0.0280   0.0102  -2.7371  0.0062   -0.0481  -0.0080  ** 
year       0.0019   0.0147   0.1299  0.8966   -0.0269   0.0307     

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

### predicted average relative risks for 10, 20, 30, 40, and 50 degrees 
### absolute latitude holding publication year constant at 1970

predict(res, newmods = cbind(seq(from = 10, to = 60, by = 10), 1970), 
        transf = exp, addx = TRUE)

### plot of the predicted average relative risk as a function of absolute latitude
preds <- predict(res, newmods = cbind(0:60, 1970), transf = exp)
wi    <- 1/sqrt(dat$vi)
size  <- 0.5 + 3 * (wi - min(wi))/(max(wi) - min(wi))
plot(dat$ablat, exp(dat$yi), pch = 19, cex = size, 
     xlab = "Absolute Latitude", ylab = "Relative Risk", 
     las = 1, bty = "l", log = "y")
lines(0:60, preds$pred)
lines(0:60, preds$ci.lb, lty = "dashed")
lines(0:60, preds$ci.ub, lty = "dashed")
abline(h = 1, lty = "dotted")

  • funnel diagram

Funnel chart is a scatter chart proposed by Light et al. In 1984, which generally takes the effect quantity of a single study as the abscissa and the sample content as the ordinate. The effect quantity can be RR, OR and death ratio OR its logarithm. Theoretically, the point estimation of each independent research effect included in Meta-analysis should be an inverted funnel shape in the plane coordinate system, so it is called funnel diagram.

  1. The sample size is small and the research accuracy is low. It is distributed at the bottom of the funnel diagram and scattered around;

  2. The sample size is large and the research accuracy is high. It is distributed at the top of the funnel diagram and concentrated in the middle.

In actual use, when the number of studies for Meta-analysis is small, it is not suitable to make a funnel chart. Generally, it is recommended that the funnel chart should be made only when the number of studies for Meta-analysis is 10 or more. Funnel chart is mainly used to observe whether there is bias in the results of a system evaluation or Meta-analysis, such as publication bias or other bias, as shown in the following figure:

### funnel plots
res <- rma(yi, vi, data = dat)
funnel(res, main = "Random-Effects Model")
res <- rma(yi, vi, mods = cbind(ablat), data = dat)
funnel(res, main = "Mixed-Effects Model")
#######Begger's test
ranktest(res)
Rank Correlation Test for Funnel Plot Asymmetry

Kendall's tau = 0.0256, p = 0.9524
##########Egger's test
regtest(res)
Regression Test for Funnel Plot Asymmetry

Model:     mixed-effects meta-regression model
Predictor: standard error

Test for Funnel Plot Asymmetry: z = -0.4623, p = 0.6439

  • radial star

It mainly reflects the heterogeneity of various studies, so as to find the points of heterogeneity. The effect corresponding to the arc is evaluated by size distribution. As follows:

### radial plots
res <- rma(yi, vi, data = dat, method = "REML")
radial(res, main = "Random-Effects Model")
res <- rma(yi, vi, data = dat, method = "ML")
radial(res, main = "Mixed-Effects Model")

  • Q-Q bitmap

Draw the prediction results and observe whether they are within the confidence interval, as follows:

### qq-normal plots
res <- rma(yi, vi, data = dat)
qqnorm(res, main = "Random-Effects Model")
res <- rma(yi, vi, mods = cbind(ablat), data = dat)
qqnorm(res, main = "Mixed-Effects Model")

  • labbe labetu

This function calculates the arm level results of the two groups (e.g., treatment and control) and compares them with each other. Incidence rate of incidence rate is incidence of the two groups of functional blocks, especially in the analysis of risk differences. The log odds ratio is analyzed in the log log log ratio. The log odds ratio is analyzed in the analysis of the odds ratio (log), the square root inverse sine transformation ratio is used to analyze the risk difference of the square root sine transform, and the incidence rate is analyzed with the original incidence rate, and the incidence ratio is analyzed by the logarithm of incidence (log). The square root conversion of incidence rate is analyzed as follows: incidence rate difference.

res <- rma(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)

### default plot
labbe(res)

### funnel plot with risk values on the x- and y-axis and add grid
labbe(res, transf=exp, grid=TRUE)
# }

04 interpretation of analysis results

It can be seen that I^2 is 71.98%, which belongs to high heterogeneity, and the random effect model can be used. When heterogeneity is low, fixed effect model and random effect model can be used, and the results are not different, but high heterogeneity can only choose random effect model, otherwise the extrapolation of results will be constrained. The random effect model is selected here for conservative consideration.

The random effect model is based on the assumption that there is heterogeneity across studies. The merger model recognizes the existence of heterogeneity among studies, but does not deal with heterogeneity; If there is heterogeneity among the studies included in the merger, although it does not reach the I ^ 2 > 50% we normally set, when merging with fixed effect model, the default operation directly ignores the existence of this part of heterogeneity, so the merger result will cause false positive error. When merging with random effect model, although the heterogeneity is not handled, However, its default operation recognizes the existence of heterogeneity, and the merging result is more reliable!

From the funnel chart and the results of Begg's test and Egger's test, we can see that the P value is greater than 0.05, which means there is no publication bias.

Meta analysis is also an option for publishing articles. To find out these diagrams and internal methods, as for the score of SCI, we still need to see the novelty of the subject we study, and the methods are basically these. As long as you have good data, it's just around the corner to publish a high score of SCI. In the next issue, we'll talk about other software!

Pay attention to the official account, scan the code into groups, free answers, and later there will be free live tutorials. Please look forward to it!

Reference:

  1. Viechtbauer, W. (2010). Conducting Meta-Analyses in R with the metafor Package. Journal of Statistical Software, 36(3), 1–48.

  2. Colditz, G. A., Brewer, T. F., Berkey, C. S., Wilson, M. E., Burdick, E., Fineberg, H. V., & Mosteller, F. (1994). Efficacy of BCG vaccine in the prevention of tuberculosis: Meta-analysis of the published literature. Journal of the American Medical Association, 271(9), 698–702.

  3. Berkey, C. S., Hoaglin, D. C., Mosteller, F., & Colditz, G. A. (1995). A random-effects regression model for meta-analysis. Statistics in Medicine, 14(4), 395–411.

  4. van Houwelingen, H. C., Arends, L. R., & Stijnen, T. (2002). Advanced methods in meta-analysis: Multivariate approach and meta-regression. Statistics in Medicine, 21(4), 589–624.

  5. Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor package. Journal of Statistical Software, 36(3), 1–48.

Huanfeng gene

Bioinformatics analysis, SCI article writing and bioinformatics basic knowledge learning: R language learning, perl basic programming, linux system commands, Python meet you better

34 original content

official account

Topics: R Language