Read single cell file in room format

Posted by tommmmm on Mon, 17 Jan 2022 15:29:53 +0100

Everything is difficult at the beginning. Considering that many small partners often read the expression matrix uploaded by the author in the first step to build the seurat object when doing single-cell public data analysis, it is very necessary to sort out the 18 single-cell data format files.

Now let's demonstrate how to read a single cell file in room format. First, install and load some packages:

library(hdf5r) 
library(loomR)
library(LoomExperiment)
# remotes::install_github("aertslab/SCopeLoomR")
# remotes::install_local('mojaveazure-loomR-0.2.0-beta-54-g1eca16a.tar.gz')
library(SCopeLoomR)

It is worth noting that some packages are actually on GitHub. If your network is poor, you need to find a solution by yourself. If you can't install package reading, you might as well try our * * marathon teaching (live one month interactive teaching), and you can watch the 200 Q & a selected from more than 2000 questions and interactive exchanges! 2021 phase II_ Introduction class of Shengxin_ Wechat group Q & a sorting , and 2021 phase II_ Data mining class_ Wechat group Q & A notes

With 100000 students, you deserve the following classes:

Next, we take gse160756 as an example to enter https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE160756 It can be seen that the seven samples of its data set are shared with you in room format.

Share it with you in loom format

Our example code is as follows;

###### step1:Import data ######  
path='GSE160756_RAW/'
samples=list.files(path )
samples 
sceList = lapply(samples,function(pro){ 
  # pro=samples[1] 
  print(pro) 
  folder=file.path(path,pro) 
  print(pro)
  print(folder)
  # Import room file
  lfile <- connect(filename =folder, mode = "r+")
  ct <- as.data.frame(lfile[["matrix"]][,])
  colnames(ct) <- lfile[["row_attrs/gene_names"]][] # Add gene_name
  rownames(ct) <- lfile[["col_attrs/cell_names"]][] # Add cell_name
  ct[1:5,1:5]
  ct <- t(ct) # Note transpose
  sce=CreateSeuratObject(counts = ct,
                         project =  pro )
  return(sce)
})
names(sceList)  
samples  

After reading all the room files, it becomes a list. There are independent seurat objects for each sample, which need to be merged. The code is as follows:

sce.all=merge(x=sceList[[1]],
              y=sceList[ -1 ],
              add.cell.ids = samples) 
as.data.frame(sce.all@assays$RNA@counts[1:10, 1:2])
head(sce.all@meta.data, 10)
table(sce.all@meta.data$orig.ident) 


library(stringr)
phe=str_split(rownames(sce.all@meta.data),'_',simplify = T)
head(phe)
table(phe[,4])
table(phe[,3])

sce.all@meta.data$rep=phe[,4]
sce.all@meta.data$group=phe[,3]
sce.all@meta.data$orig.ident = paste0(
  sce.all@meta.data$group,
  sce.all@meta.data$rep
)
table(sce.all@meta.data$orig.ident) 

Because there are multiple samples, it is usually recommended to follow the harmnoy process for integration.

You can find 10x single cell transcriptome data in the following articles, which are published in the link of geo database:

If you don't have a basic understanding of single cell data analysis, you can see basic 10:

The most basic is dimension reduction clustering. Refer to the previous example: Single cell clustering and clustering annotation that everyone can learn