Chip SEQ data analysis 1 chip SEQ technology 2 chip SEQ data analysis

Posted by toms on Fri, 21 Jan 2022 00:42:50 +0100

1 chip SEQ Technology
- 1.1 concept
- 1.2 chip SEQ technology principle
2 chip SEQ data analysis
- 2.1 data download
- 2.2 data_assess ment
- 2.3 mapping_analysis
- 2.4 Peak_calling
  - MACS2
    - 2.4.1 MACS2 core: callpack usage
    - 2.4.2 description of callpack result file
    - 2.4.3 bdg file → wig file
- 2.5 Peak_anno
  - ChIPseeker

Chip SEQ is only the first mature technology in the field of epigenetics. At present, there are many other technologies, such as

DNA modification: DNA methylation, immunocoprecipitation (MeDIP), target region methylation, genome-wide methylation (WGBS), oxidation bisulfite sequencing (oxbs SEQ),
TET assisted bisulfite sequencing (tab SEQ)

RNA modification: RNA methylation immunocoprecipitation (MeRIP)

Protein nucleic acid interaction: Rip SEQ, chip SEQ, clip seq

And what can ATAC SEQ do? (
http://www.biotrainee.com/thread-1218-1-1.html
)

1 chip SEQ Technology

1.1 concept

Chromatin Immunoprecipitation (ChIP)
)Also known as binding site analysis, it is a method to study the interaction between protein and DNA in vivo. It is usually used to study the binding sites of transcription factors or histone specific modification sites.
ChIP SEQ technology combining ChIP and second-generation sequencing technology
, it can efficiently detect DNA fragments interacting with histones and transcription factors in the whole genome.

1.2 chip SEQ technology principle

In the physiological state, the DNA in the cell is crosslinked with protein, the cell is cleaved, the chromosome is separated, and the chromatin is randomly cut by ultrasound or enzyme treatment;
The DNA fragment combined with the target protein was precipitated by the specific recognition reaction of antigen and antibody;
The DNA fragment of the binding protein was released by reverse cross-linking;
Purification;
The sequences of DNA fragments were obtained by sequencing, and finally these DNA fragments were aligned to the corresponding reference genome.
! [write picture description here]( https://img-
blog.csdn.net/20180814104707287?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTc0NTg1OA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)

2 chip SEQ data analysis

2.1 data download

GSE98149 (including all stages of H3K9me3, zygote, E6.5 Epi, E6.5 Exe and E7.5 of H3K4me3 and H3K27me3)
Epi,E7.5 Exe,E8.5 embryo,Esc)
Title: Reprogramming of H3K9me3-dependent heterochromatin during mammalian
early embryo development [ChIP-seq]
Organism: Mus musculus

    for ((i=594;i<=670;i++));
    do
    wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP105/SRP105176/SRR5479$i/SRR5479$i.sra;
    done &
[/code]

```code
    # sratookit:  . sra file → fastq file
    ls *sra |while read id;
    do
    /home/chen/sratoolkit.2.8.2-ubuntu64/bin/fastq-dump --gzip --split-3 $id;
    done &

    # Download the index of mouse reference genome
    wget -c "ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes/mm10.zip" &
    
    # decompression
    unzip mm10.zip &
[/code]

##  2.2 data_assess ment

```code
    # Fastqc for quality control
    ls *fq | while read id; do fastqc -t 4 $id; done &
    
    # multiqc: batch viewing of QC results
    multiqc *fastqc.zip --export &
[/code]

```code
    ## trimmomatic 
    
    # Installing trimmatic
    wget -c http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.38.zip &
    unzip Trimmomatic-0.38.zip
    
    # Data cleaning
    # -threads setting up multithreading
    java -jar "/data/chen/biosoft/Trimmomatic-0.38/trimmomatic-0.38.jar" PE -threads 2 -phred33 \
    
    # 2 input files
    ${name}_1.fq.gz ${name}_2.trim.fq.gz \
    
    # 4 output files
    ${name}_R1.clean.fq.gz ${name}_R1.unpaired.fq.gz \
    ${name}_R2.clean.fq.gz ${name}_R2.unpaired.fq.gz \
    
    # ILLUMINACLIP: de connector
    # "$adapter"/Exome.fa: fasta file of adapter sequence
    # 2: There can be 2 mismatches in the 16 base length seed sequence
    # 30: with palindrome pattern, the matching score is at least 30 (about 50 bases)
    # 10: When using the simple pattern, the matching score is at least 10 (about 17 bases)
    ILLUMINACLIP:"$adapter"/Exome.fa:2:30:10 \
    
    # LEADING:3, remove the base with mass value less than 3 from the beginning of the sequence;
    # Training: 3, remove bases with mass value less than 3 from the end of the sequence;
    # SLIDINGWINDOW:4:15, calculate the average base mass in a 4 bp window from the 5 'end,
    # If the average value is lower than 15, the read is truncated from this position;
    # Headcrop: < length > cut the specified length at the head end of reads;
    # MINLEN:36. If the length of reads is less than 36 bp, throw away the whole read.
    LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 HEADCROP:10 MINLEN:36
[/code]

![Write a picture description here](https://img-
blog.csdn.net/20180904092004409?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTc0NTg1OA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)  
![Write a picture description here](https://img-
blog.csdn.net/20180904092011835?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTc0NTg1OA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)

##  2.3 mapping_analysis

Bowtie2 or BWA

```code
    # bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} -S [<hit>]
    
    # -P / - threads nthreads sets the number of threads Default: 1
    # -q reads is in fastq format
    # -X < BT2 IDX > index path
    # -1 < M1 > double terminal sequencing_ 1.fastq path. Can be multiple files separated by commas; Multiple documents must correspond to the documents formulated in - 2 < M2 >.
    # -2 < M2 > double ended sequencing_ 2.fastq path
    # -U < R > fastq path for non two terminal sequencing. Can be multiple files separated by commas.
    # -S < hit > output Sam format file.
    # -3 / - - trim3 < int > cut the base with the length of < int > at the 3 'end, and then use it for comparison. (default: 0).
    # After looking at the data quality with fastqc, I found that there was a problem with the quality of the three terminals, so I used the - 3 5 --local parameter,
    # --Local if the fq file has not been subjected to trim, you can perform soft clipping with local comparison and add the parameter -- local. In this mode, read is locally compared, so that some bases at both ends of read are not compared, so that the comparison score meets the requirements
    
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479594_1.fastq -2 SRR5479594_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/MII_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479595_1.fastq -2 SRR5479595_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/MII_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479597_1.fastq -2 SRR5479597_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/sperm_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479598_1.fastq -2 SRR5479598_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/sperm_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479596_1.fastq -2 SRR5479596_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/sperm_input.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479599_1.fastq -2 SRR5479599_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/zygote_input.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479605_1.fastq -2 SRR5479605_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/zygote_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479606_1.fastq -2 SRR5479606_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/zygote_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479607_1.fastq -2 SRR5479607_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/2cell_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479608_1.fastq -2 SRR5479608_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/2cell_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479609_1.fastq -2 SRR5479609_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/4cell_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479610_1.fastq -2 SRR5479610_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/4cell_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479611_1.fastq -2 SRR5479611_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/8cell_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479612_1.fastq -2 SRR5479612_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/8cell_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479613_1.fastq -2 SRR5479613_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/morula_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479614_1.fastq -2 SRR5479614_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/morula_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479615_1.fastq -2 SRR5479615_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/ICM_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479616_1.fastq -2 SRR5479616_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/ICM_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479617_1.fastq -2 SRR5479617_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TE_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479618_1.fastq -2 SRR5479618_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TE_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479623_1.fastq -2 SRR5479623_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/ESC_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479624_1.fastq -2 SRR5479624_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/ESC_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479625_1.fastq -2 SRR5479625_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TSC_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479626_1.fastq -2 SRR5479626_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TSC_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479627_1.fastq -2 SRR5479627_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TSC_rep3.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479628_1.fastq -2 SRR5479628_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Epi_input.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479634_1.fastq -2 SRR5479634_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Epi_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479635_1.fastq -2 SRR5479635_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Epi_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479636_1.fastq -2 SRR5479636_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Exe_input.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479642_1.fastq -2 SRR5479642_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Exe_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479643_1.fastq -2 SRR5479643_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Exe_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479644_1.fastq -2 SRR5479644_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_input.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479650_1.fastq -2 SRR5479650_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479651_1.fastq -2 SRR5479651_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479652_1.fastq -2 SRR5479652_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_rep3.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479653_1.fastq -2 SRR5479653_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_rep4.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479654_1.fastq -2 SRR5479654_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Exe_input.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479661_1.fastq -2 SRR5479661_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Exe_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479662_1.fastq -2 SRR5479662_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Exe_rep2.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479663_1.fastq -2 SRR5479663_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E85Epi_input.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479669_1.fastq -2 SRR5479669_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E85Epi_rep1.sam &
    bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479670_1.fastq -2 SRR5479670_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E85Epi_rep2.sam &
[/code]

![Write a picture description here](https://img-
blog.csdn.net/2018090411251642?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTc0NTg1OA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)  
![Write a picture description here](https://img-
blog.csdn.net/20180904112404199?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTc0NTg1OA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)  
![Write a picture description here](https://img-
blog.csdn.net/20180904161257837?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTc0NTg1OA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)  
![Write a picture description here](https://img-
blog.csdn.net/2018090416131067?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTc0NTg1OA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)

##  2.4 Peak_calling

###  **MACS2**

peaks calling: Look for possible binding sites, that is, a large number of binding sites in the genome reads Enriched areas.

####  2.4.1 MACS2 core: callpack usage

```code
       # Example for regular peak calling
        macs2 callpeak -t ChIP.bam -c Control.bam -f BAM -g hs -n test -B -q 0.01
       # Example for broad peak calling
       macs2 callpeak -t ChIP.bam -c Control.bam --broad -g hs --broad-cutoff 0.1
[/code]

```code
    # Batch callpack
    macs2 callpeak -c IgGold.bam -t suz12.bam -q 0.05 -f BAM -g mm -n suz12 2> suz12.macs2.log &
    macs2 callpeak -c IgGold.bam -t ring1B.bam -q 0.05 -f BAM -g mm -n ring1B 2> ring1B.macs2.log &
    macs2 callpeak -c IgGold.bam -t cbx7.bam -q 0.05 -f BAM -g mm -n cbx7 2> cbx7.macs2.log &
    macs2 callpeak -c IgGold.bam -t RYBP.bam -q 0.01 -f BAM -g mm -n RYBP 2>RYBP.macs2.log &
[/code]

**-t** /–treatment FILENAME-Processing group input  
This is the only REQUIRED parameter for MACS. File can be in any supported
format specified by –format option. Check –format for detail. If you have more
than one alignment files, you can specify them as ` -t A B C ` . MACS will
pool up all these files together.

**-c** /–control-Control group input  
The control or mock data file. Please follow the same direction as for
-t/–treatment.

**-f** /–format FORMAT--t and-c Provide file format, current MACS Recognizable formats are“ ELAND", "BED",
"ELANDMULTI", "ELANDEXPORT", "ELANDMULTIPET" (Two terminal sequencing), "SAM", "BAM", "BOWTIE",
"BAMPE", "BEDPE". "Except" BAMPE", "BEDPE"Except for special declaration, other formats can be used AUTO Automatic detection. If this option is not provided, the selection is automatically detected.  
Format of tag file, can be "ELAND", "BED", "ELANDMULTI", "ELANDEXPORT",
"ELANDMULTIPET" (for pair-end tags), "SAM", "BAM", "BOWTIE", "BAMPE" or
"BEDPE". Default is "AUTO" which will allow MACS to decide the format
automatically. "AUTO" is also usefule when you combine different formats of
files. Note that MACS can't detect "BAMPE" or "BEDPE" format with "AUTO", and
you have to implicitly specify the format for "BAMPE" and "BEDPE".

**-g** /–gsize-Genome size, provided by default hs, mm, ce,
dm If the options are not included, for example, Arabidopsis, you need to provide them yourself (according to Arabidopsis) NCBI The display is 119,667,750，That is 1.2e8).   
PLEASE assign this parameter to fit your needs!  
It's the mappable genome size or effective genome size which is defined as the
genome size which can be sequenced. Because of the repetitive features on the
chromsomes, the actual mappable genome size will be smaller than the original
size, about 90% or 70% of the genome size. The default hs – 2.7e9 is
recommended for UCSC human hg18 assembly. Here are all precompiled parameters
for effective genome size:  
hs: 2.7e9 (Human is 2.7e9，That is 2.7G)  
mm: 1.87e9  
ce: 9e7  
dm: 1.2e8

**-n** /–name-Prefix name of the output file. Indicates the name of the experiment, Please choose a meaningful name.  
The name string of the experiment. MACS will use this string NAME to create
output files like 'NAME_peaks.xls', 'NAME_negative_peaks.xls',
'NAME_peaks.bed' , 'NAME_summits.bed', 'NAME_model.r' and so on. So please
avoid any confliction between these filenames and your existing files.

**-B** /–bdg  
Will save more information in bedGraph File, such as fragment pileup, control lambda, -log10pvalue and
-log10qvalue scores.   
If this flag is on, MACS will store the fragment pileup, control lambda,
-log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files
will be stored in current directory named NAME+'_treat_pileup.bdg' for
treatment data, NAME+'_control_lambda.bdg' for local lambda values from
control, NAME+'_treat_pvalue.bdg' for Poisson pvalue scores (in -log10(pvalue)
form), and NAME+'_treat_qvalue.bdg' for q-value scores from
Benjamini–Hochberg–Yekutieli procedure [
http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
](http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests)

**-q** /–qvalue-q Value, that is, the smallest PDR Threshold, 0 by default.05. q Value is based on p Value utilization BH Calculation, that is, the corrected results of multiple tests.  
The qvalue (minimum FDR) cutoff to call significant regions. Default is 0.01.
For broad marks, you can try 0.05 as cutoff. Q-values are calculated from
p-values using Benjamini-Hochberg procedure.

**-p** /–pvalue-p Values, specifying p After value MACS2 It won't work q It's worth it.  
The pvalue cutoff. If -p is specified, MACS2 will use pvalue instead of
qvalue.

**-m**
/–mfold-and MFOLD Relevant, and MFOLD and MACS About the pre built model, the default is 5:50, MACS Will look for more than 100 first peak Area construction model, generally do not need to change, because you do not understand.  
This parameter is used to select the regions within MFOLD range of high-
confidence enrichment ratio against background to build model. The regions
must be lower than upper limit, and higher than the lower limit of fold
enrichment. DEFAULT:5,50 means using all regions not too low (>5) and not too
high (<50) to build paired-peaks model. If MACS can not find more than 100
regions to build model, it will use the –extsize parameter to continue the
peak detection ONLY if –fix-bimodal is set.

####  2.4.2 description of callpack result file

```code
    # Count the number of * bed rows (peak number) in the current directory
    wc -l *bed
      2384 cbx7_summits.bed
      8342 ring1B_summits.bed
         0 RYBP_summits.bed
      7619 suz12_summits.bed
    # Count the number of hello lines in file a:
    # grep hello a | wc -l
    # wc(word count)
    #-c count the number of bytes.
    #-l count the number of rows. line
    #-m counts the number of characters. This flag cannot be used with the - c flag.
    #-w count the number of words. A word is defined as a string separated by white space, skip, or newline characters.
    #-L prints the length of the longest line.
[/code]

callpeak The following result files will be obtained:

**NAME_summits.bed** : Browser Extensible Data，Record each peak of peak
summits，In other words, record the position of the extreme point. MACS It is suggested to use this document to find the binding sites motif. Can load directly UCSC
browser，The first line needs to be removed when analyzing with other software.

**NAME_peaks.xls** : Store in form peak Information, although the suffix is xls，But it can actually be opened with a text editor, and bed The format is similar, but **Based on 1**
，and bed The file is based on 0.in other words xls All coordinates need to be minus one bed Coordinates of the file.

**NAME_peaks.narrowPeak** ， **NAME_peaks.broadPeak** similar. The last four columns are represented by, integer score
for display， fold-change，-log10pvalue，-log10qvalue，relative summit position to
peak start. Content and NAME_peaks.xls Basically consistent, suitable for import R Analysis.

**NAME_model.r** : Can pass ` $ Rscript NAME_model.r ` The drawing is based on the data you provide peak Model.

**.bdg** : Can use UCSC genome browser Convert to smaller bigWig File.

####  2.4.3 bdg file → wig file

> For convenience IGV View on ChIP-seq The results and later visual display need to be macs2 The results are translated into bw provide for IGV. There are three steps:

Step 1: use bdgcmp obtain **FE** perhaps **logLR Converted file** (Run MACS2 bdgcmp to generate
fold-enrichment and logLR track)

```code
    macs2 bdgcmp -t H3K36me1_EE_rep1_treat_pileup.bdg -c H3K36me1_EE_rep1_control_lambda.bdg -o H3K36me1_EE_rep1_FE.bdg -m FE
    
    macs2 bdgcmp -t  H3K36me1_EE_rep1_treat_pileup.bdg -c H3K36me1_EE_rep1_control_lambda.bdg -o H3K36me1_EE_rep1_logLR.bdg  -m logLR -p 0.00001
    
    # Parameter interpretation
    # -m FE calculation enrichment factor to reduce noise
    # -p to avoid an error when the input value of log is 0, a small value is given
[/code]

Step 2: **Pretreatment** File, Download corresponding **Reference genome chromosome length** file

use conda Install the following three processing software:  
bedGraphToBigWig  
bedClip  
bedtools

Download chromosome length file: [
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz
](http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz)
And decompress (for human，Other species can be downloaded from similar websites)

Write a little sh Scripts facilitate one-step conversion name.sh: 

```code
    #!/bin/bash
    # check commands: slopBed, bedGraphToBigWig and bedClip
    which bedtools &>/dev/null || { echo "bedtools not found! Download bedTools: <http://code.google.com/p/bedtools/>"; exit 1; }
    which bedGraphToBigWig &>/dev/null || { echo "bedGraphToBigWig not found! Download: <http://hgdownload.cse.ucsc.edu/admin/exe/>"; exit 1; }
    which bedClip &>/dev/null || { echo "bedClip not found! Download: <http://hgdownload.cse.ucsc.edu/admin/exe/>"; exit 1; }
    # end of checking
    if [ $# -lt 2 ];then
    echo "Need 2 parameters! <bedgraph> <chrom info>"
    exit
    fi
    F=$1
    G=$2
    bedtools slop -i ${F} -g ${G} -b 0 | bedClip stdin ${G} ${F}.clip
    LC_COLLATE=C sort -k1,1 -k2,2n ${F}.clip > ${F}.sort.clip
    bedGraphToBigWig ${F}.sort.clip ${G} ${F/bdg/bw}
    rm -f ${F}.clip ${F}.sort.clip
[/code]

chmod +x name.sh

Step 3: generate **bw** file

```code
    ./name.sh H3K36me1_EE_rep1_FE.bdg hg19.len
    ./name.sh H3K36me1_EE_rep1_logLR.bdg hg19.len
[/code]

Finally get the product. As for which one to use as the input file, you can come as needed

H3K36me1EErep1_FE.bw  
H3K36me1EErep1_logLR.bw

##  2.5 Peak_anno

###  ChIPseeker

> ChIPseeker The functions of are divided into three categories:  
>  Notes: extracting peak Near the nearest gene, note peak Area.  
>  Comparison: estimation ChIP peak The significance of overlap in the dataset; integration GEO Data set to compare current results with known results.  
>  Visualization: peak Coverage of; TSS Region combined peak Mean expression profile and heat map of; Genome annotation; TSS Distance; peak And gene overlap.
```code
    # Load ChIPseeker, genome annotation package and bed data
    biocLite("ChIPseeker")
    biocLite("org.Mm.eg.db")
    biocLite("TxDb.Mmusculus.UCSC.mm10.knownGene")
    library("ChIPseeker")
    
    # download
    source ("https://bioconductor.org/biocLite.R")
    biocLite("ChIPseeker")
    biocLite("org.Mm.eg.db")
    biocLite("TxDb.Mmusculus.UCSC.mm10.knownGene")
    biocLite("clusterProfiler")
    biocLite("ReactomePA")
    biocLite("DOSE")
    
    #load
    library("ChIPseeker")
    library("org.Mm.eg.db")
    library("TxDb.Mmusculus.UCSC.mm10.knownGene")
    txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene
    library("clusterProfiler")
    
    # Read in bed file
    ring1B <- readPeakFile("F:/Data/ChIP/ring1B_peaks.narrowPeak")
    cbx7 <- readPeakFile("F:/Data/ChIP/cbx7_peaks.narrowPeak")
    RYBP <- readPeakFile("F:/Data/ChIP/RYBP2_summits.bed")
    suz12 <- readPeakFile("f:/Data/ChIP/suz12_peaks.narrowPeak")
[/code]

Chip peaks coverage plot

see peak Genome wide location

covplot(ring1B,chrs=c("chr17", "chr18")) #specific chr

ring1B

suz12

cbx7

RYBP

ring1B(chr17&18)

● Heatmap of ChIP binding to TSS regions

promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)  
tagMatrix <- getTagMatrix(ring1B, windows=promoter)  
tagHeatmap(tagMatrix, xlim=c(-3000, 3000), color="red")

Average Profile of ChIP peaks binding to TSS region

● Confidence interval estimated by bootstrap method

plotAvgProf(tagMatrix, xlim=c(-3000, 3000), conf = 0.95, resample = 1000)

peak Notes for

peak For comments annotatePeak**，TSS (transcription start site) region
 You can set it yourself. The default is(-3000，3000)，TxDb Refers to the genome of a species, e.g TxDb.Hsapiens.UCSC.hg38.knownGene,
TxDb.Hsapiens.UCSC.hg19.knownGene for human genome hg38 and hg19,
TxDb.Mmusculus.UCSC.mm10.knownGene and TxDb.Mmusculus.UCSC.mm9.knownGene for
mouse mm10 and mm9.

peakAnno <- annotatePeak(ring1B, tssRegion=c(-3000, 3000),  
TxDb=txdb, annoDb="org.Mm.eg.db")

visualization Pie and Bar plot

plotAnnoBar(peakAnno)  
vennpie(peakAnno)  
upsetplot(peakAnno)

Pie chart:

Bar chart:

upsetplot: upset The technique is applicable to the representation of more than 5 sets.

visualization TSS Regional TF binding loci

plotDistToTSS(peakAnno,  
title="Distribution of transcription factor-binding loci\nrelative to TSS")

Multiple peak Comparison of

Multiple peak set When commenting, build first list,Then use lapply.
list(name1=bed_file1,name2=bed_file2)RYBP There is a problem with the data. If you add it here, you will always report errors.

peaks <- list(cbx7=cbx7,ring1B=ring1B,suz12=suz12)  
promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)  
tagMatrixList <- lapply(peaks, getTagMatrix, windows=promoter)  
plotAvgProf(tagMatrixList, xlim=c(-3000, 3000))  
plotAvgProf(tagMatrixList, xlim=c(-3000, 3000), conf=0.95,resample=500,
facet="row")  
tagHeatmap(tagMatrixList, xlim=c(-3000, 3000), color=NULL)

ChIP peak annotation comparision

peakAnnoList <- lapply(peaks, annotatePeak, TxDb=txdb,  
tssRegion=c(-3000, 3000), verbose=FALSE)  
plotAnnoBar(peakAnnoList)  
plotDistToTSS(peakAnnoList)

Overlap of peaks and annotated genes

genes= lapply(peakAnnoList, function(i) as.data.frame(i)$geneId)  
vennplot(genes)


![Insert picture description here](https://img-blog.csdnimg.cn/20210608151750993.gif)

Topics: Data Analysis

Programmer Think