- 1 chip SEQ Technology
- 1.1 concept
- 1.2 chip SEQ technology principle
- 2 chip SEQ data analysis
- 2.1 data download
- 2.2 data_assess ment
- 2.3 mapping_analysis
- 2.4 Peak_calling
- MACS2
- 2.4.1 MACS2 core: callpack usage
- 2.4.2 description of callpack result file
- 2.4.3 bdg file → wig file
- MACS2
- 2.5 Peak_anno
- ChIPseeker
Chip SEQ is only the first mature technology in the field of epigenetics. At present, there are many other technologies, such as
DNA modification: DNA methylation, immunocoprecipitation (MeDIP), target region methylation, genome-wide methylation (WGBS), oxidation bisulfite sequencing (oxbs SEQ),
TET assisted bisulfite sequencing (tab SEQ)
RNA modification: RNA methylation immunocoprecipitation (MeRIP)
Protein nucleic acid interaction: Rip SEQ, chip SEQ, clip seq
And what can ATAC SEQ do? (
http://www.biotrainee.com/thread-1218-1-1.html
)
1 chip SEQ Technology
1.1 concept
Chromatin Immunoprecipitation (ChIP)
)Also known as binding site analysis, it is a method to study the interaction between protein and DNA in vivo. It is usually used to study the binding sites of transcription factors or histone specific modification sites.
ChIP SEQ technology combining ChIP and second-generation sequencing technology
, it can efficiently detect DNA fragments interacting with histones and transcription factors in the whole genome.
1.2 chip SEQ technology principle
In the physiological state, the DNA in the cell is crosslinked with protein, the cell is cleaved, the chromosome is separated, and the chromatin is randomly cut by ultrasound or enzyme treatment;
The DNA fragment combined with the target protein was precipitated by the specific recognition reaction of antigen and antibody;
The DNA fragment of the binding protein was released by reverse cross-linking;
Purification;
The sequences of DNA fragments were obtained by sequencing, and finally these DNA fragments were aligned to the corresponding reference genome.
! [write picture description here]( https://img-
blog.csdn.net/20180814104707287?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTc0NTg1OA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70)
2 chip SEQ data analysis
2.1 data download
GSE98149 (including all stages of H3K9me3, zygote, E6.5 Epi, E6.5 Exe and E7.5 of H3K4me3 and H3K27me3)
Epi,E7.5 Exe,E8.5 embryo,Esc)
Title: Reprogramming of H3K9me3-dependent heterochromatin during mammalian
early embryo development [ChIP-seq]
Organism: Mus musculus
for ((i=594;i<=670;i++)); do wget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP105/SRP105176/SRR5479$i/SRR5479$i.sra; done & [/code] ```code # sratookit: . sra file → fastq file ls *sra |while read id; do /home/chen/sratoolkit.2.8.2-ubuntu64/bin/fastq-dump --gzip --split-3 $id; done &
# Download the index of mouse reference genome wget -c "ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes/mm10.zip" & # decompression unzip mm10.zip & [/code] ## 2.2 data_assess ment ```code # Fastqc for quality control ls *fq | while read id; do fastqc -t 4 $id; done & # multiqc: batch viewing of QC results multiqc *fastqc.zip --export & [/code] ```code ## trimmomatic # Installing trimmatic wget -c http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.38.zip & unzip Trimmomatic-0.38.zip # Data cleaning # -threads setting up multithreading java -jar "/data/chen/biosoft/Trimmomatic-0.38/trimmomatic-0.38.jar" PE -threads 2 -phred33 \ # 2 input files ${name}_1.fq.gz ${name}_2.trim.fq.gz \ # 4 output files ${name}_R1.clean.fq.gz ${name}_R1.unpaired.fq.gz \ ${name}_R2.clean.fq.gz ${name}_R2.unpaired.fq.gz \ # ILLUMINACLIP: de connector # "$adapter"/Exome.fa: fasta file of adapter sequence # 2: There can be 2 mismatches in the 16 base length seed sequence # 30: with palindrome pattern, the matching score is at least 30 (about 50 bases) # 10: When using the simple pattern, the matching score is at least 10 (about 17 bases) ILLUMINACLIP:"$adapter"/Exome.fa:2:30:10 \ # LEADING:3, remove the base with mass value less than 3 from the beginning of the sequence; # Training: 3, remove bases with mass value less than 3 from the end of the sequence; # SLIDINGWINDOW:4:15, calculate the average base mass in a 4 bp window from the 5 'end, # If the average value is lower than 15, the read is truncated from this position; # Headcrop: < length > cut the specified length at the head end of reads; # MINLEN:36. If the length of reads is less than 36 bp, throw away the whole read. LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 HEADCROP:10 MINLEN:36 [/code] ![Write a picture description here](https://img- blog.csdn.net/20180904092004409?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTc0NTg1OA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70) ![Write a picture description here](https://img- blog.csdn.net/20180904092011835?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTc0NTg1OA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70) ## 2.3 mapping_analysis Bowtie2 or BWA ```code # bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} -S [<hit>] # -P / - threads nthreads sets the number of threads Default: 1 # -q reads is in fastq format # -X < BT2 IDX > index path # -1 < M1 > double terminal sequencing_ 1.fastq path. Can be multiple files separated by commas; Multiple documents must correspond to the documents formulated in - 2 < M2 >. # -2 < M2 > double ended sequencing_ 2.fastq path # -U < R > fastq path for non two terminal sequencing. Can be multiple files separated by commas. # -S < hit > output Sam format file. # -3 / - - trim3 < int > cut the base with the length of < int > at the 3 'end, and then use it for comparison. (default: 0). # After looking at the data quality with fastqc, I found that there was a problem with the quality of the three terminals, so I used the - 3 5 --local parameter, # --Local if the fq file has not been subjected to trim, you can perform soft clipping with local comparison and add the parameter -- local. In this mode, read is locally compared, so that some bases at both ends of read are not compared, so that the comparison score meets the requirements bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479594_1.fastq -2 SRR5479594_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/MII_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479595_1.fastq -2 SRR5479595_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/MII_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479597_1.fastq -2 SRR5479597_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/sperm_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479598_1.fastq -2 SRR5479598_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/sperm_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479596_1.fastq -2 SRR5479596_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/sperm_input.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479599_1.fastq -2 SRR5479599_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/zygote_input.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479605_1.fastq -2 SRR5479605_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/zygote_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479606_1.fastq -2 SRR5479606_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/zygote_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479607_1.fastq -2 SRR5479607_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/2cell_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479608_1.fastq -2 SRR5479608_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/2cell_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479609_1.fastq -2 SRR5479609_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/4cell_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479610_1.fastq -2 SRR5479610_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/4cell_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479611_1.fastq -2 SRR5479611_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/8cell_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479612_1.fastq -2 SRR5479612_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/8cell_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479613_1.fastq -2 SRR5479613_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/morula_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479614_1.fastq -2 SRR5479614_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/morula_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479615_1.fastq -2 SRR5479615_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/ICM_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479616_1.fastq -2 SRR5479616_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/ICM_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479617_1.fastq -2 SRR5479617_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TE_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479618_1.fastq -2 SRR5479618_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TE_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479623_1.fastq -2 SRR5479623_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/ESC_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479624_1.fastq -2 SRR5479624_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/ESC_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479625_1.fastq -2 SRR5479625_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TSC_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479626_1.fastq -2 SRR5479626_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TSC_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479627_1.fastq -2 SRR5479627_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TSC_rep3.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479628_1.fastq -2 SRR5479628_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Epi_input.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479634_1.fastq -2 SRR5479634_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Epi_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479635_1.fastq -2 SRR5479635_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Epi_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479636_1.fastq -2 SRR5479636_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Exe_input.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479642_1.fastq -2 SRR5479642_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Exe_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479643_1.fastq -2 SRR5479643_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Exe_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479644_1.fastq -2 SRR5479644_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_input.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479650_1.fastq -2 SRR5479650_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479651_1.fastq -2 SRR5479651_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479652_1.fastq -2 SRR5479652_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_rep3.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479653_1.fastq -2 SRR5479653_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_rep4.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479654_1.fastq -2 SRR5479654_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Exe_input.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479661_1.fastq -2 SRR5479661_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Exe_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479662_1.fastq -2 SRR5479662_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Exe_rep2.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479663_1.fastq -2 SRR5479663_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E85Epi_input.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479669_1.fastq -2 SRR5479669_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E85Epi_rep1.sam & bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479670_1.fastq -2 SRR5479670_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E85Epi_rep2.sam & [/code] ![Write a picture description here](https://img- blog.csdn.net/2018090411251642?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTc0NTg1OA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70) ![Write a picture description here](https://img- blog.csdn.net/20180904112404199?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTc0NTg1OA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70) ![Write a picture description here](https://img- blog.csdn.net/20180904161257837?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTc0NTg1OA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70) ![Write a picture description here](https://img- blog.csdn.net/2018090416131067?watermark/2/text/aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80MTc0NTg1OA==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70) ## 2.4 Peak_calling ### **MACS2** peaks calling: Look for possible binding sites, that is, a large number of binding sites in the genome reads Enriched areas. #### 2.4.1 MACS2 core: callpack usage ```code # Example for regular peak calling macs2 callpeak -t ChIP.bam -c Control.bam -f BAM -g hs -n test -B -q 0.01 # Example for broad peak calling macs2 callpeak -t ChIP.bam -c Control.bam --broad -g hs --broad-cutoff 0.1 [/code] ```code # Batch callpack macs2 callpeak -c IgGold.bam -t suz12.bam -q 0.05 -f BAM -g mm -n suz12 2> suz12.macs2.log & macs2 callpeak -c IgGold.bam -t ring1B.bam -q 0.05 -f BAM -g mm -n ring1B 2> ring1B.macs2.log & macs2 callpeak -c IgGold.bam -t cbx7.bam -q 0.05 -f BAM -g mm -n cbx7 2> cbx7.macs2.log & macs2 callpeak -c IgGold.bam -t RYBP.bam -q 0.01 -f BAM -g mm -n RYBP 2>RYBP.macs2.log & [/code] **-t** /–treatment FILENAME-Processing group input This is the only REQUIRED parameter for MACS. File can be in any supported format specified by –format option. Check –format for detail. If you have more than one alignment files, you can specify them as ` -t A B C ` . MACS will pool up all these files together. **-c** /–control-Control group input The control or mock data file. Please follow the same direction as for -t/–treatment. **-f** /–format FORMAT--t and-c Provide file format, current MACS Recognizable formats are“ ELAND", "BED", "ELANDMULTI", "ELANDEXPORT", "ELANDMULTIPET" (Two terminal sequencing), "SAM", "BAM", "BOWTIE", "BAMPE", "BEDPE". "Except" BAMPE", "BEDPE"Except for special declaration, other formats can be used AUTO Automatic detection. If this option is not provided, the selection is automatically detected. Format of tag file, can be "ELAND", "BED", "ELANDMULTI", "ELANDEXPORT", "ELANDMULTIPET" (for pair-end tags), "SAM", "BAM", "BOWTIE", "BAMPE" or "BEDPE". Default is "AUTO" which will allow MACS to decide the format automatically. "AUTO" is also usefule when you combine different formats of files. Note that MACS can't detect "BAMPE" or "BEDPE" format with "AUTO", and you have to implicitly specify the format for "BAMPE" and "BEDPE". **-g** /–gsize-Genome size, provided by default hs, mm, ce, dm If the options are not included, for example, Arabidopsis, you need to provide them yourself (according to Arabidopsis) NCBI The display is 119,667,750,That is 1.2e8). PLEASE assign this parameter to fit your needs! It's the mappable genome size or effective genome size which is defined as the genome size which can be sequenced. Because of the repetitive features on the chromsomes, the actual mappable genome size will be smaller than the original size, about 90% or 70% of the genome size. The default hs – 2.7e9 is recommended for UCSC human hg18 assembly. Here are all precompiled parameters for effective genome size: hs: 2.7e9 (Human is 2.7e9,That is 2.7G) mm: 1.87e9 ce: 9e7 dm: 1.2e8 **-n** /–name-Prefix name of the output file. Indicates the name of the experiment, Please choose a meaningful name. The name string of the experiment. MACS will use this string NAME to create output files like 'NAME_peaks.xls', 'NAME_negative_peaks.xls', 'NAME_peaks.bed' , 'NAME_summits.bed', 'NAME_model.r' and so on. So please avoid any confliction between these filenames and your existing files. **-B** /–bdg Will save more information in bedGraph File, such as fragment pileup, control lambda, -log10pvalue and -log10qvalue scores. If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+'_treat_pileup.bdg' for treatment data, NAME+'_control_lambda.bdg' for local lambda values from control, NAME+'_treat_pvalue.bdg' for Poisson pvalue scores (in -log10(pvalue) form), and NAME+'_treat_qvalue.bdg' for q-value scores from Benjamini–Hochberg–Yekutieli procedure [ http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests ](http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests) **-q** /–qvalue-q Value, that is, the smallest PDR Threshold, 0 by default.05. q Value is based on p Value utilization BH Calculation, that is, the corrected results of multiple tests. The qvalue (minimum FDR) cutoff to call significant regions. Default is 0.01. For broad marks, you can try 0.05 as cutoff. Q-values are calculated from p-values using Benjamini-Hochberg procedure. **-p** /–pvalue-p Values, specifying p After value MACS2 It won't work q It's worth it. The pvalue cutoff. If -p is specified, MACS2 will use pvalue instead of qvalue. **-m** /–mfold-and MFOLD Relevant, and MFOLD and MACS About the pre built model, the default is 5:50, MACS Will look for more than 100 first peak Area construction model, generally do not need to change, because you do not understand. This parameter is used to select the regions within MFOLD range of high- confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:5,50 means using all regions not too low (>5) and not too high (<50) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set. #### 2.4.2 description of callpack result file ```code # Count the number of * bed rows (peak number) in the current directory wc -l *bed 2384 cbx7_summits.bed 8342 ring1B_summits.bed 0 RYBP_summits.bed 7619 suz12_summits.bed # Count the number of hello lines in file a: # grep hello a | wc -l # wc(word count) #-c count the number of bytes. #-l count the number of rows. line #-m counts the number of characters. This flag cannot be used with the - c flag. #-w count the number of words. A word is defined as a string separated by white space, skip, or newline characters. #-L prints the length of the longest line. [/code] callpeak The following result files will be obtained: **NAME_summits.bed** : Browser Extensible Data,Record each peak of peak summits,In other words, record the position of the extreme point. MACS It is suggested to use this document to find the binding sites motif. Can load directly UCSC browser,The first line needs to be removed when analyzing with other software. **NAME_peaks.xls** : Store in form peak Information, although the suffix is xls,But it can actually be opened with a text editor, and bed The format is similar, but **Based on 1** ,and bed The file is based on 0.in other words xls All coordinates need to be minus one bed Coordinates of the file. **NAME_peaks.narrowPeak** , **NAME_peaks.broadPeak** similar. The last four columns are represented by, integer score for display, fold-change,-log10pvalue,-log10qvalue,relative summit position to peak start. Content and NAME_peaks.xls Basically consistent, suitable for import R Analysis. **NAME_model.r** : Can pass ` $ Rscript NAME_model.r ` The drawing is based on the data you provide peak Model. **.bdg** : Can use UCSC genome browser Convert to smaller bigWig File. #### 2.4.3 bdg file → wig file > For convenience IGV View on ChIP-seq The results and later visual display need to be macs2 The results are translated into bw provide for IGV. There are three steps: Step 1: use bdgcmp obtain **FE** perhaps **logLR Converted file** (Run MACS2 bdgcmp to generate fold-enrichment and logLR track) ```code macs2 bdgcmp -t H3K36me1_EE_rep1_treat_pileup.bdg -c H3K36me1_EE_rep1_control_lambda.bdg -o H3K36me1_EE_rep1_FE.bdg -m FE macs2 bdgcmp -t H3K36me1_EE_rep1_treat_pileup.bdg -c H3K36me1_EE_rep1_control_lambda.bdg -o H3K36me1_EE_rep1_logLR.bdg -m logLR -p 0.00001 # Parameter interpretation # -m FE calculation enrichment factor to reduce noise # -p to avoid an error when the input value of log is 0, a small value is given [/code] Step 2: **Pretreatment** File, Download corresponding **Reference genome chromosome length** file use conda Install the following three processing software: bedGraphToBigWig bedClip bedtools Download chromosome length file: [ http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz ](http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz) And decompress (for human,Other species can be downloaded from similar websites) Write a little sh Scripts facilitate one-step conversion name.sh: ```code #!/bin/bash # check commands: slopBed, bedGraphToBigWig and bedClip which bedtools &>/dev/null || { echo "bedtools not found! Download bedTools: <http://code.google.com/p/bedtools/>"; exit 1; } which bedGraphToBigWig &>/dev/null || { echo "bedGraphToBigWig not found! Download: <http://hgdownload.cse.ucsc.edu/admin/exe/>"; exit 1; } which bedClip &>/dev/null || { echo "bedClip not found! Download: <http://hgdownload.cse.ucsc.edu/admin/exe/>"; exit 1; } # end of checking if [ $# -lt 2 ];then echo "Need 2 parameters! <bedgraph> <chrom info>" exit fi F=$1 G=$2 bedtools slop -i ${F} -g ${G} -b 0 | bedClip stdin ${G} ${F}.clip LC_COLLATE=C sort -k1,1 -k2,2n ${F}.clip > ${F}.sort.clip bedGraphToBigWig ${F}.sort.clip ${G} ${F/bdg/bw} rm -f ${F}.clip ${F}.sort.clip [/code] chmod +x name.sh Step 3: generate **bw** file ```code ./name.sh H3K36me1_EE_rep1_FE.bdg hg19.len ./name.sh H3K36me1_EE_rep1_logLR.bdg hg19.len [/code] Finally get the product. As for which one to use as the input file, you can come as needed H3K36me1EErep1_FE.bw H3K36me1EErep1_logLR.bw ## 2.5 Peak_anno ### ChIPseeker > ChIPseeker The functions of are divided into three categories: > Notes: extracting peak Near the nearest gene, note peak Area. > Comparison: estimation ChIP peak The significance of overlap in the dataset; integration GEO Data set to compare current results with known results. > Visualization: peak Coverage of; TSS Region combined peak Mean expression profile and heat map of; Genome annotation; TSS Distance; peak And gene overlap. ```code # Load ChIPseeker, genome annotation package and bed data biocLite("ChIPseeker") biocLite("org.Mm.eg.db") biocLite("TxDb.Mmusculus.UCSC.mm10.knownGene") library("ChIPseeker") # download source ("https://bioconductor.org/biocLite.R") biocLite("ChIPseeker") biocLite("org.Mm.eg.db") biocLite("TxDb.Mmusculus.UCSC.mm10.knownGene") biocLite("clusterProfiler") biocLite("ReactomePA") biocLite("DOSE") #load library("ChIPseeker") library("org.Mm.eg.db") library("TxDb.Mmusculus.UCSC.mm10.knownGene") txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene library("clusterProfiler") # Read in bed file ring1B <- readPeakFile("F:/Data/ChIP/ring1B_peaks.narrowPeak") cbx7 <- readPeakFile("F:/Data/ChIP/cbx7_peaks.narrowPeak") RYBP <- readPeakFile("F:/Data/ChIP/RYBP2_summits.bed") suz12 <- readPeakFile("f:/Data/ChIP/suz12_peaks.narrowPeak") [/code] Chip peaks coverage plot see peak Genome wide location covplot(ring1B,chrs=c("chr17", "chr18")) #specific chr ring1B suz12 cbx7 RYBP ring1B(chr17&18) ● Heatmap of ChIP binding to TSS regions promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000) tagMatrix <- getTagMatrix(ring1B, windows=promoter) tagHeatmap(tagMatrix, xlim=c(-3000, 3000), color="red") Average Profile of ChIP peaks binding to TSS region ● Confidence interval estimated by bootstrap method plotAvgProf(tagMatrix, xlim=c(-3000, 3000), conf = 0.95, resample = 1000) peak Notes for peak For comments annotatePeak**,TSS (transcription start site) region You can set it yourself. The default is(-3000,3000),TxDb Refers to the genome of a species, e.g TxDb.Hsapiens.UCSC.hg38.knownGene, TxDb.Hsapiens.UCSC.hg19.knownGene for human genome hg38 and hg19, TxDb.Mmusculus.UCSC.mm10.knownGene and TxDb.Mmusculus.UCSC.mm9.knownGene for mouse mm10 and mm9. peakAnno <- annotatePeak(ring1B, tssRegion=c(-3000, 3000), TxDb=txdb, annoDb="org.Mm.eg.db") visualization Pie and Bar plot plotAnnoBar(peakAnno) vennpie(peakAnno) upsetplot(peakAnno) Pie chart: Bar chart: upsetplot: upset The technique is applicable to the representation of more than 5 sets. visualization TSS Regional TF binding loci plotDistToTSS(peakAnno, title="Distribution of transcription factor-binding loci\nrelative to TSS") Multiple peak Comparison of Multiple peak set When commenting, build first list,Then use lapply. list(name1=bed_file1,name2=bed_file2)RYBP There is a problem with the data. If you add it here, you will always report errors. peaks <- list(cbx7=cbx7,ring1B=ring1B,suz12=suz12) promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000) tagMatrixList <- lapply(peaks, getTagMatrix, windows=promoter) plotAvgProf(tagMatrixList, xlim=c(-3000, 3000)) plotAvgProf(tagMatrixList, xlim=c(-3000, 3000), conf=0.95,resample=500, facet="row") tagHeatmap(tagMatrixList, xlim=c(-3000, 3000), color=NULL) ChIP peak annotation comparision peakAnnoList <- lapply(peaks, annotatePeak, TxDb=txdb, tssRegion=c(-3000, 3000), verbose=FALSE) plotAnnoBar(peakAnnoList) plotDistToTSS(peakAnnoList) Overlap of peaks and annotated genes genes= lapply(peakAnnoList, function(i) as.data.frame(i)$geneId) vennplot(genes) ![Insert picture description here](https://img-blog.csdnimg.cn/20210608151750993.gif)