Fork me on GitHub
0%

Using BioT2Ex

Introduction

BioT2Ex provides a package or toolkit to conveniently convert the tabular TXT file into a well-annotated Excel file. BioT2Ex could facilitate the downstream data interpretation of high-throughput genomic sequencing analysis.

BioT2Ex now supports the results of the RNA-seq analysis,ChIP-seq analysis and WES analysis.

Installing BioT2Ex

Installing in R

1
2
3
4
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")

BiocManager::install("BioT2Ex")

Installing in Python

1
pip install BioT2Ex

Running BioT2Ex in R

Once installed the package, just go to your analysis directory, select the appropriate rendering functions according to the output results, and perform formatting conversion and style modification as desired.

Rendering the RNA-seq analysis results

Currently BioT2Ex’s render functions support the RNA-seq analysis results exported from edgeR , DESeq2 , limma and clusterProfiler.

render_edgeR

  1. Sort the output results descendingly based on the size of logFC (log2 transformed fold-change); add the “UP”- or “DOWN”-regulation annotations(abs(logFC)>1 & PValue<0.05);
  2. Add ENTREZID;
  3. Add GO number and GeneBank link of each gene;
  4. Convert the output to a file in XLSX format, add comment information for the header, and render the default table style, such as font, size, and column width;
  5. Add gradient color to the logFC column to indicate the expression level, with red representing up-regulation and blue representing down-regulation;
  6. Add statistical functions for some specific variables, and the statistical results are presented in additional worksheets, including the corresponding data results, and the visualized results are displayed in pie charts;
  7. Add gene description.

The original differential expression analysis was derived from the publication.

Procedures:

  1. Provide the data you want to process as input, such as DGEExact(egdeR object) or imported external TXT/CSV/Excel files as data frame and the rownames should be gene symbol ;

    For example:

    DGEExact

    or:

    edgerdata

  2. Provide the OrgDb (e.g., org.Hs.eg.db for human) for gene annotation, the corresponding taxonomy ID (e.g., taxID 9606 for human and the number of genes you want to output.

1
2
3
4
5
6
7
8
#' @param input  input data. [dataframe or DGEExact]
#' @param orgDB the org.xx.eg.db you need. [character]
#' @param taxid tax_id of the organism,can query by TaxIdentifier.[character]
#' @param filename the name of the output file.[character]
#' @param n The number of as you want to choose, default is 20000.[integer]
#' @param showInExcel open the output file in Excel or not.[logical]
render_edgeR(input,orgDB,taxid,n,filename,showInExcel)
data<-render_edgeR(y,orgDB='org.Hs.eg.db',taxid=9606,n=20000,filename="test",showInExcel=TRUE)

Output effect:

  1. The first worksheet(edgeR-3.34.0) : The rendered differential expression analysis data:

edgeR

  1. The second worksheet(up_down) : The number of up-regulated , down-regulated expression and no difference were classified and statistically displayed in a pie chart. The pie chart will also be saved locally:

    updown

render_DESeq2

  1. Sort the output results descendingly based on the size of log2FoldChange (log2 transformed fold-change); add the “UP”- or “DOWN”-regulation annotations (abs(log2FoldChange)>1 & Padj<0.05);
  2. Add ENTREZID;
  3. Add GO number and GeneBank link of each gene;
  4. Convert the output to a file in XLSX format, add comment information for the header, and render the default table style, such as font, size, and column width;
  5. Add gradient color to the logFC column to indicate the expression level, with red representing up-regulation and blue representing down-regulation;
  6. Add statistical functions for some specific variables, and the statistical results are presented in additional worksheets, including the corresponding data results, and the visualized results are displayed in pie charts;
  7. Add gene description.

Compared with the render result of edgeR, the header comments will be different.The original differential expression analysis was derived from the publication.

Procedures:

  1. Provide the data you want to process as input, such as DESeqDataSet(DESeq2 object) or imported external TXT/CSV/Excel files as data frame and the rownames should be gene symbol ;

    For example:

    DESeqDataSet

    or:

    deseqdata

  2. Provide the OrgDb (e.g., org.Hs.eg.db for human) for gene annotation, the corresponding taxonomy ID (e.g., taxID 9606 for human) and the number of genes you want to output.

1
2
3
4
5
6
7
8
#' @param input  input data. [dataframe or DGEExact]
#' @param orgDB the org.xx.eg.db you need. [character]
#' @param taxid tax_id of the organism,can query by TaxIdentifier.[character]
#' @param filename the name of the output file.[character]
#' @param n The number of as you want to choose, default is 20000.[integer]
#' @param showInExcel open the output file in Excel or not.[logical]
render_DESeq2(input,orgDB,taxid,filename,n,showInExcel)
data<-render_DESeq2(deseq,orgDB='org.Hs.eg.db',taxid=9606,filename="test_des",n=20000,showInExcel=FALSE)

Output effect:

  1. The first worksheet(DESeq2-1.32.0) : The rendered differential expression analysis data:

deseq2

  1. The second worksheet(up_down) : The number of up-regulated , down-regulated expression and no difference were classified and statistically displayed in a pie chart. The pie chart will also be saved locally:

    updown_deseq

render_limma

  1. Sort the output results descendingly based on the size of logFC (log2 transformed fold-change); add the “UP”- or “DOWN”-regulation annotations (abs(logFC)>1 & adj.P.Val<0.05);
  2. Add ENTREZID;
  3. Add GO number and GeneBank link of each gene;
  4. Convert the output to a file in XLSX format, add comment information for the header, and render the default table style, such as font, size, and column width;
  5. Add gradient color to the logFC column to indicate the expression level, with red representing up-regulation and blue representing down-regulation;
  6. Add statistical functions for some specific variables, and the statistical results are presented in additional worksheets, including the corresponding data results, and the visualized results are displayed in pie charts;
  7. Add gene description.

Compared with the render result of edgeR, the header comments will be different.The original differential expression analysis was derived from the publication.

Procedures:

  1. Provide the data you want to process as input, such as imported external TXT/CSV/Excel files as data frame and the rownames should be gene symbol .

For example:

limma
  1. Provide the OrgDb (e.g., org.Hs.eg.db for human) for gene annotation, the corresponding taxonomy ID (e.g., taxID 9606 for human) and the number of genes you want to output.
1
2
3
4
5
6
7
8
#' @param input  input data. [dataframe or DGEExact]
#' @param orgDB the org.xx.eg.db you need. [character]
#' @param taxid tax_id of the organism,can query by TaxIdentifier.[character]
#' @param filename the name of the output file.[character]
#' @param n The number of as you want to choose, default is 20000.[integer]
#' @param showInExcel open the output file in Excel or not.[logical]
render_limma(input,orgDB,taxid,filename,n,showInExcel)
data<-render_limma(input,orgDB='org.Hs.eg.db',taxid=9606,filename="test_limma",n=20000,showInExcel=FALSE)

Output effect:

  1. The first worksheet(limma-3.48.1) : The rendered differential expression analysis data:

    render_limma
  2. The second worksheet(up_down) : The number of up-regulated , down-regulated expression and no difference were classified and statistically displayed in a pie chart. The pie chart will also be saved locally:

    updown_limma

render_NOISeq

  1. Add ENTREZID;

  2. Add GO number and GeneBank link of each gene;

  3. Convert the output to a file in XLSX format, add comment information for the header, and render the default table style, such as font, size, and column width;

  4. Add gene description.

    Procedures:

    1. Provide the data you want to process as input, such as imported external TXT/CSV/Excel files as data frame and the rownames should be gene symbol .

      noiseq
      1. Provide the OrgDb (e.g., org.Hs.eg.db for human) for gene annotation, the corresponding taxonomy ID (e.g., taxID 9606 for human) and the number of genes you want to output.

        1
        2
        3
        4
        5
        6
        7
        8
        #' @param input  input data. [dataframe or DGEExact]
        #' @param orgDB the org.xx.eg.db you need. [character]
        #' @param taxid tax_id of the organism,can query by TaxIdentifier.[character]
        #' @param filename the name of the output file.[character]
        #' @param n The number of as you want to choose, default is 20000.[integer]
        #' @param showInExcel open the output file in Excel or not.[logical]
        render_NOISeq(input,orgDB,taxid,filename,n,showInExcel)
        data<-render_NOISeq(input,orgDB='org.Hs.eg.db',taxid=9606,filename="test_noiseq",n=20000,showInExcel=FALSE)

      Output effect:

      noiseqresult

render_clusterProfiler

BioT2Ex now supports the results of the clusterProfiler GO analysis ,KEGG analysis and GSEA analysis(GSEA,gseGO and gseKEGG).

render_clusterP_GO
  1. Add the geneontology database link for each term;
  2. Convert the output to a file in XLSX format, add comment information for the header, and render the default table style, such as font, size, and column width;
  3. Add statistical functions for some specific variables, and the statistical results are presented in additional worksheets, including the corresponding data results, and the visualized results are displayed in pie charts.

The original differential expression analysis was derived from the publication.

Procedures:

  1. Provide the data you want to process as input, such as Large enrichResult or imported external TXT/CSV/Excel files as data frame and the rownames should be gene symbol ;

    For example:

    enrichresultgo

    or:

    go

    2.Provide the filename you want to set.

    1
    2
    3
    4
    5
    #' @param input  input data. [dataframe or DGEExact]
    #' @param filename the name of the output file.[character]
    #' @param showInExcel open the output file in Excel or not.[logical]
    render_clusterP_GO(input,filename,showInExcel)
    data<-render_clusterP_GO(input,filename="test",showInExcel=FALSE)

Output effect:

  1. The first worksheet(clusterProfiler-4.2.0) : The rendered differential expression analysis data:
goresult

2.The second worksheet(ONTOLOGY) : The number of BP , CC and MF were classified and statistically displayed in a pie chart. The pie chart will also be saved locally:

goresult2
render_clusterP_KEGG
  1. Add the KEGG database link for each term;
  2. Convert the output to a file in XLSX format, add comment information for the header, and render the default table style, such as font, size, and column width.

The original differential expression analysis was derived from the publication.

Procedures:

  1. Provide the data you want to process as input, such as Large enrichResult or imported external TXT/CSV/Excel files as data frame and the rownames should be gene symbol ;

    For example:

enrichresultkegg

or:

kegg

2.Provide the filename you want to set.

1
2
3
4
5
#' @param input  input data. [dataframe or DGEExact]
#' @param filename the name of the output file.[character]
#' @param showInExcel open the output file in Excel or not.[logical]
render_clusterP_KEGG(input,filename,showInExcel)
data<-render_clusterP_KEGG(input,filename="test",showInExcel=FALSE)

Output effect:

The first worksheet(clusterProfiler-4.2.0) : The rendered differential expression analysis data:

KEGGresult

Rendering the ChIP-seq analysis results

Currently BioT2Ex supports the ChIP-seq analysis results from ChIPseeker and DiffBind.

render_ChIPseeker

  1. Output the results in XLSX format;
  2. Add corresponding “Ensembl”, “Symbol”, “GeneName” and description of gene;
  3. Render the style of the XLSX output and add header comments.

The original data is downloaded from the GEO database.(GSE144195_combined_peaks_with_CTCF_motif.bed.gz).

Procedures:

  1. Provide the data you want to process as input, such as imported external TXT/CSV/Excel files as data frame;
  2. Provide the OrgDb (e.g., org.Hs.eg.db for human) for gene annotation and the corresponding taxonomy ID (e.g., taxID 9606 for human) .
1
2
3
4
5
6
7
#' @param input  input data. [dataframe or DGEExact]
#' @param orgDB the org.xx.eg.db you need. [character]
#' @param taxid tax_id of the organism,can query by TaxIdentifier.[character]
#' @param filename the name of the output file.[character]
#' @param op open the output file in Excel or not.[logical]
render_ChIPseeker(input,orgDB,taxid,filename,op)
data<-render_ChIPseeker(y,orgDB='org.Hs.eg.db',taxid=9606,filename="test",op=TRUE)

Output effect:

chipseeker

render_DiffBind

  1. Convert the output to a file in XLSX format, add comment information for the header, and render the default table style, such as font, size, and column width;

  2. Add the “UP”- or “DOWN”-regulation annotations ;

  3. Add gradient color to the Fold column to indicate the expression level, with red representing up-regulation and blue representing down-regulation;

  4. Add statistical functions for some specific variables.

    Procedures:

    1.Provide the data you want to process as input, such as imported external TXT/CSV/Excel files as data frame:

    diffbind

2.Provide the filename and sheetname you want to set.

1
2
3
4
5
6
7
8
#' @param input  input data
#' @param n The number of as you want to choose, default is 20000.character
#' @param filename name the output file or it will be named by default.character
#' @param sheetname name the data sheet or it will be named by default.character
#' @param taxid tax_id of the organism,can query by TaxIdentifier,default homo sapiens 9606.character
#' @param showInExcel show in Excel or not,default FALSE.logical
render_DiffBind(input,n,filename="",sheetname="",taxid,showInExcel)
data_db <- render_DiffBind(diffbd,n=2000,filename="diffbind_out",sheetname="happysheet",showInExcel=TRUE)

Output effect:

diffbindresult

Basic functions of BioT2Ex in R

We also include some functions to further improve the readability of XLSX files.

add_Description

A function to add description of genes based on EntrezID. You need to provide the TaxID of the species and the column number of EntrezID (col.no).

1
2
3
4
5
6
#' @param input input data. [dataframe]
#' @param taxid tax_id of the organism,can query by TaxIdentifier.[character]
#' @param ncol the column number of geneId.[integer]
#' @param showInExcel show output in Excel or not.[logical]
add_Descriptioninput,taxid,ncol,showInExcel)
data_c1<-add_Description(data_c,taxid=9606,ncol=9,showInExcel=TRUE)

Output effect:

description

write_Out

This function appends the previously processed data into a workbook and names the workbook.

1
2
3
4
5
#' @param input  input data .[dataframe]
#' @param package which R package outputs the data.[character]
#' @param showInExcel show output in Excel or not.[logical]
write_Out(input,package,showInExcel)
wbc1<-write_Out(data_c1,package='ChIPseeker',showInExcel=TRUE)

Output effect:

effect

add_HeadAnnotation

Add header comments to the output result. Users can select the default line header comments (currently only the output results of edgeR, DESeq2 and ChIPseeker are supported) or customize the list of line header comments.

1
2
3
4
5
6
7
8
#' @param type the way to add head annotation.[character]
#' @param workbook which workbook to add feature
#' @param worksheet which worksheet to add feature.[integer]
#' @param mycomments add the custom head annotation by yourself.[character]
#' @param showInExcel show output in Excel or not.[logical]
add_HeadAnnotation(type,workbook,worksheet,mycomments,showInExcel)
add_HeadAnnotation(type='default',wbe1,worksheet=1,showInExcel=TRUE)
add_HeadAnnotation(type="userdefine",wb,worksheet=1,mycomments=c('a','b','c'),showInExcel=TRUE)

Output effect:

head

add_Feature

This function modifies the tabular style of the output.This function will set a uniform font size for the output results, and add a gradient representing the expression level and size for the foldchange column. Red represents up and blue represents down.Users need to enter the column number of log2foldchange.

1
2
3
4
5
6
7
#' @param input input data .dataframe
#' @param workbook which workbook to add feature
#' @param worksheet which worksheet to add feature,integer
#' @param colorcol the column number of log2foldchange ,numerical
#' @param showInExcel show output in Excel or not
add_Feature(input,workbook,worksheet,colorcol,showInExcel)
wbe3 <- add_Feature(data_e1,wbe2,worksheet = 1,colorcol = 2,showInExcel = TRUE)

Output effect:

color

fix_Number

This function can select the numeric column in the file and makes it displays correctly.

1
2
as.data.frame(a)
fix_Number(a$2)

showInExcel

A function to open selected directory or file in excel, this base fuction is copied from o function of yulab.utils(package).

1
showInExcel(a,"mydata.xlsx")

op

A function to open selected files (in Excel) automatically. This fuction is copied from o function of yulab.utils

1
op(file='myfile')

Running BioT2Ex in Python

BioT2Ex now supports the results of the MACS,MutSigCV,RSEM,dNdScv,MutPlanning,RF5,OncodriveCLUST and OncodriveFM.

biot2ex_macs

Using this function will render the input data as follows: output the pre-comment and post-comment data in XLSX format, add gradients to the -log10(pvalue) column to represent the numerical size, and add header comments to the output table, beautifying the table font, alignment and formatting, etc.

1
python -m biot2ex_macs inputfile

Output effect:

macs

biot2ex_mutsigcv

Using this function to convert input data to XLSX format, add header comments, and render table style.The data used for the example is from this site.

1
python -m biot2ex_mutsigcv inputfile

Output effect:

MutSigCV

biot2ex_rsem

Using this function to convert input data to XLSX format, add header comments, and render table style.

1
python -m biot2ex_rsem inputfile

Output effect:

RSEM

biot2ex_mutpanning

Using this function to convert input data to XLSX format, add header comments and render table style.

1
python -m biot2ex_mutpanning inputfile

Output effect:

mutplanning

biot2ex_rf5

Using this function to convert input data to XLSX format, add header comments and render table style.

1
python -m biot2ex_rf5 inputfile

Output effect:

rf5

biot2ex_dndscv

Using this function to convert input data to XLSX format, add header comments and render table style.This file will be broken down into some sheets in greater detail

1
python -m biot2ex_dndscv inputfile

Output effect:

dndscv dndscv_result

biot2ex_oncordriveclustl

Using this function to convert input data to XLSX format, add header comments and render table style.

1
python -m biot2ex_oncordriveclustl inputfile

Output effect:

1.Oncordriveclustl cluster results:

onchor

2.Oncordriveclustl elements results:

onchor_element

biot2ex_oncordrivefm

Using this function to convert input data to XLSX format, add header comments and render table style.

1
python -m biot2ex_oncordrivefm inputfile

Output effect:

  1. oncordrivefm CLL-genes results:

    oncordrivefm_CLL-genes
    1. oncordrivefm CLL-pathways results:

      oncordrivefm_CLL-pathways