Summarizes abundances, counts, lengths, (and inferential
replicates or variance) from transcript- to gene-level.
Transcript IDs are stored as a CharacterList in the mcols
of the output object.
This function operates on SummarizedExperiment objects, and
will automatically access the relevant TxDb (by either finding it
in the BiocFileCache or by building it from an ftp location).
This function uses the tximport package to perform summarization,
where a method is defined that works on simple lists.
# S4 method for class 'SummarizedExperiment'
summarizeToGene(
object,
assignRanges = c("range", "abundant"),
varReduce = FALSE,
...
)
a SummarizedExperiment produced by tximeta
"range"
or "abundant"
, this argument
controls the way that the rowRanges
of the output object are assigned
(note that this argument does not affect data aggregation at all).
The default is to just output the entire range
of the gene,
i.e. the leftmost basepair to the rightmost basepair across all isoforms.
Alternatively, for expressed genes, one can obtain the start and end
of the most abundant
isoform (averaging over all samples).
Non-expressed genes will have range
-based positions.
For abundant
, for expressed genes,
the name of the range-assigned isoform
, max_prop
(maximum isoform proportion), and iso_prop
(numeric values for
isoform proportions) are also returned in mcols
whether to reduce per-sample inferential replicates
information into a matrix of sample variances variance
(default FALSE)
arguments passed to tximport
a SummarizedExperiment with summarized quantifications
and transcript IDs as a CharacterList in the mcols
example(tximeta)
#>
#> tximet> # point to a Salmon quantification file:
#> tximet> dir <- system.file("extdata/salmon_dm", package="tximportData")
#>
#> tximet> files <- file.path(dir, "SRR1197474", "quant.sf")
#>
#> tximet> coldata <- data.frame(files, names="SRR1197474", condition="A", stringsAsFactors=FALSE)
#>
#> tximet> # normally we would just run the following which would download the appropriate metadata
#> tximet> # se <- tximeta(coldata)
#> tximet>
#> tximet> # for this example, we instead point to a local path where the GTF can be found
#> tximet> # by making a linkedTxome:
#> tximet> indexDir <- file.path(dir, "Dm.BDGP6.22.98_salmon-0.14.1")
#>
#> tximet> fastaFTP <- c("ftp://ftp.ensembl.org/pub/release-98/fasta/drosophila_melanogaster/cdna/Drosophila_melanogaster.BDGP6.22.cdna.all.fa.gz",
#> tximet+ "ftp://ftp.ensembl.org/pub/release-98/fasta/drosophila_melanogaster/ncrna/Drosophila_melanogaster.BDGP6.22.ncrna.fa.gz")
#>
#> tximet> gtfPath <- file.path(dir, "Drosophila_melanogaster.BDGP6.22.98.gtf.gz")
#>
#> tximet> makeLinkedTxome(indexDir=indexDir, source="LocalEnsembl", organism="Drosophila melanogaster",
#> tximet+ release="98", genome="BDGP6.22", fasta=fastaFTP, gtf=gtfPath, write=FALSE)
#> linkedTxome is same as already in bfc
#>
#> tximet> se <- tximeta(coldata)
#> importing quantifications
#> reading in files with read.delim (install 'readr' package for speed up)
#> 1
#>
#> found matching linked transcriptome:
#> [ LocalEnsembl - Drosophila melanogaster - release 98 ]
#> loading existing TxDb created: 2024-10-18 15:14:14
#> loading existing transcript ranges created: 2024-10-18 15:14:14
#> Warning:
#>
#> Warning: the annotation is missing some transcripts that were quantified.
#> 5 out of 33706 txps were missing from GTF/GFF but were in the indexed FASTA.
#> (This occurs sometimes with Ensembl txps on haplotype chromosomes.)
#> In order to build a ranged SummarizedExperiment, these txps were removed.
#> To keep these txps, and to skip adding ranges, use skipMeta=TRUE
#>
#> Example missing txps: [FBtr0307759, FBtr0084079, FBtr0084080, ...]
#>
#> tximet> # to clear the entire linkedTxome table
#> tximet> # (don't run unless you want to clear this table!)
#> tximet> # bfcloc <- getTximetaBFC()
#> tximet> # bfc <- BiocFileCache(bfcloc)
#> tximet> # bfcremove(bfc, bfcquery(bfc, "linkedTxomeTbl")$rid)
#> tximet>
#> tximet>
#> tximet>
#> tximet>
gse <- summarizeToGene(se)
#> loading existing TxDb created: 2024-10-18 15:14:14
#> obtaining transcript-to-gene mapping from database
#> generating gene ranges
#> assignRanges='range': gene ranges assigned by total range of isoforms
#> see details at: ?summarizeToGene,SummarizedExperiment-method
#> summarizing abundance
#> summarizing counts
#> summarizing length