Update transcript metadatda for importData() imported data
Source: R/mixed_reference.R
updateMetadata.RdThis function takes as input a SummarizedExperiment as output by importData(),
and will update the metadata on the transcripts when possible
(updating rowData and/or rowRanges depending on the value of ranges).
importData() uses metadata pulled from digest matches in registries used by tximeta
(linkedTxome, linkedTxpData, and the pre-computed digests).
Additionally, GRanges or data.frame-type data can be provided on a one-time basis
via the argument txpData, which will annotate transcripts with index="user".
See inspectDigests() for how to inspect which indices have matching digests,
and how to link data to local metadata in a persistent manner.
Arguments
- se
the SummarizedExperiment (SE) output by
importData()- txpData
either GRanges or data.frame-type object to use if there is not a match based on digest. This is used on a one-time basis, and transcripts will be marked in metadata columns as
index = "user"``. SeemakeLinkedTxome()ormakeLinkedTxpData()` for persistent metadata storage/retrieval- ranges
logical, whether to add
rowRanges(or justrowData)- prefer
vector of length up to 3, giving the preferred order of tximeta's transcript registries to when finding matches, with elements:
txome: linkedTxome,txpdata: linkedTxpData,precomputed: the pre-computed digests in tximeta- order
order of index, in which to update the metadata, by default the order is
annotation, thennovel, thenuser, info supplied here astxpData- key
a named character vector of length 3. For each index (annotated, novel, and user)
keyis the name of the column to use for merging metadata withrownames(se). Theuserindex corresponds to data provided here astxpDataDefaults to"tx_name"which often matches the transcript names in GENCODE
Examples
example(importData)
#>
#> imprtD> # oarfish files using a mix of --annotated and --novel transcripts
#> imprtD> dir <- system.file("extdata/oarfish", package="tximportData")
#>
#> imprtD> names <- paste0("rep", 2:4)
#>
#> imprtD> files <- file.path(dir, paste0("sgnex_h9_", names, ".quant.gz"))
#>
#> imprtD> coldata <- data.frame(files, names)
#>
#> imprtD> # returns an un-ranged SE object
#> imprtD> se <- importData(coldata, type="oarfish")
#> reading in files with read.delim (install 'readr' package for speed up)
#> 1
#> 2
#> 3
#>
#> returning un-ranged SummarizedExperiment, see functions:
#> -- inspectDigests() to check matching digests
#> -- makeLinkedTxome/makeLinkedTxpData() to link digests to metadata
#> -- updateMetadata() to update metadata and optionally add ranges
# build custom novel GRanges data
library(GenomicRanges)
novel <- data.frame(
seqnames = paste0("chr", rep(1:22, each=500)),
start = 1e6 + 1 + 0:499 * 1000, end = 1e6 + 1 + 0:499 * 1000 + 1000 - 1,
strand = "+", tx_name = paste0("novel", 1:(22*500)),
gene_id = paste0("novel_gene", rep(1:(22*10), each=50)), type = "protein_coding"
)
novel_gr <- as(novel, "GRanges")
names(novel_gr) <- novel$tx_name
# now update the metadata + ranges:
if (FALSE) { # \dontrun{
# this requires connection to internet (will download GENCODE GTF via FTP)
se_with_ranges <- updateMetadata(
se, txpData=novel_gr, ranges=TRUE
)
mcols(se_with_ranges)
} # }