Inspect digest matches from importData()
imported data
Source: R/mixed_reference.R
inspectDigests.Rd
This function takes as input a SummarizedExperiment as output by importData()
and returns a tibble with information about the digest-match status of two
indices (annotated
and novel
), with respect to tximeta metadata.
Inspection of index digests can be run iteratively, checking if
the digests used in the mixed reference transcript set
have a match against 1) pre-computed digests representing
standard annotated sets (e.g. GENCODE, Ensembl, etc.)
or 2) digests added by the user to a local registry with
makeLinkedTxome()
(GTF file)
or makeLinkedTxpData
(GRanges-based metadata).
Optional columns may be added if specified by
fullDigest=TRUE
(include the full digest) and/or
count=TRUE
(add matching transcript ID counts per index).
Following inspection, one can run updateMetadata()
to automatically update
the transcript metadata using the sources indicated by this function.
Usage
inspectDigests(
se,
type = "oarfish",
prefer = c("txome", "txpdata", "precomputed"),
fullDigest = FALSE,
count = FALSE
)
Arguments
- se
the SummarizedExperiment output by
importData()
, or alternatively justmetadata(se)$quantInfo
, a list of metadata information from the quantification tool (assumingannotated
andnovel
indices both used)- type
what quantifier was used (see
tximport::tximport()
)- prefer
vector of length up to 3, giving the preferred order of tximeta's transcript registries to when finding matches, with elements:
txome
: linkedTxome,txpdata
: linkedTxpData,precomputed
: the pre-computed digests in tximeta- fullDigest
whether to include the full digest string in the output, in addition to the shortened 6-char version
- count
whether to count the number of matching transcripts ID to each index (only possible for those indices that have matching metadata). Counting requires loading transcript data, either from locally cached databases or from GTF files.
Value
a 2-row tibble of the annotated
and novel
index,
their matching information if available
(source, organism, release), for matches,
whether it is a linkedTxome
or a linkedTxpData
(both `FALSE“ for pre-computed)
and a small 6 character version of the digest itself.
Examples
example(importData)
#>
#> imprtD> # oarfish files using a mix of --annotated and --novel transcripts
#> imprtD> dir <- system.file("extdata/oarfish", package="tximportData")
#>
#> imprtD> names <- paste0("rep", 2:4)
#>
#> imprtD> files <- file.path(dir, paste0("sgnex_h9_", names, ".quant.gz"))
#>
#> imprtD> coldata <- data.frame(files, names)
#>
#> imprtD> # returns an un-ranged SE object
#> imprtD> se <- importData(coldata, type="oarfish")
#> reading in files with read.delim (install 'readr' package for speed up)
#> 1
#> 2
#> 3
#>
#> returning un-ranged SummarizedExperiment, see functions:
#> -- inspectDigests() to check matching digests
#> -- makeLinkedTxome/makeLinkedTxpData() to link digests to metadata
#> -- updateMetadata() to update metadata and optionally add ranges
# now we have an `se` created by importData()...
inspectDigests(se)
#> # A tibble: 2 × 8
#> index source organism release genome linkedTxome linkedTxpData small_digest
#> <chr> <chr> <chr> <chr> <chr> <lgl> <lgl> <chr>
#> 1 annotat… GENCO… Homo sa… 48 GRCh38 FALSE FALSE 6fc626
#> 2 novel NA NA NA NA NA NA 43158f
# can then update the registry via makeLinkedTxome() and re-run inspection