The oarfish and salmon quantification tools allow a mix of
--annotated reference transcripts (e.g. GENCODE, Ensembl) and
--novel or custom transcripts (e.g. de novo assembled transcripts not present
in the annotated set) to be used as the index for quantification.
importData() and associated functions facilitate import, reference identification,
and addition of metadata across annotated and/or novel transcripts.
The importData() function alone imports the data, while inspection of the
recognized digests and updating of transcript metadata is handled by subsequent functions
(listed in See also section below).
Arguments
- coldata
data.frame with columns
filesandnamesas intximeta()- type
what quantifier was used (see
tximport::tximport());importData()supports"oarfish"and"salmon"(withmixedDigest)- mixedDigest
for
type="salmon", the filename (relative to each sample's quantification directory) of the JSON file containing per-sub-index sha256 digests, e.g."mixed_ref_digests.json". This file is produced by the companion Snakemake rule that runscompute_fasta_digeston the annotated and novel FASTAs at index time and writes the results in a structure compatible withinspectDigests()andupdateMetadata(). Not needed fortype="oarfish", which records digests natively.- quiet
whether to suppress printed messages
- ...
arguments passed to
tximport::tximport()
Value
an un-ranged SummarizedExperiment (SE) object, for use with subsequent functions described in See also section
Details
oarfish with mixed reference transcript sets may have been generated with e.g.
oarfish --only-index --annotated gencode.v48.transcripts.fa.gz \
--novel my_novel_txps.fa.gz --seq-tech ont-cdna --threads 32 \
--index-out gencode_plus_novel
oarfish --reads reads/experiment_rep1.fastq.gz --index gencode_plus_novel \
--output quants/experiment_rep1 --seq-tech ont-cdna \
--filter-group no-filters --threads 32
For salmon, per-sub-index digests are not recorded natively. Instead, a
companion Snakemake rule runs compute_fasta_digest on each FASTA at index
time and writes mixed_ref_digests.json into each quantification directory.
Pass the filename via mixedDigest so that importData() can find it:
se <- importData(coldata, type="salmon", mixedDigest="mixed_ref_digests.json")See also
inspectDigests() and updateMetadata() for subsequent tasks
of inspecting digest matches and updating metadata, respectively.
makeLinkedTxome() can be used to add custom metadata into the registry used
for inspecting digests and then updating transcript data. A user may
follow the workflow importData() > inspectDigests() >
makeLinkedTxome() > inspectDigests() > updateMetadata().
See also makeLinkedTxpData() for a lightweight alternative of linking
GRanges metadata to a digest.
Examples
# oarfish files using a mix of --annotated and --novel transcripts
dir <- system.file("extdata/oarfish", package="tximportData")
names <- paste0("rep", 2:4)
files <- file.path(dir, paste0("sgnex_h9_", names, ".quant.gz"))
coldata <- data.frame(files, names)
# returns an un-ranged SE object
se <- importData(coldata, type="oarfish")
#> reading in files with read.delim (install 'readr' package for speed up)
#> 1
#> 2
#> 3
#>
#> returning un-ranged SummarizedExperiment, see functions:
#> -- inspectDigests() to check matching digests
#> -- makeLinkedTxome/makeLinkedTxpData() to link digests to metadata
#> -- updateMetadata() to update metadata and optionally add ranges