importAllelicCounts.Rd
Read in Salmon quantification of allelic counts from a
diploid transcriptome. Assumes that diploid transcripts
are marked with the following suffix: an underscore and
a consistent symbol for each of the two alleles,
e.g. ENST123_M
and ENST123_P
,
or ENST123_alt
and ENST123_ref
, etc.
importAllelicCounts
requires the tximeta package.
Further information in Details below.
importAllelicCounts(
coldata,
a1,
a2,
format = c("wide", "assays"),
tx2gene = NULL,
...
)
a data.frame as used in tximeta
the symbol for the effect allele
the symbol for the non-effect allele
either "wide"
or "assays"
for whether
to combine the allelic counts as columns (wide) or put the allelic
count information in different assay slots (assays).
For wide output, the data for the non-effect allele (a2) comes first,
then the effect allele (a1), e.g. [a2 | a1]
. The ref
level
of the factor variable se$allele
will be "a2"
(so by default comparisons will be: a1 vs a2).
For assays output, all of the original matrices are renamed with a prefix,
either a1-
or a2-
.
optional, a data.frame with first column indicating
transcripts, second column indicating genes (or any other transcript
grouping). Alternatively, this can be a GRanges object with
required columns tx_id
, and group_id
(see makeTx2Tss
). For more information on this argument,
see Details.
any arguments to pass to tximeta
a SummarizedExperiment, with allele counts (and other data)
combined into a wide matrix [a2 | a1]
, or as assays (a1, then a2).
The original strings associated with a1 and a2 are stored in the
metadata of the object, in the alleles
list element.
Note the reference level of se$allele
will be "a2"
,
such that comparisons by default will be a1 vs a2
(effect vs non-effect).
Requirements - There must be exactly two alleles for each
transcript, and the --keep-duplicates
option should be used
in Salmon indexing to avoid removal of transcripts with identical
sequence. The output object has half the number of transcripts,
with the two alleles either stored in a "wide"
object, or as
re-named "assays"
. Note carefully that the symbol provided
to a1
is used as the effect allele, and a2
is used as
the non-effect allele (see the format
argument description
and Value description below).
tx2gene - The two columns should include the a1
and
a2
suffix for the transcripts and genes/groups, or those
will be added internally, if it is detected that the first
transcript does not have these suffices. For example if
_alt
or _ref
, or _M
or _P
(as indicated
by the a1
and a2
arguments) are not present in the
table, the table rows will be duplicated with those suffices added
on behalf of the user. If tx2gene
is not provided, the
output object will be transcript-level. Do not attempt to set the
txOut
argument, it will conflict with internal calls to
downstream functions. If the a1/a2 suffices are not at the end of
the transcript name in the quantification files,
e.g. ENST123_M|<metadata>
, then ignoreAfterBar=TRUE
can be used to match regardless of the string following |
in
the quantification files.
skipMeta=TRUE
is used, as it is assumed the diploid
transcriptome does not match any reference transcript
collection. This may change in future iterations of the function,
depending on developments in upstream annotations and software.
If tx2gene
is a GRanges object, the rowRanges of the output
will be the reduced ranges of the grouped input ranges, with
tx_id
collapsed into a CharacterList, and TSS positions
saved as an IntegerList, if these are not equal among the transcripts
of a group.
Other metadata columns are not manipulated, just the metadata
for the first range is returned.
Euphy Wu, Noor P. Singh, Kwangbom Choi, Mohsen Zakeri, Matthew Vincent, Gary A. Churchill, Cheryl L. Ackert-Bicknell, Rob Patro, Michael I. Love. "Detecting isoform-level allelic imbalance accounting for inferential uncertainty" bioRxiv (2022) https://doi.org/10.1101/2022.08.12.503785