R/seqArchR_auxiliary_functionsI.R
collate_seqArchR_result.Rd
We use hierarchical clustering for reordering/collating raw clusters from seqArchR's given iteration.
The seqArchR result object.
Specify clusters at which iteration of seqArchR are to be reordered/collated. Default is the last iteration of the seqArchR result object.
Specify 'hc' for hierarchical clustering. Currently, only hierarchical clustering is supported.
One of linkage values as specified for hierarchical
clustering with hclust
. Default is 'ward.D'.
Distance measure to be used with hierarchical clustering. Available options are "euclid" (default), "cor" for correlation, "cosangle" for cosine angle.
Logical. Specify TRUE if regularization is to be performed before comparison. Default is FALSE. Also see argument 'topN'.
Use only the top N dimensions of each basis vector for comparing them. Note that since each basis vector has 4L or 16L (mono- or dinucleotides) dimensions, each dimension is a combination of nucleotide and its position in the sequence. This argument selects the top N dimensions of the basis vector. This is ignored when argument 'regularize' is FALSE.
Logical. Specify TRUE if collation using hierarchical agglomerative clustering is to be performed, otherwise FALSE.
Logical. Use this argument when you want hierarchical clustering to be performed but not collation of clusters. Therefore, setting return_order to TRUE will return the hierarchical clustering object itself. This enables custom downstream processing/analysis.
Pass the flags object similar to the flags in configuration of the seqArchR result object.
ignored
When `collate` is TRUE, a list with the following elements is returned:
A list storing collation information of the basis vectors, i.e, IDs of basis vectors that were collated into one.
A list of sequences in each collated cluster.
Cluster labels for all sequences according to the collated clustering.
When 'collate' is FALSE, it returns the already existing basis vectors, each as singleton clusters. The sequence cluster labels and sequence clusters are also handled accordingly. All are available as part of the same list as the earlier case.
When 'return_order' is set to TRUE, the hierarchical clustering result is returned instead.
res <- readRDS(system.file("extdata", "example_seqArchRresult.rds",
package = "seqArchR", mustWork = TRUE))
# While the default settings for collation use Euclidean distance and
# ward.D agglomeration, one can choose to use different settings, say,
# correlation distance and complete linkage, and also regularizing to use
# only top 50 dimensions (nucleotide-positions combinations)
collated_res <- collate_seqArchR_result(result = res, iter = 2,
aggl_method = "complete", dist_method = "cor",
regularize = TRUE, topn = 50)
names(collated_res)
#> [1] "basisVectorsClust" "clusters" "seqsClustLabels"