`R/seqArchR_auxiliary_functionsI.R`

`collate_seqArchR_result.Rd`

We use hierarchical clustering for reordering/collating raw clusters from seqArchR's given iteration.

- result
The seqArchR result object.

- iter
Specify clusters at which iteration of seqArchR are to be reordered/collated. Default is the last iteration of the seqArchR result object.

- clust_method
Specify 'hc' for hierarchical clustering. Currently, only hierarchical clustering is supported.

- aggl_method
One of linkage values as specified for hierarchical clustering with

`hclust`

. Default is 'ward.D'.- dist_method
Distance measure to be used with hierarchical clustering. Available options are "euclid" (default), "cor" for correlation, "cosangle" for cosine angle.

- regularize
Logical. Specify TRUE if regularization is to be performed before comparison. Default is FALSE. Also see argument 'topN'.

- topn
Use only the top N dimensions of each basis vector for comparing them. Note that since each basis vector has 4L or 16L (mono- or dinucleotides) dimensions, each dimension is a combination of nucleotide and its position in the sequence. This argument selects the top N dimensions of the basis vector. This is ignored when argument 'regularize' is FALSE.

- collate
Logical. Specify TRUE if collation using hierarchical agglomerative clustering is to be performed, otherwise FALSE.

- return_order
Logical. Use this argument when you want hierarchical clustering to be performed but not collation of clusters. Therefore, setting return_order to TRUE will return the hierarchical clustering object itself. This enables custom downstream processing/analysis.

- flags
Pass the flags object similar to the flags in configuration of the seqArchR result object.

- ...
ignored

When `collate` is TRUE, a list with the following elements is returned:

- basisVectorsCLust
A list storing collation information of the basis vectors, i.e, IDs of basis vectors that were collated into one.

- clusters
A list of sequences in each collated cluster.

- seqClustLabels
Cluster labels for all sequences according to the collated clustering.

When 'collate' is FALSE, it returns the already existing basis vectors, each as singleton clusters. The sequence cluster labels and sequence clusters are also handled accordingly. All are available as part of the same list as the earlier case.

When 'return_order' is set to TRUE, the hierarchical clustering result is returned instead.

```
res <- readRDS(system.file("extdata", "example_seqArchRresult.rds",
package = "seqArchR", mustWork = TRUE))
# While the default settings for collation use Euclidean distance and
# ward.D agglomeration, one can choose to use different settings, say,
# correlation distance and complete linkage, and also regularizing to use
# only top 50 dimensions (nucleotide-positions combinations)
collated_res <- collate_seqArchR_result(result = res, iter = 2,
aggl_method = "complete", dist_method = "cor",
regularize = TRUE, topn = 50)
names(collated_res)
#> [1] "basisVectorsClust" "clusters" "seqsClustLabels"
```