We use hierarchical clustering for reordering/collating raw clusters from seqArchR's given iteration.

collate_seqArchR_result(
  result,
  iter = length(result$seqsClustLabels),
  clust_method = "hc",
  aggl_method = "ward.D",
  dist_method = "euclid",
  regularize = FALSE,
  topn = 50,
  collate = TRUE,
  return_order = FALSE,
  flags = list(debugFlag = FALSE, verboseFlag = TRUE),
  ...
)

Arguments

result

The seqArchR result object.

iter

Specify clusters at which iteration of seqArchR are to be reordered/collated. Default is the last iteration of the seqArchR result object.

clust_method

Specify 'hc' for hierarchical clustering. Currently, only hierarchical clustering is supported.

aggl_method

One of linkage values as specified for hierarchical clustering with hclust. Default is 'ward.D'.

dist_method

Distance measure to be used with hierarchical clustering. Available options are "euclid" (default), "cor" for correlation, "cosangle" for cosine angle.

regularize

Logical. Specify TRUE if regularization is to be performed before comparison. Default is FALSE. Also see argument 'topN'.

topn

Use only the top N dimensions of each basis vector for comparing them. Note that since each basis vector has 4L or 16L (mono- or dinucleotides) dimensions, each dimension is a combination of nucleotide and its position in the sequence. This argument selects the top N dimensions of the basis vector. This is ignored when argument 'regularize' is FALSE.

collate

Logical. Specify TRUE if collation using hierarchical agglomerative clustering is to be performed, otherwise FALSE.

return_order

Logical. Use this argument when you want hierarchical clustering to be performed but not collation of clusters. Therefore, setting return_order to TRUE will return the hierarchical clustering object itself. This enables custom downstream processing/analysis.

flags

Pass the flags object similar to the flags in configuration of the seqArchR result object.

...

ignored

Value

When `collate` is TRUE, a list with the following elements is returned:

basisVectorsCLust

A list storing collation information of the basis vectors, i.e, IDs of basis vectors that were collated into one.

clusters

A list of sequences in each collated cluster.

seqClustLabels

Cluster labels for all sequences according to the collated clustering.

When 'collate' is FALSE, it returns the already existing basis vectors, each as singleton clusters. The sequence cluster labels and sequence clusters are also handled accordingly. All are available as part of the same list as the earlier case.

When 'return_order' is set to TRUE, the hierarchical clustering result is returned instead.

Examples

res <- readRDS(system.file("extdata", "example_seqArchRresult.rds",
         package = "seqArchR", mustWork = TRUE))

# While the default settings for collation use Euclidean distance and
# ward.D agglomeration, one can choose to use different settings, say,
# correlation distance and complete linkage, and also regularizing to use
# only top 50 dimensions (nucleotide-positions combinations)
collated_res <- collate_seqArchR_result(result = res, iter = 2,
                        aggl_method = "complete", dist_method = "cor",
                        regularize = TRUE, topn = 50)

names(collated_res)
#> [1] "basisVectorsClust" "clusters"          "seqsClustLabels"