This function sets the configuration for `seqArchR`.

set_config(
  chunk_size = 500,
  k_min = 1,
  k_max = 50,
  mod_sel_type = "stability",
  bound = 10^-6,
  cv_folds = 5,
  parallelize = FALSE,
  n_cores = NA,
  n_runs = 100,
  alpha_base = 0,
  alpha_pow = 1,
  min_size = 25,
  result_aggl = "complete",
  result_dist = "euclid",
  checkpointing = TRUE,
  flags = list(debug = FALSE, time = FALSE, verbose = TRUE, plot = FALSE)
)

Arguments

chunk_size

Numeric. Specify the size of the inner chunks of sequences.

k_min

Numeric. Specify the minimum of the range of values to be tested for number of NMF basis vectors. Default is 1.

k_max

Numeric. Specify the maximum of the range of values to be tested for number of NMF basis vectors. Default is 50.

mod_sel_type

Character. Specify the model selection strategy to be used. Default is 'stability'. Another option is 'cv', short for cross-validation. Warning: The cross-validation approach can be time consuming and computationally expensive than the stability-based approach.

bound

Numeric. Specify the lower bound value as criterion for choosing the most appropriate number of NMF factors. Default is 1e-08.

cv_folds

Numeric. Specify the number of cross-validation folds used for model selection. Only used when mod_sel_type is set to 'cv'. Default value is 5.

parallelize

Logical. Specify whether to parallelize the procedure. Note that running seqArchR serially can be time consuming, especially when using cross-validation for model selection. See `n_cores`. Consider parallelizing with at least 2 or 4 cores.

n_cores

The number of cores to be used when `parallelize` is set to TRUE. If `parallelize` is FALSE, nCores is ignored.

n_runs

Numeric. Specify the number of bootstrapped runs to be performed with NMF. Default value is 100. When using cross-validation more than 100 iterations may be needed (upto 500).

alpha_base, alpha_pow

Specify the base and the power for computing 'alpha' in performing model selection for NMF. alpha = alpha_base^alpha_pow. Alpha specifies the regularization for NMF. Default: 0 and 1 respectively. _Warning_: Currently, not used (for future).

min_size

Numeric. Specify the minimum number of sequences, such that any cluster/chunk of size less than or equal to it will not be further processed. Default is 25.

result_aggl

Character. Specify the agglomeration method to be used for final result collation with hierarchical clustering. Default is 'complete' linkage. Possible values are those allowed with hclust. Also see details below.

result_dist

Character. Specify the distance method to be used for final result collation with hierarchical clustering. Default is 'cor' for correlation. Possible values are those allowed with hclust. Also see details below.

checkpointing

Logical. Specify whether to write intermediate checkpoints to disk as RDS files. Checkpoints and the final result are saved to disk provided the `o_dir` argument is set in seqArchR. When `o_dir` argument is not provided or NULL, this is ignored. Default is TRUE.

flags

List with four logical elements as detailed.

debug

Whether debug information for the run is printed

verbose

Whether verbose information for the run is printed

plot

Whether verbose plotting is performed for the run

time

Whether timing information is printed for the run

Value

A list with all params for seqArchR set

Details

Setting suitable values for the following parameters is dependent on the data: 'inner_chunk_size', 'k_min', 'k_max', 'mod_sel_type', 'min_size', 'result_aggl', 'result_dist'.

Examples

# Set seqArchR configuration
seqArchRconfig <- seqArchR::set_config(
    chunk_size = 100,
    parallelize = TRUE,
    n_cores = 2,
    n_runs = 100,
    k_min = 1,
    k_max = 20,
    mod_sel_type = "stability",
    bound = 10^-8,
    flags = list(debug = FALSE, time = TRUE, verbose = TRUE,
        plot = FALSE)
)