This function sets the configuration for `seqArchR`.
set_config(
chunk_size = 500,
k_min = 1,
k_max = 50,
mod_sel_type = "stability",
bound = 10^-6,
cv_folds = 5,
parallelize = FALSE,
n_cores = NA,
n_runs = 100,
alpha_base = 0,
alpha_pow = 1,
min_size = 25,
result_aggl = "complete",
result_dist = "euclid",
checkpointing = TRUE,
flags = list(debug = FALSE, time = FALSE, verbose = TRUE, plot = FALSE)
)
Numeric. Specify the size of the inner chunks of sequences.
Numeric. Specify the minimum of the range of values to be tested for number of NMF basis vectors. Default is 1.
Numeric. Specify the maximum of the range of values to be tested for number of NMF basis vectors. Default is 50.
Character. Specify the model selection strategy to be used. Default is 'stability'. Another option is 'cv', short for cross-validation. Warning: The cross-validation approach can be time consuming and computationally expensive than the stability-based approach.
Numeric. Specify the lower bound value as criterion for choosing the most appropriate number of NMF factors. Default is 1e-08.
Numeric. Specify the number of cross-validation folds used for model selection. Only used when mod_sel_type is set to 'cv'. Default value is 5.
Logical. Specify whether to parallelize the procedure. Note that running seqArchR serially can be time consuming, especially when using cross-validation for model selection. See `n_cores`. Consider parallelizing with at least 2 or 4 cores.
The number of cores to be used when `parallelize` is set to TRUE. If `parallelize` is FALSE, nCores is ignored.
Numeric. Specify the number of bootstrapped runs to be performed with NMF. Default value is 100. When using cross-validation more than 100 iterations may be needed (upto 500).
Specify the base and the power for computing 'alpha' in performing model selection for NMF. alpha = alpha_base^alpha_pow. Alpha specifies the regularization for NMF. Default: 0 and 1 respectively. _Warning_: Currently, not used (for future).
Numeric. Specify the minimum number of sequences, such that any cluster/chunk of size less than or equal to it will not be further processed. Default is 25.
Character. Specify the agglomeration method to be used
for final result collation with hierarchical clustering. Default is
'complete' linkage. Possible values are those allowed with
hclust
. Also see details below.
Character. Specify the distance method to be used for
final result collation with hierarchical clustering. Default is 'cor' for
correlation. Possible values are those allowed with
hclust
. Also see details below.
Logical. Specify whether to write intermediate
checkpoints to disk as RDS files. Checkpoints and the final result are
saved to disk provided the `o_dir` argument is set in seqArchR
.
When `o_dir` argument is not provided or NULL, this is ignored.
Default is TRUE.
List with four logical elements as detailed.
Whether debug information for the run is printed
Whether verbose information for the run is printed
Whether verbose plotting is performed for the run
Whether timing information is printed for the run
A list with all params for seqArchR set
Setting suitable values for the following parameters is dependent on the data: 'inner_chunk_size', 'k_min', 'k_max', 'mod_sel_type', 'min_size', 'result_aggl', 'result_dist'.
# Set seqArchR configuration
seqArchRconfig <- seqArchR::set_config(
chunk_size = 100,
parallelize = TRUE,
n_cores = 2,
n_runs = 100,
k_min = 1,
k_max = 20,
mod_sel_type = "stability",
bound = 10^-8,
flags = list(debug = FALSE, time = TRUE, verbose = TRUE,
plot = FALSE)
)