This function sets the configuration for `archR`.
archR_set_config( chunk_size = 500, k_min = 1, k_max = 50, mod_sel_type = "stability", bound = 10^-6, cv_folds = 5, parallelize = FALSE, n_cores = NA, n_runs = 100, alpha_base = 0, alpha_pow = 1, min_size = 25, result_aggl = "complete", result_dist = "cor", checkpointing = TRUE, flags = list(debug = FALSE, time = FALSE, verbose = TRUE, plot = FALSE) )
chunk_size | Numeric. Specify the size of the inner chunks of sequences. |
---|---|
k_min | Numeric. Specify the minimum of the range of values to be tested for number of NMF basis vectors. Default is 1. |
k_max | Numeric. Specify the maximum of the range of values to be tested for number of NMF basis vectors. Default is 50. |
mod_sel_type | Character. Specify the model selection strategy to be used. Default is 'stability'. Another option is 'cv', short for cross-validation. Warning: The cross-validation approach can be time consuming and computationally expensive than the stability-based approach. |
bound | Numeric. Specify the lower bound value as criterion for choosing the most appropriate number of NMF factors. Default is 1e-08. |
cv_folds | Numeric. Specify the number of cross-validation folds used for model selection. Only used when mod_sel_type is set to 'cv'. Default value is 5. |
parallelize | Logical. Specify whether to parallelize the procedure.
Note that running archR serially can be time consuming, especially when
using cross-validation for model selection. See `n_cores`.
Consider parallelizing with at least 2 or 4 cores. If Slurm is available,
archR's graphical user interface, accessed with |
n_cores | The number of cores to be used when `parallelize` is set to TRUE. If `parallelize` is FALSE, nCores is ignored. |
n_runs | Numeric. Specify the number of bootstrapped runs to be performed with NMF. Default value is 100. When using cross-validation more than 100 iterations may be needed (upto 500). |
alpha_base, alpha_pow | Specify the base and the power for computing 'alpha' in performing model selection for NMF. alpha = alpha_base^alpha_pow. Alpha specifies the regularization for NMF. Default: 0 and 1 respectively. _Warning_: Currently, not used (for future). |
min_size | Numeric. Specify the minimum number of sequences, such that any cluster/chunk of size less than or equal to it will not be further processed. Default is 25. |
result_aggl | Character. Specify the agglomeration method to be used
for final result collation with hierarchical clustering. Default is
'complete' linkage. Possible values are those allowed with
|
result_dist | Character. Specify the distance method to be used for
final result collation with hierarchical clustering. Default is 'cor' for
correlation. Possible values are those allowed with
|
checkpointing | Logical. Specify whether to write intermediate
checkpoints to disk as RDS files. Checkpoints and the final result are
saved to disk provided the `o_dir` argument is set in |
flags | List with four logical elements as detailed.
|
A list with all params for archR set
Setting suitable values for the following parameters is dependent on the data: 'inner_chunk_size', 'k_min', 'k_max', 'mod_sel_type', 'min_size', 'result_aggl', 'result_dist'.
# Set archR configuration archRconfig <- archR::archR_set_config( chunk_size = 100, parallelize = TRUE, n_cores = 2, n_runs = 100, k_min = 1, k_max = 20, mod_sel_type = "stability", bound = 10^-8, flags = list(debug = FALSE, time = TRUE, verbose = TRUE, plot = FALSE) )