Configuration Arguments

This page provides a list of possible configuration arguments. For examples of how a config file could look like, check out the files:

Data entries

dataset: Defines which dataset will be used. Currently supported is camelsus (CAMELS-US dataset by Newman et al., 2015. The code is intended to support other datasets but might require specific adaptations, such as extending the parent class to handle differences in dataset structure or features.
concept_data_dir: Specifies the path to the data source required for the conceptual model. This path should be defined in the configuration file src/utils/data_dir.yml.
forcings: This entry can be ignored if the dataset is not camelsus or unless it is strictly required by a newly defined dataset. It can be either a string or a list of strings corresponding to forcing products in the CAMELS dataset.

Examples: [daymet, maurer, maurer_extended, nldas].

General experiment entries

basin_file: Specifies the full or relative path to a text file containing the basin IDs used for training, validation, and testing. Each line in the file should contain a single basin ID, as defined in the dataset.
train_start_date: Start date of the training period (first day of discharge) in the format DD/MM/YYYY. Corresponding pairs of start and end dates denote the different periods.
train_end_date: End date of the training period (last day of discharge) in the format DD/MM/YYYY.
valid_start_date: Start date of the validation period (first day of discharge) in the format DD/MM/YYYY.
valid_end_date: End date of the validation period (last day of discharge) in the format DD/MM/YYYY.
metrics: Specifies the list of metrics to calculate during validation (testing). Available metrics include: NSE, Alpha-NSE, Beta-NSE, FHV, FMS, FLV, KGE, Beta-KGE, Peak-Timing, Peak-MAPE, Pearson-r.

Reference: For a full list of available metrics, see src/utils/metrics.
experiment_name: Defines the name of your experiment that will be used as a folder name (+ date-time string suffix) to save the model and results.,
device: Which device to use in format of cuda:0, cuda:1, etc, for GPUs or cpu.
seed: Fixed random seed. If empty, a random seed is generated for this run.
precision: Sets the precision for the model. Supported options: float32, float64.
verbose: Specifies the verbosity level of the model’s logging and progress display. - 0: Only log informational messages; progress bars are not shown. - 1: Show progress bars along with informational messages.

Conceptual model entries

concept_model: Specifies the conceptual model to use. Supported models include exphydro. The code is intended to support other conceptual models but might require specific adaptations, such as extending the parent class to accommodate the specifics of a newly defined model.

exphydro: A two-bucket model (water and snow) with 5 processes and 6 parameters. (Höge et al., 2022.)
ode_solver_lib: Specifies the library used for solving ODEs. Supported options include scipy and torchdiffeq.
- scipy: Solves ODEs using solve_ivp from scipy.integrate. Reference: SciPy Documentation.
  
  Supported methods: RK45, RK23, DOP853, Radau, BDF, LSODA.
  
  Example:
  
  For adaptive-step solvers:
  - ode_solver_lib: scipy
  - odesmethod: RK23
  - rtol: 1e-4
  - atol: 1e-6
  Note: Methods such as euler and rk4 are not part of the scipy module and have been separately implemented in the model class.
  
  For fixed-step solvers:
  - ode_solver_lib: scipy
  - odesmethod: euler
  - time_step: 0.5
- torchdiffeq: Solves ODEs using the torchdiffeq library. Reference: torchdiffeq documentation. Supported methods: euler, rk4, midpoint, adaptive_heun, bosh3, dopri5.
  
  Example:
  - ode_solver_lib: torchdiffeq
  - odesmethod: dopri5
  - rtol: 1e-4
  - atol: 1e-6

Neural network entries

data_dir: Specifies the folder that contains the data obtained by running the conceptual model. The path should be: src/data/data_dir - beware of locating the data in the correct folder.
nn_model: Specifies the neural network model to use. Supported models include mlp and lstm. The code is intended to support other neural network models but might require specific adaptations.

mlp: A multi-layer perceptron model with fully connected layers.

lstm: A Long Short-Term Memory model.
hidden_size: Specifies the number of hidden units in each layer of the neural network.

Example: [32, 32, 32, 32, 32]
seq_length: Length of the input sequence. Only required for LSTM models.
nn_dynamic_inputs: Specifies the dynamic inputs to the neural network.

Example: [s_snow, s_water, prcp, tmean]
nn_mech_targets: Specifies the mechanistic targets to the neural network (neural network outputs).

Example: [ps_bucket, pr_bucket, m_bucket, et_bucket, q_bucket]
target_variables: Specifies the main target variables for the neural network - the one that will be used to train the model.

Example: [obs_runoff]

Note: The nn_dynamic_inputs, nn_mech_targets, adn target_variables entries should be consistent with the variables in the dataset and be inluded as model_inputs, nn_mech_targets, and target_variables, respectively, in the concept_model entry definded in the file src/utils/concept_model_vars.yml.
loss_pretrain: Specifies the loss function to use during the pre-training phase. Supported options include nse and mae, but the code is intended to support other loss functions.
lr_pretrain: Specifies the learning rate for the pre-training phase.
epochs_pretrain: Specifies the number of epochs for the pre-training phase.

Hybrid model entries

data_dir: Same as in the Neural network entries.

hybrid_model: Specifies the hybrid model to use. Supported models include exphydroM100. The code is intended to support other hybrid models but might require specific adaptations.

exphydroM100: A hybrid model that combines a conceptual model with a neural network model. (Höge et al., 2022.). See class ExpHydroM100 in src/modelzoo_hybrid/exphydroM100.py for more details.
concept_model: Same as in the Conceptual model entries.
ode_solver_lib: Same as in the Conceptual model entries but only torchdiffeq is supported for hybrid models.
basin_file: Same as in the General experiment entries.
nn_model_dir: Specifies the path to the pre-trained neural network model.

Note: If nn_model_dir is not specified, the model will be trained from scratch and all the Neural network entries should be defined in the configuration file.
scale_target_vars: Specifies whether to scale the target variables. If set to True, the target variables will be scaled using the mea and standard deviation of the training period.
loss: Specifies the loss function to use. Supported options include mse, nse, and nse-nh.
epochs: Specifies the number of epochs to train the model.
patience: Specifies the patience for early stopping.
clip_gradient_norm: If a value, clips the gradients during training to that norm.
batch_size: Specifies the batch size for training. If set to -1, the whole dataset will be used in a single batch.
optimizer: Specifies the optimizer to use. Supported options include adam and sgd.
learning_rate: Learning rate. Can be either a single number (for a constant learning rate) or a dictionary. See How to adjust learning rate in the Pytorch documentation for more information.

Example:

learning_rate:
- initial: 0.001
- decay: 0.5
- decay_step_fraction: 2
Note: The learning rate will be decayed by a factor of decay every decay_step_fraction epochs.
log_n_basins: Specifies the number of basins to log during training. If set to 0, no basins will be logged.
log_every_n_epochs: If a value and greater than 0, logs figures and metrics, and saves the model after each n epochs.