Base Bootstrap
- class tsbootstrap.base_bootstrap.BaseDistributionBootstrap(n_bootstraps: Integral = 10, distribution: str = 'normal', refit: bool = False, model_type: Literal['ar', 'arima', 'sarima', 'var'] = 'ar', model_params=None, order: Integral | List[Integral] | tuple[Integral, Integral, Integral] | tuple[Integral, Integral, Integral, Integral] | None = None, save_models: bool = False, rng=None, **kwargs)[source]
Implementation of the Distribution Bootstrap (DB) method for time series data.
The DB method is a non-parametric method that generates bootstrapped samples by fitting a distribution to the residuals and then generating new residuals from the fitted distribution. The new residuals are then added to the fitted values to create the bootstrapped samples.
- Parameters:
n_bootstraps (Integral, default=10) – The number of bootstrap samples to create.
distribution (str, default='normal') – The distribution to use for generating the bootstrapped samples. Must be one of ‘poisson’, ‘exponential’, ‘normal’, ‘gamma’, ‘beta’, ‘lognormal’, ‘weibull’, ‘pareto’, ‘geometric’, or ‘uniform’.
refit (bool, default=False) – Whether to refit the distribution to the resampled residuals for each bootstrap. If False, the distribution is fit once to the residuals and the same distribution is used for all bootstraps.
model_type (str, default="ar") – The model type to use. Must be one of “ar”, “arima”, “sarima”, “var”, or “arch”.
model_params (dict, default=None) – Additional keyword arguments to pass to the TSFit model.
order (Integral or list or tuple, default=None) – The order of the model. If None, the best order is chosen via TSFitBestLag. If Integral, it is the lag order for AR, ARIMA, and SARIMA, and the lag order for ARCH. If list or tuple, the order is a tuple of (p, o, q) for ARIMA and (p, d, q, s) for SARIMAX. It is either a single Integral or a list of non-consecutive ints for AR, and an Integral for VAR and ARCH. If None, the best order is chosen via TSFitBestLag. Do note that TSFitBestLag only chooses the best lag, not the best order, so for the tuple values, it only chooses the best p, not the best (p, o, q) or (p, d, q, s). The rest of the values are set to 0.
save_models (bool, default=False) – Whether to save the fitted models.
rng (Integral or np.random.Generator, default=np.random.default_rng()) – The random number generator or seed used to generate the bootstrap samples.
- resids_dist
The distribution object used to generate the bootstrapped samples. If None, the distribution has not been fit yet.
- Type:
scipy.stats.rv_continuous or None
- resids_dist_params
The parameters of the distribution used to generate the bootstrapped samples. If None, the distribution has not been fit yet.
- Type:
tuple or None
- __init__ : Initialize the BaseDistributionBootstrap class.
- fit_distribution(resids: np.ndarray) tuple[rv_continuous, tuple]
Fit the specified distribution to the residuals and return the distribution object and the parameters of the distribution.
Notes
The DB method is defined as:
\[\begin{split}\\hat{X}_t = \\hat{\\mu} + \\epsilon_t\end{split}\]where \(\\epsilon_t \\sim F_{\\hat{\\epsilon}}\) is a random variable sampled from the distribution \(F_{\\hat{\\epsilon}}\) fitted to the residuals \(\\hat{\\epsilon}\).
References
- class tsbootstrap.base_bootstrap.BaseMarkovBootstrap(n_bootstraps: Integral = 10, method: Literal['first', 'middle', 'last', 'mean', 'mode', 'median', 'kmeans', 'kmedians', 'kmedoids'] = 'middle', apply_pca_flag: bool = False, pca=None, n_iter_hmm: Integral = 10, n_fits_hmm: Integral = 1, blocks_as_hidden_states_flag: bool = False, n_states: Integral = 2, model_type: Literal['ar', 'arima', 'sarima', 'var'] = 'ar', model_params=None, order: Integral | List[Integral] | tuple[Integral, Integral, Integral] | tuple[Integral, Integral, Integral, Integral] | None = None, save_models: bool = False, rng=None, **kwargs)[source]
Base class for Markov bootstrap.
- Parameters:
n_bootstraps (Integral, default=10) – The number of bootstrap samples to create.
method (str, default="middle") – The method to use for compressing the blocks. Must be one of “first”, “middle”, “last”, “mean”, “mode”, “median”, “kmeans”, “kmedians”, “kmedoids”.
apply_pca_flag (bool, default=False) – Whether to apply PCA to the residuals before fitting the HMM.
pca (PCA, default=None) – The PCA object to use for applying PCA to the residuals.
n_iter_hmm (Integral, default=10) – Number of iterations for fitting the HMM.
n_fits_hmm (Integral, default=1) – Number of times to fit the HMM.
blocks_as_hidden_states_flag (bool, default=False) – Whether to use blocks as hidden states.
n_states (Integral, default=2) – Number of states for the HMM.
model_type (str, default="ar") – The model type to use. Must be one of “ar”, “arima”, “sarima”, “var”, or “arch”.
model_params (dict, default=None) – Additional keyword arguments to pass to the TSFit model.
order (Integral or list or tuple, default=None) – The order of the model. If None, the best order is chosen via TSFitBestLag. If Integral, it is the lag order for AR, ARIMA, and SARIMA, and the lag order for ARCH. If list or tuple, the order is a tuple of (p, o, q) for ARIMA and (p, d, q, s) for SARIMAX. It is either a single Integral or a list of non-consecutive ints for AR, and an Integral for VAR and ARCH. If None, the best order is chosen via TSFitBestLag. Do note that TSFitBestLag only chooses the best lag, not the best order, so for the tuple values, it only chooses the best p, not the best (p, o, q) or (p, d, q, s). The rest of the values are set to 0.
save_models (bool, default=False) – Whether to save the fitted models.
rng (Integral or np.random.Generator, default=np.random.default_rng()) – The random number generator or seed used to generate the bootstrap samples.
- hmm_object
The MarkovSampler object used for sampling.
- Type:
MarkovSampler or None
- __init__ : Initialize the Markov bootstrap.
Notes
Fitting Markov models is expensive, hence we do not allow re-fititng. We instead fit once to the residuals and generate new samples by changing the random_seed.
- class tsbootstrap.base_bootstrap.BaseResidualBootstrap(n_bootstraps: Integral = 10, rng=None, model_type: Literal['ar', 'arima', 'sarima', 'var'] = 'ar', model_params=None, order: Integral | List[Integral] | tuple[Integral, Integral, Integral] | tuple[Integral, Integral, Integral, Integral] | None = None, save_models: bool = False)[source]
Base class for residual bootstrap.
- Parameters:
n_bootstraps (Integral, default=10) – The number of bootstrap samples to create.
model_type (str, default="ar") – The model type to use. Must be one of “ar”, “arima”, “sarima”, “var”, or “arch”.
model_params (dict, default=None) – Additional keyword arguments to pass to the TSFit model.
order (Integral or list or tuple, default=None) – The order of the model. If None, the best order is chosen via TSFitBestLag. If Integral, it is the lag order for AR, ARIMA, and SARIMA, and the lag order for ARCH. If list or tuple, the order is a tuple of (p, o, q) for ARIMA and (p, d, q, s) for SARIMAX. It is either a single Integral or a list of non-consecutive ints for AR, and an Integral for VAR and ARCH. If None, the best order is chosen via TSFitBestLag. Do note that TSFitBestLag only chooses the best lag, not the best order, so for the tuple values, it only chooses the best p, not the best (p, o, q) or (p, d, q, s). The rest of the values are set to 0.
save_models (bool, default=False) – Whether to save the fitted models.
rng (Integral or np.random.Generator, default=np.random.default_rng()) – The random number generator or seed used to generate the bootstrap samples.
- fit_model
The fitted model.
- Type:
TSFitBestLag
- resids
The residuals of the fitted model.
- Type:
np.ndarray
- X_fitted
The fitted values of the fitted model.
- Type:
np.ndarray
- coefs
The coefficients of the fitted model.
- Type:
np.ndarray
- __init__ : Initialize self.
- _fit_model : Fits the model to the data and stores the residuals.
- class tsbootstrap.base_bootstrap.BaseSieveBootstrap(n_bootstraps: Integral = 10, rng=None, resids_model_type: Literal['ar', 'arima', 'sarima', 'var', 'arch'] = 'ar', resids_order=None, save_resids_models: bool = False, kwargs_base_sieve=None, model_type: Literal['ar', 'arima', 'sarima', 'var'] = 'ar', model_params=None, order: Integral | List[Integral] | tuple[Integral, Integral, Integral] | tuple[Integral, Integral, Integral, Integral] | None = None, **kwargs_base_residual)[source]
Base class for Sieve bootstrap.
This class provides the core functionalities for implementing the Sieve bootstrap method, allowing for the fitting of various models to the residuals and generation of bootstrapped samples. The Sieve bootstrap is a parametric method that generates bootstrapped samples by fitting a model to the residuals and then generating new residuals from the fitted model. The new residuals are then added to the fitted values to create the bootstrapped samples.
- Parameters:
resids_model_type (str, default="ar") – The model type to use for fitting the residuals. Must be one of “ar”, “arima”, “sarima”, “var”, or “arch”.
resids_order (Integral or list or tuple, default=None) – The order of the model to use for fitting the residuals. If None, the order is automatically determined.
save_resids_models (bool, default=False) – Whether to save the fitted models for the residuals.
kwargs_base_sieve (dict, default=None) – Keyword arguments to pass to the SieveBootstrap class.
model_type (str, default="ar") – The model type to use. Must be one of “ar”, “arima”, “sarima”, “var”, or “arch”.
model_params (dict, default=None) – Additional keyword arguments to pass to the TSFit model.
order (Integral or list or tuple, default=None) – The order of the model. If None, the best order is chosen via TSFitBestLag. If Integral, it is the lag order for AR, ARIMA, and SARIMA, and the lag order for ARCH. If list or tuple, the order is a tuple of (p, o, q) for ARIMA and (p, d, q, s) for SARIMAX. It is either a single Integral or a list of non-consecutive ints for AR, and an Integral for VAR and ARCH. If None, the best order is chosen via TSFitBestLag. Do note that TSFitBestLag only chooses the best lag, not the best order, so for the tuple values, it only chooses the best p, not the best (p, o, q) or (p, d, q, s). The rest of the values are set to 0.
- resids_coefs
Coefficients of the fitted residual model. Replace “type” with the specific type if known.
- Type:
type or None
- resids_fit_model
Fitted residual model object. Replace “type” with the specific type if known.
- Type:
type or None
- __init__ : Initialize the BaseSieveBootstrap class.
- _fit_resids_model : Fit the residual model to the residuals.
- class tsbootstrap.base_bootstrap.BaseStatisticPreservingBootstrap(n_bootstraps: Integral = 10, statistic: Callable | None = None, statistic_axis: Integral = 0, statistic_keepdims: bool = False, rng=None)[source]
Bootstrap class that generates bootstrapped samples preserving a specific statistic.
This class generates bootstrapped time series data, preserving a given statistic (such as mean, median, etc.) The statistic is calculated from the original data and then used as a parameter for generating the bootstrapped samples. For example, if the statistic is np.mean, then the mean of the original data is calculated and then used as a parameter for generating the bootstrapped samples.
- Parameters:
n_bootstraps (Integral, default=10) – The number of bootstrap samples to create.
statistic (Callable, default=np.mean) – A callable function to compute the statistic that should be preserved.
statistic_axis (Integral, default=0) – The axis along which the statistic should be computed.
statistic_keepdims (bool, default=False) – Whether to keep the dimensions of the statistic or not.
rng (Integral or np.random.Generator, default=np.random.default_rng()) – The random number generator or seed used to generate the bootstrap samples.
- statistic_X
The statistic calculated from the original data. This is used as a parameter for generating the bootstrapped samples.
- Type:
np.ndarray, default=None
- __init__ : Initialize the BaseStatisticPreservingBootstrap class.
- _calculate_statistic(X: np.ndarray) np.ndarray : Calculate the statistic from the input data. [source]
- class tsbootstrap.base_bootstrap.BaseTimeSeriesBootstrap(n_bootstraps: Integral = 10, rng=None)[source]
Base class for time series bootstrapping.
- Raises:
ValueError – If n_bootstraps is not greater than 0.
- bootstrap(X: ndarray, return_indices: bool = False, y=None, test_ratio: float | None = None)[source]
Generate indices to split data into training and test set.
- Parameters:
X (2D array-like of shape (n_timepoints, n_features)) – The endogenous time series to bootstrap. Dimension 0 is assumed to be the time dimension, ordered
return_indices (bool, default=False) – If True, a second output is retured, integer locations of index references for the bootstrap sample, in reference to original indices. Indexed values do are not necessarily identical with bootstrapped values.
y (array-like of shape (n_timepoints, n_features_exog), default=None) – Exogenous time series to use in bootstrapping.
test_ratio (float, default=0.0) – The ratio of test samples to total samples. If provided, test_ratio fraction the data (rounded up) is removed from the end before applying the bootstrap logic.
- Yields:
X_boot_i (2D np.ndarray-like of shape (n_timepoints_boot_i, n_features)) – i-th bootstrapped sample of X.
indices_i (1D np.nparray of shape (n_timepoints_boot_i,) integer values,) – only returned if return_indices=True. Index references for the i-th bootstrapped sample of X. Indexed values do are not necessarily identical with bootstrapped values.
- get_n_bootstraps(X=None, y=None) int [source]
Returns the number of bootstrap instances produced by the bootstrap.
- Parameters:
X (2D array-like of shape (n_timepoints, n_features)) – The endogenous time series to bootstrap. Dimension 0 is assumed to be the time dimension, ordered
y (array-like of shape (n_timepoints, n_features_exog), default=None) – Exogenous time series to use in bootstrapping.
- Returns:
int
- Return type:
The number of bootstrap instances produced by the bootstrap.