Base Bootstrap

class tsbootstrap.base_bootstrap.BaseDistributionBootstrap(n_bootstraps: Integral = 10, distribution: str = 'normal', refit: bool = False, model_type: Literal['ar', 'arima', 'sarima', 'var'] = 'ar', model_params=None, order: Integral | List[Integral] | tuple[Integral, Integral, Integral] | tuple[Integral, Integral, Integral, Integral] | None = None, save_models: bool = False, rng=None, **kwargs)[source]

Implementation of the Distribution Bootstrap (DB) method for time series data.

The DB method is a non-parametric method that generates bootstrapped samples by fitting a distribution to the residuals and then generating new residuals from the fitted distribution. The new residuals are then added to the fitted values to create the bootstrapped samples.

Parameters:

n_bootstraps (Integral, default=10) – The number of bootstrap samples to create.
distribution (str, default='normal') – The distribution to use for generating the bootstrapped samples. Must be one of ‘poisson’, ‘exponential’, ‘normal’, ‘gamma’, ‘beta’, ‘lognormal’, ‘weibull’, ‘pareto’, ‘geometric’, or ‘uniform’.
refit (bool, default=False) – Whether to refit the distribution to the resampled residuals for each bootstrap. If False, the distribution is fit once to the residuals and the same distribution is used for all bootstraps.
model_type (str, default="ar") – The model type to use. Must be one of “ar”, “arima”, “sarima”, “var”, or “arch”.
model_params (dict, default=None) – Additional keyword arguments to pass to the TSFit model.
order (Integral or list or tuple, default=None) – The order of the model. If None, the best order is chosen via TSFitBestLag. If Integral, it is the lag order for AR, ARIMA, and SARIMA, and the lag order for ARCH. If list or tuple, the order is a tuple of (p, o, q) for ARIMA and (p, d, q, s) for SARIMAX. It is either a single Integral or a list of non-consecutive ints for AR, and an Integral for VAR and ARCH. If None, the best order is chosen via TSFitBestLag. Do note that TSFitBestLag only chooses the best lag, not the best order, so for the tuple values, it only chooses the best p, not the best (p, o, q) or (p, d, q, s). The rest of the values are set to 0.
save_models (bool, default=False) – Whether to save the fitted models.
rng (Integral or np.random.Generator, default=np.random.default_rng()) – The random number generator or seed used to generate the bootstrap samples.

resids_dist

The distribution object used to generate the bootstrapped samples. If None, the distribution has not been fit yet.

Type:: scipy.stats.rv_continuous or None

resids_dist_params

The parameters of the distribution used to generate the bootstrapped samples. If None, the distribution has not been fit yet.

Type:: tuple or None

__init__ : Initialize the BaseDistributionBootstrap class.

fit_distribution(resids: np.ndarray) → tuple[rv_continuous, tuple]: Fit the specified distribution to the residuals and return the distribution object and the parameters of the distribution.

Notes

The DB method is defined as:

\[\begin{split}\\hat{X}_t = \\hat{\\mu} + \\epsilon_t\end{split}\]

where \(\\epsilon_t \\sim F_{\\hat{\\epsilon}}\) is a random variable sampled from the distribution \(F_{\\hat{\\epsilon}}\) fitted to the residuals \(\\hat{\\epsilon}\).

References

class tsbootstrap.base_bootstrap.BaseMarkovBootstrap(n_bootstraps: Integral = 10, method: Literal['first', 'middle', 'last', 'mean', 'mode', 'median', 'kmeans', 'kmedians', 'kmedoids'] = 'middle', apply_pca_flag: bool = False, pca=None, n_iter_hmm: Integral = 10, n_fits_hmm: Integral = 1, blocks_as_hidden_states_flag: bool = False, n_states: Integral = 2, model_type: Literal['ar', 'arima', 'sarima', 'var'] = 'ar', model_params=None, order: Integral | List[Integral] | tuple[Integral, Integral, Integral] | tuple[Integral, Integral, Integral, Integral] | None = None, save_models: bool = False, rng=None, **kwargs)[source]

Base class for Markov bootstrap.

Parameters:

n_bootstraps (Integral, default=10) – The number of bootstrap samples to create.
method (str, default="middle") – The method to use for compressing the blocks. Must be one of “first”, “middle”, “last”, “mean”, “mode”, “median”, “kmeans”, “kmedians”, “kmedoids”.
apply_pca_flag (bool, default=False) – Whether to apply PCA to the residuals before fitting the HMM.
pca (PCA, default=None) – The PCA object to use for applying PCA to the residuals.
n_iter_hmm (Integral, default=10) – Number of iterations for fitting the HMM.
n_fits_hmm (Integral, default=1) – Number of times to fit the HMM.
blocks_as_hidden_states_flag (bool, default=False) – Whether to use blocks as hidden states.
n_states (Integral, default=2) – Number of states for the HMM.
model_type (str, default="ar") – The model type to use. Must be one of “ar”, “arima”, “sarima”, “var”, or “arch”.
model_params (dict, default=None) – Additional keyword arguments to pass to the TSFit model.
order (Integral or list or tuple, default=None) – The order of the model. If None, the best order is chosen via TSFitBestLag. If Integral, it is the lag order for AR, ARIMA, and SARIMA, and the lag order for ARCH. If list or tuple, the order is a tuple of (p, o, q) for ARIMA and (p, d, q, s) for SARIMAX. It is either a single Integral or a list of non-consecutive ints for AR, and an Integral for VAR and ARCH. If None, the best order is chosen via TSFitBestLag. Do note that TSFitBestLag only chooses the best lag, not the best order, so for the tuple values, it only chooses the best p, not the best (p, o, q) or (p, d, q, s). The rest of the values are set to 0.
save_models (bool, default=False) – Whether to save the fitted models.
rng (Integral or np.random.Generator, default=np.random.default_rng()) – The random number generator or seed used to generate the bootstrap samples.

hmm_object

The MarkovSampler object used for sampling.

Type:: MarkovSampler or None

__init__ : Initialize the Markov bootstrap.

Notes

Fitting Markov models is expensive, hence we do not allow re-fititng. We instead fit once to the residuals and generate new samples by changing the random_seed.

class tsbootstrap.base_bootstrap.BaseResidualBootstrap(n_bootstraps: Integral = 10, rng=None, model_type: Literal['ar', 'arima', 'sarima', 'var'] = 'ar', model_params=None, order: Integral | List[Integral] | tuple[Integral, Integral, Integral] | tuple[Integral, Integral, Integral, Integral] | None = None, save_models: bool = False)[source]

Base class for residual bootstrap.

Parameters:

n_bootstraps (Integral, default=10) – The number of bootstrap samples to create.
model_type (str, default="ar") – The model type to use. Must be one of “ar”, “arima”, “sarima”, “var”, or “arch”.
model_params (dict, default=None) – Additional keyword arguments to pass to the TSFit model.
order (Integral or list or tuple, default=None) – The order of the model. If None, the best order is chosen via TSFitBestLag. If Integral, it is the lag order for AR, ARIMA, and SARIMA, and the lag order for ARCH. If list or tuple, the order is a tuple of (p, o, q) for ARIMA and (p, d, q, s) for SARIMAX. It is either a single Integral or a list of non-consecutive ints for AR, and an Integral for VAR and ARCH. If None, the best order is chosen via TSFitBestLag. Do note that TSFitBestLag only chooses the best lag, not the best order, so for the tuple values, it only chooses the best p, not the best (p, o, q) or (p, d, q, s). The rest of the values are set to 0.
save_models (bool, default=False) – Whether to save the fitted models.
rng (Integral or np.random.Generator, default=np.random.default_rng()) – The random number generator or seed used to generate the bootstrap samples.

fit_model

The fitted model.

Type:: TSFitBestLag

resids

The residuals of the fitted model.

Type:: np.ndarray

X_fitted

The fitted values of the fitted model.

Type:: np.ndarray

coefs

The coefficients of the fitted model.

Type:: np.ndarray

__init__ : Initialize self.

_fit_model : Fits the model to the data and stores the residuals.

class tsbootstrap.base_bootstrap.BaseSieveBootstrap(n_bootstraps: Integral = 10, rng=None, resids_model_type: Literal['ar', 'arima', 'sarima', 'var', 'arch'] = 'ar', resids_order=None, save_resids_models: bool = False, kwargs_base_sieve=None, model_type: Literal['ar', 'arima', 'sarima', 'var'] = 'ar', model_params=None, order: Integral | List[Integral] | tuple[Integral, Integral, Integral] | tuple[Integral, Integral, Integral, Integral] | None = None, **kwargs_base_residual)[source]

Base class for Sieve bootstrap.

This class provides the core functionalities for implementing the Sieve bootstrap method, allowing for the fitting of various models to the residuals and generation of bootstrapped samples. The Sieve bootstrap is a parametric method that generates bootstrapped samples by fitting a model to the residuals and then generating new residuals from the fitted model. The new residuals are then added to the fitted values to create the bootstrapped samples.

Parameters:

resids_model_type (str, default="ar") – The model type to use for fitting the residuals. Must be one of “ar”, “arima”, “sarima”, “var”, or “arch”.
resids_order (Integral or list or tuple, default=None) – The order of the model to use for fitting the residuals. If None, the order is automatically determined.
save_resids_models (bool, default=False) – Whether to save the fitted models for the residuals.
kwargs_base_sieve (dict, default=None) – Keyword arguments to pass to the SieveBootstrap class.
model_type (str, default="ar") – The model type to use. Must be one of “ar”, “arima”, “sarima”, “var”, or “arch”.
model_params (dict, default=None) – Additional keyword arguments to pass to the TSFit model.
order (Integral or list or tuple, default=None) – The order of the model. If None, the best order is chosen via TSFitBestLag. If Integral, it is the lag order for AR, ARIMA, and SARIMA, and the lag order for ARCH. If list or tuple, the order is a tuple of (p, o, q) for ARIMA and (p, d, q, s) for SARIMAX. It is either a single Integral or a list of non-consecutive ints for AR, and an Integral for VAR and ARCH. If None, the best order is chosen via TSFitBestLag. Do note that TSFitBestLag only chooses the best lag, not the best order, so for the tuple values, it only chooses the best p, not the best (p, o, q) or (p, d, q, s). The rest of the values are set to 0.

resids_coefs

Coefficients of the fitted residual model. Replace “type” with the specific type if known.

Type:: type or None

resids_fit_model

Fitted residual model object. Replace “type” with the specific type if known.

Type:: type or None

__init__ : Initialize the BaseSieveBootstrap class.

_fit_resids_model : Fit the residual model to the residuals.

class tsbootstrap.base_bootstrap.BaseStatisticPreservingBootstrap(n_bootstraps: Integral = 10, statistic: Callable | None = None, statistic_axis: Integral = 0, statistic_keepdims: bool = False, rng=None)[source]

Bootstrap class that generates bootstrapped samples preserving a specific statistic.

This class generates bootstrapped time series data, preserving a given statistic (such as mean, median, etc.) The statistic is calculated from the original data and then used as a parameter for generating the bootstrapped samples. For example, if the statistic is np.mean, then the mean of the original data is calculated and then used as a parameter for generating the bootstrapped samples.

Parameters:

n_bootstraps (Integral, default=10) – The number of bootstrap samples to create.
statistic (Callable, default=np.mean) – A callable function to compute the statistic that should be preserved.
statistic_axis (Integral, default=0) – The axis along which the statistic should be computed.
statistic_keepdims (bool, default=False) – Whether to keep the dimensions of the statistic or not.
rng (Integral or np.random.Generator, default=np.random.default_rng()) – The random number generator or seed used to generate the bootstrap samples.

statistic_X

The statistic calculated from the original data. This is used as a parameter for generating the bootstrapped samples.

Type:: np.ndarray, default=None

__init__ : Initialize the BaseStatisticPreservingBootstrap class.

_calculate_statistic(X: np.ndarray) → np.ndarray : Calculate the statistic from the input data.[source]

class tsbootstrap.base_bootstrap.BaseTimeSeriesBootstrap(n_bootstraps: Integral = 10, rng=None)[source]

Base class for time series bootstrapping.

Raises:: ValueError – If n_bootstraps is not greater than 0.

bootstrap(X: ndarray, return_indices: bool = False, y=None, test_ratio: float | None = None)[source]

Generate indices to split data into training and test set.

Parameters:

X (2D array-like of shape (n_timepoints, n_features)) – The endogenous time series to bootstrap. Dimension 0 is assumed to be the time dimension, ordered
return_indices (bool, default=False) – If True, a second output is retured, integer locations of index references for the bootstrap sample, in reference to original indices. Indexed values do are not necessarily identical with bootstrapped values.
y (array-like of shape (n_timepoints, n_features_exog), default=None) – Exogenous time series to use in bootstrapping.
test_ratio (float, default=0.0) – The ratio of test samples to total samples. If provided, test_ratio fraction the data (rounded up) is removed from the end before applying the bootstrap logic.

Yields:

X_boot_i (2D np.ndarray-like of shape (n_timepoints_boot_i, n_features)) – i-th bootstrapped sample of X.
indices_i (1D np.nparray of shape (n_timepoints_boot_i,) integer values,) – only returned if return_indices=True. Index references for the i-th bootstrapped sample of X. Indexed values do are not necessarily identical with bootstrapped values.

get_n_bootstraps(X=None, y=None) → int[source]

Returns the number of bootstrap instances produced by the bootstrap.

Parameters:

X (2D array-like of shape (n_timepoints, n_features)) – The endogenous time series to bootstrap. Dimension 0 is assumed to be the time dimension, ordered
y (array-like of shape (n_timepoints, n_features_exog), default=None) – Exogenous time series to use in bootstrapping.

Returns:

int

Return type:

The number of bootstrap instances produced by the bootstrap.