Block Length Sampler

class tsbootstrap.block_length_sampler.BlockLengthSampler(*, avg_block_length: Annotated[int, Gt(gt=0)] = 2, block_length_distribution: DistributionTypes | None = None, rng: Generator | Integral | None = None, tags: dict[str, str] = None, **data)[source]

A class for sampling block lengths for the random block length bootstrap.

This class provides functionality to sample block lengths from various probability distributions. It is used in time series bootstrapping methods where variable block lengths are required.

Parameters:
  • avg_block_length (PositiveInt, optional) – The average block length to be used for sampling. Must be greater than or equal to MIN_AVG_BLOCK_LENGTH. Default is DEFAULT_AVG_BLOCK_LENGTH.

  • block_length_distribution (Optional[Union[str, DistributionTypes]], optional) – The probability distribution to use for sampling block lengths. Must be one of the values in DistributionTypes or a corresponding string. Default is None.

  • rng (RngTypes, optional) – Random number generator for reproducibility. If not provided, a new default RNG will be created.

avg_block_length

The average block length used for sampling.

Type:

PositiveInt

block_length_distribution

The selected probability distribution for block length sampling.

Type:

Optional[DistributionTypes]

rng

The random number generator used for sampling.

Type:

RngTypes

sample_block_length()[source]

Sample a block length from the selected distribution.

Examples

>>> from tsbootstrap.utils.block_length_sampler import BlockLengthSampler, DistributionTypes
>>> sampler = BlockLengthSampler(avg_block_length=5, block_length_distribution=DistributionTypes.GAMMA)
>>> block_length = sampler.sample_block_length()
>>> print(block_length)
6
>>> sampler_str = BlockLengthSampler(avg_block_length=5, block_length_distribution="gamma")
>>> block_length_str = sampler_str.sample_block_length()
>>> print(block_length_str)
7
>>> sampler_none = BlockLengthSampler(avg_block_length=5)
>>> block_length_none = sampler_none.sample_block_length()
>>> print(block_length_none)
5

Notes

The class uses Pydantic for data validation and settings management. It inherits from both pydantic.BaseModel and skbase.base.BaseObject.

model_computed_fields: ClassVar[Dict[str, ComputedFieldInfo]] = {}

A dictionary of computed field names and their corresponding ComputedFieldInfo objects.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True, 'extra': 'allow', 'validate_assignment': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_fields: ClassVar[Dict[str, FieldInfo]] = {'avg_block_length': FieldInfo(annotation=int, required=False, default=2, description='The average block length to use for sampling.', metadata=[Gt(gt=0)]), 'block_length_distribution': FieldInfo(annotation=Union[DistributionTypes, NoneType], required=False, default=None, description='The probability distribution to use for sampling block lengths. Must be one of the values in `DistributionTypes` or a corresponding string.'), 'rng': FieldInfo(annotation=Union[Generator, Integral, NoneType], required=False, default_factory=builtin_function_or_method, description='Random number generator for sampling.'), 'tags': FieldInfo(annotation=dict[str, str], required=False, default_factory=<lambda>, exclude=True)}

Metadata about the fields defined on the model, mapping of field names to [FieldInfo][pydantic.fields.FieldInfo] objects.

This replaces Model.__fields__ from Pydantic V1.

sample_block_length() int[source]

Sample a block length from the selected distribution.

This method uses the configured distribution type and parameters to generate a random block length.

Returns:

A sampled block length. The returned value is always an integer and is at least MIN_BLOCK_LENGTH.

Return type:

int

Notes

The sampled value is rounded to the nearest integer and is ensured to be no less than MIN_BLOCK_LENGTH.

classmethod validate_avg_block_length(v: int) int[source]

Validate that avg_block_length is greater than or equal to MIN_AVG_BLOCK_LENGTH.

If v is less than MIN_AVG_BLOCK_LENGTH, issue a warning and set it to MIN_AVG_BLOCK_LENGTH.

Parameters:

v (int) – The average block length to validate.

Returns:

The validated (and possibly adjusted) average block length.

Return type:

int

classmethod validate_block_length_distribution(v: str | DistributionTypes | None) DistributionTypes | None[source]

Validate and normalize the block length distribution input.

This method ensures that string inputs for block_length_distribution are converted to lowercase for consistency and then to the appropriate DistributionTypes enum value. It also handles None values.

Parameters:

v (Optional[Union[str, DistributionTypes]]) – The input block length distribution to validate.

Returns:

The validated and normalized block length distribution.

Return type:

Optional[DistributionTypes]

Raises:

ValueError – If the input string is not a valid DistributionTypes value.

classmethod validate_rng_field(v: Generator | int | None) Generator[source]

Validate the random number generator.

This method ensures that the provided random number generator is valid and consistent with the expected type.

Parameters:

v (Union[Generator, int, None]) – The input random number generator to validate.

Returns:

The validated random number generator.

Return type:

Generator

Raises:

ValueError – If the input is not a valid random number generator or seed.

class tsbootstrap.block_length_sampler.DistributionRegistry[source]

Registry for managing supported distributions and their sampling functions.

classmethod get_sampler(distribution: DistributionTypes) Callable[[Generator, int], int | float][source]

Retrieve the sampling function for a given distribution.

Parameters:

distribution (DistributionTypes) – The distribution type for which to retrieve the sampling function.

Returns:

The sampling function associated with the distribution.

Return type:

DistributionSamplerFunc

Raises:

ValueError – If the distribution is not registered.

classmethod register_distribution(distribution: DistributionTypes, sampler_func: Callable[[Generator, int], int | float]) None[source]

Register a new distribution and its sampling function.

Parameters:
  • distribution (DistributionTypes) – The distribution type to register.

  • sampler_func (DistributionSamplerFunc) – The sampling function corresponding to the distribution.

Raises:

ValueError – If the distribution is already registered.

tsbootstrap.block_length_sampler.sample_beta(rng: Generator, avg_block_length: int) float[source]

Sample from a Beta distribution.

tsbootstrap.block_length_sampler.sample_exponential(rng: Generator, avg_block_length: int) float[source]

Sample from an Exponential distribution.

tsbootstrap.block_length_sampler.sample_gamma(rng: Generator, avg_block_length: int) float[source]

Sample from a Gamma distribution.

tsbootstrap.block_length_sampler.sample_geometric(rng: Generator, avg_block_length: int) int[source]

Sample from a Geometric distribution.

tsbootstrap.block_length_sampler.sample_lognormal(rng: Generator, avg_block_length: int) float[source]

Sample from a Lognormal distribution.

tsbootstrap.block_length_sampler.sample_none(rng: Generator, avg_block_length: int) int[source]

Return the average block length.

tsbootstrap.block_length_sampler.sample_normal(rng: Generator, avg_block_length: int) float[source]

Sample from a Normal distribution.

tsbootstrap.block_length_sampler.sample_pareto(rng: Generator, avg_block_length: int) float[source]

Sample from a Pareto distribution.

tsbootstrap.block_length_sampler.sample_poisson(rng: Generator, avg_block_length: int) int[source]

Sample from a Poisson distribution.

tsbootstrap.block_length_sampler.sample_uniform(rng: Generator, avg_block_length: int) int[source]

Sample from a Uniform distribution.

tsbootstrap.block_length_sampler.sample_weibull(rng: Generator, avg_block_length: int) float[source]

Sample from a Weibull distribution.