API Reference

API documents for exhaustive search with Bayesian model averaging.

Estimators

class exhbma.exhaustive_search.ExhaustiveLinearRegression(sigma_noise_points: List[RandomVariable], sigma_coef_points: List[RandomVariable], alpha: float = 0.5, exclude_null: bool = False)

ExhaustiveSearchModel with linear_model.LinearRegression

  • Intercept of the linear model is assumed to be zero.

    • Assume that target variable y is centralized.

    • Assume that all features x are centralized and normalized.

Parameters:
  • sigma_noise_points (List[RandomVariable]) – Data points to explore sigma_noise parameter in exhaustive search.

  • sigma_coef_points (List[RandomVariable]) – Data points to explore sigma_coef parameter in exhaustive search.

  • alpha (float (default: 0.5)) – Alpha parameter is fixed to this value.

  • exclude_null (bool (default: False)) – Whether or not exclude a null model.

n_features_in_

Number of features seen during fit.

Type:

int

coef_

Coefficients of the regression model (mean of distribution).

Type:

List[float]

log_likelihood_

Log-likelihood of the model. Marginalization is performed over sigma_noise, sigma_coef, indicators.

Type:

float

log_likelihood_over_sigma_

Log-likelihood over \(\sigma_{noise}\) and \(\sigma_{coef}\), \(p(y| \sigma_{noise}, \sigma_{coef}, X)\), which is marginalized over indicators. Prior distributions for both sigma are not included.

Type:

List[List[float]]

feature_posteriors_

Posterior probabilities for each feature.

Type:

List[float]

indicators_

List of indicator vectors. Null model [0, 0, …, 0] are excluded, so length is \(2^{n\_features\_in\_} - 1\).

Type:

List[List[int]]

log_priors_

List of log-prior probabilities for each model specified by indicator.

Type:

List[float]

log_likelihoods_

List of log-likelihood of each model specified by indicator.

Type:

List[float]

models_

Information for all models specified by indicator vector. Length of this attribute is equal to that of indicators_ and models correspond to each other.

Type:

List[ModelInfo]

fit(X: ndarray, y: ndarray, verbose: bool = True)

Train a model

  1. Create indicator vectors

  2. Fit sub-models for each indicator vector with marginalizing sigma_noise and sigma_coef for each indicator

  3. Calculate final model averaged over sub-models

Parameters:
  • X (np.ndarray with shape (n_data, n_features)) – Feature matrix. Each row corresponds to single data.

  • y (np.ndarray with shape (n_data,)) – Target value vector.

  • verbose (bool (default: True)) – If this is set to True, progress bar is displayed.

predict(X: ndarray, mode: str, threshold: float = 0.5) ndarray

Predict using the model

Predict about new data.

Parameters:
  • X (np.ndarray with shape (n_data, n_features)) – Feature matrix for prediction.

  • mode (str) – Mode of prediction.

  • threshold (float (default: 0.5)) – Feature selection threshold used in mode ‘select’.

Returns:

y – Prediction values.

Return type:

np.ndarray

select_variables(threshold: float = 0.5) List[int]

Return indicator with posterior probability greater than or equal to threshold.

class exhbma.linear_regression.LinearRegression(sigma_noise: float, sigma_coef: float)

Model description:

  • Base model: Linear regression model

  • Observation noise: Gaussian distribution

  • Prior distribution for coefficient: Gaussian distribution

  • Intercept of the linear model is assumed to be zero.

    • Assume that target variable y is centralized.

    • Assume that all features x are centralized and normalized.

    • Marginalization over intercept is performed.

Parameters:
  • sigma_noise (float) – Standard deviation of gaussian noise. Model assumes that observation noise obeys Gaussian distribution with mean = 0, variance = sigma_noise^2.

  • sigma_coef (float) – Standard deviation of gaussian noise. Model assumes that each coefficient value obeys Gaussian distribution with mean = 0, variance = sigma_coef^2 independently.

n_features_in_

Number of features seen during fit.

Type:

int

coef_

Coefficients of the regression model (mean of distribution).

Type:

List[float]

log_likelihood_

Log-likelihood of the model. Marginalization is performed over sigma_noise, sigma_coef, indicators.

Type:

float

fit(X: ndarray, y: ndarray, skip_preprocessing_validation: bool = False)

Calculate coefficient used in prediction and log likelihood for this data.

Parameters:
  • X (np.ndarray with shape (n_data, n_features)) – Feature matrix. Each row corresponds to single data.

  • y (np.ndarray with shape (n_data,)) – Target value vector.

predict(X)

Prediction using trained model.

Xnp.ndarray with shape (n_data, n_features)

Feature matrix for prediction.

Utilities

Utility classes and functions for constructing estimators.

class exhbma.scaler.StandardScaler(n_dim, scaling: bool = True)
exhbma.probabilities.gamma(x: ndarray, low: float | None = None, high: float | None = None, shape: float = 0.001, scale: float = 1000.0) List[RandomVariable]

Gamma distribution: x ~ (const.) * x^(shape - 1) * exp(- x / scale) Distribution is limited on range of [low, high]. If low(high) is None, low(high) is set as the minimum(maximum) value of x.

exhbma.probabilities.inverse(x: ndarray, low: float | None = None, high: float | None = None) List[RandomVariable]

x-inverse distribution: p(x) = 1 / x This distribution becomes proper when finite interval is considered.

exhbma.probabilities.uniform(x: ndarray, low: float = 0.0, high: float = 1.0) List[RandomVariable]

Uniform distribution: p(x) = 1 / (high - low)

exhbma.integrate.integrate_log_values_in_line(log_values: List[float], x1: List[float], weights: List[float] | None = None, expect_positive: bool = True)

Integrate along line.

log_values: 1 dimension array with length len(x1)

Log values to be integrated along x1

x1: List[float]

1st axis points.

weights: Optional[List[float]], default: None

Weights to log_values.

expect_positive: bool, default: True

If set True, ValueError is raised when result is not positive.

exhbma.integrate.integrate_log_values_in_square(log_values: List[List[float]], x1: List[float], x2: List[float], weights: List[List[float]] | None = None, expect_positive: bool = True)

Integrate box region. int weight * exp(log_value) dx1 dx2

log_values: 2 dimension array with shape (len(x1), len(x2))

Log values to be integrated over x1 and x2 axes.

x1: List[float]

1st axis points.

x2: List[float]

2nd axis points.

weights: Optional[List[List[float]]], default: None

Weights to log_values.

expect_positive: bool, default: True

If set True, ValueError is raised when result is not positive.