Module `areixio.utils.preprocessor`

Classes

class FirstValueScaler

StandardScaler: It scales the data to have a mean of 0 and a standard deviation of 1. It is useful when the data is normally distributed and the outliers are not significant.

MinMaxScaler: It scales the data to a given range, typically between 0 and 1. It is useful when the data is not normally distributed and the outliers are not significant.

MaxAbsScaler: It scales the data to the range [-1, 1] based on the absolute maximum value of each feature. It is useful when the data has outliers and the distribution is not normal.

RobustScaler: It scales the data based on the median and the interquartile range (IQR) instead of the mean and the standard deviation. It is useful when the data has significant outliers and the distribution is not normal.

The choice of the scaling technique depends on the distribution of the data and the presence of outliers. StandardScaler and MinMaxScaler are commonly used when the distribution is normal and the outliers are not significant. MaxAbsScaler and RobustScaler are used when the distribution is not normal and the outliers are significant.

Ancestors

Scaler

Methods

def fit(self, X)
def transform(self, X)

class GaussianHMM (n_components, n_features)

Here, we define a class called GaussianHMM with methods for fitting the model parameters and predicting the most likely sequence of hidden states for a given observation sequence. The fit method initializes the model parameters randomly and then runs the Baum-Welch algorithm to estimate the model parameters from the observation sequence. The _forward_backward method computes the forward and backward probabilities using the current model parameters, and the _compute_statistics method computes the expected sufficient statistics from the forward and backward probabilities. The _update_parameters method updates the model

Methods

def fit(self, X, max_iter=100)
def predict(self, X)

class KalmanFilter

https://www.quantstart.com/articles/kalman-filter-based-pairs-trading-strategy-in-qstrader

Briefly, a Kalman filter is a state-space model applicable to linear dynamic systems – systems whose state is time-dependent and state variations are represented linearly. The model is used to estimate unknown states of a variable based on a series of past values. The procedure is two-fold: a prediction (estimate) is made by the filter of the current state of a variable and the uncertainty of the estimate itself. When new data is available, these estimates are updated. There is a lot of information available about Kalman filters, and the variety of their applications is pretty astounding, but for now, we're going to use a Kalman filter to estimate the hedge ratio between a pair of equities.

The idea behind the strategy is pretty straightforward: take two equities that are cointegrated and create a long-short portfolio. The premise of this is that the spread between the value of our two positions should be mean-reverting. Anytime the spread deviates from its expected value, one of the assets moved in an unexpected direction and is due to revert back. When the spread diverges, you can take advantage of this by going long or short on the spread.

To illustrate, imagine you have a long position in AAPL worth $2000 and a short position in IBM worth $2000. This gives you a net spread of $0. Since you expected AAPL and IBM to move together, then if the spread increases significantly above $0, you would short the spread in the expectation that it will return to $0, it's natural equilibrium. Similarly, if the value drops significantly below $0, you would long the spread and capture the profits as its value returns to $0. In our application, the Kalman filter will be used to track the hedging ratio between our equities to ensure that the portfolio value is stationary, which means it will continue to exhibit mean-reversion behavior.

Methods

def update(self, price_one, price_two)

class MaxAbsScaler

StandardScaler: It scales the data to have a mean of 0 and a standard deviation of 1. It is useful when the data is normally distributed and the outliers are not significant.

MinMaxScaler: It scales the data to a given range, typically between 0 and 1. It is useful when the data is not normally distributed and the outliers are not significant.

MaxAbsScaler: It scales the data to the range [-1, 1] based on the absolute maximum value of each feature. It is useful when the data has outliers and the distribution is not normal.

Ancestors

Scaler

Methods

def fit(self, X)
def transform(self, X)

class MinMaxScaler

StandardScaler: It scales the data to have a mean of 0 and a standard deviation of 1. It is useful when the data is normally distributed and the outliers are not significant.

MinMaxScaler: It scales the data to a given range, typically between 0 and 1. It is useful when the data is not normally distributed and the outliers are not significant.

MaxAbsScaler: It scales the data to the range [-1, 1] based on the absolute maximum value of each feature. It is useful when the data has outliers and the distribution is not normal.

Ancestors

Scaler

Methods

def fit(self, X)
def transform(self, X)

class Normalizer (norm='l2')

Methods

def transform(self, X)

class PCA (n_components)

Here, we define a class called PCA with a method for fitting and transforming the data. The fit_transform method computes the principal components of the input data and transforms the data onto the new feature space. It first centers the data by subtracting the mean of each feature, then computes the covariance matrix of the centered data. It then computes the eigenvectors and eigenvalues of the covariance matrix, sorts them in descending order, and selects the top n_components eigenvectors. Finally, it projects the data onto the new feature space defined by the top eigenvectors.

Methods

def fit_transform(self, X)

class RobustScaler

StandardScaler: It scales the data to have a mean of 0 and a standard deviation of 1. It is useful when the data is normally distributed and the outliers are not significant.

MinMaxScaler: It scales the data to a given range, typically between 0 and 1. It is useful when the data is not normally distributed and the outliers are not significant.

MaxAbsScaler: It scales the data to the range [-1, 1] based on the absolute maximum value of each feature. It is useful when the data has outliers and the distribution is not normal.

Ancestors

Scaler

Methods

def fit(self, X)
def transform(self, X)

class Scaler

StandardScaler: It scales the data to have a mean of 0 and a standard deviation of 1. It is useful when the data is normally distributed and the outliers are not significant.

MinMaxScaler: It scales the data to a given range, typically between 0 and 1. It is useful when the data is not normally distributed and the outliers are not significant.

MaxAbsScaler: It scales the data to the range [-1, 1] based on the absolute maximum value of each feature. It is useful when the data has outliers and the distribution is not normal.

Methods

def fit(self, X)
def fit_transform(self, X)
def transform(self, X)

class StandardScaler

StandardScaler: It scales the data to have a mean of 0 and a standard deviation of 1. It is useful when the data is normally distributed and the outliers are not significant.

MinMaxScaler: It scales the data to a given range, typically between 0 and 1. It is useful when the data is not normally distributed and the outliers are not significant.

MaxAbsScaler: It scales the data to the range [-1, 1] based on the absolute maximum value of each feature. It is useful when the data has outliers and the distribution is not normal.

Ancestors

Scaler

Methods

def fit(self, X)
def transform(self, X)