Module areixio.utils.preprocessor
Classes
class FirstValueScaler
-
StandardScaler: It scales the data to have a mean of 0 and a standard deviation of 1. It is useful when the data is normally distributed and the outliers are not significant.
MinMaxScaler: It scales the data to a given range, typically between 0 and 1. It is useful when the data is not normally distributed and the outliers are not significant.
MaxAbsScaler: It scales the data to the range [-1, 1] based on the absolute maximum value of each feature. It is useful when the data has outliers and the distribution is not normal.
RobustScaler: It scales the data based on the median and the interquartile range (IQR) instead of the mean and the standard deviation. It is useful when the data has significant outliers and the distribution is not normal.
The choice of the scaling technique depends on the distribution of the data and the presence of outliers. StandardScaler and MinMaxScaler are commonly used when the distribution is normal and the outliers are not significant. MaxAbsScaler and RobustScaler are used when the distribution is not normal and the outliers are significant.
Ancestors
Methods
def fit(self, X)
def transform(self, X)
class GaussianHMM (n_components, n_features)
-
Here, we define a class called GaussianHMM with methods for fitting the model parameters and predicting the most likely sequence of hidden states for a given observation sequence. The fit method initializes the model parameters randomly and then runs the Baum-Welch algorithm to estimate the model parameters from the observation sequence. The _forward_backward method computes the forward and backward probabilities using the current model parameters, and the _compute_statistics method computes the expected sufficient statistics from the forward and backward probabilities. The _update_parameters method updates the model
Methods
def fit(self, X, max_iter=100)
def predict(self, X)
class KalmanFilter
-
https://www.quantstart.com/articles/kalman-filter-based-pairs-trading-strategy-in-qstrader
Briefly, a Kalman filter is a state-space model applicable to linear dynamic systems – systems whose state is time-dependent and state variations are represented linearly. The model is used to estimate unknown states of a variable based on a series of past values. The procedure is two-fold: a prediction (estimate) is made by the filter of the current state of a variable and the uncertainty of the estimate itself. When new data is available, these estimates are updated. There is a lot of information available about Kalman filters, and the variety of their applications is pretty astounding, but for now, we're going to use a Kalman filter to estimate the hedge ratio between a pair of equities.
The idea behind the strategy is pretty straightforward: take two equities that are cointegrated and create a long-short portfolio. The premise of this is that the spread between the value of our two positions should be mean-reverting. Anytime the spread deviates from its expected value, one of the assets moved in an unexpected direction and is due to revert back. When the spread diverges, you can take advantage of this by going long or short on the spread.
To illustrate, imagine you have a long position in AAPL worth $2000 and a short position in IBM worth $2000. This gives you a net spread of $0. Since you expected AAPL and IBM to move together, then if the spread increases significantly above $0, you would short the spread in the expectation that it will return to $0, it's natural equilibrium. Similarly, if the value drops significantly below $0, you would long the spread and capture the profits as its value returns to $0. In our application, the Kalman filter will be used to track the hedging ratio between our equities to ensure that the portfolio value is stationary, which means it will continue to exhibit mean-reversion behavior.
Methods
def update(self, price_one, price_two)
class MaxAbsScaler
-
StandardScaler: It scales the data to have a mean of 0 and a standard deviation of 1. It is useful when the data is normally distributed and the outliers are not significant.
MinMaxScaler: It scales the data to a given range, typically between 0 and 1. It is useful when the data is not normally distributed and the outliers are not significant.
MaxAbsScaler: It scales the data to the range [-1, 1] based on the absolute maximum value of each feature. It is useful when the data has outliers and the distribution is not normal.
RobustScaler: It scales the data based on the median and the interquartile range (IQR) instead of the mean and the standard deviation. It is useful when the data has significant outliers and the distribution is not normal.
The choice of the scaling technique depends on the distribution of the data and the presence of outliers. StandardScaler and MinMaxScaler are commonly used when the distribution is normal and the outliers are not significant. MaxAbsScaler and RobustScaler are used when the distribution is not normal and the outliers are significant.
Ancestors
Methods
def fit(self, X)
def transform(self, X)
class MinMaxScaler
-
StandardScaler: It scales the data to have a mean of 0 and a standard deviation of 1. It is useful when the data is normally distributed and the outliers are not significant.
MinMaxScaler: It scales the data to a given range, typically between 0 and 1. It is useful when the data is not normally distributed and the outliers are not significant.
MaxAbsScaler: It scales the data to the range [-1, 1] based on the absolute maximum value of each feature. It is useful when the data has outliers and the distribution is not normal.
RobustScaler: It scales the data based on the median and the interquartile range (IQR) instead of the mean and the standard deviation. It is useful when the data has significant outliers and the distribution is not normal.
The choice of the scaling technique depends on the distribution of the data and the presence of outliers. StandardScaler and MinMaxScaler are commonly used when the distribution is normal and the outliers are not significant. MaxAbsScaler and RobustScaler are used when the distribution is not normal and the outliers are significant.
Ancestors
Methods
def fit(self, X)
def transform(self, X)
class Normalizer (norm='l2')
-
Methods
def transform(self, X)
class PCA (n_components)
-
Here, we define a class called PCA with a method for fitting and transforming the data. The fit_transform method computes the principal components of the input data and transforms the data onto the new feature space. It first centers the data by subtracting the mean of each feature, then computes the covariance matrix of the centered data. It then computes the eigenvectors and eigenvalues of the covariance matrix, sorts them in descending order, and selects the top n_components eigenvectors. Finally, it projects the data onto the new feature space defined by the top eigenvectors.
Methods
def fit_transform(self, X)
class RobustScaler
-
StandardScaler: It scales the data to have a mean of 0 and a standard deviation of 1. It is useful when the data is normally distributed and the outliers are not significant.
MinMaxScaler: It scales the data to a given range, typically between 0 and 1. It is useful when the data is not normally distributed and the outliers are not significant.
MaxAbsScaler: It scales the data to the range [-1, 1] based on the absolute maximum value of each feature. It is useful when the data has outliers and the distribution is not normal.
RobustScaler: It scales the data based on the median and the interquartile range (IQR) instead of the mean and the standard deviation. It is useful when the data has significant outliers and the distribution is not normal.
The choice of the scaling technique depends on the distribution of the data and the presence of outliers. StandardScaler and MinMaxScaler are commonly used when the distribution is normal and the outliers are not significant. MaxAbsScaler and RobustScaler are used when the distribution is not normal and the outliers are significant.
Ancestors
Methods
def fit(self, X)
def transform(self, X)
class Scaler
-
StandardScaler: It scales the data to have a mean of 0 and a standard deviation of 1. It is useful when the data is normally distributed and the outliers are not significant.
MinMaxScaler: It scales the data to a given range, typically between 0 and 1. It is useful when the data is not normally distributed and the outliers are not significant.
MaxAbsScaler: It scales the data to the range [-1, 1] based on the absolute maximum value of each feature. It is useful when the data has outliers and the distribution is not normal.
RobustScaler: It scales the data based on the median and the interquartile range (IQR) instead of the mean and the standard deviation. It is useful when the data has significant outliers and the distribution is not normal.
The choice of the scaling technique depends on the distribution of the data and the presence of outliers. StandardScaler and MinMaxScaler are commonly used when the distribution is normal and the outliers are not significant. MaxAbsScaler and RobustScaler are used when the distribution is not normal and the outliers are significant.
Subclasses
Methods
def fit(self, X)
def fit_transform(self, X)
def transform(self, X)
class StandardScaler
-
StandardScaler: It scales the data to have a mean of 0 and a standard deviation of 1. It is useful when the data is normally distributed and the outliers are not significant.
MinMaxScaler: It scales the data to a given range, typically between 0 and 1. It is useful when the data is not normally distributed and the outliers are not significant.
MaxAbsScaler: It scales the data to the range [-1, 1] based on the absolute maximum value of each feature. It is useful when the data has outliers and the distribution is not normal.
RobustScaler: It scales the data based on the median and the interquartile range (IQR) instead of the mean and the standard deviation. It is useful when the data has significant outliers and the distribution is not normal.
The choice of the scaling technique depends on the distribution of the data and the presence of outliers. StandardScaler and MinMaxScaler are commonly used when the distribution is normal and the outliers are not significant. MaxAbsScaler and RobustScaler are used when the distribution is not normal and the outliers are significant.
Ancestors
Methods
def fit(self, X)
def transform(self, X)