sklearn_classification#


class SkLearnDecisionTreeVectorClassificationModel(min_samples_leaf=1, random_state=42, **model_args)[source]#

Bases: AbstractSkLearnVectorClassificationModel

Parameters:
  • model_constructor – the sklearn model constructor

  • model_args – arguments to be passed to the sklearn model constructor

  • use_balanced_class_weights – whether to compute class weights from the training data and apply the corresponding weight to each data point such that the sum of weights for all classes is equal. This is achieved by applying a weight proportional to the reciprocal frequency of the class in the (training) data. We scale weights such that the smallest weight (of the largest class) is 1, ensuring that weight counts still reasonably correspond to data point counts. Note that weighted data points may not be supported for all types of models.

  • use_label_encoding – whether to replace original class labels with 0-based index in sorted list of labels (a.k.a. label encoding), which is required by some sklearn-compatible implementations (particularly xgboost)

class SkLearnRandomForestVectorClassificationModel(n_estimators=100, min_samples_leaf=1, random_state=42, use_balanced_class_weights=False, **model_args)[source]#

Bases: AbstractSkLearnVectorClassificationModel, FeatureImportanceProviderSkLearnClassification

Parameters:
  • model_constructor – the sklearn model constructor

  • model_args – arguments to be passed to the sklearn model constructor

  • use_balanced_class_weights – whether to compute class weights from the training data and apply the corresponding weight to each data point such that the sum of weights for all classes is equal. This is achieved by applying a weight proportional to the reciprocal frequency of the class in the (training) data. We scale weights such that the smallest weight (of the largest class) is 1, ensuring that weight counts still reasonably correspond to data point counts. Note that weighted data points may not be supported for all types of models.

  • use_label_encoding – whether to replace original class labels with 0-based index in sorted list of labels (a.k.a. label encoding), which is required by some sklearn-compatible implementations (particularly xgboost)

class SkLearnMLPVectorClassificationModel(hidden_layer_sizes=(100,), activation: str = 'relu', solver: str = 'adam', batch_size: Union[int, str] = 'auto', random_state: Optional[int] = 42, max_iter: int = 200, early_stopping: bool = False, n_iter_no_change: int = 10, **model_args)[source]#

Bases: AbstractSkLearnVectorClassificationModel

Parameters:
  • hidden_layer_sizes – the sequence of hidden layer sizes

  • activation – {“identity”, “logistic”, “tanh”, “relu”} the activation function to use for hidden layers (the one used for the output layer is always ‘identity’)

  • solver – {“adam”, “lbfgs”, “sgd”} the name of the solver to apply

  • batch_size – the batch size or “auto” for min(200, data set size)

  • random_state – the random seed for reproducability; use None if it shall not be specifically defined

  • max_iter – the number of iterations (gradient steps for L-BFGS, epochs for other solvers)

  • early_stopping – whether to use early stopping (stop training after n_iter_no_change epochs without improvement)

  • n_iter_no_change – the number of iterations after which to stop early (if early_stopping is enabled)

  • model_args – additional arguments to pass on to MLPClassifier, see https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

class SkLearnMultinomialNBVectorClassificationModel(**model_args)[source]#

Bases: AbstractSkLearnVectorClassificationModel

Parameters:
  • model_constructor – the sklearn model constructor

  • model_args – arguments to be passed to the sklearn model constructor

  • use_balanced_class_weights – whether to compute class weights from the training data and apply the corresponding weight to each data point such that the sum of weights for all classes is equal. This is achieved by applying a weight proportional to the reciprocal frequency of the class in the (training) data. We scale weights such that the smallest weight (of the largest class) is 1, ensuring that weight counts still reasonably correspond to data point counts. Note that weighted data points may not be supported for all types of models.

  • use_label_encoding – whether to replace original class labels with 0-based index in sorted list of labels (a.k.a. label encoding), which is required by some sklearn-compatible implementations (particularly xgboost)

class SkLearnSVCVectorClassificationModel(random_state=42, **model_args)[source]#

Bases: AbstractSkLearnVectorClassificationModel

Parameters:
  • model_constructor – the sklearn model constructor

  • model_args – arguments to be passed to the sklearn model constructor

  • use_balanced_class_weights – whether to compute class weights from the training data and apply the corresponding weight to each data point such that the sum of weights for all classes is equal. This is achieved by applying a weight proportional to the reciprocal frequency of the class in the (training) data. We scale weights such that the smallest weight (of the largest class) is 1, ensuring that weight counts still reasonably correspond to data point counts. Note that weighted data points may not be supported for all types of models.

  • use_label_encoding – whether to replace original class labels with 0-based index in sorted list of labels (a.k.a. label encoding), which is required by some sklearn-compatible implementations (particularly xgboost)

class SkLearnLogisticRegressionVectorClassificationModel(random_state=42, **model_args)[source]#

Bases: AbstractSkLearnVectorClassificationModel

Parameters:
  • model_constructor – the sklearn model constructor

  • model_args – arguments to be passed to the sklearn model constructor

  • use_balanced_class_weights – whether to compute class weights from the training data and apply the corresponding weight to each data point such that the sum of weights for all classes is equal. This is achieved by applying a weight proportional to the reciprocal frequency of the class in the (training) data. We scale weights such that the smallest weight (of the largest class) is 1, ensuring that weight counts still reasonably correspond to data point counts. Note that weighted data points may not be supported for all types of models.

  • use_label_encoding – whether to replace original class labels with 0-based index in sorted list of labels (a.k.a. label encoding), which is required by some sklearn-compatible implementations (particularly xgboost)

class SkLearnKNeighborsVectorClassificationModel(**model_args)[source]#

Bases: AbstractSkLearnVectorClassificationModel

Parameters:
  • model_constructor – the sklearn model constructor

  • model_args – arguments to be passed to the sklearn model constructor

  • use_balanced_class_weights – whether to compute class weights from the training data and apply the corresponding weight to each data point such that the sum of weights for all classes is equal. This is achieved by applying a weight proportional to the reciprocal frequency of the class in the (training) data. We scale weights such that the smallest weight (of the largest class) is 1, ensuring that weight counts still reasonably correspond to data point counts. Note that weighted data points may not be supported for all types of models.

  • use_label_encoding – whether to replace original class labels with 0-based index in sorted list of labels (a.k.a. label encoding), which is required by some sklearn-compatible implementations (particularly xgboost)