rfe#


class RFEStep(metric_value: float, features: List[str])[source]#

Bases: object

metric_value: float#
features: List[str]#
class RFEResult(steps: List[RFEStep], metric_name: str, minimise: bool)[source]#

Bases: object

get_sorted_steps() List[RFEStep][source]#
Returns:

the elimination step results, sorted from best to worst

get_selected_features() List[str][source]#
get_num_features_array() ndarray[source]#
Returns:

array containing the number of features that was considered in each step

get_metric_values_array() ndarray[source]#
Returns:

array containing the metric value that resulted in each step

plot_metric_values() Figure[source]#

Plots the metric values vs. the number of features for each step of the elimination

Returns:

the figure

class RecursiveFeatureEliminationCV(cross_validator_params: VectorModelCrossValidatorParams, min_features=1)[source]#

Bases: object

Recursive feature elimination, using cross-validation to select the best set of features: In each step, the model is first evaluated using cross-validation. Then the feature importance values are aggregated across the models that were trained during cross-validation, and the least important feature is discarded. For the case where the lowest feature importance is 0, all features with 0 importance are discarded. This process is repeated until a point is reached where only minFeatures (or less) remain. The selected set of features is the one from the step where cross-validation yielded the best evaluation metric value.

Feature importance is computed at the level of model input features, i.e. after feature generation and transformation.

NOTE: This implementation differs markedly from sklearn’s RFECV, which performs an independent RFE for each fold. RFECV determines the number of features to use by determining the elimination step in each fold that yielded the best metric value on average. Because the eliminations are independent, the actual features that were being used in those step could have been completely different. Using the selected number of features n, RFECV then performs another RFE, eliminating features until n features remain and returns these features as the result.

Parameters:
  • cross_validator_params – the parameters for cross-validation

  • min_features – the smallest number of features that shall be evaluated during feature elimination

run(model: Union[VectorModel, FeatureImportanceProvider], io_data: InputOutputData, metric_name: str, minimise: bool, remove_input_preprocessors=False) RFEResult[source]#

Runs the optimisation for the given model and data.

Parameters:
  • model – the model

  • io_data – the data

  • metric_name – the metric to optimise

  • minimise – whether the metric shall be minimsed; if False, maximise.

  • remove_input_preprocessors – whether to remove input preprocessors from the model and create input data only once during the entire experiment; this is usually reasonable only if all input preprocessors are not trained on the input data or if, for any given data split/fold, the preprocessor learning outcome is likely to be largely similar.

Returns:

a result object, which provides access to the selected features and data on all elimination steps

class RecursiveFeatureElimination(metric_computation: MetricComputation, min_features=1)[source]#

Bases: object

Parameters:
  • metric_computation – the method to apply for metric computation in order to determine which feature set is best

  • min_features – the smallest number of features that shall be evaluated during feature elimination

run(model_factory: Callable[[], Union[VectorRegressionModel, VectorClassificationModel]], minimise: bool) RFEResult[source]#

Runs the optimisation for the given model and data.

Parameters:
  • model_factory – factory for the model to be evaluated

  • minimise – whether the metric shall be minimised; if False, maximise.

Returns:

a result object, which provides access to the selected features and data on all elimination steps