feature_generator_registry#


class FeatureGeneratorRegistry(use_singletons: bool = False)[source]#

Bases: object

Represents a registry for (named) feature generator factories

Parameters:

use_singletons – if True, internally maintain feature generator singletons, such that there is at most one instance for each name/key

property available_features#
register_factory(name: Hashable, factory: Callable[[], FeatureGenerator])[source]#

Registers a feature generator factory which can subsequently be referenced by models via their name/hashable key

Parameters:
  • name – the name/key (which can, in particular, be a string or an Enum item). Especially for larger projects the use of an Enum is recommended (for optimal IDE support)

  • factory – the factory

get_feature_generator(name: str) FeatureGenerator[source]#

Creates a feature generator from a name, which must have been previously registered. The name of the returned feature generator (as returned by getName()) is set to name.

Parameters:

name – the name (which can, in particular, be a string or an enum item)

Returns:

a new feature generator instance (or existing instance for the case where useSingletons is enabled)

collect_features(*feature_generators_or_names: Union[Hashable, FeatureGenerator]) FeatureCollector[source]#

Creates a feature collector for the given feature names/keys/instances, which can subsequently be added to a model.

Parameters:

feature_generators_or_names – feature names/keys known to this registry or feature generator instances

class FeatureCollector(*feature_generators_or_names: Union[Hashable, FeatureGenerator], registry: Optional[FeatureGeneratorRegistry] = None)[source]#

Bases: object

A feature collector which facilitates the collection of features that shall be used by a model as well as the generation of commonly used feature transformers that are informed by the features’ meta-data.

Parameters:
  • feature_generators_or_names – generator names/keys (known to the registry) or generator instances

  • registry – the feature generator registry for the case where names/keys are passed

get_multi_feature_generator() MultiFeatureGenerator[source]#

Gets the multi-feature generator that was created for this collector. To create a new, independent instance (e.g. when using this collector for multiple models), use create_multi_feature_generator() instead.

Returns:

the multi-feature generator that was created for this instance

get_normalisation_rules(include_generated_categorical_rules=True)[source]#
get_categorical_feature_name_regex() str[source]#
Returns:

a regular expression that matches all known categorical feature names

create_multi_feature_generator()[source]#

Creates a new instance of the multi-feature generator that generates the features collected by this instance. If the feature collector instance is not used for multiple models, use get_multi_feature_generator() instead to obtain the instance that has already been created.

Returns:

a new multi-feature generator that generates the collected features

create_dft_normalisation(default_transformer_factory=None, require_all_handled=True, inplace=False) DFTNormalisation[source]#

Creates a feature transformer that will apply normalisation to all supported (numeric) features

Parameters:
  • default_transformer_factory – a factory for the creation of transformer instances (which implements the API used by sklearn.preprocessing, e.g. StandardScaler) that shall be used to create a transformer for all rules that do not specify a particular transformer. The default transformer will only be applied to columns matched by such rules, unmatched columns will not be transformed. Use SkLearnTransformerFactoryFactory to conveniently create a factory.

  • require_all_handled – whether to raise an exception if not all columns are matched by a rule

  • inplace – whether to apply data frame transformations in-place

Returns:

the transformer

create_dft_one_hot_encoder(ignore_unknown=False, inplace=False)[source]#

Creates a feature transformer that will apply one-hot encoding to all the features that are known to be categorical

Parameters:
  • inplace – whether to perform the transformation in-place

  • ignore_unknown – if True and an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. if False, an unknown category will raise an error.

Returns:

the transformer

create_feature_transformer_normalisation(default_transformer_factory=None, require_all_handled=True, inplace=False) DFTNormalisation[source]#

Creates a feature transformer that will apply normalisation to all supported (numeric) features. Alias of create_dft_normalisation.

Parameters:
  • default_transformer_factory – a factory for the creation of transformer instances (which implements the API used by sklearn.preprocessing, e.g. StandardScaler) that shall be used to create a transformer for all rules that do not specify a particular transformer. The default transformer will only be applied to columns matched by such rules, unmatched columns will not be transformed. Use SkLearnTransformerFactoryFactory to conveniently create a factory.

  • require_all_handled – whether to raise an exception if not all columns are matched by a rule

  • inplace – whether to apply data frame transformations in-place

Returns:

the transformer

create_feature_transformer_one_hot_encoder(ignore_unknown=False, inplace=False)[source]#

Creates a feature transformer that will apply one-hot encoding to all the features that are known to be categorical. Alias of create_dft_one_hot_encoder.

Parameters:
  • inplace – whether to perform the transformation in-place

  • ignore_unknown – if True and an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. if False, an unknown category will raise an error.

Returns:

the transformer