feature_generator_registry#
Source code: sensai/featuregen/feature_generator_registry.py
- class FeatureGeneratorRegistry(use_singletons: bool = False)[source]#
Bases:
object
Represents a registry for (named) feature generator factories
- Parameters:
use_singletons – if True, internally maintain feature generator singletons, such that there is at most one instance for each name/key
- property available_features#
- register_factory(name: Hashable, factory: Callable[[], FeatureGenerator])[source]#
Registers a feature generator factory which can subsequently be referenced by models via their name/hashable key
- Parameters:
name – the name/key (which can, in particular, be a string or an Enum item). Especially for larger projects the use of an Enum is recommended (for optimal IDE support)
factory – the factory
- get_feature_generator(name: str) FeatureGenerator [source]#
Creates a feature generator from a name, which must have been previously registered. The name of the returned feature generator (as returned by getName()) is set to name.
- Parameters:
name – the name (which can, in particular, be a string or an enum item)
- Returns:
a new feature generator instance (or existing instance for the case where useSingletons is enabled)
- collect_features(*feature_generators_or_names: Union[Hashable, FeatureGenerator]) FeatureCollector [source]#
Creates a feature collector for the given feature names/keys/instances, which can subsequently be added to a model.
- Parameters:
feature_generators_or_names – feature names/keys known to this registry or feature generator instances
- class FeatureCollector(*feature_generators_or_names: Union[Hashable, FeatureGenerator], registry: Optional[FeatureGeneratorRegistry] = None)[source]#
Bases:
object
A feature collector which facilitates the collection of features that shall be used by a model as well as the generation of commonly used feature transformers that are informed by the features’ meta-data.
- Parameters:
feature_generators_or_names – generator names/keys (known to the registry) or generator instances
registry – the feature generator registry for the case where names/keys are passed
- get_multi_feature_generator() MultiFeatureGenerator [source]#
Gets the multi-feature generator that was created for this collector. To create a new, independent instance (e.g. when using this collector for multiple models), use
create_multi_feature_generator()
instead.- Returns:
the multi-feature generator that was created for this instance
- get_categorical_feature_name_regex() str [source]#
- Returns:
a regular expression that matches all known categorical feature names
- create_multi_feature_generator()[source]#
Creates a new instance of the multi-feature generator that generates the features collected by this instance. If the feature collector instance is not used for multiple models, use
get_multi_feature_generator()
instead to obtain the instance that has already been created.- Returns:
a new multi-feature generator that generates the collected features
- create_dft_normalisation(default_transformer_factory=None, require_all_handled=True, inplace=False) DFTNormalisation [source]#
Creates a feature transformer that will apply normalisation to all supported (numeric) features
- Parameters:
default_transformer_factory – a factory for the creation of transformer instances (which implements the API used by sklearn.preprocessing, e.g. StandardScaler) that shall be used to create a transformer for all rules that do not specify a particular transformer. The default transformer will only be applied to columns matched by such rules, unmatched columns will not be transformed. Use SkLearnTransformerFactoryFactory to conveniently create a factory.
require_all_handled – whether to raise an exception if not all columns are matched by a rule
inplace – whether to apply data frame transformations in-place
- Returns:
the transformer
- create_dft_one_hot_encoder(ignore_unknown=False, inplace=False)[source]#
Creates a feature transformer that will apply one-hot encoding to all the features that are known to be categorical
- Parameters:
inplace – whether to perform the transformation in-place
ignore_unknown – if True and an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. if False, an unknown category will raise an error.
- Returns:
the transformer
- create_feature_transformer_normalisation(default_transformer_factory=None, require_all_handled=True, inplace=False) DFTNormalisation [source]#
Creates a feature transformer that will apply normalisation to all supported (numeric) features. Alias of create_dft_normalisation.
- Parameters:
default_transformer_factory – a factory for the creation of transformer instances (which implements the API used by sklearn.preprocessing, e.g. StandardScaler) that shall be used to create a transformer for all rules that do not specify a particular transformer. The default transformer will only be applied to columns matched by such rules, unmatched columns will not be transformed. Use SkLearnTransformerFactoryFactory to conveniently create a factory.
require_all_handled – whether to raise an exception if not all columns are matched by a rule
inplace – whether to apply data frame transformations in-place
- Returns:
the transformer
- create_feature_transformer_one_hot_encoder(ignore_unknown=False, inplace=False)[source]#
Creates a feature transformer that will apply one-hot encoding to all the features that are known to be categorical. Alias of create_dft_one_hot_encoder.
- Parameters:
inplace – whether to perform the transformation in-place
ignore_unknown – if True and an unknown category is encountered during transform, the resulting one-hot encoded columns for this feature will be all zeros. if False, an unknown category will raise an error.
- Returns:
the transformer