columngen#


class ColumnGenerator(generated_column_name: str)[source]#

Bases: object

Generates a single column (pd.Series) from an input data frame, which is to have the same index as the input

Parameters:

generated_column_name – the name of the column being generated

generate_column(df: DataFrame) Series[source]#

Generates a column from the input data frame

Parameters:

df – the input data frame

Returns:

the column as a named series, which has the same index as the input

to_feature_generator(take_input_column_if_present: bool = False, normalisation_rule_template: Optional[RuleTemplate] = None, is_categorical: bool = False)[source]#

Transforms this column generator into a feature generator that can be used as part of a VectorModel.

Parameters:
  • take_input_column_if_present – if True, then if a column whose name corresponds to the column to generate exists in the input data, simply copy it to generate the output (without using the column generator); if False, always apply the columnGen to generate the output

  • is_categorical – whether the resulting column is categorical

  • normalisation_rule_template – template for a DFTNormalisation for the resulting column. This should only be provided if is_categorical is False

Returns:

class IndexCachedColumnGenerator(column_generator: ColumnGenerator, cache: KeyValueCache)[source]#

Bases: ColumnGenerator

Decorator for a column generator which adds support for cached column generation where cache keys are given by the input data frame’s index. Entries not found in the cache are computed by the wrapped column generator.

The main use case for this class is to add caching to existing ColumnGenerators. For creating a new caching ColumnGenerator the use of ColumnGeneratorCachedByIndex is encouraged.

Parameters:
  • column_generator – the column generator with which to generate values for keys not found in the cache

  • cache – the cache in which to store key-value pairs

log = <Logger sensai.columngen.IndexCachedColumnGenerator (WARNING)>#
class ColumnGeneratorCachedByIndex(generated_column_name: str, cache: Optional[KeyValueCache], persist_cache=False)[source]#

Bases: ColumnGenerator, ABC

Base class for column generators, which supports cached column generation, each value being generated independently. Cache keys are given by the input data frame’s index.

Parameters:
  • generated_column_name – the name of the column being generated

  • cache – the cache in which to store key-value pairs. If None, caching will be disabled

  • persist_cache – whether to persist the cache when pickling

log = <Logger sensai.columngen.ColumnGeneratorCachedByIndex (WARNING)>#