eval_util#


This module contains methods and classes that facilitate evaluation of different types of models. The suggested workflow for evaluation is to use these higher-level functionalities instead of instantiating the evaluation classes directly.

create_vector_model_evaluator(data: InputOutputData, model: Optional[VectorModel] = None, is_regression: Optional[bool] = None, params: Optional[Union[RegressionEvaluatorParams, ClassificationEvaluatorParams]] = None, test_data: Optional[InputOutputData] = None) Union[VectorRegressionModelEvaluator, VectorClassificationModelEvaluator][source]#
create_vector_model_cross_validator(data: InputOutputData, model: Optional[VectorModel] = None, is_regression: Optional[bool] = None, params: Optional[Union[VectorModelCrossValidatorParams, Dict[str, Any]]] = None) Union[VectorClassificationModelCrossValidator, VectorRegressionModelCrossValidator][source]#
create_evaluation_util(data: InputOutputData, model: Optional[VectorModel] = None, is_regression: Optional[bool] = None, evaluator_params: Optional[Union[RegressionEvaluatorParams, ClassificationEvaluatorParams]] = None, cross_validator_params: Optional[Dict[str, Any]] = None, test_io_data: Optional[InputOutputData] = None) Union[ClassificationModelEvaluation, RegressionModelEvaluation][source]#
eval_model_via_evaluator(model: TModel, io_data: InputOutputData, test_fraction=0.2, plot_target_distribution=False, compute_probabilities=True, normalize_plots=True, random_seed=60) TEvalData[source]#

Evaluates the given model via a simple evaluation mechanism that uses a single split

Parameters:
  • model – the model to evaluate

  • io_data – data on which to evaluate

  • test_fraction – the fraction of the data to test on

  • plot_target_distribution – whether to plot the target values distribution in the entire dataset

  • compute_probabilities – only relevant if the model is a classifier

  • normalize_plots – whether to normalize plotted distributions such that the sum/integrate to 1

  • random_seed

Returns:

the evaluation data

class EvaluationResultCollector(show_plots: bool = True, result_writer: Optional[ResultWriter] = None, tracking_context: Optional[TrackingContext] = None)[source]#

Bases: object

is_plot_creation_enabled() bool[source]#
add_figure(name: str, fig: Figure)[source]#
add_data_frame_csv_file(name: str, df: DataFrame)[source]#
child(added_filename_prefix)[source]#
class EvalStatsPlotCollector[source]#

Bases: Generic[TEvalStats, TEvalStatsPlot]

add_plot(name: str, plot: EvalStatsPlot)[source]#
get_enabled_plots() List[str][source]#
disable_plots(*names: str)[source]#
create_plots(eval_stats: EvalStats, subtitle: str, result_collector: EvaluationResultCollector)[source]#
class RegressionEvalStatsPlotCollector[source]#

Bases: EvalStatsPlotCollector[RegressionEvalStats, RegressionEvalStatsPlot]

class ClassificationEvalStatsPlotCollector[source]#

Bases: EvalStatsPlotCollector[RegressionEvalStats, RegressionEvalStatsPlot]

class ModelEvaluation(io_data: InputOutputData, eval_stats_plot_collector: Union[RegressionEvalStatsPlotCollector, ClassificationEvalStatsPlotCollector], evaluator_params: Optional[Union[RegressionEvaluatorParams, ClassificationEvaluatorParams, Dict[str, Any]]] = None, cross_validator_params: Optional[Union[VectorModelCrossValidatorParams, Dict[str, Any]]] = None, test_io_data: Optional[InputOutputData] = None)[source]#

Bases: ABC, Generic[TModel, TEvaluator, TEvalData, TCrossValidator, TCrossValData, TEvalStats]

Utility class for the evaluation of models based on a dataset

Parameters:
  • io_data – the data set to use for evaluation. For evaluation purposes, this dataset usually will be split into training and test data according to the rules specified by evaluator_params. However, if test_io_data is specified, then this is taken to be the training data and test_io_data is taken to be the test data when creating evaluators for simple (single-split) evaluation.

  • eval_stats_plot_collector – a collector for plots generated from evaluation stats objects

  • evaluator_params – parameters with which to instantiate evaluators

  • cross_validator_params – parameters with which to instantiate cross-validators

  • test_io_data – optional test data (see io_data)

create_evaluator(model: Optional[TModel] = None, is_regression: Optional[bool] = None) TEvaluator[source]#

Creates an evaluator holding the current input-output data

Parameters:
  • model – the model for which to create an evaluator (just for reading off regression or classification, the resulting evaluator will work on other models as well)

  • is_regression – whether to create a regression model evaluator. Either this or model have to be specified

Returns:

an evaluator

create_cross_validator(model: Optional[TModel] = None, is_regression: Optional[bool] = None) TCrossValidator[source]#

Creates a cross-validator holding the current input-output data

Parameters:
  • model – the model for which to create a cross-validator (just for reading off regression or classification, the resulting evaluator will work on other models as well)

  • is_regression – whether to create a regression model cross-validator. Either this or model have to be specified

Returns:

an evaluator

perform_simple_evaluation(model: TModel, create_plots=True, show_plots=False, log_results=True, result_writer: Optional[ResultWriter] = None, additional_evaluation_on_training_data=False, fit_model=True, write_eval_stats=False, tracked_experiment: Optional[TrackedExperiment] = None, evaluator: Optional[TEvaluator] = None) TEvalData[source]#
perform_cross_validation(model: TModel, show_plots=False, log_results=True, result_writer: Optional[ResultWriter] = None, tracked_experiment: Optional[TrackedExperiment] = None, cross_validator: Optional[TCrossValidator] = None) TCrossValData[source]#

Evaluates the given model via cross-validation

Parameters:
  • model – the model to evaluate

  • show_plots – whether to show plots that visualise evaluation results (combining all folds)

  • log_results – whether to log evaluation results

  • result_writer – a writer with which to store text files and plots. The evaluated model’s name is added to each filename automatically

  • tracked_experiment – a tracked experiment with which results shall be associated

  • cross_validator – the cross-validator to apply; if None, a suitable cross-validator will be created

Returns:

cross-validation result data

compare_models(models: Sequence[TModel], result_writer: Optional[ResultWriter] = None, use_cross_validation=False, fit_models=True, write_individual_results=True, sort_column: Optional[str] = None, sort_ascending: bool = True, sort_column_move_to_left=True, also_include_unsorted_results: bool = False, also_include_cross_val_global_stats: bool = False, visitors: Optional[Iterable[ModelComparisonVisitor]] = None, write_visitor_results=False, write_csv=False, tracked_experiment: Optional[TrackedExperiment] = None) ModelComparisonData[source]#

Compares several models via simple evaluation or cross-validation

Parameters:
  • models – the models to compare

  • result_writer – a writer with which to store results of the comparison

  • use_cross_validation – whether to use cross-validation in order to evaluate models; if False, use a simple evaluation on test data (single split)

  • fit_models – whether to fit models before evaluating them; this can only be False if useCrossValidation=False

  • write_individual_results – whether to write results files on each individual model (in addition to the comparison summary)

  • sort_column – column/metric name by which to sort; the fact that the column names change when using cross-validation (aggregation function names being added) should be ignored, simply pass the (unmodified) metric name

  • sort_ascending – whether to sort using sortColumn in ascending order

  • sort_column_move_to_left – whether to move the sortColumn (if any) to the very left

  • also_include_unsorted_results – whether to also include, for the case where the results are sorted, the unsorted table of results in the results text

  • also_include_cross_val_global_stats – whether to also include, when using cross-validation, the evaluation metrics obtained when combining the predictions from all folds into a single collection. Note that for classification models, this may not always be possible (if the set of classes know to the model differs across folds)

  • visitors – visitors which may process individual results

  • write_visitor_results – whether to collect results from visitors (if any) after the comparison

  • write_csv – whether to write metrics table to CSV files

  • tracked_experiment – an experiment for tracking

Returns:

the comparison results

compare_models_cross_validation(models: Sequence[TModel], result_writer: Optional[ResultWriter] = None) ModelComparisonData[source]#

Compares several models via cross-validation

Parameters:
  • models – the models to compare

  • result_writer – a writer with which to store results of the comparison

Returns:

the comparison results

create_plots(data: Union[TEvalData, TCrossValData], show_plots=True, result_writer: Optional[ResultWriter] = None, subtitle_prefix: str = '', tracking_context: Optional[TrackingContext] = None)[source]#

Creates default plots that visualise the results in the given evaluation data

Parameters:
  • data – the evaluation data for which to create the default plots

  • show_plots – whether to show plots

  • result_writer – if not None, plots will be written using this writer

  • subtitle_prefix – a prefix to add to the subtitle (which itself is the model name)

  • tracking_context – the experiment tracking context

class RegressionModelEvaluation(io_data: InputOutputData, evaluator_params: Optional[Union[RegressionEvaluatorParams, Dict[str, Any]]] = None, cross_validator_params: Optional[Union[VectorModelCrossValidatorParams, Dict[str, Any]]] = None, test_io_data: Optional[InputOutputData] = None)[source]#

Bases: ModelEvaluation[VectorRegressionModel, VectorRegressionModelEvaluator, VectorRegressionModelEvaluationData, VectorRegressionModelCrossValidator, VectorRegressionModelCrossValidationData, RegressionEvalStats]

Parameters:
  • io_data – the data set to use for evaluation. For evaluation purposes, this dataset usually will be split into training and test data according to the rules specified by evaluator_params. However, if test_io_data is specified, then this is taken to be the training data and test_io_data is taken to be the test data when creating evaluators for simple (single-split) evaluation.

  • evaluator_params – parameters with which to instantiate evaluators

  • cross_validator_params – parameters with which to instantiate cross-validators

  • test_io_data – optional test data (see io_data)

class ClassificationModelEvaluation(io_data: InputOutputData, evaluator_params: Optional[Union[ClassificationEvaluatorParams, Dict[str, Any]]] = None, cross_validator_params: Optional[Union[VectorModelCrossValidatorParams, Dict[str, Any]]] = None, test_io_data: Optional[InputOutputData] = None)[source]#

Bases: ModelEvaluation[VectorClassificationModel, VectorClassificationModelEvaluator, VectorClassificationModelEvaluationData, VectorClassificationModelCrossValidator, VectorClassificationModelCrossValidationData, ClassificationEvalStats]

Parameters:
  • io_data – the data set to use for evaluation. For evaluation purposes, this dataset usually will be split into training and test data according to the rules specified by evaluator_params. However, if test_io_data is specified, then this is taken to be the training data and test_io_data is taken to be the test data when creating evaluators for simple (single-split) evaluation.

  • evaluator_params – parameters with which to instantiate evaluators

  • cross_validator_params – parameters with which to instantiate cross-validators

  • test_io_data – optional test data (see io_data)

class MultiDataModelEvaluation(io_data_dict: Dict[str, InputOutputData], key_name: str = 'dataset', meta_data_dict: Optional[Dict[str, Dict[str, Any]]] = None, evaluator_params: Optional[Union[RegressionEvaluatorParams, ClassificationEvaluatorParams, Dict[str, Any]]] = None, cross_validator_params: Optional[Union[VectorModelCrossValidatorParams, Dict[str, Any]]] = None, test_io_data_dict: Optional[Dict[str, Optional[InputOutputData]]] = None)[source]#

Bases: object

Parameters:
  • io_data_dict – a dictionary mapping from names to the data sets with which to evaluate models. For evaluation or cross-validation, these datasets will usually be split according to the rules specified by evaluator_params or `cross_validator_params. An exception is the case where explicit test data sets are specified by passing test_io_data_dict. Then, for these data sets, the io_data will not be split for evaluation, but the test_io_data will be used instead.

  • key_name – a name for the key value used in inputOutputDataDict, which will be used as a column name in result data frames

  • meta_data_dict – a dictionary which maps from a name (same keys as in inputOutputDataDict) to a dictionary, which maps from a column name to a value and which is to be used to extend the result data frames containing per-dataset results

  • evaluator_params – parameters to use for the instantiation of evaluators (relevant if useCrossValidation==False)

  • cross_validator_params – parameters to use for the instantiation of cross-validators (relevant if useCrossValidation==True)

  • test_io_data_dict – a dictionary mapping from names to the test data sets to use for evaluation or to None. Entries with non-None values will be used for evaluation of the models that were trained on the respective io_data_dict. If passed, the keys need to be a superset of io_data_dict’s keys (note that the values may be None, e.g. if you want to use test data sets for some entries, and splitting of the io_data for others). If not None, cross-validation cannot be used when calling compare_models.

compare_models(model_factories: Sequence[Callable[[], Union[VectorRegressionModel, VectorClassificationModel]]], use_cross_validation=False, result_writer: Optional[ResultWriter] = None, write_per_dataset_results=False, write_csvs=False, column_name_for_model_ranking: Optional[str] = None, rank_max=True, add_combined_eval_stats=False, create_metric_distribution_plots=True, create_combined_eval_stats_plots=False, distribution_plots_cdf=True, distribution_plots_cdf_complementary=False, visitors: Optional[Iterable[ModelComparisonVisitor]] = None) Union[RegressionMultiDataModelComparisonData, ClassificationMultiDataModelComparisonData][source]#
Parameters:
  • model_factories – a sequence of factory functions for the creation of models to evaluate; every factory must result in a model with a fixed model name (otherwise results cannot be correctly aggregated)

  • use_cross_validation – whether to use cross-validation (rather than a single split) for model evaluation. This can only be used if the instance’s test_io_data_dict is None.

  • result_writer – a writer with which to store results; if None, results are not stored

  • write_per_dataset_results – whether to use resultWriter (if not None) in order to generate detailed results for each dataset in a subdirectory named according to the name of the dataset

  • write_csvs – whether to write metrics table to CSV files

  • column_name_for_model_ranking – column name to use for ranking models

  • rank_max – if true, use max for ranking, else min

  • add_combined_eval_stats – whether to also report, for each model, evaluation metrics on the combined set data points from all EvalStats objects. Note that for classification, this is only possible if all individual experiments use the same set of class labels.

  • create_metric_distribution_plots – whether to create, for each model, plots of the distribution of each metric across the datasets (applies only if result_writer is not None)

  • create_combined_eval_stats_plots – whether to combine, for each type of model, the EvalStats objects from the individual experiments into a single objects that holds all results and use it to create plots reflecting the overall result (applies only if resultWriter is not None). Note that for classification, this is only possible if all individual experiments use the same set of class labels.

  • distribution_plots_cdf – whether to create CDF plots for the metric distributions. Applies only if create_metric_distribution_plots is True and result_writer is not None.

  • distribution_plots_cdf_complementary – whether to plot the complementary cdf instead of the regular cdf, provided that distribution_plots_cdf is True.

  • visitors – visitors which may process individual results. Plots generated by visitors are created/collected at the end of the comparison.

Returns:

an object containing the full comparison results

class ModelComparisonData(results_df: DataFrame, results_by_model_name: Dict[str, Result], evaluator: Optional[VectorModelEvaluator] = None, cross_validator: Optional[VectorModelCrossValidator] = None)[source]#

Bases: object

class Result(eval_data: Union[VectorClassificationModelEvaluationData, VectorRegressionModelEvaluationData] = None, cross_validation_data: Union[VectorClassificationModelCrossValidationData, VectorRegressionModelCrossValidationData] = None)[source]#

Bases: object

eval_data: Union[VectorClassificationModelEvaluationData, VectorRegressionModelEvaluationData] = None#
cross_validation_data: Union[VectorClassificationModelCrossValidationData, VectorRegressionModelCrossValidationData] = None#
iter_evaluation_data() Iterator[Union[VectorClassificationModelEvaluationData, VectorRegressionModelEvaluationData]][source]#
get_best_model_name(metric_name: str) str[source]#
get_best_model(metric_name: str) Union[VectorClassificationModel, VectorRegressionModel, VectorModelBase][source]#
class ModelComparisonVisitor[source]#

Bases: ABC

abstract visit(model_name: str, result: Result)[source]#
abstract collect_results(result_collector: EvaluationResultCollector) None[source]#

Collects results (such as figures) at the end of the model comparison, based on the results collected

Parameters:

result_collector – the collector to which figures are to be added

class ModelComparisonVisitorAggregatedFeatureImportance(model_name: str, feature_agg_regex: Sequence[str] = (), write_figure=True, write_data_frame_csv=False)[source]#

Bases: ModelComparisonVisitor

During a model comparison, computes aggregated feature importance values for the model with the given name

Parameters:
  • model_name – the name of the model for which to compute the aggregated feature importance values

  • feature_agg_regex – a sequence of regular expressions describing which feature names to sum as one. Each regex must contain exactly one group. If a regex matches a feature name, the feature importance will be summed under the key of the matched group instead of the full feature name. For example, the regex r”(w+)_d+$” will cause “foo_1” and “foo_2” to be summed under “foo” and similarly “bar_1” and “bar_2” to be summed under “bar”.

visit(model_name: str, result: Result)[source]#
plot_feature_importance() Figure[source]#
get_feature_importance() FeatureImportance[source]#
collect_results(result_collector: EvaluationResultCollector)[source]#

Collects results (such as figures) at the end of the model comparison, based on the results collected

Parameters:

result_collector – the collector to which figures are to be added

class MultiDataModelComparisonData(all_results_df: DataFrame, mean_results_df: DataFrame, agg_results_df: DataFrame, eval_stats_by_model_name: Dict[str, List[TEvalStats]], results_by_model_name: Dict[str, List[Result]], dataset_names: List[str], model_name_to_string_repr: Dict[str, str])[source]#

Bases: Generic[TEvalStats, TEvalStatsCollection], ABC

get_model_names() List[str][source]#
get_model_description(model_name: str) str[source]#
get_eval_stats_list(model_name: str) List[TEvalStats][source]#
abstract get_eval_stats_collection(model_name: str) TEvalStatsCollection[source]#
iter_model_results(model_name: str) Iterator[Tuple[str, Result]][source]#
create_distribution_plots(result_writer: ResultWriter, cdf=True, cdf_complementary=False)[source]#

Creates plots of distributions of metrics across datasets for each model as a histogram, and additionally any x-y plots (scatter plots & heat maps) for metrics that have associated paired metrics that were also computed

Parameters:
  • result_writer – the result writer

  • cdf – whether to additionally plot, for each distribution, the cumulative distribution function

  • cdf_complementary – whether to plot the complementary cdf instead of the regular cdf, provided that cdf is True

class ClassificationMultiDataModelComparisonData(all_results_df: DataFrame, mean_results_df: DataFrame, agg_results_df: DataFrame, eval_stats_by_model_name: Dict[str, List[TEvalStats]], results_by_model_name: Dict[str, List[Result]], dataset_names: List[str], model_name_to_string_repr: Dict[str, str])[source]#

Bases: MultiDataModelComparisonData[ClassificationEvalStats, ClassificationEvalStatsCollection]

get_eval_stats_collection(model_name: str)[source]#
class RegressionMultiDataModelComparisonData(all_results_df: DataFrame, mean_results_df: DataFrame, agg_results_df: DataFrame, eval_stats_by_model_name: Dict[str, List[TEvalStats]], results_by_model_name: Dict[str, List[Result]], dataset_names: List[str], model_name_to_string_repr: Dict[str, str])[source]#

Bases: MultiDataModelComparisonData[RegressionEvalStats, RegressionEvalStatsCollection]

get_eval_stats_collection(model_name: str)[source]#