greedy_clustering#


class GreedyAgglomerativeClustering(clusters: Sequence[Cluster], merge_candidate_determination_strategy: Optional[MergeCandidateDeterminationStrategy] = None)[source]#

Bases: object

An implementation of greedy agglomerative clustering which avoids unnecessary recomputations of merge costs through the management of a priority queue of potential merges.

Greedy agglomerative clustering works as follows. Starting with an initial set of clusters (where each cluster typically contains a single data point), the method successively merges the two clusters where the merge cost is lowest (greedy), until no further merges are admissible. The merge operation is a mutating operation, i.e. the initial clusters are modified.

To apply the method, the Cluster class must be subclassed, so as to define what the cost of a merge in your application shall be and how two clusters can be merged. For example, if data points are points in a Cartesian coordinate system, then the merge cost can be defined as the minimum or maximum distance among all pairs of points in the two clusters, admissibility being determined by a threshold that must not be exceeded; the merge operation can simply concatenate lists of data points.

Parameters:

clusters – the initial clusters, which are to be agglomerated into larger clusters

log = <Logger sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering (WARNING)>#
class Cluster[source]#

Bases: ABC

Base class for clusters that can be merged via GreedyAgglomerativeClustering

abstract merge_cost(other) float[source]#

Computes the cost of merging the given cluster with this cluster

Returns:

the (non-negative) merge cost or math.inf if a merge is inadmissible

abstract merge(other)[source]#

Merges the given cluster into this cluster”

Parameters:

other – the cluster that is to be merged into this cluster

apply_clustering() List[Cluster][source]#

Applies greedy agglomerative clustering to the clusters given at construction, merging clusters until no further merges are admissible

Returns:

the list of agglomerated clusters (subset of the original clusters, which may have had other clusters merged into them)

class WrappedCluster(cluster, idx, clusterer: GreedyAgglomerativeClustering)[source]#

Bases: object

Wrapper for clusters which stores additional data required for clustering (internal use only)

is_merged() bool[source]#
get_cluster_association() WrappedCluster[source]#

Gets the wrapped cluster that this cluster’s points have ultimately been merged into (which may be the cluster itself)

Returns:

the wrapped cluster this cluster’s points are associated with

remove_merges()[source]#
compute_merges(initial: bool, merged_cluster_indices: Optional[Tuple[int, int]] = None)[source]#
class ClusterMerge(c1: WrappedCluster, c2: WrappedCluster, merge_cost)[source]#

Bases: object

Represents a potential merge

log = <Logger sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.ClusterMerge (WARNING)>#
apply()[source]#
class MergeCandidateDeterminationStrategy[source]#

Bases: ABC

set_clusterer(clusterer: GreedyAgglomerativeClustering)[source]#

Initialises the clusterer the strategy is applied to :param clusterer: the clusterer

abstract iter_candidate_indices(wc: WrappedCluster, initial: bool, merged_cluster_indices: Optional[Tuple[int, int]] = None) Iterator[Union[int, ClusterMerge]][source]#
Parameters:
  • wc – the wrapped cluster: the cluster for which to determine the cluster indices that are to be considered for a potential merge

  • initial – whether we are computing the initial candidates (at the start of the clustering algorithm)

  • merged_cluster_indices – [for initial=False] the pair of cluster indices that were just joined to form the updated cluster wc

Returns:

an iterator of cluster indices that should be evaluated as potential merge partners for wc (it may contain the index of wc, which will be ignored)

class MergeCandidateDeterminationStrategyDefault[source]#

Bases: MergeCandidateDeterminationStrategy

iter_candidate_indices(wc: WrappedCluster, initial: bool, merged_cluster_indices: Optional[Tuple[int, int]] = None) Iterator[Union[int, ClusterMerge]][source]#
Parameters:
  • wc – the wrapped cluster: the cluster for which to determine the cluster indices that are to be considered for a potential merge

  • initial – whether we are computing the initial candidates (at the start of the clustering algorithm)

  • merged_cluster_indices – [for initial=False] the pair of cluster indices that were just joined to form the updated cluster wc

Returns:

an iterator of cluster indices that should be evaluated as potential merge partners for wc (it may contain the index of wc, which will be ignored)