greedy_clustering#
Source code: sensai/clustering/greedy_clustering.py
- class GreedyAgglomerativeClustering(clusters: Sequence[Cluster], merge_candidate_determination_strategy: Optional[MergeCandidateDeterminationStrategy] = None)[source]#
Bases:
object
An implementation of greedy agglomerative clustering which avoids unnecessary recomputations of merge costs through the management of a priority queue of potential merges.
Greedy agglomerative clustering works as follows. Starting with an initial set of clusters (where each cluster typically contains a single data point), the method successively merges the two clusters where the merge cost is lowest (greedy), until no further merges are admissible. The merge operation is a mutating operation, i.e. the initial clusters are modified.
To apply the method, the Cluster class must be subclassed, so as to define what the cost of a merge in your application shall be and how two clusters can be merged. For example, if data points are points in a Cartesian coordinate system, then the merge cost can be defined as the minimum or maximum distance among all pairs of points in the two clusters, admissibility being determined by a threshold that must not be exceeded; the merge operation can simply concatenate lists of data points.
- Parameters:
clusters – the initial clusters, which are to be agglomerated into larger clusters
- log = <Logger sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering (WARNING)>#
- class Cluster[source]#
Bases:
ABC
Base class for clusters that can be merged via GreedyAgglomerativeClustering
- apply_clustering() List[Cluster] [source]#
Applies greedy agglomerative clustering to the clusters given at construction, merging clusters until no further merges are admissible
- Returns:
the list of agglomerated clusters (subset of the original clusters, which may have had other clusters merged into them)
- class WrappedCluster(cluster, idx, clusterer: GreedyAgglomerativeClustering)[source]#
Bases:
object
Wrapper for clusters which stores additional data required for clustering (internal use only)
- get_cluster_association() WrappedCluster [source]#
Gets the wrapped cluster that this cluster’s points have ultimately been merged into (which may be the cluster itself)
- Returns:
the wrapped cluster this cluster’s points are associated with
- class ClusterMerge(c1: WrappedCluster, c2: WrappedCluster, merge_cost)[source]#
Bases:
object
Represents a potential merge
- log = <Logger sensai.clustering.greedy_clustering.GreedyAgglomerativeClustering.ClusterMerge (WARNING)>#
- class MergeCandidateDeterminationStrategy[source]#
Bases:
ABC
- set_clusterer(clusterer: GreedyAgglomerativeClustering)[source]#
Initialises the clusterer the strategy is applied to :param clusterer: the clusterer
- abstract iter_candidate_indices(wc: WrappedCluster, initial: bool, merged_cluster_indices: Optional[Tuple[int, int]] = None) Iterator[Union[int, ClusterMerge]] [source]#
- Parameters:
wc – the wrapped cluster: the cluster for which to determine the cluster indices that are to be considered for a potential merge
initial – whether we are computing the initial candidates (at the start of the clustering algorithm)
merged_cluster_indices – [for initial=False] the pair of cluster indices that were just joined to form the updated cluster wc
- Returns:
an iterator of cluster indices that should be evaluated as potential merge partners for wc (it may contain the index of wc, which will be ignored)
- class MergeCandidateDeterminationStrategyDefault[source]#
Bases:
MergeCandidateDeterminationStrategy
- iter_candidate_indices(wc: WrappedCluster, initial: bool, merged_cluster_indices: Optional[Tuple[int, int]] = None) Iterator[Union[int, ClusterMerge]] [source]#
- Parameters:
wc – the wrapped cluster: the cluster for which to determine the cluster indices that are to be considered for a potential merge
initial – whether we are computing the initial candidates (at the start of the clustering algorithm)
merged_cluster_indices – [for initial=False] the pair of cluster indices that were just joined to form the updated cluster wc
- Returns:
an iterator of cluster indices that should be evaluated as potential merge partners for wc (it may contain the index of wc, which will be ignored)