cellcharter.tl.ClusterAutoK#

class cellcharter.tl.ClusterAutoK(n_clusters, max_runs=10, convergence_tol=0.01, model_class=None, model_params=None, similarity_function=None)#

Identify the best candidates for the number of clusters.

Parameters:
  • n_clusters (tuple[int, int] | list[int]) – Range for number of clusters (bounds included).

  • max_runs (int (default: 10)) – Maximum number of repetitions for each value of number of clusters.

  • convergence_tol (float (default: 0.01)) – Convergence tolerance for the clustering stability. If the Mean Absolute Percentage Error between consecutive iterations is below convergence_tol the algorithm stops without reaching max_runs.

  • model_class (Optional[type] (default: None)) – Class of the model to be used for clustering. It must accept as random_state and n_clusters as initialization parameters.

  • model_params (Optional[dict] (default: None)) – Keyword args for model_class

  • similarity_function (Optional[callable] (default: None)) – The similarity function used between clustering results. Defaults to sklearn.metrics.fowlkes_mallows_score().

Examples

>>> adata = anndata.read_h5ad(path_to_anndata)
>>> sq.gr.spatial_neighbors(adata, coord_type='generic', delaunay=True)
>>> cc.gr.remove_long_links(adata)
>>> cc.gr.aggregate_neighbors(adata, n_layers=3)
>>> model_params = {
        'random_state': 42,
        'trainer_params': {
            'accelerator':'cpu',
            'enable_progress_bar': False
        },
    }
>>> models = cc.tl.ClusterAutoK(n_clusters=(2,10), model_class=cc.tl.GaussianMixture, model_params=model_params, max_runs=5)

Attributes table#

best_k

The number of clusters with the highest stability.

peaks

Find the peaks in the stability curve.

persistent_attributes

Returns the list of fitted attributes that ought to be saved and loaded.

labels

The cluster assignments for each repetition and number of clusters.

stability

The stability values of all combinations of runs between K and K-1, and between K and K+1

Methods table#

fit(adata[, use_rep])

Cluster data multiple times for each number of clusters (K) in the selected range and compute the average stability for each them.

get_params()

Returns the estimator's parameters as passed to the initializer.

load(path)

Loads the estimator and (if available) the fitted model.

predict(adata[, use_rep, k])

Predict the labels for the data in use_rep using the fitted model.

save(path[, best_k])

Saves the ClusterAutoK object and the clustering models to the provided directory using pickle.

set_params(values)

Sets the provided values.

Attributes#

ClusterAutoK.best_k#

The number of clusters with the highest stability.

ClusterAutoK.peaks#

Find the peaks in the stability curve.

ClusterAutoK.persistent_attributes#

Returns the list of fitted attributes that ought to be saved and loaded. By default, this encompasses all annotations.

ClusterAutoK.labels: dict#

The cluster assignments for each repetition and number of clusters.

ClusterAutoK.stability: ndarray#

The stability values of all combinations of runs between K and K-1, and between K and K+1

Methods#

ClusterAutoK.fit(adata, use_rep='X_cellcharter')#

Cluster data multiple times for each number of clusters (K) in the selected range and compute the average stability for each them.

Parameters:
ClusterAutoK.get_params()#

Returns the estimator’s parameters as passed to the initializer.

Parameters:

deep – Ignored. For Scikit-learn compatibility.

Return type:

Dict[str, Any]

Returns:

The mapping from init parameters to values.

classmethod ClusterAutoK.load(path)#

Loads the estimator and (if available) the fitted model.

This method should only be expected to work to load an estimator that has previously been saved via save().

Parameters:

path (Path) – The directory from which to load the estimator.

Returns:

The loaded estimator, either fitted or not.

ClusterAutoK.predict(adata, use_rep=None, k=None)#

Predict the labels for the data in use_rep using the fitted model.

Parameters:
  • adata (AnnData) – Annotated data object.

  • use_rep (Optional[str] (default: None)) – Key in anndata.AnnData.obsm used as data to fit the clustering model. If None, uses anndata.AnnData.obsm['X_cellcharter'] if present, otherwise anndata.AnnData.X.

  • k (Optional[int] (default: None)) – Number of clusters to predict using the fitted model. If None, the number of clusters with the highest stability will be selected. If max_runs > 1, the model with the largest marginal likelihood will be used among the ones fitted on k.

Return type:

Categorical

ClusterAutoK.save(path, best_k=False)#

Saves the ClusterAutoK object and the clustering models to the provided directory using pickle.

Parameters:
  • path (Union[str, PathLike[str]]) – The directory to which all files should be saved.

  • best_k (default: False) – Save only the best model out all number of clusters K. If false, save the best model for each value of K.

Return type:

None

Note

If the dictionary returned by get_params() is not JSON-serializable, this method uses pickle which is not necessarily backwards-compatible.

ClusterAutoK.set_params(values)#

Sets the provided values. The estimator is returned as well, but the estimator on which this function is called is also modified.

Parameters:

values (Dict[str, Any]) – The values to set.

Returns:

The estimator where the values have been set.