cellcharter.tl.ClusterAutoK#

class cellcharter.tl.ClusterAutoK(n_clusters, max_runs=10, convergence_tol=0.01, model_class=None, model_params=None, similarity_function=None)#

Identify the best candidates for the number of clusters.

Parameters:

n_clusters (tuple[int, int] | list[int]) – Range for number of clusters (bounds included).
max_runs (int (default: 10)) – Maximum number of repetitions for each value of number of clusters.
convergence_tol (float (default: 0.01)) – Convergence tolerance for the clustering stability. If the Mean Absolute Percentage Error between consecutive iterations is below convergence_tol the algorithm stops without reaching max_runs.
model_class (Optional[type] (default: None)) – Class of the model to be used for clustering. It must accept as random_state and n_clusters as initialization parameters.
model_params (Optional[dict] (default: None)) – Keyword args for model_class
similarity_function (Optional[callable] (default: None)) – The similarity function used between clustering results. Defaults to sklearn.metrics.fowlkes_mallows_score().

Examples

>>> adata = anndata.read_h5ad(path_to_anndata)
>>> sq.gr.spatial_neighbors(adata, coord_type='generic', delaunay=True)
>>> cc.gr.remove_long_links(adata)
>>> cc.gr.aggregate_neighbors(adata, n_layers=3)
>>> model_params = {
        'random_state': 42,
        'trainer_params': {
            'accelerator':'cpu',
            'enable_progress_bar': False
        },
    }
>>> models = cc.tl.ClusterAutoK(n_clusters=(2,10), model_class=cc.tl.GaussianMixture, model_params=model_params, max_runs=5)

Attributes table#

`best_k`	The number of clusters with the highest stability.
`peaks`	Find the peaks in the stability curve.
`persistent_attributes`	Returns the list of fitted attributes that ought to be saved and loaded.
`labels`	The cluster assignments for each repetition and number of clusters.
`stability`	The stability values of all combinations of runs between K and K-1, and between K and K+1

Methods table#

`fit`(adata[, use_rep])	Cluster data multiple times for each number of clusters (K) in the selected range and compute the average stability for each them.
`get_params`()	Returns the estimator's parameters as passed to the initializer.
`load`(path)	Loads the estimator and (if available) the fitted model.
`predict`(adata[, use_rep, k])	Predict the labels for the data in `use_rep` using the fitted model.
`save`(path[, best_k])	Saves the ClusterAutoK object and the clustering models to the provided directory using pickle.
`set_params`(values)	Sets the provided values.

Attributes#

ClusterAutoK.best_k#: The number of clusters with the highest stability.

ClusterAutoK.peaks#: Find the peaks in the stability curve.

ClusterAutoK.persistent_attributes#: Returns the list of fitted attributes that ought to be saved and loaded. By default, this encompasses all annotations.

ClusterAutoK.labels: dict#: The cluster assignments for each repetition and number of clusters.

ClusterAutoK.stability: ndarray#: The stability values of all combinations of runs between K and K-1, and between K and K+1

Methods#

ClusterAutoK.fit(adata, use_rep='X_cellcharter')#

Cluster data multiple times for each number of clusters (K) in the selected range and compute the average stability for each them.

Parameters:

adata (AnnData) – Annotated data object.
use_rep (str (default: 'X_cellcharter')) – Key in anndata.AnnData.obsm to use as data to fit the clustering model. If None, uses anndata.AnnData.X.

ClusterAutoK.get_params()#

Returns the estimator’s parameters as passed to the initializer.

Parameters:: deep – Ignored. For Scikit-learn compatibility.
Return type:: Dict[str, Any]
Returns:: The mapping from init parameters to values.

classmethod ClusterAutoK.load(path)#

Loads the estimator and (if available) the fitted model.

This method should only be expected to work to load an estimator that has previously been saved via save().

Parameters:: path (Path) – The directory from which to load the estimator.
Returns:: The loaded estimator, either fitted or not.

ClusterAutoK.predict(adata, use_rep=None, k=None)#

Predict the labels for the data in use_rep using the fitted model.

Parameters:

adata (AnnData) – Annotated data object.
use_rep (Optional[str] (default: None)) – Key in anndata.AnnData.obsm used as data to fit the clustering model. If None, uses anndata.AnnData.obsm['X_cellcharter'] if present, otherwise anndata.AnnData.X.
k (Optional[int] (default: None)) – Number of clusters to predict using the fitted model. If None, the number of clusters with the highest stability will be selected. If max_runs > 1, the model with the largest marginal likelihood will be used among the ones fitted on k.

Return type:

Categorical

ClusterAutoK.save(path, best_k=False)#

Saves the ClusterAutoK object and the clustering models to the provided directory using pickle.

Parameters:

path (Union[str, PathLike[str]]) – The directory to which all files should be saved.
best_k (default: False) – Save only the best model out all number of clusters K. If false, save the best model for each value of K.

Return type:

None

Note

If the dictionary returned by get_params() is not JSON-serializable, this method uses pickle which is not necessarily backwards-compatible.

ClusterAutoK.set_params(values)#

Sets the provided values. The estimator is returned as well, but the estimator on which this function is called is also modified.

Parameters:: values (Dict[str, Any]) – The values to set.
Returns:: The estimator where the values have been set.

cellcharter.tl.ClusterAutoK

Contents

cellcharter.tl.ClusterAutoK#

Attributes table#

Methods table#

Attributes#

Methods#