cellcharter.tl.ClusterAutoK#
- class cellcharter.tl.ClusterAutoK(n_clusters, max_runs=10, convergence_tol=0.01, model_class=None, model_params=None, similarity_function=None)#
Identify the best candidates for the number of clusters.
- Parameters:
n_clusters (
tuple[int,int] |list[int]) – Range for number of clusters (bounds included).max_runs (
int(default:10)) – Maximum number of repetitions for each value of number of clusters.convergence_tol (
float(default:0.01)) – Convergence tolerance for the clustering stability. If the Mean Absolute Percentage Error between consecutive iterations is belowconvergence_tolthe algorithm stops without reachingmax_runs.model_class (
Optional[type] (default:None)) – Class of the model to be used for clustering. It must accept asrandom_stateandn_clustersas initialization parameters.model_params (
Optional[dict] (default:None)) – Keyword args formodel_classsimilarity_function (
Optional[callable] (default:None)) – The similarity function used between clustering results. Defaults tosklearn.metrics.fowlkes_mallows_score().
Examples
>>> adata = anndata.read_h5ad(path_to_anndata) >>> sq.gr.spatial_neighbors(adata, coord_type='generic', delaunay=True) >>> cc.gr.remove_long_links(adata) >>> cc.gr.aggregate_neighbors(adata, n_layers=3) >>> model_params = { 'random_state': 42, 'trainer_params': { 'accelerator':'cpu', 'enable_progress_bar': False }, } >>> models = cc.tl.ClusterAutoK(n_clusters=(2,10), model_class=cc.tl.GaussianMixture, model_params=model_params, max_runs=5)
Attributes table#
The number of clusters with the highest stability. |
|
Find the peaks in the stability curve. |
|
Returns the list of fitted attributes that ought to be saved and loaded. |
|
The cluster assignments for each repetition and number of clusters. |
|
The stability values of all combinations of runs between K and K-1, and between K and K+1 |
Methods table#
|
Cluster data multiple times for each number of clusters (K) in the selected range and compute the average stability for each them. |
Returns the estimator's parameters as passed to the initializer. |
|
|
Loads the estimator and (if available) the fitted model. |
|
Predict the labels for the data in |
|
Saves the ClusterAutoK object and the clustering models to the provided directory using pickle. |
|
Sets the provided values. |
Attributes#
- ClusterAutoK.best_k#
The number of clusters with the highest stability.
- ClusterAutoK.peaks#
Find the peaks in the stability curve.
- ClusterAutoK.persistent_attributes#
Returns the list of fitted attributes that ought to be saved and loaded. By default, this encompasses all annotations.
Methods#
- ClusterAutoK.fit(adata, use_rep='X_cellcharter')#
Cluster data multiple times for each number of clusters (K) in the selected range and compute the average stability for each them.
- Parameters:
adata (
AnnData) – Annotated data object.use_rep (
str(default:'X_cellcharter')) – Key inanndata.AnnData.obsmto use as data to fit the clustering model. IfNone, usesanndata.AnnData.X.
- ClusterAutoK.get_params()#
Returns the estimator’s parameters as passed to the initializer.
- classmethod ClusterAutoK.load(path)#
Loads the estimator and (if available) the fitted model.
This method should only be expected to work to load an estimator that has previously been saved via
save().- Parameters:
path (
Path) – The directory from which to load the estimator.- Returns:
The loaded estimator, either fitted or not.
- ClusterAutoK.predict(adata, use_rep=None, k=None)#
Predict the labels for the data in
use_repusing the fitted model.- Parameters:
adata (
AnnData) – Annotated data object.use_rep (
Optional[str] (default:None)) – Key inanndata.AnnData.obsmused as data to fit the clustering model. IfNone, usesanndata.AnnData.obsm['X_cellcharter']if present, otherwiseanndata.AnnData.X.k (
Optional[int] (default:None)) – Number of clusters to predict using the fitted model. IfNone, the number of clusters with the highest stability will be selected. Ifmax_runs > 1, the model with the largest marginal likelihood will be used among the ones fitted onk.
- Return type:
Categorical
- ClusterAutoK.save(path, best_k=False)#
Saves the ClusterAutoK object and the clustering models to the provided directory using pickle.
- Parameters:
- Return type:
Note
If the dictionary returned by
get_params()is not JSON-serializable, this method usespicklewhich is not necessarily backwards-compatible.