cellcharter.tl.ClusterAutoK#
- class cellcharter.tl.ClusterAutoK(n_clusters, max_runs=10, convergence_tol=0.01, model_class=None, model_params=None, similarity_function=None)#
Identify the best candidates for the number of clusters.
- Parameters:
n_clusters (
tuple
[int
,int
] |list
[int
]) – Range for number of clusters (bounds included).max_runs (
int
(default:10
)) – Maximum number of repetitions for each value of number of clusters.convergence_tol (
float
(default:0.01
)) – Convergence tolerance for the clustering stability. If the Mean Absolute Percentage Error between consecutive iterations is belowconvergence_tol
the algorithm stops without reachingmax_runs
.model_class (
Optional
[type
] (default:None
)) – Class of the model to be used for clustering. It must accept asrandom_state
andn_clusters
as initialization parameters.model_params (
Optional
[dict
] (default:None
)) – Keyword args formodel_class
similarity_function (
Optional
[callable
] (default:None
)) – The similarity function used between clustering results. Defaults tosklearn.metrics.fowlkes_mallows_score()
.
Examples
>>> adata = anndata.read_h5ad(path_to_anndata) >>> sq.gr.spatial_neighbors(adata, coord_type='generic', delaunay=True) >>> cc.gr.remove_long_links(adata) >>> cc.gr.aggregate_neighbors(adata, n_layers=3) >>> model_params = { 'random_state': 42, 'trainer_params': { 'accelerator':'cpu', 'enable_progress_bar': False }, } >>> models = cc.tl.ClusterAutoK(n_clusters=(2,10), model_class=cc.tl.GaussianMixture, model_params=model_params, max_runs=5)
Attributes table#
The number of clusters with the highest stability. |
|
Find the peaks in the stability curve. |
|
Returns the list of fitted attributes that ought to be saved and loaded. |
|
The cluster assignments for each repetition and number of clusters. |
|
The stability values of all combinations of runs between K and K-1, and between K and K+1 |
Methods table#
|
Cluster data multiple times for each number of clusters (K) in the selected range and compute the average stability for each them. |
Returns the estimator's parameters as passed to the initializer. |
|
|
Loads the estimator and (if available) the fitted model. |
|
Predict the labels for the data in |
|
Saves the ClusterAutoK object and the clustering models to the provided directory using pickle. |
|
Sets the provided values. |
Attributes#
- ClusterAutoK.best_k#
The number of clusters with the highest stability.
- ClusterAutoK.peaks#
Find the peaks in the stability curve.
- ClusterAutoK.persistent_attributes#
Returns the list of fitted attributes that ought to be saved and loaded. By default, this encompasses all annotations.
Methods#
- ClusterAutoK.fit(adata, use_rep='X_cellcharter')#
Cluster data multiple times for each number of clusters (K) in the selected range and compute the average stability for each them.
- Parameters:
adata (
AnnData
) – Annotated data object.use_rep (
str
(default:'X_cellcharter'
)) – Key inanndata.AnnData.obsm
to use as data to fit the clustering model. IfNone
, usesanndata.AnnData.X
.
- ClusterAutoK.get_params()#
Returns the estimator’s parameters as passed to the initializer.
- classmethod ClusterAutoK.load(path)#
Loads the estimator and (if available) the fitted model.
This method should only be expected to work to load an estimator that has previously been saved via
save()
.- Parameters:
path (
Path
) – The directory from which to load the estimator.- Returns:
The loaded estimator, either fitted or not.
- ClusterAutoK.predict(adata, use_rep=None, k=None)#
Predict the labels for the data in
use_rep
using the fitted model.- Parameters:
adata (
AnnData
) – Annotated data object.use_rep (
Optional
[str
] (default:None
)) – Key inanndata.AnnData.obsm
used as data to fit the clustering model. IfNone
, usesanndata.AnnData.obsm['X_cellcharter']
if present, otherwiseanndata.AnnData.X
.k (
Optional
[int
] (default:None
)) – Number of clusters to predict using the fitted model. IfNone
, the number of clusters with the highest stability will be selected. Ifmax_runs > 1
, the model with the largest marginal likelihood will be used among the ones fitted onk
.
- Return type:
Categorical
- ClusterAutoK.save(path, best_k=False)#
Saves the ClusterAutoK object and the clustering models to the provided directory using pickle.
- Parameters:
- Return type:
Note
If the dictionary returned by
get_params()
is not JSON-serializable, this method usespickle
which is not necessarily backwards-compatible.