In-situ clustering

class comseg.clustering.InSituClustering(anndata, selected_genes)

Bases: object

In situ clustering class takes as attribute an anndata object containing the community expression vectors \(V_c\) of RNA partitions/communities from one or many images. This class is in charge of identifying the single cell transcriptomic clusters present in the dataset.

__init__(anndata, selected_genes)

Parameters:

anndata (anndata object) – anndata object containing the expression vector of the community. The anndata can be the concatenation of several anndata object from different ComSeg instance
selected_genes (list[str]) – list of genes to take into account for the clustering the gene list order will define the order of the gene in the expression vector

compute_normalization_parameters(debug_path=None, sample_size=10000)

Compute the ScTransform normalization parameters from the class attribute anndata

Parameters:: debug_path
Returns:

cluster_rna_community(size_commu_min=3, norm_vector=True, n_pcs=15, n_comps=15, clustering_method='leiden', n_neighbors=20, resolution=1, n_clusters_kmeans=4, palette=None, plot_umap=False)

Cluster the RNA partition/community expression vector to identify the single cell transcriptomic cluster present in the dataset

Parameters:

size_commu_min (int) – minimum number of RNA in a community to be considered for the clustering
norm_vector (bool) – if True, the expression vector will be normalized using the scTRANSFORM normalization parameters
n_pcs (int) – number of principal component to compute for the clustering; Lets 0 if no pca
n_comps (int) – number of components to compute for the clustering; Lets 0 if no pca
clustering_method (str) – choose in [“leiden”, “kmeans”, “louvain”]
n_neighbors (int) – number of neighbors similarity graph
resolution (float) – resolution parameter for the leiden/Louvain clustering
n_clusters_kmeans – number of cluster for the kmeans clustering
palette (list[str]) – color palette for the cluster list of (HEX) color
plot_umap – if True, plot the umap of the cluster

Rtype n_clusters_kmeans:

int

Returns:

merge_cluster(nb_min_cluster=0, min_merge_correlation=0.8, cluster_column_name='leiden', plot=True)

Merge clusters based on the correlation of their centroid

Parameters:

nb_min_cluster (int) – minimum number of clusters to merge
min_merge_correlation (float) – minimum correlation to merge clusters
cluster_column_name (str) – clustering method used
plot

Returns:

classify_small_community(key_pred='leiden_merged', classify_mode='pca', min_proba_small_commu=0)

associate unclassified RNA community expression vector by using a knn classifier and the already classify communities

Parameters:

key_pred – leave default
unorm_vector_key – leave default
classify_mode – choose in ‘pca’ or ‘euclidien’. it either uses the euclidian space or PCA space
min_proba_small_commu – minimum probability to classify a small community based on the KNN classifier

Returns: