In-situ clustering

class comseg.clustering.InSituClustering(anndata, selected_genes)

Bases: object

In situ clustering class takes as attribute an anndata object containing the community expression vectors \(V_c\) of RNA partitions/communities from one or many images. This class is in charge of identifying the single cell transcriptomic clusters present in the dataset.

__init__(anndata, selected_genes)
Parameters:
  • anndata (anndata object) – anndata object containing the expression vector of the community. The anndata can be the concatenation of several anndata object from different ComSeg instance

  • selected_genes (list[str]) – list of genes to take into account for the clustering the gene list order will define the order of the gene in the expression vector

compute_normalization_parameters(debug_path=None, sample_size=10000)

Compute the ScTransform normalization parameters from the class attribute anndata

Parameters:

debug_path

Returns:

cluster_rna_community(size_commu_min=3, norm_vector=True, n_pcs=15, n_comps=15, clustering_method='leiden', n_neighbors=20, resolution=1, n_clusters_kmeans=4, palette=None, plot_umap=False)

Cluster the RNA partition/community expression vector to identify the single cell transcriptomic cluster present in the dataset

Parameters:
  • size_commu_min (int) – minimum number of RNA in a community to be considered for the clustering

  • norm_vector (bool) – if True, the expression vector will be normalized using the scTRANSFORM normalization parameters

  • n_pcs (int) – number of principal component to compute for the clustering; Lets 0 if no pca

  • n_comps (int) – number of components to compute for the clustering; Lets 0 if no pca

  • clustering_method (str) – choose in [“leiden”, “kmeans”, “louvain”]

  • n_neighbors (int) – number of neighbors similarity graph

  • resolution (float) – resolution parameter for the leiden/Louvain clustering

  • n_clusters_kmeans – number of cluster for the kmeans clustering

  • palette (list[str]) – color palette for the cluster list of (HEX) color

  • plot_umap – if True, plot the umap of the cluster

Rtype n_clusters_kmeans:

int

Returns:

merge_cluster(nb_min_cluster=0, min_merge_correlation=0.8, cluster_column_name='leiden', plot=True)

Merge clusters based on the correlation of their centroid

Parameters:
  • nb_min_cluster (int) – minimum number of clusters to merge

  • min_merge_correlation (float) – minimum correlation to merge clusters

  • cluster_column_name (str) – clustering method used

  • plot

Returns:

classify_small_community(key_pred='leiden_merged', classify_mode='pca', min_proba_small_commu=0)

associate unclassified RNA community expression vector by using a knn classifier and the already classify communities

Parameters:
  • key_pred – leave default

  • unorm_vector_key – leave default

  • classify_mode – choose in ‘pca’ or ‘euclidien’. it either uses the euclidian space or PCA space

  • min_proba_small_commu – minimum probability to classify a small community based on the KNN classifier

Returns: