Graph model

class comseg.model.ComSegGraph(df_spots_label, selected_genes, dict_co_expression, dict_scale={'x': 0.103, 'y': 0.103, 'z': 0.3}, mean_cell_diameter=15, k_nearest_neighbors=10, edge_max_length=None, gene_column='gene', prior_name='in_nucleus', disable_tqdm=False)

Bases: object

Class to the generate the graph of RNA spots from a CSV file/image this class is in charge of :

create the graph
apply community detection / graph partitioning
Compute the community expression vector \(V_c\)
add to the communities the labels/cell types computed by the clustering of \(V_c\) the by the InSituClustering() class
add the centroid of the cells in the graph
associate RNAs to cell
compute the cell-by-gene matrix of the input sample

__init__(df_spots_label, selected_genes, dict_co_expression, dict_scale={'x': 0.103, 'y': 0.103, 'z': 0.3}, mean_cell_diameter=15, k_nearest_neighbors=10, edge_max_length=None, gene_column='gene', prior_name='in_nucleus', disable_tqdm=False)

Parameters:

df_spots_label (pd.DataFrame) – dataframe of the spots with column x,y,z, gene and optionally the prior label column
selected_genes (list[str]) – list of genes to consider
dict_scale (dict) – dictionary containing the pixel/voxel size of the images in µm default is {“x”: 0.103, ‘y’: 0.103, “z”: 0.3}
mean_cell_diameter (float) – the expected mean cell diameter in µm default is 15µm
k_nearest_neighbors (int) – number of nearest neighbors to consider for the graph construction default is 10
edge_max_length (float) – default is mean_cell_diameter / 4

create_graph()

create the graph of the RNA nodes, all the graph generation parameters are set in the __init__() function

Returns:: a graph of the RNA spots
Return type:: nx.Graph

community_vector(clustering_method='with_prior', seed=None)

Partition the graph into communities/sets of RNAs and computes and stores the “community expression vector”: in the community_anndata class attribute

Parameters:

clustering_method (str) – choose in [“with_prior”, “louvain”]. “with_prior” is our graph partitioning / community detection method modify from Louvain, taking into account prior knowledge from landmarks like nuclei or cell mask.
seed (int) – (optional) seed for the graph partitioning initialization

Returns:

a graph with a new node attribute “community” with the community detection vector

Return type:

nx.Graph

add_cluster_id_to_graph(dict_cluster_id, clustering_method='leiden_merged')

add transcriptional cluster id to each RNA molecule in the graph

Parameters:

dict_cluster_id (dict) – dict {index_commu : cluster_id}
clustering_method (str) – clustering method used to get the community

Returns:

Return type:

nx.Graph

classify_centroid(dict_cell_centroid, n_neighbors=15, dict_in_pixel=True, max_dist_centroid=None, key_pred='leiden_merged', distance='ngb_distance_weights')

classify cells centroid based on their neighbor RNAs labels from add_cluster_id_to_graph()

Parameters:

dict_cell_centroid (dict) – dict of centroid coordinate {cell : {z:,y:,x:}}
n_neighbors (int) – number of neighbors to consider for the classification of the centroid (default 15)
dict_in_pixel (bool) – if True the centroid are in pixel and rescals if False the centroid are in um (default True)
max_dist_centroid (int) – maximum distance to consider for the centroid, if None it is set to mean_cell_diameter / 2
key_pred (str) – key-name of the node attribute containing the cluster id (default “leiden_merged”)
distance (str) – leave it to “ngb_distance_weights” (default “ngb_distance_weights”)

Returns:

self.G

Return type:

nx.Graph

associate_rna2landmark(key_pred='leiden_merged', distance='distance', max_cell_radius=100)

Associate RNA to cell based on the both transcriptomic landscape and the distance between the RNAs and the centroid of the cell

Parameters:

key_pred (str) – key of the node attribute containing the cluster id (default “leiden_merged”)
prior_name (str) – prior_name attribut used in community_vector
max_distance (float) – maximum distance between a cell centroid and an RNA to be associated (default 100)

Returns:

self.G

:rtype nx.Graph

get_anndata_from_result(key_cell_pred='cell_index_pred', return_polygon=False, alpha=0.5, min_rna_per_cell=5, allow_disconnected_polygon=False)

Generate an anndata storing the estimated expression vector and their spots coordinates

Parameters:

key_cell_pred (str) – key of the cell prediction in the graph (default cell_index_pred)
return_polygon (bool) – if True return the polygon of the cells
alpha (float) – alpha parameter to compute the alphashape polygone : https://pypi.org/project/alphashape/. alpha is between 0 and 1, 1 correspond to the convex hull of the cell
min_rna_per_cell (int) – minimum number of RNA to consider a cell
allow_disconnected_polygon – if True allow disconnected polygon

Returns: