Plot 2D scatter plots comparing gene expression of two genes

Sometimes you want to know how the expression of two genes relate to each other. Here is some code to plot a 2D scatter plot:

# Create a variable with your favorite gene
my_favorite_genes <- c("ENSMUSG00000038192","ENSMUSG00000047139")
my_favorite_celltype <- "Def. endoderm"

# Plot
logcounts(sce)[my_favorite_genes,sce$celltype %in% my_favorite_celltype] %>%
  t() %>%
  as.matrix() %>%
  as_tibble() %>%
  rename(gene1 = 1, gene2 = 2) %>%
  ggplot(aes(x = gene1, y = gene2)) +
  geom_point() +
  theme_bw() +
  xlab(paste(my_favorite_genes[1], "log(counts)")) +
  ylab(paste(my_favorite_genes[2], "log(counts)")) +
  ggtitle(my_favorite_celltype)

To plot a linear regression on top, you can add one with geom_smooth(method = “lm”):

# Plot with linear regression line
logcounts(sce)[my_favorite_genes,sce$celltype %in% my_favorite_celltype] %>%
  t() %>%
  as.matrix() %>%
  as_tibble() %>%
  rename(gene1 = 1, gene2 = 2) %>%
  ggplot(aes(x = gene1, y = gene2)) +
  geom_point() +
  geom_smooth(method = "lm", col = "firebrick") +
  theme_bw() +
  xlab(paste(my_favorite_genes[1], "log(counts)")) +
  ylab(paste(my_favorite_genes[2], "log(counts)")) +
  ggtitle(my_favorite_celltype)

Together with the correlatePairs() function, which I posted about recently (Finding coexpressed genes in specific cell types), you can use the p value and FDR to get a sense of whether your genes are significantly correlated, and then the rho value to see if they are positively or negatively correlated.

We can see from the above plots that these two genes are positively correlated. The output from correlatePairs() confirms this:

> cor.pairs <- correlatePairs(x = sce_subset, subset.row = of.interest)
> cor.pairs %>% as_tibble() %>% filter(gene1 %in% my_favorite_genes, gene2 %in% my_favorite_genes)
# A tibble: 1 × 5
  gene1              gene2                rho  p.value      FDR
  <chr>              <chr>              <dbl>    <dbl>    <dbl>
1 ENSMUSG00000038192 ENSMUSG00000047139 0.860 7.97e-30 1.59e-25

@yanitrevino

1 Like

Hey @mmccoy so I tried running it to find the rho for Wnt7B(ENSMUSG00000022382) and c-jun(ENSMUSG00000052684), however the two genes I looked for didn’t pop up. What should I do?


I’m having the same issue as Yani ^.

Hi @yanethp and @yanitrevino , your genes are in the cor.pairs object you made, you just have to search for them. I edited the code above to show you how to filter the cor.pairs object to only show your two genes of interest:

cor.pairs %>% as_tibble() %>% filter(gene1 %in% my_favorite_genes, gene2 %in% my_favorite_genes)
1 Like


Okay thank you I got it working!
@yanethp this is how I got it to work:
cor.pairs ← correlatePairs(x = sce_subset, subset.row = of.interest)
cor.pairs %>% as_tibble() %>% filter(gene1 %in% my_favorite_gene, gene2 %in% my_favorite_gene)

A tibble: 1 × 5

1 Like

How should I alter this code to show a violin plot of gene expression relative the embryonic stages, for all cell types, would this even be possible ?

Plotting the gastrulation across modalities using facet_wrap()

gene_name <- "Sox17"
gene_id <- rowData(sce)[which(rowData(sce)[,2] %in% gene_name),1]

# favorite plot to combine both stage and cell type
plotExpression(sce, features = gene_id, colour_by = "celltype", x = "celltype") + theme(axis.text.x = element_text(angle = 90), legend.position = "none") + facet_wrap( ~colData(sce)$stage ) + labs(subtitle = gene_name)

Whenever I run this code with multiple gene names, it would only provide the plots for the very first gene, How could I tweak this so that it would show every single gene with their own individual plots, or would I have to run the code 9 times with each gene

You have to plot each one individually.


I’m still ruining into errors.

Hi @yanethp , your variable is named “my_favorite_gene”, not “my_favorite_genes”. They need to match.