Sometimes you want to know how the expression of two genes relate to each other. Here is some code to plot a 2D scatter plot:
# Create a variable with your favorite gene
my_favorite_genes <- c("ENSMUSG00000038192","ENSMUSG00000047139")
my_favorite_celltype <- "Def. endoderm"
# Plot
logcounts(sce)[my_favorite_genes,sce$celltype %in% my_favorite_celltype] %>%
t() %>%
as.matrix() %>%
as_tibble() %>%
rename(gene1 = 1, gene2 = 2) %>%
ggplot(aes(x = gene1, y = gene2)) +
geom_point() +
theme_bw() +
xlab(paste(my_favorite_genes[1], "log(counts)")) +
ylab(paste(my_favorite_genes[2], "log(counts)")) +
ggtitle(my_favorite_celltype)
To plot a linear regression on top, you can add one with geom_smooth(method = “lm”):
# Plot with linear regression line
logcounts(sce)[my_favorite_genes,sce$celltype %in% my_favorite_celltype] %>%
t() %>%
as.matrix() %>%
as_tibble() %>%
rename(gene1 = 1, gene2 = 2) %>%
ggplot(aes(x = gene1, y = gene2)) +
geom_point() +
geom_smooth(method = "lm", col = "firebrick") +
theme_bw() +
xlab(paste(my_favorite_genes[1], "log(counts)")) +
ylab(paste(my_favorite_genes[2], "log(counts)")) +
ggtitle(my_favorite_celltype)
Together with the correlatePairs() function, which I posted about recently (Finding coexpressed genes in specific cell types), you can use the p value and FDR to get a sense of whether your genes are significantly correlated, and then the rho value to see if they are positively or negatively correlated.
We can see from the above plots that these two genes are positively correlated. The output from correlatePairs() confirms this:
> cor.pairs <- correlatePairs(x = sce_subset, subset.row = of.interest)
> cor.pairs %>% as_tibble() %>% filter(gene1 %in% my_favorite_genes, gene2 %in% my_favorite_genes)
# A tibble: 1 × 5
gene1 gene2 rho p.value FDR
<chr> <chr> <dbl> <dbl> <dbl>
1 ENSMUSG00000038192 ENSMUSG00000047139 0.860 7.97e-30 1.59e-25