Plotting proportional data as nested circles in R
A guide demonstrating how to make a static nested circle plot using the R packages packcircles and ggplot2
This article is part of our technical series, designed to provide the bioscience community with in-depth knowledge and insight from experts working at the Earlham Institute.
For better or worse, you see proportional data represented with nested circles a fair bit in the media.
Plotting proportional data is almost always best done with some kind of bar chart but, occasionally, I do come across a case where I think nested circles convey a message well (as well as being visually attractive).
In this article I’ll demonstrate how to make a static nested circle plot using the R packages packcircles and ggplot2, although if you want interactivity you can also check out circlepackeR, which creates snazzy html widgets.
Dr Rowena Hill is a Postdoctoral Research Scientist at the Earlham Institute, studying the genome of the wheat root pathogen take-all, as part of the Delivering Sustainable Wheat (DSW) programme.
Rowena gained a PhD at the Royal Botanic Gardens, Kew, and Queen Mary University of London, focusing on the diversity and evolution of fungi and plant-fungal interactions.
In the first place, you need a dataframe with the total and subset values you want to plot.
Here are some data I scraped off MycoCosm showing the number of genome assemblies available for different fungal lifestyles, which will be my larger circles. I then want the nested circle area to represent the subset of the total which have already been published - this is actually similar to a figure I created for my PhD thesis introduction!
head(mycocosm.lifestyles.df)
## lifestyle num num.pub colour
## 1 endophyte 142 85 #009E73
## 2 lichenised 16 8 #E69F00
## 3 mycoparasite 105 32 #F0E442
## 4 mycorrhizal 199 123 #56B4E9
## 5 phytopathogen 263 155 dimgrey
## 6 saprotroph 258 178 #0072B2
packcircles handles the creation of circles with area proportional to the numbers we give it.
This involves first generating a dataframe with the central point and radius of each circle.
library(packcircles)
#Get radius and x and y coordinates for centre of larger circles
circle.layout <- circleProgressiveLayout(mycocosm.lifestyles.df$num,
sizetype="area")
#Optionally add a small gap between circles so they're not touching
circle.layout$radius <- circle.layout$radius * 0.95
head(circle.layout)
## x y radius
## 1 -6.723095 0.000000 6.386940
## 2 2.256758 0.000000 2.143920
## 3 2.875424 -8.014137 5.492162
## 4 3.958529 10.072890 7.560930
## 5 16.394459 -1.676497 8.692137
## 6 -9.972836 -15.447186 8.609115
We can then generate a dataframe with enough vertices to plot a polygon that looks like a circle.
#Create a dataframe of vertices to draw each 'circle'
circle.vertices <- circleLayoutVertices(circle.layout, npoints=50)
head(circle.vertices)
## x y id
## 1 -0.3361547 0.0000000 1
## 2 -0.3865177 0.8004959 1
## 3 -0.5368122 1.5883674 1
## 4 -0.7846681 2.3511895 1
## 5 -1.1261766 3.0769318 1
## 6 -1.5559518 3.7541492 1
And finally plot the larger circles - we’ll use alpha to make them translucent so that they are distinguished from the nested circles we add later.
library(ggplot2)
library(tgutil)
#Plot circles
gg.circles <- ggplot() +
geom_polygon(data=circle.vertices,
aes(x, y, group=id, fill=as.factor(id)),
colour=NA,
alpha=0.3) +
scale_fill_manual(values=mycocosm.lifestyles.df$colour) +
coord_equal() +
theme_void() +
theme(legend.position="none") +
ggpreview(width=4, height=3, unit="in")
Now we want to create the polygons for the nested circles, which essentially means repeating the above steps with the nested data.
#Get radius and x and y coordinates for centre of nested circles
circle.layout.pub <-
circleProgressiveLayout(mycocosm.lifestyles.df$num.pub,
sizetype="area")
#If you previously added a small gap between circles, make sure to
#do so again
circle.layout.pub$radius <- circle.layout.pub$radius * 0.95
However before creating the polygon vertices for these nested circles, we first need to replace the central points with those of the larger circles so that our nested ones overlay correctly.
#Replace x and y with that of the larger circles, but keep the
#same radius
circle.layout.pub <- data.frame(x=circle.layout$x,
y=circle.layout$y,
radius=circle.layout.pub$radius)
Now we can generate the vertices and add the nested circles to the plot.
#Create a dataframe of vertices to draw each nested 'circle'
circle.vertices.pub <- circleLayoutVertices(circle.layout.pub,
npoints=50)
#Add to plot
gg.circles.nested <- gg.circles +
geom_polygon(data=circle.vertices.pub,
aes(x, y, group=id, fill=as.factor(id)),
colour=NA) +
ggpreview(width=4, height=3, unit="in")
Finally we can make another dataframe with information to label the circles.
#Combine original dataframe with the layout dataframe
circle.labels <- cbind(mycocosm.lifestyles.df, circle.layout)
#Add lifestyle labels to centre of circles
gg.circles.nested +
geom_text(data=circle.labels,
aes(x, y, size=num, label=lifestyle),
fontface="bold") +
scale_size_continuous(range=c(1.5, 3.5))
Alternatively we could label with the original values or percentage published, and add a colour legend for the lifestyles.
#Add new column with percentage of published genomes
circle.labels$percent <- round(
circle.labels$num.pub/circle.labels$num * 100
)
#Add percentage labels
gg.circles.nested +
geom_text(data=circle.labels,
aes(x, y, size=num, label=paste0(percent, "%")),
fontface="bold",
show.legend=FALSE) +
scale_size_continuous(range=c(2, 6)) +
scale_fill_manual(values=circle.labels$colour,
labels=circle.labels$lifestyle) +
guides(fill=guide_legend(
nrow=3,
direction="horizontal",
title=NULL,
label.theme=element_text(size=7, margin=margin(l=-3)),
keywidth=unit(7, "pt"),
keyheight=unit(7, "pt"))
) +
theme(legend.position=c(0.7, 0.15))
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8
## [2] LC_CTYPE=English_United Kingdom.utf8
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.utf8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] tgutil_0.1.14 ggplot2_3.4.2 packcircles_0.3.5
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.9 highr_0.10 pillar_1.9.0 compiler_4.2.2
## [5] tools_4.2.2 digest_0.6.31 evaluate_0.21 lifecycle_1.0.3
## [9] tibble_3.2.1 gtable_0.3.3 png_0.1-8 pkgconfig_2.0.3
## [13] rlang_1.1.1 cli_3.6.0 rstudioapi_0.14 yaml_2.3.6
## [17] xfun_0.36 fastmap_1.1.0 withr_2.5.0 dplyr_1.1.2
## [21] knitr_1.42 generics_0.1.3 vctrs_0.6.2 systemfonts_1.0.4
## [25] grid_4.2.2 tidyselect_1.2.0 glue_1.6.2 R6_2.5.1
## [29] textshaping_0.3.6 fansi_1.0.3 rmarkdown_2.21 farver_2.1.1
## [33] magrittr_2.0.3 scales_1.2.1 htmltools_0.5.4 colorspace_2.0-3
## [37] labeling_0.4.2 ragg_1.2.5 utf8_1.2.2 munsell_0.5.0