R

Histogram

December 15, 2021
visualization
ggplot2, R, data visualization

Overview # A histogram is a visualization that shows the spread of a numerical variable. The way histograms are assembled is by taking a numerical variable, and chunking the data up into bins. Each bin reflects a count of the number of observations that fall within a range. For example, if we have a super simple numerical variable made up of 1, 3, 5, 7, 11, 13, and we define two bins, where one bin includes everything from 0 to 10, and the other bin containing everything everything above 10 to 20, then the bin from 0 to 10 includes 4 observations, and the bin from above 10 to 20 includes two observations. ...

Stream Graph

December 1, 2021
visualization
R, ggplot2

Overview # A stream graph is a variation of a stacked area chart. Stream graphs shows proportions of different categories along some other piece of sequenced data. If you look at the data as vertical slices, the width of each section represents how much of something there is at each particular vertical slice. You could call it a “stream plot” I suppose, but “stream graph” seems to be much more common. ...

Beeswarm Plot

November 30, 2021
visualization
R, ggplot2

Overview # A beeswarm plot is a way to display the spread of a numerical data field. Think of it as a histogram, but rather than having heights of bars corresponding to the number of points within particular ranges, a beeswarm shows individual data points. The data points in a beeswarm plot are spread out along a cross section to minimize point overlap so it’s visually easy to identify clusters of data points along a single continuous variable. ...

Line Plot

November 29, 2021
visualization
ggplot2, R

Overview # A line plot (or line graph, whichever you prefer) is a way to show how something changes as something else changes. Visually, a line plot connects point to point with lines. In many instances, the points themselves are not shown prominently, and the plot itself relies more on conveying data through the sequence of lines. Line plots are frequently used to show how some value changes over time. ...

Bubble Plot

November 19, 2021
visualization
R

Overview # A bubble plot is simply an enhanced version of a scatter plot. A simple, flat scatter plot typically only utilizes two numerical fields and conveys data based on the position of points. A bubble plot on the other hand utilizes a third numerical field. In a bubble plot, the size of the points change based on the value of the third numerical field. Data # A bubble plot requires at least three numerical fields: ...

Heatmap Plot

November 18, 2021
visualization
R

Overview # Heatmaps are used to display variations in numbers across different observations that also have other categorical attributes. These are also referred to as tile plots. Data # At a minimum, a heatmap must have at least one numerical field and one categorical field. A more common and information rich heatmap can leverage one numerical field and two categorical fields. R # A heatmap can be generated in R using the geom_tile() function in the ggplot2 package. ...

Violin Plot

November 4, 2021
visualization
R

Overview # A violin plot is used to display the distribution of numerical variables. The width along the cross section of any part of the plotted violin represents how many data points there are within that given section. A single violin plot can display the distribution of numerical variables for multiple categories. Data # A violin plot requires at least one categorical variable and one numerical variable. R # Let’s make a violin plot in R with the ggplot2 package. ...