[data analysis and visualization] key points of data drawing 4 - problems of pie chart

Posted by chomps on Wed, 01 Dec 2021 14:36:56 +0100

Key points of data drawing 4 - problems of pie chart

This article lets us understand the most criticized chart type in history: pie chart.

Bad definition

A pie chart is a circle divided into several parts, each part representing a part of the whole. It is usually used to display percentages where the sum of sectors equals 100%. The problem is that humans are very bad at reading. In the adjacent pie chart, try to find the largest group and try to sort them by value. It may be difficult for you to do so, which is why you must avoid using pie charts. Let's try to compare three pie charts. Try to find out which group has the highest value among the three graphs. In addition, try to find out what the value evolution between groups is.

# Libraries
library(tidyverse)
library(hrbrthemes)
library(viridis)
library(patchwork)

# create 3 data frame
data1 <- data.frame( name=letters[1:5], value=c(17,18,20,22,24) )
data2 <- data.frame( name=letters[1:5], value=c(20,18,21,20,20) )
data3 <- data.frame( name=letters[1:5], value=c(24,23,21,19,18) )
# View data
data1
data2
data3
A data.frame: 5 × 2
namevalue
<fct><dbl>
a17
b18
c20
d22
e24
A data.frame: 5 × 2
namevalue
<fct><dbl>
a20
b18
c21
d20
e20
A data.frame: 5 × 2
namevalue
<fct><dbl>
a24
b23
c21
d19
e18
# Define drawing functions
plot_pie <- function(data, vec){

ggplot(data, aes(x="name", y=value, fill=name)) +
  # The pie chart needs to draw a bar chart first
  geom_bar(width = 1, stat = "identity") +
  # Change to polar coordinate system
  coord_polar("y", start=0, direction = -1) +
  # Set fill color
  scale_fill_viridis(discrete = TRUE,  direction=-1) + 
  # display text  
  geom_text(aes(y = vec, label = rev(name), size=4, color=c( "white", rep("black", 4)))) +
  scale_color_manual(values=c("black", "white")) +
  theme(
    legend.position="none",
    plot.title = element_text(size=14),
    panel.grid = element_blank(),
    axis.text = element_blank()
  ) +
  xlab("") +
  ylab("")
}

a <- plot_pie(data1, c(10,35,55,75,93))
b <- plot_pie(data2, c(10,35,53,75,93))
c <- plot_pie(data3, c(10,29,50,75,93))
a + b + c

Now, let's use the bar graph barplot to represent exactly the same data:

# Define drawing functions
plot_bar  <- function(data){

ggplot(data, aes(x=name, y=value, fill=name)) +
  # Draw bar chart
  geom_bar(stat = "identity") +
  # Set fill color
  scale_fill_viridis(discrete = TRUE,  direction=-1) + 
  scale_color_manual(values=c("black", "white")) +
  theme(
    legend.position="none",
    plot.title = element_text(size=14),
    panel.grid = element_blank(),
  ) +
  ylim(0,25) +
  xlab("") +
  ylab("")

}

a <- plot_bar (data1)
b <- plot_bar (data2)
c <- plot_bar (data3)
a + b + c

Let's talk about the reasons for using charts.

  • Charts are a way to get information and make it easier to understand.
  • In general, the purpose of charts is to make it easier to compare different data sets.
  • Charts can convey as much information as possible without increasing complexity.

As you can see by comparing the pictures, the pie chart is difficult to visually show the differences between data, while the bar chart is just the opposite, which can clearly see the differences between different data. Pie charts can't compare different values, and they can't convey more information.

Solution

Bar chart and bar chart are the best alternatives to pie chart. If you have a lot of values to show, you can also consider a more elegant lollipop chart in my opinion. The following is an example of a display based on the number of important items sold in a few countries in the world:

# Load data from github
data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/7_OneCatOneNum.csv", header=TRUE, sep=",")
# Clear null data
data <- filter(data,!is.na(Value))
nrow(data)
head(data)
# Arrange data
data<- arrange(data,Value)
# Convert Contry into a factor item to represent classified data
data<- mutate(data,Country=factor(Country, Country))
# mapping
ggplot(data,aes(x=Country, y=Value) ) +
# Define data axis
geom_segment( aes(x=Country ,xend=Country, y=0, yend=Value), color="grey") +
# Draw point
geom_point(size=3, color="#69b3a2") +
# x. Y-axis exchange
coord_flip() +
# set up themes
theme(
    # Set internal line to empty
    panel.grid.minor.y = element_blank(),
    panel.grid.major.y = element_blank(),
    legend.position="none"
) +
# The title of the original x-axis, that is, the y-axis in the image, is set to null
xlab("")

38

A data.frame: 6 × 2
CountryValue
<fct><int>
1United States 12394
2Russia 6148
3Germany (FRG) 1653
4France 2162
5United Kingdom 1214
6China 1131

If your goal is to describe the composition of the whole, another possibility is to create a tree view.

# Package
# Import specialized packages
library(treemap)

# Plot plot
treemap(data,    
        # data
        index="Country",
        vSize="Value",
        type="index",

        # Set color
        title="",
        palette="Dark2",

        # Border bounding box settings
        border.col=c("black"),
        # Bounding box lineweight
        border.lwds=3,                         

        # Labels sets the label color
        fontcolor.labels="white",
        # Set font
        fontface.labels=2,
        # Set label location
        align.labels=c("left", "top"),
        # The larger the setting area, the larger the label
        inflate.labels=T,
        # Set the display label level. The smaller the display label, the fewer labels will be displayed
        fontsize.labels=5
)

reference resources

Topics: R Language Data Analysis Data Mining