转换数据框以在ggplot2中制作瀑布图 [英] Convert Dataframe to make Waterfall Chart in ggplot2
问题描述
我想将数据框转换为适合瀑布图的格式.
I want to transform my dataframe into a format that would be suitable for a waterfall chart.
我的数据框如下:
employee <- c('A','B','C','D','E','F',
'A','B','C','D','E','F',
'A','B','C','D','E','F',
'A','B','C','D','E','F')
revenue <- c(10, 20, 30, 40, 10, 40,
8, 10, 20, 50, 20, 10,
2, 5, 70, 30, 10, 50,
40, 8, 30, 40, 10, 40)
date <- as.Date(c('2017-03-01','2017-03-01','2017-03-01',
'2017-03-01','2017-03-01','2017-03-01',
'2017-03-02','2017-03-02','2017-03-02',
'2017-03-02','2017-03-02','2017-03-02',
'2017-03-03','2017-03-03','2017-03-03',
'2017-03-03','2017-03-03','2017-03-03',
'2017-03-04','2017-03-04','2017-03-04',
'2017-03-04','2017-03-04','2017-03-04'))
df <- data.frame(date,employee,revenue)
date employee revenue
1 2017-03-01 A 10
2 2017-03-01 B 20
3 2017-03-01 C 30
4 2017-03-01 D 40
5 2017-03-01 E 10
6 2017-03-01 F 40
7 2017-03-02 A 8
8 2017-03-02 B 10
9 2017-03-02 C 20
10 2017-03-02 D 50
11 2017-03-02 E 20
12 2017-03-02 F 10
13 2017-03-03 A 2
14 2017-03-03 B 5
15 2017-03-03 C 70
16 2017-03-03 D 30
17 2017-03-03 E 10
18 2017-03-03 F 50
19 2017-03-04 A 40
20 2017-03-04 B 8
21 2017-03-04 C 30
22 2017-03-04 D 40
23 2017-03-04 E 10
24 2017-03-04 F 40
如何转换此数据框,以便可以将其转换为ggplot2中瀑布图的形式?
How do I transform this dataframe so that I can get it into a form for a waterfall chart in ggplot2?
amount
列是与员工的总天数之差.
The amount
column is the difference from the total day by employee.
end
列是start
列减去amount
列.
start
列是前一天的Total
最终值.
The start
column is the Total
end values from previous day.
最终数据框应如下所示:
Final dataframe should look like this:
date employee start end amount total_for_day
1 2017-03-01 A 0 10 10 10
2 2017-03-01 B 0 20 20 20
3 2017-03-01 C 0 30 30 30
4 2017-03-01 D 0 40 40 40
5 2017-03-01 E 0 10 10 10
6 2017-03-01 F 0 40 40 40
7 2017-03-01 Total 0 150 150 150
8 2017-03-02 A 150 148 -2 8
9 2017-03-02 B 150 140 -10 10
10 2017-03-02 C 150 140 -10 20
11 2017-03-02 D 150 160 10 50
12 2017-03-02 E 150 160 10 20
13 2017-03-02 F 150 120 -30 10
14 2017-03-02 Total 150 118 -32 98
15 2017-03-03 A 118 112 -6 2
16 2017-03-03 B 118 113 -5 5
17 2017-03-03 C 118 168 50 70
18 2017-03-03 D 118 98 -20 30
19 2017-03-03 E 118 108 -10 10
20 2017-03-03 F 118 158 40 50
21 2017-03-03 Total 118 167 49 170
22 2017-03-04 A 167 205 38 40
23 2017-03-04 B 167 170 3 8
24 2017-03-04 C 167 127 -40 30
25 2017-03-04 D 167 177 10 40
26 2017-03-04 E 167 167 0 10
27 2017-03-04 F 167 157 -10 40
28 2017-03-04 Total 167 168 1 168
推荐答案
有一些步骤可以帮助您实现这一目标,并且我认为dplyr
软件包会有所帮助(在下面大量使用).
There are a few steps to get you to this, and I think that the dplyr
package will help (used heavily below).
我的理解是revenue
给出的是累计总收入,而不是每日变化.如果那是错误的,则需要逆转其中一些计算.
My understanding is that revenue
gives the cumulative total revenue, rather than the daily change. If that is wrong, you would need to reverse some of these calculations.
第一步是创建一个新的data.frame来计算每日总计,然后将其绑定回data.frame.然后,您可以group_by
雇员(包括总计")并添加将分别为每个雇员创建的列(前一天的值,更改,然后是增加还是减少).>
The first step is to create a new data.frame that calculates the daily totals, then bind that back to the data.frame. Then, you can group_by
the employees (including "Total") and add columns that will be created separately for each employee (value on the previous day, the change, and then whether it was an increase or a decrease).
toPlot <-
bind_rows(
df
, df %>%
group_by(date) %>%
summarise(revenue = sum(revenue)) %>%
mutate(employee = "Total")
) %>%
group_by(employee) %>%
mutate(
previousDay = lag(revenue, default = 0)
, change = revenue - previousDay
, direction = ifelse(change > 0
, "Positive"
, "Negative"))
返回:
date employee revenue previousDay change direction
<date> <chr> <dbl> <dbl> <dbl> <chr>
1 2017-03-01 A 10 0 10 Positive
2 2017-03-01 B 20 0 20 Positive
3 2017-03-01 C 30 0 30 Positive
4 2017-03-01 D 40 0 40 Positive
5 2017-03-01 E 10 0 10 Positive
6 2017-03-01 F 40 0 40 Positive
7 2017-03-02 A 8 10 -2 Negative
8 2017-03-02 B 10 20 -10 Negative
9 2017-03-02 C 20 30 -10 Negative
10 2017-03-02 D 50 40 10 Positive
# ... with 18 more rows
然后,我们可以使用以下方法进行绘制:
Then, we can plot that using:
toPlot %>%
ggplot(aes(xmin = date - 0.5
, xmax = date + 0.5
, ymin = previousDay
, ymax = revenue
, fill = direction)) +
geom_rect(col = "black"
, show.legend = FALSE) +
facet_wrap(~employee
, scale = "free_y") +
scale_fill_brewer(palette = "Set1")
给予
请注意,将总计"包括在内会超出范围(要求使用免费范围),因此我宁愿忽略它:
Note that including "Total" throws off the scale (requiring the free scales), so I would prefer to omit it:
toPlot %>%
filter(employee != "Total") %>%
ggplot(aes(xmin = date - 0.5
, xmax = date + 0.5
, ymin = previousDay
, ymax = revenue
, fill = direction)) +
geom_rect(col = "black"
, show.legend = FALSE) +
facet_wrap(~employee) +
scale_fill_brewer(palette = "Set1")
为此,员工之间可以直接进行比较
For this to allow direct comparsion between employees
这是整个总数
toPlot %>%
filter(employee == "Total") %>%
ggplot(aes(xmin = date - 0.5
, xmax = date + 0.5
, ymin = previousDay
, ymax = revenue
, fill = direction)) +
geom_rect(col = "black"
, show.legend = FALSE) +
scale_fill_brewer(palette = "Set1")
尽管我仍然发现折线图更易于解释(尤其是比较员工):
though I still find line graphs to be easier to interpret (especially comparing employees):
toPlot %>%
filter(employee != "Total") %>%
ggplot(aes(x = date
, y = revenue
, col = employee)) +
geom_line() +
scale_fill_brewer(palette = "Dark2")
如果您想按日自己绘制更改,则可以执行以下操作:
If you want to plot the changes themselves by day, you can do:
toPlot %>%
filter(employee != "Total") %>%
ggplot(aes(x = date
, y = change
, fill = employee)) +
geom_col(position = "dodge") +
scale_fill_brewer(palette = "Dark2")
获得:
但是现在您与瀑布"图输出相差很远.如果您确实确实想使瀑布在各图之间具有可比性,那么它将非常难看(我强烈建议在上面的线图上使用
but now you are getting rather far from the "waterfall" plot outputs. If you really, really want to make a waterfall comparable across plots you can, but it is going to be rather ugly (I'd strongly recommend the line plot above instead).
在这里,您需要手动移动框,如果您更改输出宽高比(或大小)或员工人数,则需要进行一些修补.您还需要包括员工和更改方向的颜色,这些颜色开始看起来很粗糙.这属于可以,但可能不应该"的类别-显示这些数据可能是一种更好的方法.
Here, you need to manually move the boxes around, and this will require some tinkering if you change the output aspect ratio (or size) or the number of employees. You also need to include colors for both the employee and the direction of the change, which starts to look rough. This falls into the category of "can, but probably shouldn't" -- there is likely a better way to display these data.
toPlot %>%
filter(employee != "Total") %>%
ungroup() %>%
mutate(empNumber = as.numeric(as.factor(employee))) %>%
ggplot(aes(xmin = (empNumber) - 0.4
, xmax = (empNumber) + 0.4
, ymin = previousDay
, ymax = revenue
, col = direction
, fill = employee)) +
geom_rect(size = 1.5) +
facet_grid(~date) +
scale_fill_brewer(palette = "Dark2") +
theme(axis.text.x = element_blank()
, axis.ticks.x = element_blank())
给予
这篇关于转换数据框以在ggplot2中制作瀑布图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!