根据条件在数据框中插入行-Tidyverse方法 [英] Insert rows in dataframe based on condition - the Tidyverse way
问题描述
这是一个数据框
# 5 companies observed each day for 10 days
df <- tibble(
company = rep(LETTERS[1:5], 10),
value = rep(sample(100, 5), 10),
date = rep(seq(as.Date("2020-01-01"), as.Date("2020-01-10"), 1), each = 5)
)
df
现在数据发生了一些变化,并且删除了一些公司E行.
Now something happens to the data and some of the company E rows are removed.
df_error <- df[-c(5, 10, 15, 20), ]
df_error
最简单的Tidyverse方法是加回E行.价值无所谓.E行的日期与其上方的D行相同.
What is the simplest Tidyverse way to add back the E rows. Value doesn't matter. The date of the E row is the same as the D row above it.
我从以下内容开始,不确定如何继续:
I started with the following and wasn't sure how to proceed:
# Find all D occurrences
e_idx <- which(df_error$company == "D")
e_idx
# If there is not an E in the next row, get the index. These need E rows below each index value.
rows_need_e_below <- ifelse(df_error[e_idx + 1, 1] != "E", e_idx, NA)
rows_need_e_below
推荐答案
如果您知道自己的数据应包含"A",到"E"您可以使用 complete
的公司:
If you know that your data should have "A" to "E" companies you can use complete
:
tidyr::complete(df_error, date, company = LETTERS[1:5])
或更笼统地说:
unique_company <- c('A', 'B', 'C', 'D', 'E')
tidyr::complete(df_error, date, company = unique_company)
# A tibble: 50 x 3
# date company value
# <date> <chr> <int>
# 1 2020-01-01 A 87
# 2 2020-01-01 B 5
# 3 2020-01-01 C 40
# 4 2020-01-01 D 67
# 5 2020-01-01 E NA
# 6 2020-01-02 A 87
# 7 2020-01-02 B 5
# 8 2020-01-02 C 40
# 9 2020-01-02 D 67
#10 2020-01-02 E NA
# … with 40 more rows
默认情况下, value
列是给定的 NA
值.如果要用特定值填充它,可以使用 complete
的 fill
参数.例如,要填充0,可以执行以下操作:
The value
column is by default given NA
value. If you want to fill it with specific value you can use fill
parameter of complete
. For example, to fill with 0's you can do :
tidyr::complete(df_error, date, company = unique_company, fill = list(value = 0))
这篇关于根据条件在数据框中插入行-Tidyverse方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!