如何pivot_wider 并通过复制值来填充缺失值 [英] How to pivot_wider and fill in missing values by duplicating values
问题描述
我正在尝试从长格式到宽格式获取数据框,但由于其结构,每当我使用 pivot_wider() 时,我都会得到两列包含数据向量的列.
I am trying to get a data frame from a long to a wide format, but due to its structure, whenever I use pivot_wider() I get two columns that contain vectors of data.
这是原始数据:
structure(list(type = c("radio", "radio", "radio", "television",
"television", "television", "television", "television", "television",
"television", "television", "television", "television", "television",
"television", "television", "television", "television", "television"
), Resource = c("samsung", "samsung", "samsung", "samsung", "samsung",
"samsung", "samsung", "samsung", "samsung", "samsung", "samsung",
"sony", "sony", "sony", "sony", "sony", "sony", "sony", "sony"
), Property = c("lot_number", "lot_number", "manufacturer", "lot_number",
"lot_number", "lot_number", "lot_number", "lot_number", "manufacturer",
"other_PN", "part_number", "lot_number", "lot_number", "lot_number",
"lot_number", "lot_number", "manufacturer", "other_PN", "part_number"
), value = c("12345", "54321", "John", "9876", "12345", "54321",
"56789", "67890", "Walt", "5g6h3f", "6789", "9876", "12345",
"54321", "56789", "67890", "John", "2a3b4c", "3461")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -19L), spec = structure(list(
cols = list(type = structure(list(), class = c("collector_character",
"collector")), Resource = structure(list(), class = c("collector_character",
"collector")), Property = structure(list(), class = c("collector_character",
"collector")), value = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
我使用的命令:
df_wide <- df %>% pivot_wider(names_from = Property, values_from = value)
但是多个值被保存为lot_number"中的一个向量.列:
but multiple values are saved as a vector in the "lot_number" column:
structure(list(type = c("radio", "television", "television"),
Resource = c("samsung", "samsung", "sony"), lot_number = list(
c("12345", "54321"), c("9876", "12345", "54321", "56789",
"67890"), c("9876", "12345", "54321", "56789", "67890"
)), manufacturer = list("John", "Walt", "John"), other_PN = list(
NULL, "5g6h3f", "2a3b4c"), part_number = list(NULL, "6789",
"3461")), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
这是我最后想要得到的数据框.请注意,在所需的输出收音机"中,有一些缺失值.列manufacturer"、other_PN"中的值也包括在内.和part_number"必须跨多行复制.
This is the data frame I would like to get at the end. Note that in the desired output "radio" has some missing values. Also values in columns "manufacturer", "other_PN" and "part_number" have to be duplicated across several rows.
structure(list(type = c("radio", "radio", "television", "television",
"television", "television", "television", "television", "television",
"television", "television", "television"), Resource = c("samsung",
"samsung", "samsung", "samsung", "samsung", "samsung", "samsung",
"sony", "sony", "sony", "sony", "sony"), manufacturer = c("John",
"John", "Walt", "Walt", "Walt", "Walt", "Walt", "John", "John",
"John", "John", "John"), other_PN = c(NA, NA, "5g6h3f", "5g6h3f",
"5g6h3f", "5g6h3f", "5g6h3f", "2a3b4c", "2a3b4c", "2a3b4c", "2a3b4c",
"2a3b4c"), part_number = c(NA, NA, 6789, 6789, 6789, 6789, 6789,
3461, 3461, 3461, 3461, 3461), lot_number = c(12345, 54321, 9876,
12345, 54321, 56789, 67890, 9876, 12345, 54321, 56789, 67890)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L), spec = structure(list(
cols = list(type = structure(list(), class = c("collector_character",
"collector")), Resource = structure(list(), class = c("collector_character",
"collector")), manufacturer = structure(list(), class = c("collector_character",
"collector")), other_PN = structure(list(), class = c("collector_character",
"collector")), part_number = structure(list(), class = c("collector_double",
"collector")), lot_number = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
感谢您的帮助!!
推荐答案
我建议这种方法使用 dput()
数据作为 df
.您可以使用 group_by()
创建一个 id 变量以识别行,然后您可以使用 pivot_wider()
进行整形.由于需要填充某些值,您可以使用 tidyr
中的 fill()
,一个 tidyverse
包:
I would suggest this approach using your dput()
data as df
. You can create an id variable with group_by()
in order to identify rows and then you can reshape with pivot_wider()
. As some values need to be filled you can use fill()
from tidyr
, a tidyverse
package:
library(tidyverse)
#Data
df %>% group_by(type,Resource,Property) %>% mutate(id=1:n()) %>%
pivot_wider(names_from = Property,values_from=value) %>%
fill(everything()) %>% select(-id)
输出:
# A tibble: 12 x 6
# Groups: type, Resource [3]
type Resource lot_number manufacturer other_PN part_number
<chr> <chr> <chr> <chr> <chr> <chr>
1 radio samsung 12345 John NA NA
2 radio samsung 54321 John NA NA
3 television samsung 9876 Walt 5g6h3f 6789
4 television samsung 12345 Walt 5g6h3f 6789
5 television samsung 54321 Walt 5g6h3f 6789
6 television samsung 56789 Walt 5g6h3f 6789
7 television samsung 67890 Walt 5g6h3f 6789
8 television sony 9876 John 2a3b4c 3461
9 television sony 12345 John 2a3b4c 3461
10 television sony 54321 John 2a3b4c 3461
11 television sony 56789 John 2a3b4c 3461
12 television sony 67890 John 2a3b4c 3461
这篇关于如何pivot_wider 并通过复制值来填充缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!