根据R中行的内容重新组织数据框架元素 [英] Reorganize data frame elements depending on the content of the rows in R
问题描述
我有这个数据集:
df <- structure(list(V1 = c("B1D01", "B1D01", "B1D01", "B1D01", "B1D01",
"B1D01", "U0155"), V2 = c("U0155", "U0155", "U0155", "U0155",
"U0155", "U0155", "U3003"), V3 = c("U3003", "U3003", "C1B00",
"U3003", "U3003", "U3003", "C1B00"), V4 = c("C1B00", "C1B00",
"U0073", "C1B00", "C1B00", "C1B00", "P037D"), V5 = c("P037D",
"P037D", NA, "P037D", "P037D", "P037D", "P0616"), V6 = c("P0616",
"P0616", NA, "P0616", "P0616", "P0616", "P0562"), V7 = c("P0562",
"P0562", NA, "P0562", "P0562", "P0562", "U0073"), V8 = c("U0073",
"U0073", NA, "U0073", "U0073", "U0073", NA)), .Names = c("V1",
"V2", "V3", "V4", "V5", "V6", "V7", "V8"), row.names = 1719:1725, class = "data.frame")
当我 print(df)
:
V1 V2 V3 V4 V5 V6 V7 V8
1719 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
1720 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
1721 B1D01 U0155 C1B00 U0073 <NA> <NA> <NA> <NA>
1722 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
1723 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
1724 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
1725 U0155 U3003 C1B00 P037D P0616 P0562 U0073 <NA>
如您所见,这些代码中混合使用。例如, U3003
主要在 V3
中,但也可以在 V2 <中显示/ code>(最后一行)。
As you can observe, there is a mix in these codes. For instance, U3003
is primarily in V3
, but it can also be shown in V2
(last row).
我想在以下条件下重新组织此数据框:
I would like to reorganize this data frame with these conditions:
- 每个代码可能放在一列中。
- 该列的名称应该是代码的名称。
- 如果代码多于8列,则列数可能反映代码数。
- 单元格值可能会保留代码名称。
- 如果代码未连续显示,则必须显示
NA
。
- Each code might be placed in one column.
- Names of the column should be the name of the codes.
- If there are more codes than 8 columns, number of columns might reflect number of codes.
- The cell values might keep the name of the codes.
- If the code is not present in a row,
NA
must appear.
请注意,我的原始数据框中包含的行比从原始示例中提取的小示例要多。
Be aware that my original data frame contains much more rows than this small example extracted from the original.
推荐答案
我发现的最佳方法是按摩数据框,旋转为更长的形式,然后将其恢复为初始形式:
The best way I found is to 'massage' the dataframe, pivoting to a longer form, and then bring it back to the initial form:
library(tidyverse)
df %>%
rownames_to_column() %>%
pivot_longer(-rowname, values_drop_na = TRUE) %>%
pivot_wider(rowname, names_from = value, values_from = value)
#> # A tibble: 7 x 9
#> rowname B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1719 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
#> 2 1720 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
#> 3 1721 B1D01 U0155 <NA> C1B00 <NA> <NA> <NA> U0073
#> 4 1722 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
#> 5 1723 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
#> 6 1724 B1D01 U0155 U3003 C1B00 P037D P0616 P0562 U0073
#> 7 1725 <NA> U0155 U3003 C1B00 P037D P0616 P0562 U0073
于2020-04-03创建href = https://reprex.tidyverse.org rel = nofollow noreferrer> reprex包(v0.3.0)
Created on 2020-04-03 by the reprex package (v0.3.0)
这篇关于根据R中行的内容重新组织数据框架元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!