使用行的重复标识符进行传播 [英] Spread with duplicate identifiers for rows

查看:93
本文介绍了使用行的重复标识符进行传播的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此处之前,对此主题有疑问,但我仍然在努力传播这一点.我希望每个state都有自己的温度值列.

There has been questions on this topic before here, but I am still struggling with spreading this. I would like so each state to have its own column of temperatures values.

这是我的数据的dput().我称它为df

Here is a dput() of my data. I'll call it df

structure(list(date = c("2018-01-21", "2018-01-21", "2018-01-20", 
"2018-01-20", "2018-01-19", "2018-01-19", "2018-01-18", "2018-01-18", 
"2018-01-17", "2018-01-17", "2018-01-16", "2018-01-16", "2018-01-15", 
"2018-01-15", "2018-01-14", "2018-01-14", "2018-01-12", "2018-01-12", 
"2018-01-11", "2018-01-11", "2018-01-10", "2018-01-10", "2018-01-09", 
"2018-01-09", "2018-01-08", "2018-01-08", "2018-01-07", "2018-01-07", 
"2018-01-06", "2018-01-06", "2018-01-05", "2018-01-05", "2018-01-04", 
"2018-01-04", "2018-01-03", "2018-01-03", "2018-01-03", "2018-01-03", 
"2018-01-02", "2018-01-02"), tmin = c(24, 31, 31, 29, 44, 17, 
32, 7, 31, 7, 31, 6, 30, 13, 30, 1, 43, 20, 33, 52, 42, 29, 30, 
29, 26, 32, 33, -2, 29, 0, 23, 3, 19, 11, NA, -3, 22, -3, 24, 
-4), state = c("UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", 
"UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", 
"OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", 
"UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH", "UT", "OH")), class = "data.frame", row.names = c(NA, 
-40L), .Names = c("date", "tmin", "state"))

我运行的代码是

df %>% spread(state,tmin)

我希望给我以下格式

date UT  OH
... ... ...

但是我收到错误消息

错误:(36、38),(35、37)行的标识符重复

Error: Duplicate identifiers for rows (36, 38), (35, 37)

我尝试了一些不同的事情.我尝试的一件事是按日期分组.我以为同一日期的行导致spread问题.我还尝试使用add_rownames()然后使用spread(state,tmin)创建新行,但这也无法解决问题.

I have tried a few different things. One thing I tried was grouping by date. I was thinking that rows of the same date were causing a problem for spread. I also tried making a new row with add_rownames() then using spread(state,tmin), but that also failed to solve the issue.

推荐答案

为使spread正常工作,结果数据框必须具有唯一标识的行和列.对于您的数据,日期"列是散布后唯一的唯一标识符.但是,第36行和第38行是相同的:

In order for spread to work as intended, the resulting data frame must have uniquely identified rows and columns. In the case of your data, the "date" column is the only unique identifier after spreading. However, rows 36 and 38 are identical:

         date tmin state
36 2018-01-03   -3    OH
38 2018-01-03   -3    OH

这使tidyr处于试图将两个数据点解析为同一行和同一列的不可能的位置.此外,行35和37都具有相同的日期和状态,再次造成了一种不可能的情况,即将两个不同的值放置在新数据框中的同一位置:

This puts tidyr in the impossible position of trying to resolve two data points to the same row and column. In addition, rows 35 and 37 both have the same date and state, once again creating the impossible situation of placing two different values into the same position in the new data frame:

         date tmin state
35 2018-01-03   NA    UT
37 2018-01-03   22    UT

以下数据清理将使传播成为可能:

The following data cleanup will make spreading possible:

df %>% 
  filter(!is.na(tmin)) %>% # remove NA values
  unique %>% # remove duplicated rows
  spread(state, tmin)

         date OH UT
1  2018-01-02 -4 24
2  2018-01-03 -3 22
3  2018-01-04 11 19
4  2018-01-05  3 23
5  2018-01-06  0 29
...

这篇关于使用行的重复标识符进行传播的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆