使用具有重复标识符的 data.frame/tibble 传播 [英] Spread with data.frame/tibble with duplicate identifiers

查看:43
本文介绍了使用具有重复标识符的 data.frame/tibble 传播的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

tidyr 的文档表明 collect 和 spread 是可传递的,但以下带有iris"数据的示例表明它们不是,但不清楚为什么.任何澄清将不胜感激

The documentation for tidyr suggests that gather and spread are transitive, but the following example with the "iris" data shows they are not, but it is not clear why. Any clarification would be greatly appreciated

iris.df = as.data.frame(iris)
long.iris.df = iris.df %>% gather(key = feature.measure, value = size, -Species)
w.iris.df = long.iris.df %>% spread(key = feature.measure, value = size, -Species)

我希望数据框w.iris.df"与iris.df"相同,但收到以下错误:

I expected the data frame "w.iris.df" to be the same as "iris.df" but received the following error instead:

错误:行标识符重复(1、2、3、4、5、6、7、8、9..."

"Error: Duplicate identifiers for rows (1, 2, 3, 4, 5, 6, 7, 8, 9..."

我的一般问题是如何在此类数据集上反转收集"的应用程序.

My general question is how to reverse an application of "gather" on this sort of dataset.

推荐答案

Hadley 的干预出乎意料地完美……但在那之后我最终弄乱了语法……所以为了它的价值,我发布了完整的操作代码(抱歉我的语法和上面有点不同):

Hadley's intervention was unsurprisingly perfect... but I ended up mucking with the syntax a bit after that... so for what it's worth, I post the fully operational code (sorry my syntax is a bit different than above):

library(tidyr)
library(dplyr)

wide <- 
  iris %>%
  mutate(row = row_number()) %>%
  gather(vars, val, -Species, -row) %>%
  spread(vars, val)

head(wide)
#   Species row Petal.Length Petal.Width Sepal.Length Sepal.Width
# 1  setosa   1          1.4         0.2          5.1         3.5
# 2  setosa   2          1.4         0.2          4.9         3.0
# 3  setosa   3          1.3         0.2          4.7         3.2
# 4  setosa   4          1.5         0.2          4.6         3.1
# 5  setosa   5          1.4         0.2          5.0         3.6
# 6  setosa   6          1.7         0.4          5.4         3.9

head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1          5.1         3.5          1.4         0.2  setosa
# 2          4.9         3.0          1.4         0.2  setosa
# 3          4.7         3.2          1.3         0.2  setosa
# 4          4.6         3.1          1.5         0.2  setosa
# 5          5.0         3.6          1.4         0.2  setosa
# 6          5.4         3.9          1.7         0.4  setosa

它们是一样的....如果你喜欢它只需要重新排序...

They are the same.... just need to reorder if u feel like it...

wide <- wide[,c(3, 4, 5, 6, 1)]  ## Reorder and then remove "row" column

完成.

这篇关于使用具有重复标识符的 data.frame/tibble 传播的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆