R - 错误的错误消息 - 错误:行标识符重复 [英] R - Wrong error message - Error: Duplicate identifiers for rows

查看:29
本文介绍了R - 错误的错误消息 - 错误:行标识符重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个需要重塑的数据框的问题.

I have a problem with a dataframe that I need to reshape.

我有这个命令:

library(tidyverse)
df1 = df1 %>% gather(Day, value, Day01:Day31) %>% spread(Station, value)

我得到这个错误:

Error: Duplicate identifiers for rows (130933, 131029), (389113, 389209), (647293, 647389), (905473, 905569), (1163653, 1163749), (1421833, 1421929), (1680013, 1680109), (1938193, 1938289), (2196373, 2196469), (2454553, 2454649), (2712733, 2712829), (2970913, 2971009), (3229093, 3229189), (3487273, 3487369), (3745453, 3745549), (4003633, 4003729), (4261813, 4261909), (4519993, 4520089), (4778173, 4778269), (5036353, 5036449), (5294533, 5294629), (5552713, 5552809), (5810893, 5810989), (6069073, 6069169), (6327253, 6327349), (6585433, 6585529), (6843613, 6843709), (7101793, 7101889), (7359973, 7360069), (7618153, 7618249), (7876333, 7876429), (130934, 131030), (389114, 389210), (647294, 647390), (905474, 905570), (1163654, 1163750), (1421834, 1421930), (1680014, 1680110), (1938194, 1938290), (2196374, 2196470), (2454554, 2454650), (2712734, 2712830), (2970914, 2971010), (3229094, 3229190), (3487274, 3487370), (3745454, 3745550), (4003634, 4003730), (4261814, 4261910), (4519994, 4520090

奇怪的是我也得到了这个结果:

The strange thing is that I also get this results:

library(dplyr)
test = rownames_to_column(df1, "VALUE")
length(unique(test$VALUE)) ### Result 258180 = Same as number of rows
length(unique(test$VALUE)) == nrow(test) #### Result TRUE

如您所见,错误消息还包含在我的数据框中甚至不存在的行.

As you see the error message also contains rows that do not even exist in my dataframe.

该命令在我拥有的所有其他数据帧上都可以正常工作,这些数据帧具有 1:1 相同的结构.他们只有更少的行.

The command works fine on all other dataframes I have, that have 1:1 the same structure. They only have less rows.

我不知道如何为您提供数据框,因为它太大了.我把它上传到我的大学,所以你可以下载数据框.

I dont know how to provide the dataframe for you since its so huge. I uploaded it on my university, so you can download the dataframe.

这是链接(我希望它允许这样发布)

Here is the link (I hope its allowed to post it like that)

https://megastore.uni-augsburg.de/get/pmAS15z6TN/

推荐答案

这应该有效.正如评论指出的那样,这是因为 spread 尝试组合在 gather 之后不再唯一标识的行.rowid_to_column 是一个将行 ID 转换为列的简单函数.数字大于原始数据集大小的原因是因为收集后您有一个包含 8003580 行的数据框.

This ought to work. As a comment noted, this is because spread tries to combine rows that are no longer uniquely identified after the gather. rowid_to_column is a simple function that converts the row ids into a column. The reason the numbers are larger than the size of the original dataset is because after gathering you have a data frame with 8003580 rows.

data2 <- data %>%
    gather(Day, value, Day01:Day31) %>%
    tibble::rowid_to_column() %>%
    spread(Station, value)

不过,我在尝试在笔记本电脑上实际执行此操作时遇到了内存问题.

I ran into memory issues trying to actually do this on my laptop though.

这篇关于R - 错误的错误消息 - 错误:行标识符重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆