在从数据集中删除重复项之后,无法保留所有变量 [英] Having trouble keeping all variables after removing duplicates from a dataset

查看:132
本文介绍了在从数据集中删除重复项之后,无法保留所有变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我导入了178个观察值和8个变量的数据集。那么最终的目的就是消除所有这三个变量之间相同的观察结果(2,5和6)。这是非常容易的使用唯一的命令。

  mav2<  -  unique(mav [,c(2,5,6) ])

生成的 mav2 数据框生成55观察,摆脱所有的重复!不幸的是,它也摆脱了我在独特命令中没有使用的其他五个变量(1,3,4,7和8)。我最初尝试添加两个数据框,当然这并不奏效,因为它们的大小不等。我也尝试合并这两个,但是这失败了,只是给出了所有178个观察结果的第一个数据集的输出。



第二个数据集( mav2 )确实产生了一个新列( row.names ),这是从初始数据集的每个观察值的行号。



如果有人可以帮助我将所有8个初始变量都纳入数据集只有55个独特的观察,我会非常感谢。感谢提前。

解决方案

我想你想要的是重复一个类似于独特的函数返回重复元素的索引。



所以

  mav2<  -  mav [!duplicateated(mav [,c(2,5,6)])]] 

编辑:重复的反义意向


So, I imported a dataset with 178 observations and 8 variables. Then end goal was to eliminate all observations that were the same across three of those variables (2, 5, and 6). This proved quite easy using the unique command.

mav2 <- unique(mav[,c(2,5,6)])

The resulting mav2 dataframe produced 55 observations, getting rid of all the duplicates! Unfortunately, it also got rid of the other five variables that I did not use in the unique command (1,3,4,7, and 8). I initially tried adding the two dataframes, of course this did not work since they were of unequal size. I have also tried merging the two, but this fails and just gives the an output of the first dataset with all 178 observations.

The second dataset (mav2) did produce a new column (row.names) which is the row number for each observation from the initial dataset.

If anyone could help me out on getting all 8 initial variables into a dataset with only the 55 unique observations, I would be very appreciative. Thanks in advance.

解决方案

I think what you want is duplicated, a function similar to unique that returns the indices of the duplicated elements.

So

mav2 <- mav[!duplicated(mav[,c(2,5,6)]),]

EDIT: inverted sense of duplicated

这篇关于在从数据集中删除重复项之后,无法保留所有变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆