从数据框中选择具有多列值的唯一组合的行 [英] Select rows from dataframe with unique combination of values from multiple columns

查看:50
本文介绍了从数据框中选择具有多列值的唯一组合的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在R中有一个data.frame,它是每个赛季多个球队棒球比赛结果的目录.其中一些列是 team opponent_team date result team_runs opponent_runs 等.我的问题是,因为data.frame是每个团队的日志的组合,所以每一行实际上在data.frame中的其他位置都有另一行,这是该行的镜像行.

I have a data.frame in R that is a catalog of results from baseball games for every team for a number of seasons. Some of the columns are team, opponent_team, date, result, team_runs, opponent_runs, etc. My problem is that the because the data.frame is a combination of logs for every team, each row essentially has another row somewhere else in the data.frame that is a mirror image of that row.

例如

team  opponent_team  date           result team_runs opponent_runs
BAL   BOS            2010-04-05      W      5         4

在其他地方还有另一行

team  opponent_team  date           result team_runs opponent_runs
BOS   BAL            2010-04-05      L      4         5

我想用 dplyr 或类似的代码编写一些代码,以选择具有 team的唯一 组合 的行 opponent_team date 列.我在这里强调组合词,因为顺序无关紧要,我只是想摆脱那些镜像的行.

I would like to write some code in dplyr or something similar that selects rows that have a unique combination of the team, opponent_team and date columns. I stress the word combination here because order doesn't matter, I am just trying to get rid of the rows that are mirror images.

谢谢

推荐答案

您是否尝试过dplyr的 distinct 函数?对于您的情况,可能是

Have you tried distinct function from dplyr? For your case, it can be something like

library(dplyr)
df %>% distinct(team, opponent_team, date)

另一种替代方法是在dplyr的 filter 函数中使用R中的 duplicated 函数,如下所示.

Another alternative is to use duplicated function from base R inside filter function of dplyr like below.

filter(!duplicated(team, opponent_team, date)

这篇关于从数据框中选择具有多列值的唯一组合的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆