从数据框中选择具有多列值的唯一组合的行 [英] Select rows from dataframe with unique combination of values from multiple columns
问题描述
我在R中有一个data.frame,它是每个赛季多个球队棒球比赛结果的目录.其中一些列是 team
, opponent_team
, date
, result
, team_runs
, opponent_runs
等.我的问题是,因为data.frame是每个团队的日志的组合,所以每一行实际上在data.frame中的其他位置都有另一行,这是该行的镜像行.
I have a data.frame in R that is a catalog of results from baseball games for every team for a number of seasons. Some of the columns are team
, opponent_team
, date
, result
, team_runs
, opponent_runs
, etc. My problem is that the because the data.frame is a combination of logs for every team, each row essentially has another row somewhere else in the data.frame that is a mirror image of that row.
例如
team opponent_team date result team_runs opponent_runs
BAL BOS 2010-04-05 W 5 4
在其他地方还有另一行
team opponent_team date result team_runs opponent_runs
BOS BAL 2010-04-05 L 4 5
我想用 dplyr
或类似的代码编写一些代码,以选择具有 team的唯一 组合 的行
, opponent_team
和 date
列.我在这里强调组合词,因为顺序无关紧要,我只是想摆脱那些镜像的行.
I would like to write some code in dplyr
or something similar that selects rows that have a unique combination of the team
, opponent_team
and date
columns. I stress the word combination here because order doesn't matter, I am just trying to get rid of the rows that are mirror images.
谢谢
推荐答案
您是否尝试过dplyr的 distinct
函数?对于您的情况,可能是
Have you tried distinct
function from dplyr? For your case, it can be something like
library(dplyr)
df %>% distinct(team, opponent_team, date)
另一种替代方法是在dplyr的 filter
函数中使用R中的 duplicated
函数,如下所示.
Another alternative is to use duplicated
function from base R inside filter
function of dplyr like below.
filter(!duplicated(team, opponent_team, date)
这篇关于从数据框中选择具有多列值的唯一组合的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!