如何在数据框架中创建唯一的行 [英] How to create unique rows in a data frame

查看:138
本文介绍了如何在数据框架中创建唯一的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中行被重复。我需要从中创建唯一的行。我尝试了几个选项,但似乎不起作用。

  l1< -summarise(group_by(l,bowler, wickets),economyRate,d = unique(date))

这适用于某些行,错误期望单一值。数据框'l'看起来像这样

 保龄球员少女运行小窗口经济日期对手
(fctr)(int) (int)(dbl)(dbl)(dbl)(date)(chr)
1 MA Starc 9 0 51 0 5.67 2010-10-20印度
2 MA Starc 9 0 27 4 3.00 2010- 11-07斯里兰卡
3 MA Starc 9 0 27 4 3.00 2010-11-07斯里兰卡
4 MA Starc 9 0 27 4 3.00 2010-11-07斯里兰卡
5 MA Starc 9 0 27 4 3.00 2010-11-07斯里兰卡
6 MA Starc 6 0 33 2 5.50 2012-02-05印度
7 MA Starc 6 0 33 2 5.50 2012-02-05印度
8 MA Starc 10 0 50 2 5.00 2012-02-10斯里兰卡
9 MA Starc 10 0 50 2 5.00 2012-02-10斯里兰卡
10 MA Starc 8 0 49 0 6.12 2012- 02-12印度

日期是唯一的,可用于获取可以选择行的行。请让我知道如何做到这一点。

解决方案

在示例数据集中,有多个唯一的元素每个礼帽,小门组合的约会。一个选择是将粘贴 唯一'date'在一起

  l%>%
group_by(bowler,wickets)%>%
summaryize(economyRate = mean(economyRate),d = toString(unique日期))

或创建'd'作为列表

  l%>%
group_by(bowler,wickets)%>%
summary(economyRate = mean(economyRate),d = list(unique(date))

对于economyRate,我猜测OP需要的意思



如果我们需要在原始数据集中创建唯一日期列,使用 mutate

  l%>%
group_by(bowler,wickets)%>%
mutate(d = list(unique(date)))

由于OP没有提供预期的输出,下面也可能是结果

  l%> ;%
group_by(bowler,wickets)%>%
distinct(date)

或@Frank提到

  l%>%
group_by(bowler,wickets,date)% >%
slice(1L)


I have a dataframe where rows are duplicated. I need to create unique rows from this. I tried a couple of options but they don't seem to work

  l1 <-summarise(group_by(l,bowler,wickets),economyRate,d=unique(date))

This works for some rows but also gives the error "Expecting a single value". The dataframe 'l' looks like this

     bowler overs maidens  runs wickets economyRate       date opposition
     (fctr) (int)   (int) (dbl)   (dbl)       (dbl)     (date)      (chr)
1  MA Starc     9       0    51       0        5.67 2010-10-20      India
2  MA Starc     9       0    27       4        3.00 2010-11-07  Sri Lanka
3  MA Starc     9       0    27       4        3.00 2010-11-07  Sri Lanka
4  MA Starc     9       0    27       4        3.00 2010-11-07  Sri Lanka
5  MA Starc     9       0    27       4        3.00 2010-11-07  Sri Lanka
6  MA Starc     6       0    33       2        5.50 2012-02-05      India
7  MA Starc     6       0    33       2        5.50 2012-02-05      India
8  MA Starc    10       0    50       2        5.00 2012-02-10  Sri Lanka
9  MA Starc    10       0    50       2        5.00 2012-02-10  Sri Lanka
10 MA Starc     8       0    49       0        6.12 2012-02-12      India   

The date is unique and can be used to get the rows for which the row can be selected. Please let me know how this can be done.

解决方案

In the example dataset, there are more than one unique elements of 'date' per each 'bowler', 'wickets' combination. One option would be to paste the unique 'date' together

l %>%
    group_by(bowler, wickets) %>% 
    summarise(economyRate= mean(economyRate), d = toString(unique(date)))

Or create 'd' as a list column

l %>%
    group_by(bowler, wickets) %>% 
    summarise(economyRate= mean(economyRate), d = list(unique(date)))

With respect to 'economyRate', I am guessing the OP need the mean of that.

If we need to create a column of unique date in the original dataset, use mutate

l %>% 
    group_by(bowler, wickets) %>%
    mutate(d = list(unique(date)))

As the OP didn't provide the expected output, the below could be also the result

l %>%
     group_by(bowler, wickets) %>% 
     distinct(date)

Or as @Frank mentioned

l %>%
  group_by(bowler,wickets,date) %>%
  slice(1L)

这篇关于如何在数据框架中创建唯一的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆