基于row_number()过滤数据 [英] filtering data.frame based on row_number()

查看:302
本文介绍了基于row_number()过滤数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更新:此问题已被更新,因为此问题已经被更新,现在表现为OP需要

我正在尝试获取使用 dplyr data.frame 中的第二行到第七行。

I´m trying to get the second to the seventh line in a data.frame using dplyr.

我这样做:

require(dplyr)
df <- data.frame(id = 1:10, var = runif(10))
df <- df %>% filter(row_number() <= 7, row_number() >= 2)

但这会抛出一个错误。

But this throws an error.

Error in rank(x, ties.method = "first") : 
  argument "x" is missing, with no default

我知道我可以轻松地做到:

I know i could easily make:

df <- df %>% mutate(rn = row_number()) %>% filter(rn <= 7, rn >= 2)

但是我想了解为什么我的第一次尝试不起作用。

But I would like to understand why my first try is not working.

推荐答案

row_number()函数不会简单地返回每个元素的行号,因此不能使用像你想要的:

The row_number() function does not simply return the row number of each element and so can't be used like you want:

•'row_number':相当于'rank(ties.method =first)'

• ‘row_number’: equivalent to ‘rank(ties.method = "first")’

你实际上并不在说你想要的 row_number 。在你的情况下:

You're not actually saying what you want the row_number of. In your case:

df %>% filter(row_number(id) <= 7, row_number(id) >= 2)

因为 id 被排序所以 row_number(id) 1:10 。我不知道在这个上下文中什么 row_number()评估,但是当第二次调用 dplyr 已经运行出来吃东西,你得到相当于:

works because id is sorted and so row_number(id) is 1:10. I don't know what row_number() evaluates to in this context, but when called a second time dplyr has run out of things to feed it and you get the equivalent of:

> row_number()
Error in rank(x, ties.method = "first") : 
  argument "x" is missing, with no default

这是你的错误。

无论如何,这不是选择行

Anyway, that's not the way to select rows.

您只需要下标 df [2:7,] ,或者如果您坚持使用管道无处不在:

You simply need to subscript df[2:7,], or if you insist on pipes everywhere:

> df %>% "["(.,2:7,)
  id        var
2  2 0.52352994
3  3 0.02994982
4  4 0.90074801
5  5 0.68935493
6  6 0.57012344
7  7 0.01489950

这篇关于基于row_number()过滤数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆