在tidyr中,函数“收集”使用什么标准来将数据帧从宽映射到长整型? [英] In tidyr, what criteria does the function `gather` use to map a dataframe from wide to long?

查看:118
本文介绍了在tidyr中,函数“收集”使用什么标准来将数据帧从宽映射到长整型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在 tidyr 包中找出收集的参数。

I'm trying to figure out the arguments for gather in the tidyr package.

我查看了文档,语法如下所示:

I looked at the documentation, and the syntax looks like:

collect(data,key,value ,...,na.rm = FALSE,convert = FALSE)

有一个示例

stocks <- data.frame(
  time = as.Date('2009-01-01') + 0:9,
  X = rnorm(10, 0, 1),
  Y = rnorm(10, 0, 2),
  Z = rnorm(10, 0, 4)
)

gather(stocks, stock, price, -time)

我对最后一行感到好奇:

收集(股票,股票, -time)

I'm curious about the last line:
gather(stocks, stock, price, -time)

这里,股票显然是我们要修改的数据,这是罚款。

Here, stocks is clearly the data we want to modify, which is fine.

所以我可以看到股票价格关键值对的参数 - 但是这个函数如何决定如何选择列来创建这个键值对?原始数据框如下所示:

So I can read that stock and price are arguments to a key value pair -- but how does this function decide how to select columns to create this key value pair? The original dataframe looks like this:

time        X            Y          Z
2009-01-01  1.10177950  -1.1926213  -7.4149618
2009-01-02  0.75578151  -4.3705737  -0.3117843
2009-01-03  -0.23823356 -1.3497319  3.8742654
2009-01-04  0.98744470  -4.2381224  0.7397038
2009-01-05  0.74139013  -2.5303960  -5.5197743

我没有看到任何迹象表明我们应该使用 X Y Z 。当我使用这个功能时,我觉得我只是选择我想要的长格式的数据框中的列的名称,并祈祷收集神奇地工作。要想到这一点,当我使用 fusion 时,我感觉一样。

I don't see any indication that we should use any combination of X, Y or Z. When I'm using this function, I feel like I'm just choosing names for what I want the columns in my long formatted dataframe to be, and praying that gather magically works. Come to think of it, I feel the same way when I use melt.

收集查看列的类型?如何从广泛到长时间映射?

Does gather look at the column's type? How does it map from wide to long?

编辑
以下伟大的答案,下面的很好的讨论,以及任何其他人想要更多关于 tidyr 软件包的理念和使用的信息,绝对应该读取

EDIT Great answer below, great discussion below, and for anyone else wanting more info on the philosophy and use of the tidyr package should definitely read this paper, although the vignette doesn't explain the syntax.

推荐答案

推荐答案

在tidyr中,您可以在 ... 参数中指定 collect 的度量变量。这在概念上与 fusion 有点不同,其中有很多例子(在这里SO甚至很多答案)都会显示使用 id.vars 参数(假定任何未被指定为ID的内容都是测量值)。

The ... argument can also take a - column name, as in the example you have shown. This basically says to "gather all of the columns except for this one". Another shorthand approach in gather includes specifying a range of columns by using the colon, for example, gather(stocks, stock, price, X:Z).

... 参数也可以采用 - 列名称,如您所示的示例所示。这基本上说是收集除了这一个之外的所有列。 collect 中的另一个简写方法包括使用冒号指定列范围,例如 gather(股票,股票,价格,X:Z)

You can compare gather with melt by looking at the code for the function. Here are the first few lines:

您可以将 collect code>通过查看函数的代码。以下是前几行:

> tidyr:::gather_.data.frame function (data, key_col, value_col, gather_cols, na.rm = FALSE, convert = FALSE) { data2 <- reshape2::melt(data, measure.vars = gather_cols, variable.name = key_col, value.name = value_col, na.rm = na.rm)

这篇关于在tidyr中,函数“收集”使用什么标准来将数据帧从宽映射到长整型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆