必须先“熔化”一个数据帧,然后再进行“投射”吗? [英] Must one `melt` a dataframe before having it `cast`?

查看:99
本文介绍了必须先“熔化”一个数据帧,然后再进行“投射”吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

必须一个融化数据帧,然后强制转换吗?从?熔化

Must one melt a data frame prior to having it cast? From ?melt:

data    molten data frame, see melt.

换句话说,是否绝对有必要在任何<$ c $之前熔化数据帧c> acast 或 dcast 操作?

In other words, is it absolutely necessary to have a data frame molten prior to any acast or dcast operation?

请考虑以下内容:

library("reshape2")
library("MASS")

xb <- dcast(Cars93, Manufacturer ~ Type, mean, value.var="Price")
m.Cars93 <- melt(Cars93, id.vars=c("Manufacturer", "Type"), measure.vars="Price")
xc <- dcast(m.Cars93, Manufacturer ~ Type, mean, value.var="value")

然后:

> identical(xb, xc)
[1] TRUE

因此在这种情况下融化操作似乎是多余的。

So in this case the melt operation seems to have been redundant.

这些情况下的一般指导规则是什么?您如何确定在 * cast 操作之前何时需要熔化数据框?

What are the general guiding rules in these cases? How do you decide when a data frame needs to be molten prior to a *cast operation?

推荐答案

是否需要熔化您的数据集取决于什么

Whether or not you need to melt your dataset depends on what form you want the final data to be in and how that relates to what you currently have.

我通常认为的方式是:


  1. 对于公式的LHS,我应该有一个或多个列,这些列将成为我的 id行。这些将保留为最终输出中的单独列。

  2. 对于公式的RHS,我应该将一个或多个列合并以形成新的列,在这些列中我将扩展我的价值观横扫所有人。如果此字段不止一列,则 dcast 将基于值的组合创建新列。

  3. 我必须具有 仅一列 ,它们将提供值以填充由这些行和列创建的结果网格。

  1. For the LHS of the formula, I should have one or more columns that will become my "id" rows. These will remain as separate columns in the final output.
  2. For the RHS of the formula, I should have one or more columns that combine to form new columns in which I will be "spreading" my values out across. When this is more than one column, dcast will create new columns based on the combination of the values.
  3. I must have just one column that would feed the values to fill in the resulting "grid" created by these rows and columns.

为举例说明,请考虑以下小型数据集:

To illustrate with a small example, consider this tiny dataset:

mydf <- data.frame(
  A = c("A", "A", "B", "B", "B"),
  B = c("a", "b", "a", "b", "c"),
  C = c(1, 1, 2, 2, 3),
  D = c(1, 2, 3, 4, 5),
  E = c(6, 7, 8, 9, 10)
)

想象一下,我们可能的值变量是 D或 E列,但我们只对 E中的值感兴趣。还要想象一下,我们的主要 id是列 A,并且我们想根据列 B来分配值。

Imagine that our possible value variables are columns "D" or "E", but we are only interested in the values from "E". Imagine also that our primary "id" is column "A", and we want to spread the values out according to column "B". Column "C" is irrelevant at this point.

在这种情况下,我们不需要融化数据。第一。我们可以简单地做:

With that scenario, we would not need to melt the data first. We could simply do:

library(reshape2)
dcast(mydf, A ~ B, value.var = "E")
#   A a b  c
# 1 A 6 7 NA
# 2 B 8 9 10

比较执行以下操作时发生的情况,并牢记上面的三点:

Compare what happens when you do the following, keeping in mind my three points above:

dcast(mydf, A ~ C, value.var = "E")
dcast(mydf, A ~ B + C, value.var = "E")
dcast(mydf, A + B ~ C, value.var = "E")






何时需要融化

现在,让我们对场景进行一些小调整:我们希望将 列中的值 D和 E分散开,而无需进行实际的汇总。进行此更改后,我们需要先融化数据,以便需要散布的相关值在单个列中(上面的第3点)。

Now, let's make one small adjustment to the scenario: We want to spread out the values from both columns "D" and "E" with no actual aggregation taking place. With this change, we need to melt the data first so that the relevant values that need to be spread out are in a single column (point 3 above).

dfL <- melt(mydf, measure.vars = c("D", "E"))
dcast(dfL, A ~ B + variable, value.var = "value")
#   A a_D a_E b_D b_E c_D c_E
# 1 A   1   6   2   7  NA  NA
# 2 B   3   8   4   9   5  10

这篇关于必须先“熔化”一个数据帧,然后再进行“投射”吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆