dcast更改数据帧的内容 [英] dcast changes content of dataframe

查看:88
本文介绍了dcast更改数据帧的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用reshape软件包对我得到的数据框进行重塑,但是使用它时,数据框中的数字已更改,但不应更改。

I tried using the reshape package to reshape a dataframe I got, but when using it, numbers in the dataframe are changed which should not be.

数据框包含多个变量以及这些变量已被测量的多次,每个人有6行,即该人被测量的6倍。现在,我想重塑数据框的形状,以便每个人只有一行而不是6行,这意味着每个变量应该存在6次(每次测量一次),这可以通过以下代码轻松完成:

The dataframe contains several variables as well as multiple times these variables have been measured, for each person there are 6 rows, that is 6 times that person has been measured. Now I want to reshape the dataframe so there is only one row for each person instead of 6, that means every variable should be there 6 times (once for every measurement), this should easily be done with the following code:

melteddata <- melt(daten, id=(c("IDParticipant", "looporder")))

datenrestrukturiert <- dcast(melteddata, IDParticipant~looporder+variable)

其中 daten为原始数据框, looporder是反映测量时间的变量(1-6),下面是一个示例(不幸的是,我无法弄清楚如何发布表格):

with "daten" being the original dataframe, "looporder" being the variable that reflects the time of measurement (1-6), here an example (unfortunately I could not figure out how to post tables):

https://www.dropbox.com/s/8c9dm4rttedbzw1/daten .jpg?dl = 0

也许这很好:

structure(list(IDParticipant = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 3L, 3L, 3L), looporder = c(1L, 2L, 3L, 5L, 6L, 2L, 3L, 
5L, 6L, 1L, 2L, 3L), pc_mean_1 = c(NA, 3.22222222222222, NA, 
3.22222222222222, 3.22222222222222, 3.66666666666667, 3.66666666666667, 
3.66666666666667, 3.66666666666667, 3.25, NA, 3.25), bd_mean_1 = c(NA, 
2.88888888888889, NA, 2.88888888888889, 2.88888888888889, 2.75, 
2.75, 2.75, 2.75, 4.08333333333333, NA, 4.08333333333333), sm = c(999, 
4, 999, 3.66666666666667, 1, 4, 4, 5, 5, 5, 999, 5), cm = c(999, 
1.33333333333333, 999, 2.33333333333333, 1, 2, 2, 2.33333333333333, 
1, 3, 999, 1.66666666666667)), .Names = c("IDParticipant", "looporder", 
"pc_mean_1", "bd_mean_1", "sm", "cm"), row.names = c(NA, 12L), class = "data.frame")

datenrestrukturiert如下:

datenrestrukturiert looks as the following:

https://www.dropbox .com / s / al93lnj76y1j266 / datenrestrukturiert.jpg?dl = 0

我不想进行汇总或其他任何操作ng,这就是为什么我尝试不加任何更改地添加 fun.aggregate = NULL 的原因,也总是出现以下消息:

I do not want to aggregate or anything, which is why I tried adding fun.aggregate = NULL without any change, also there is always the following message:


缺少聚合函数:默认为长度

"Aggregation function missing: defaulting to length"

到目前为止,一切正常一个问题:使用dcast(以及强制转换)时,变量中的某些数字已更改,大部分更改为 0或 1,但通常应该还有其他一些数字,例如 3.44或 4.77或类似的数字,但是在计算转换时,它们大多变为 0

so far everything worked, but there is one problem: when using dcast (as well as cast) some numbers from variables are changed, mostly to "0" or "1", but usually there should be some other numbers like "3.44" or "4.77" or something like that, but they are changed to mostly "0" when cast is computed

任何人都暗示了为什么会这样吗?

Anybody got any hints why this could be?

一些其他信息可能会有所帮助:当我通过read.csv2导入数据集时,我总是对第一个变量有一个陌生的名称,即该变量名前面的符号比Excel中显示的符号多:ï..IDParticipant我将其重命名为 IDParticipant,可能与它有任何关系吗?

Some more information that might help: when i import the dataset via read.csv2 I always get a strange name for the first variable, that is some more symbols in front of the variablename than shown in Excel: "ï..IDParticipant" which I rename to "IDParticipant", could that have anything to do with it?

另一个副作用:与samplefr一起运行我提供的ame,一切都很好,原始数据帧包含1404行和353个变量,对于R来说可能太大了吗?

another sidefact: running it with the sampleframe I provided, everything is fine, the original dataframe consists of 1404 rows and 353 variables, could it be too big for R?

推荐答案

这是我基于Anandas建议的解决方案(非常感谢您)

here is my solution basend on Anandas suggestions (thank you very much for that)

数据帧是 daten,包含许多变量,例如 IDParticipant, looporder和 sm

dataframe is "daten" containing many variables, e.g. "IDParticipant", "looporder" and "sm"

首先,我们需要创建一个包含变量的对象,以便以后使用melt和cast函数

first we need to create an object containing the variables for the later use of the melt- and cast-function

idvars<-c( IDParticipant, looporder)

idvars <- c("IDParticipant", "looporder")

事实证明,有重复项在具有两个变量 IDParticipant和 looporder中相同值的数据框中,因此在融化数据时我们需要向该数据框中添加另一个id变量,即使用splitstackshape-package

as it turns out, there were duplicates in the dataframe with the same values in the two variables "IDParticipant" and "looporder", so we need to add another id-varaible to the dataframe when melting it, that is to be done with "getanID" from the splitstackshape-package

melteddata< -melt(getanID(daten,idvars),c(。id,idvars))

melteddata <- melt(getanID(daten, idvars), c(".id", idvars))

添加额外的id变量后,我们终于可以使用额外的id变量和其他变量来投射所需的数据帧

after adding an extra id-variable, we can finally cast the dataframe we need using the extra id-variable and the other variables

datenrestrukturiert<-dcast(melteddata ,.id + IDParticipant〜变量+循环顺序)

datenrestrukturiert <- dcast(melteddata, .id + IDParticipant ~ variable + looporder)

这篇关于dcast更改数据帧的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆