用因子列融解R data.table [英] Melting an R data.table with a factor column
问题描述
我有以下R data.table(尽管它也应随data.frame缩放)。目的是重塑此data.table,以在 ggplot2
中作为散点图进行绘制。因此,我需要重塑此data.table,使其具有一个因素列来为这些点着色:
I have the following R data.table (though this should scale with a data.frame too). The goal is to reshape this data.table to plot as a scatterplot in ggplot2
. I therefore need to reshape this data.table to have one "factor" column to color the points:
> library(data.table)
> dt
ID x_A y_A x_B y_B
1: 05AC 0.81 3 0.92 2.05
2: 01BA 0.41 5 0.63 1.8
3: Z1AC 0.41 5 0.58 1.8
4: B2BA 0.21 6.5 1.00 1.8
....
我相信正确的输出格式必须为:
I believe the correct output needs to be of the form:
ID type x y
05AC A 0.81 3
05AC B 0.92 2.05
01BA A 0.41 5
01BA B 0.63 1.8
Z1AC A 0.41 5
Z1AC B 0.58 1.8
B2BA A 0.21 6.5
B2BA B 1.00 1.8
是否有一种标准的方式以这种方式展开 data.table?我很高兴在这种情况下可以使用dplyr,但我怀疑应该有一个data.table方法。
Is there a standard way to "unfold" data.tables in this fashion? I'm happy for how to use dplyr in this case, but I suspect there should be a data.table method.
melt()
可以,如果我能弄清楚如何创建列键入
,例如
melt()
would work, if I could figure out how to create the column type
, e.g.
melt(dt, id.vars=c("ID"))
只会根据一列 ID
我特别困惑如何分别从2-3列和4-5列中擦除 A和B类型...
I'm especially confused how one "scrapes" the A and B type from columns 2-3 and columns 4-5 respectively...
推荐答案
在建议的使用融化
的方法之后,留在 data.table
中,您可以 tstrsplit
来基于 _字符拆分变量。
Staying within data.table
, after your suggested approach of using melt
, you can tstrsplit
to split the variable based on the "_" character.
## use tstrsplit to split a column on a regular expression
dt[, c("xy", "type") := tstrsplit(variable, "_")]
dt
# ID variable value xy type
# 1: 05AC x_A 0.81 x A
# 2: 01BA x_A 0.41 x A
# 3: Z1AC x_A 0.41 x A
# 4: B2BA x_A 0.21 x A
# 5: 05AC y_A 3.00 y A
# 6: 01BA y_A 5.00 y A
# 7: Z1AC y_A 5.00 y A
# 8: B2BA y_A 6.50 y A
# 9: 05AC x_B 0.92 x B
# 10: 01BA x_B 0.63 x B
# 11: Z1AC x_B 0.58 x B
# 12: B2BA x_B 1.00 x B
# 13: 05AC y_B 2.05 y B
# 14: 01BA y_B 1.80 y B
# 15: Z1AC y_B 1.80 y B
# 16: B2BA y_B 1.80 y B
这为您提供了所需解决方案的详细形式。然后,您可以使用 dcast
扩大它
This gives you the long-form of your required solution. You can then use dcast
to widen it
dcast(dt, formula = ID + type ~ xy)
# ID type x y
# 1: 01BA A 0.41 5.00
# 2: 01BA B 0.63 1.80
# 3: 05AC A 0.81 3.00
# 4: 05AC B 0.92 2.05
# 5: B2BA A 0.21 6.50
# 6: B2BA B 1.00 1.80
# 7: Z1AC A 0.41 5.00
# 8: Z1AC B 0.58 1.80
此答案的逻辑与建议的 dplyr
方法,将收集%&%;%分开%>%价差
,但使用 data.table
。
The logic of this answer is the same as the suggested dplyr
approach of gather %>% separate %>% spread
, but using data.table
.
这篇关于用因子列融解R data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!