R:NA和DCAST [英] R: NA and dcast
本文介绍了R:NA和DCAST的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用R编程语言。
假设我有以下数据:
my_data <- data.frame(
"id" = c("1", "1", "1", "1", "2", "2", "2", "2" ),
"name" = c("john", "jason", "jack", "jim", "john", "jason", "jack", "jim" ),
"points" = c("150", "165", "183", "191", "151", "166", "184", "192"),
"gender" = c("male", "male", "male", "male", "male", "male", "male", "male"),
"country" = c("usa", "usa", "usa", "usa", "usa", "usa", "usa", "usa")
)
#view original data format
my_data
id name points gender country
1 1 john 150 male usa
2 1 jason 165 male usa
3 1 jack 183 male usa
4 1 jim 191 male usa
5 2 john 151 male usa
6 2 jason 166 male usa
7 2 jack 184 male usa
8 2 jim 192 male usa
让我们假设对于上述数据:性别和国家将始终具有相同的值。此外,这4个名称将始终一起出现--每次它们一起出现时,它们的ID都是相同的数字。唯一可以更改的数字是它们在不同迭代中拥有的";点数(即";id";)。
以下是我正在尝试做的事情:
my_data_1 <- data.frame(
"id" = c("1", "2"),
"john_points" = c("150", "151"),
"jason_points" = c("165", "166"),
"jack_points" = c("183", "184"),
"jim_points" = c("191", "192"),
"gender" = c("male", "male"),
"country" = c("usa", "usa")
)
#view desired data format
my_data_1
id john_points jason_points jack_points jim_points gender country
1 1 150 165 183 191 male usa
2 2 151 166 184 192 male usa
我找到了上一篇堆栈溢出帖子How to reshape data from long to wide format,其中";data.table";库和";dcast";函数可用于解决此类问题。
我尝试了";dcast";函数的不同组合,但无法获得所需的最终结果:
library(data.table)
#attempt 1 : not correct
setDT(my_data)
dcast(my_data, name ~ points, value.var = c("gender", "country", "id")
)
name gender_150 gender_151 gender_165 gender_166 gender_183 gender_184 gender_191 gender_192 country_150 country_151 country_165 country_166 country_183 country_184 country_191
1: jack <NA> <NA> <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA>
2: jason <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA> <NA> <NA>
3: jim <NA> <NA> <NA> <NA> <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa
4: john male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA> <NA> <NA> <NA> <NA>
country_192 id_150 id_151 id_165 id_166 id_183 id_184 id_191 id_192
1: <NA> <NA> <NA> <NA> <NA> 1 2 <NA> <NA>
2: <NA> <NA> <NA> 1 2 <NA> <NA> <NA> <NA>
3: usa <NA> <NA> <NA> <NA> <NA> <NA> 1 2
4: <NA> 1 2 <NA> <NA> <NA> <NA> <NA> <NA>
#attempt 2 : not correct
setDT(my_data)
dcast(my_data, name ~ points, value.var = c("gender", "country"))
name gender_150 gender_151 gender_165 gender_166 gender_183 gender_184 gender_191 gender_192 country_150 country_151 country_165 country_166 country_183 country_184 country_191
1: jack <NA> <NA> <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA>
2: jason <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA> <NA> <NA>
3: jim <NA> <NA> <NA> <NA> <NA> <NA> male male <NA> <NA> <NA> <NA> <NA> <NA> usa
4: john male male <NA> <NA> <NA> <NA> <NA> <NA> usa usa <NA> <NA> <NA> <NA> <NA>
country_192
1: <NA>
2: <NA>
3: usa
4: <NA>
#attempt 3 - not correct:
setDT(my_data)
dcast(my_data, name ~ points, value.var = c("id"))
name 150 151 165 166 183 184 191 192
1: jack <NA> <NA> <NA> <NA> 1 2 <NA> <NA>
2: jason <NA> <NA> 1 2 <NA> <NA> <NA> <NA>
3: jim <NA> <NA> <NA> <NA> <NA> <NA> 1 2
4: john 1 2 <NA> <NA> <NA> <NA> <NA> <NA>
有没有人能教我怎么解决这个问题?为什么会有这么多?是否可以拥有我所显示的最终表格(即MY_DATA_1)?是否可以重命名NAME_POINTS格式的变量(例如John_Points)?
谢谢
推荐答案
我会使用tidyr
,因为从长格式更改为宽格式非常简单。
library(tidyr)
wide = my_data %>%
tidyr::spread(name, points)
结果
id gender country jack jason jim john
1 1 male usa 183 165 191 150
2 2 male usa 184 166 192 151
这篇关于R:NA和DCAST的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文