R:NA和DCAST [英] R: NA and dcast

查看:15
本文介绍了R:NA和DCAST的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R编程语言。

假设我有以下数据:

my_data <- data.frame(

"id" = c("1", "1", "1", "1", "2", "2", "2", "2" ),
"name" = c("john", "jason", "jack", "jim", "john", "jason", "jack", "jim" ),
"points" = c("150", "165", "183", "191", "151", "166", "184", "192"),
"gender" = c("male", "male", "male", "male", "male", "male", "male", "male"),
"country" = c("usa", "usa", "usa", "usa", "usa", "usa", "usa", "usa")
)

#view original data format
 my_data

  id  name points gender country
1  1  john    150   male     usa
2  1 jason    165   male     usa
3  1  jack    183   male     usa
4  1   jim    191   male     usa
5  2  john    151   male     usa
6  2 jason    166   male     usa
7  2  jack    184   male     usa
8  2   jim    192   male     usa
让我们假设对于上述数据:性别和国家将始终具有相同的值。此外,这4个名称将始终一起出现--每次它们一起出现时,它们的ID都是相同的数字。唯一可以更改的数字是它们在不同迭代中拥有的";点数(即";id";)。

以下是我正在尝试做的事情:

my_data_1 <- data.frame(

"id" = c("1", "2"),
"john_points" = c("150", "151"),
"jason_points" = c("165", "166"),
"jack_points" = c("183", "184"),
"jim_points" = c("191", "192"),
"gender" = c("male", "male"),
"country" = c("usa", "usa")
)

#view desired data format

  my_data_1
  id john_points jason_points jack_points jim_points gender country
1  1         150          165         183        191   male     usa
2  2         151          166         184        192   male     usa

我找到了上一篇堆栈溢出帖子How to reshape data from long to wide format,其中";data.table";库和";dcast";函数可用于解决此类问题。

我尝试了";dcast";函数的不同组合,但无法获得所需的最终结果:

 library(data.table)
 
#attempt 1 : not correct
 setDT(my_data)
dcast(my_data, name ~ points, value.var = c("gender", "country", "id")
)
    name gender_150 gender_151 gender_165 gender_166 gender_183 gender_184 gender_191 gender_192 country_150 country_151 country_165 country_166 country_183 country_184 country_191
1:  jack       <NA>       <NA>       <NA>       <NA>       male       male       <NA>       <NA>        <NA>        <NA>        <NA>        <NA>         usa         usa        <NA>
2: jason       <NA>       <NA>       male       male       <NA>       <NA>       <NA>       <NA>        <NA>        <NA>         usa         usa        <NA>        <NA>        <NA>
3:   jim       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       male       male        <NA>        <NA>        <NA>        <NA>        <NA>        <NA>         usa
4:  john       male       male       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>         usa         usa        <NA>        <NA>        <NA>        <NA>        <NA>
   country_192 id_150 id_151 id_165 id_166 id_183 id_184 id_191 id_192
1:        <NA>   <NA>   <NA>   <NA>   <NA>      1      2   <NA>   <NA>
2:        <NA>   <NA>   <NA>      1      2   <NA>   <NA>   <NA>   <NA>
3:         usa   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>      1      2
4:        <NA>      1      2   <NA>   <NA>   <NA>   <NA>   <NA>   <NA>

#attempt 2 : not correct

 setDT(my_data)
 dcast(my_data, name ~ points, value.var = c("gender", "country"))
    name gender_150 gender_151 gender_165 gender_166 gender_183 gender_184 gender_191 gender_192 country_150 country_151 country_165 country_166 country_183 country_184 country_191
1:  jack       <NA>       <NA>       <NA>       <NA>       male       male       <NA>       <NA>        <NA>        <NA>        <NA>        <NA>         usa         usa        <NA>
2: jason       <NA>       <NA>       male       male       <NA>       <NA>       <NA>       <NA>        <NA>        <NA>         usa         usa        <NA>        <NA>        <NA>
3:   jim       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       male       male        <NA>        <NA>        <NA>        <NA>        <NA>        <NA>         usa
4:  john       male       male       <NA>       <NA>       <NA>       <NA>       <NA>       <NA>         usa         usa        <NA>        <NA>        <NA>        <NA>        <NA>
   country_192
1:        <NA>
2:        <NA>
3:         usa
4:        <NA>

#attempt 3 - not correct:

 setDT(my_data)
dcast(my_data, name ~ points, value.var = c("id"))
    name  150  151  165  166  183  184  191  192
1:  jack <NA> <NA> <NA> <NA>    1    2 <NA> <NA>
2: jason <NA> <NA>    1    2 <NA> <NA> <NA> <NA>
3:   jim <NA> <NA> <NA> <NA> <NA> <NA>    1    2
4:  john    1    2 <NA> <NA> <NA> <NA> <NA> <NA>

有没有人能教我怎么解决这个问题?为什么会有这么多?是否可以拥有我所显示的最终表格(即MY_DATA_1)?是否可以重命名NAME_POINTS格式的变量(例如John_Points)?

谢谢

推荐答案

我会使用tidyr,因为从长格式更改为宽格式非常简单。

library(tidyr)
wide = my_data %>% 
  tidyr::spread(name, points)

结果

  id gender country jack jason jim john
1  1   male     usa  183   165 191  150
2  2   male     usa  184   166 192  151

这篇关于R:NA和DCAST的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆