R:将字符向量(json)中的整数转换为多个布尔列 [英] R: convert integers in a character vector (json) to multiple boolean columns

查看:259
本文介绍了R:将字符向量(json)中的整数转换为多个布尔列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我实际上有一个包含2000行(不同日期)的数据框,每行包含一个字符"vector",其中包含有关30种不同技能的二进制信息.如果已使用该技能,则其编号将显示在引导程序中.但是为了简化:
如果我有一个包含10个不同技能的3个观察值(3天)的数据框-命名为"S_total":
S_total= [1,3,7,8,9,10], [5,9], []和变量Day= 1,2,3 我想构造一个3行12列的数据框
列为:Day,S_total,,s1,s,2,s3,s4,s5,s6,s7,s8,s9,s10其中编号的变量的格式可以为true/false.

I actually have a data frame with 2000 rows (different days), each row contains a character "vector" containing binary info on 30 different skills. If the skill has been used its number appear in the vector. But to simplify:
If I have a data frame with 3 observations (3 days) of 10 different skills -named "S_total":
S_total= [1,3,7,8,9,10], [5,9], [], and a variable Day= 1,2,3 I'd like to construct a dataframe with 3 rows and 12 columns
The columns being: Day,S_total,,s1,s,2,s3,s4,s5,s6,s7,s8,s9,s10 Where the numbered variables could be of the format true/false.

我已经想到了as.numeric(read.csv)的方向,然后想到了一个包含cbindfor循环.
但是必须有更好的方法吗?整齐的诗句?我希望有人能演示:正则表达式和Map命令

I have thought in the direction of as.numeric(read.csv) and then a for-loop containing cbind.
But there must be a better way ? tidy verse? I could hope for someone demonstrating: regular expression and the Map-command

推荐答案

您可以简单地使用dataFrame$newColumndataFrame[, "newColum]添加新列.然后,您可以使用grepl来测试在向量dataFrame$S_total中是否发现了一项技能.例如

You can simply add a new column by either using dataFrame$newColumn or dataFrame[, "newColum]. Then you can use grepl to test if a skill is found in the vector dataFrame$S_total. e.g.

dataFrame[, "1"] <- grepl("1", dataFrame$S_total)

要获得数据集中出现的所有不同技能,可以将字符向量拆分为单个数字,然后使用唯一.然后,您可以遍历所有不同的技能,并为每种技能创建一个新列:

To get all different skills that occur in the dataset, you can split the character vectors into single numbers and then use unique. Then you can loop through all different skills and create one new column for each skill:

 > dataFrame <- data.frame(S_total = c(toString(c(1,3,7,8,11,20)),  toString(c(5,12)), ""),
    +                         Day = c(1,2,3),
    +                         stringsAsFactors = FALSE)
    > 
    > dataFrame
                 S_total Day
    1 1, 3, 7, 8, 11, 20   1
    2              5, 12   2
    3                      3
    > 
    > allSkill <- sort(unique(unlist(strsplit(dataFrame$S_total, ", "))))
    > for(i in allSkill){
    +   dataFrame[, i] <- grepl(i, dataFrame$S_total)
    + }
    > dataFrame
                 S_total Day     1    11    12    20     3     5     7     8
    1 1, 3, 7, 8, 11, 20   1  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE
    2              5, 12   2  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
    3                      3 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

如果您的数据集不那么大,那就可以了.如果您有很大的集合并且性能很重要,则可以先创建空列,然后遍历它们以提高性能

If your dataset is not that large, this will do it. If you have a very large set and performance is important, you can first create empty columns and then loop through them which increases performance see.

我认为无需使用地图或任何tidyverse软件包.

No need to use map or any of the tidyverse packages in my opinion.

这篇关于R:将字符向量(json)中的整数转换为多个布尔列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆