将空格分隔的条目拆分为R中的新列 [英] splitting space delimited entries into new columns in R

查看:295
本文介绍了将空格分隔的条目拆分为R中的新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在对输出.csv文件的调查进行编码.在此csv中,我有一些用空格分隔的条目,它们代表多选问题(例如,具有多个回答的问题).最后,我想将这些以空格分隔的条目解析为它们自己的列,并为它们创建标头,以便我知道它们来自何处.

I am coding a survey that outputs a .csv file. Within this csv I have some entries that are space delimited, which represent multi-select questions (e.g. questions with more than one response). In the end I want to parse these space delimited entries into their own columns and create headers for them so i know where they came from.

例如,我可以以此开头(请注意,多选列后面有一个_M):

For example I may start with this (note that the multiselect columns have an _M after them):

Q1, Q2_M, Q3, Q4_M
6, 1 2 88, 3, 3 5 99
6, , 3, 1 2

我想转到这里:

Q1, Q2_M_1, Q2_M_2, Q2_M_88, Q3, Q4_M_1, Q4_M_2, Q4_M_3, Q4_M_5, Q4_M_99
6, 1, 1, 1, 3, 0, 0, 1, 1, 1
6,,,,3,1,1,0,0,0

我认为这是一个相对较常见的问题,但我无法在R部分找到它.有什么想法在导入.csv之后如何在R中执行此操作?我的一般想法(通常会导致程序效率低下)是,我可以: (1)使用grep()拉出具有特殊后缀的列号 (2)循环浏览(或使用应用)这些列中的每个条目,并确定响应级别,然后相应地创建列 (3)循环浏览(或使用Apply)并将指示器放置在适当的列中,以指示选择的存在

I imagine this is a relatively common issue to deal with but I have not been able to find it in the R section. Any ideas how to do this in R after importing the .csv ? My general thoughts (which often lead to inefficient programs) are that I can: (1) pull column numbers that have the special suffix with grep() (2) loop through (or use an apply) each of the entries in these columns and determine the levels of responses and then create columns accordingly (3) loop through (or use an apply) and place indicators in appropriate columns to indicate presence of selection

感谢您的帮助,如果不清楚,请告诉我.

I appreciate any help and please let me know if this is not clear.

推荐答案

我同意ran2和aL3Xa,您可能希望更改数据格式,以使每个可能的响应都具有不同的列.但是,如果将数据集调整为更好的格式被证明是有问题的,则可以按照您的要求进行操作.

I agree with ran2 and aL3Xa that you probably want to change the format of your data to have a different column for each possible reponse. However, if you munging your dataset to a better format proves problematic, it is possible to do what you asked.

process_multichoice <- function(x) lapply(strsplit(x, " "), as.numeric)

q2 <- c("1 2 3 NA 4", "2 5")
processed_q2 <- process_multichoice(q2)
[[1]]
[1]  1  2  3 NA  4

[[2]]
[1] 2 5

之所以建议使用不同的列来表示不同的响应,是因为尝试以这种形式从数据中检索任何统计信息仍然非常不愉快.尽管您可以做类似的事情

The reason different columns for different responses are suggested is because it is still quite unpleasant trying to retrieve any statistics from the data in this form. Although you can do things like

# Number of reponses given
sapply(processed_q2, length)

#Frequency of each response
table(unlist(processed_q2), useNA = "ifany")

另外一条建议.将处理数据的代码与分析数据的代码分开.如果创建任何图形,请再次保留用于创建它们的代码.我一直在将事物混合在一起,这并不漂亮. (尤其是六个月后返回代码时.)

One more piece of advice. Keep the code that processes your data separate from the code that analyses it. If you create any graphs, keep the code for creating them separate again. I've been down the road of mixing things together, and it isn't pretty. (Especially when you come back to the code six months later.)

这篇关于将空格分隔的条目拆分为R中的新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆