匹配的字符串遍历多列 [英] Matching strings loop over multiple columns

查看:57
本文介绍了匹配的字符串遍历多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个开放式调查的数据.我有一个注释表和一个代码表.代码表是一组主题或字符串.

I have data from an open ended survey. I have a comments table and a codes table. The codes table is a set of themes or strings.

我正在尝试做的事情:在开放式注释中检查代码表中相关列中是否存在单词/字符串.在注释表中为特定主题添加新列,并使用二进制1或0表示已标记了哪些记录.

What I am trying to do: Check to see if a word / string exists from the relevant column in the codes table is in an open ended comment. Add a new column in the comments table for the specific theme and a binary 1 or 0 to denote what records have been tagged.

在代码表中有很多列,这些列是实时变化的,列顺序和列数可能会发生变化.

There are quite a number of columns in the codes table, these are live and ever changing, column orders and number of columns subject to change.

我目前正在以一种相当复杂的方式来执行此操作,我正在用多行代码分别检查每一列,并且我认为可能有更好的方法来执行此操作.

I am currently doing this in a rather convoluted way, I am checking each column individually with multiple lines of code and I reckon there is likely a much better way of doing it.

我不知道如何使之适用于stringi函数.

I can't figure out how to get lapply to work with the stringi function.

非常感谢您的帮助.

这是一组示例代码,因此您可以看到我要执行的操作:

Here is an example set of code so you can see what I am trying to do:

#Two tables codes and comments
#codes table
codes <- structure(
  list(
    Support = structure(
      c(2L, 3L, NA),
      .Label = c("",
                 "help", "questions"),
      class = "factor"
    ),
    Online = structure(
      c(1L,
        3L, 2L),
      .Label = c("activities", "discussion board", "quiz"),
      class = "factor"
    ),
    Resources = structure(
      c(3L, 2L, NA),
      .Label = c("", "pdf",
                 "textbook"),
      class = "factor"
    )
  ),
  row.names = c(NA,-3L),
  class = "data.frame"
)
#comments table
comments <- structure(
  list(
    SurveyID = structure(
      1:5,
      .Label = c("ID_1", "ID_2",
                 "ID_3", "ID_4", "ID_5"),
      class = "factor"
    ),
    Open_comments = structure(
      c(2L,
        4L, 3L, 5L, 1L),
      .Label = c(
        "I could never get the pdf to download",
        "I didn’t get the help I needed on time",
        "my questions went unanswered",
        "staying motivated to get through the textbook",
        "there wasn’t enough engagement in the discussion board"
      ),
      class = "factor"
    )
  ),
  class = "data.frame",
  row.names = c(NA,-5L)
)

#check if any words from the columns in codes table match comments

#here I am looking for a match column by column but looking for a better way - lappy?

support = paste(codes$Support, collapse = "|")
supp_stringi = stri_detect_regex(comments$Open_comments, support)
supp_grepl = grepl(pattern = support, x = comments$Open_comments)
identical(supp_stringi, supp_grepl)
comments$Support = ifelse(supp_grepl == TRUE, 1, 0)

# What I would like to do is loop through all columns in codes rather than outlining the above code for each column in codes

推荐答案

这里是一种将 string :: stri_detect_regex() lapply()一起使用的方法的TRUE = 1,FALSE = 0取决于注释中是否包含 Support Online Resources 向量中的任何单词,以及将此数据与注释合并回去.

Here is an approach that uses string::stri_detect_regex() with lapply() to create vectors of TRUE = 1, FALSE = 0 depending on whether any of the words in the Support, Online or Resources vectors are in the comments, and merges this data back with the comments.

# build data structures from OP

resultsList <- lapply(1:ncol(codes),function(x){
     y <- stri_detect_regex(comments$Open_comments,paste(codes[[x]],collapse = "|"))
     ifelse(y == TRUE,1,0)   
     })

results <- as.data.frame(do.call(cbind,resultsList))
colnames(results) <- colnames(codes)
mergedData <- cbind(comments,results)
mergedData

...以及结果.

> mergedData
  SurveyID                                          Open_comments Support Online
1     ID_1                 I didn’t get the help I needed on time       1      0
2     ID_2          staying motivated to get through the textbook       0      0
3     ID_3                           my questions went unanswered       1      0
4     ID_4 there wasn’t enough engagement in the discussion board       0      1
5     ID_5                  I could never get the pdf to download       0      0
  Resources
1         0
2         1
3         0
4         0
5         1
> 

这篇关于匹配的字符串遍历多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆