标识单列的多个相邻行中的特定模式-R [英] Identifying a specific pattern in several adjacent rows of a single column - R

查看:90
本文介绍了标识单列的多个相邻行中的特定模式-R的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我回来了,我的调查数据.

I'm back with my survey data.

这一次,我需要从数据中删除一组特定的行.在我们的调查(自动电话调查)中,调查工具将在呼叫期间尝试三次,以提示受访者输入答复.在问题三个超时后,调查工具挂断.当呼叫转到某人的语音信箱时,通常会发生这种情况.

This time, I need to remove a specific set of rows from data when they occur. In our survey, an automated telephone survey, the survey tool will attempt three times during that call to prompt the respondent to enter a response. After three timeouts of the question the survey tool hangs up. This mostly happens when the call goes to someone's voicemail.

我想在发生这种情况时对其进行识别,以便将其从计算通话时间中删除.

I would like to identify that pattern when it happens so I can remove it from calculating call time.

我正在寻找的模式在交互"列中如下所示:

The pattern I am looking for looks like this in the Interactions column:

不必介绍.它可以是调查的任何部分,它会提示响应者三次响应,但没有提供响应,因此呼叫失败.但是,必须将其夹在应答"(电话接听)和超时.呼叫失败"之间. (失败).

It doesn't HAVE to be Intro. It can be any part of the survey where it prompting the respondent for a response THREE times but no response is provided so the call fails. But, it does have to be sandwiched in between "Answer" (the phone picks up) and "Timeout. Call failed." (a failure).

我确实尝试将从昨天的解决方案(有关游程长度编码)中学到的知识应用于我的其他索引问题,但我丝毫没有使它起作用.所以,我在这里.

I did try to apply what I learned from yesterday's solution (about run length encoding) to my other indexing question but I couldn't make it work in the slightest. So, here I am.

这是一个示例数据集:

这是15位受访者,调查工具与受访者(实质上是他们的电话)之间的每次互动.

This is 15 respondents and every interaction between the survey tool and the respondent (or their phone, essentially).

以下是数据框的代码:转到Google云端硬盘带有代码的文本编辑器

Here's the code for the dataframe: This goes to a Google Drive text editor with the code

推荐答案

如果我正确理解了问题,则下面的函数将删除具有"Answer"的行和失败值之间的所有行(问题中有3个这样的值).
要查找默认值的列名称为"Interactions",并且第一个答案和失败值也分配了默认值.
请注意,所有匹配指令都区分大小写.

If I understand the question correctly, the function below removes all rows between a row with "Answer" and a failure value (there are 3 such values in the question).
The name of the column to look for defaults to "Interactions", and the first answer and failure values also have defaults assigned.
Note that all match instructions are case sensitive.

removeRows <- function(X, col = "Interaction", 
                       ans = "Answer", 
                       fail = c("Timeout. Call failed.", "Partial", "Enqueueing call"))
{  
  a <- grep(ans, X[[col]])
  f <- which(X[[col]] %in% fail)
  a <- a[findInterval(f, a)]

  for(i in seq_along(a)){
    X[[col]][a[i]:f[i]] <- NA_character_
  }
  Y <- X[complete.cases(X), , drop = FALSE]
  Y
}

removeRows(survey_data)

这篇关于标识单列的多个相邻行中的特定模式-R的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆