R - 如何在数据框的其他列的一列中搜索字符串 [英] R - How to search for a string in one column in other columns of a data frame

查看:182
本文介绍了R - 如何在数据框的其他列的一列中搜索字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张表,称它为df,带有3列,第一个是产品的标题,第二个是产品的描述,第三个是一个字的字符串。我需要做的是在整个表上运行一个操作,创建2个新列(称为exists_in_title和exists_in_description),它们具有1或0,表示第3列存在于第1列或第2列。我需要它只是一个1:1操作,所以例如调用行1'A',我需要检查单元格A3是否存在于A1中,并使用该数据创建列
exists_in_title,以及然后检查A2中是否存在A3,并使用该数据创建列exists_in_description。然后移动到B行并进行相同的操作。我有数千行的数据,所以在一个时间点做一些这样做是不现实的,为每一行编写单个函数,绝对需要一个函数或方法,一次通过表中的每一行。



我已经玩过grepl,pmatch,str_count,但似乎没有真正做我需要的。我认为grepl可能是最接近我需要的,这里是一个我写的2行代码的例子,逻辑上我想要他们,但似乎没有工作:

  df $ exists_in_title<  -  grepl(df $ A3,df $ A1)

df $ exists_in_description< - grepl(df $ A3, df $ A2)

然而,当我运行这些消息时,我会得到以下消息,这让我相信没有正常工作:参数'模式'长度> 1,只有第一个元素将被使用



任何帮助如何做到这一点将不胜感激。谢谢!

解决方案

grepl 将与 mapply



样本数据框:

 标题<  -  c('鸡蛋和培根',香肠饼干,煎饼)
描述< - c('炒鸡蛋和浓缩培根','自制饼干与早餐pattie'
关键字< - c('bacon','sausage','sourdough')
df< - data.frame(title,description,keyword,stringsAsFactors = F)

使用 grepl搜索匹配

  df $ exists_in_title<  -  mapply(grepl,pattern = df $ keyword,x = df $ title)
df $ exists_in_description< - mapply(grepl,pattern = df $ keyword,x = df $ description)

结果:

 标题描述关键字exists_in_title exists_in_description 
1个鸡蛋和培根炒鸡蛋和浓缩培根培根真正的
2香肠饼干自制饼干与早餐pattie香肠TRUE FALSE
3煎饼堆的酸面团煎饼sourdough FALSE TRUE


I have a table, call it df, with 3 columns, the 1st is the title of a product, the 2nd is the description of a product, and the third is a one word string. What I need to do is run an operation on the entire table, creating 2 new columns (call them 'exists_in_title' and 'exists_in_description') that have either a 1 or 0 indicating if the 3rd column exists in either the 1st or 2nd column. I need it to simply be a 1:1 operation, so for example, calling row 1 'A', I need to check if the cell A3, exists in A1, and use that data to create column exists_in_title, and then check if A3 exists in A2, and use that data to create the column exists_in_description. Then move on to row B and go through the same operation. I have thousands of rows of data so it's not realistic to do these in a 1 at a time fashion, writing individual functions for each row, definitely need a function or method that will run through every row in the table in one shot.

I've played around with grepl, pmatch, str_count but none seem to really do what I need. I think grepl is probably the closest to what I need, here's an example of 2 lines of code I wrote that logically do what I would want them to, but didn't seem to work:

df$exists_in_title <- grepl(df$A3, df$A1)

df$exists_in_description <- grepl(df$A3, df$A2)

However when I run those I get the following message, which leads me to believe it did not work properly: "argument 'pattern' has length > 1 and only the first element will be used"

Any help on how to do this would be greatly appreciated. Thanks!

解决方案

grepl will work with mapply:

Sample data frame:

title <- c('eggs and bacon','sausage biscuit','pancakes')
description <- c('scrambled eggs and thickcut bacon','homemade biscuit with breakfast pattie', 'stack of sourdough pancakes')
keyword <- c('bacon','sausage','sourdough')
df <- data.frame(title, description, keyword, stringsAsFactors=F)

Searching for matches using grepl:

df$exists_in_title <- mapply(grepl, pattern=df$keyword, x=df$title)
df$exists_in_description <- mapply(grepl, pattern=df$keyword, x=df$description)

And the results:

        title                            description   keyword exists_in_title exists_in_description
1  eggs and bacon      scrambled eggs and thickcut bacon     bacon            TRUE                  TRUE
2 sausage biscuit homemade biscuit with breakfast pattie   sausage            TRUE                 FALSE
3        pancakes            stack of sourdough pancakes sourdough           FALSE                  TRUE

这篇关于R - 如何在数据框的其他列的一列中搜索字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆