R - 如何在数据框的其他列的一列中搜索字符串 [英] R - How to search for a string in one column in other columns of a data frame
问题描述
exists_in_title,以及然后检查A2中是否存在A3,并使用该数据创建列exists_in_description。然后移动到B行并进行相同的操作。我有数千行的数据,所以在一个时间点做一些这样做是不现实的,为每一行编写单个函数,绝对需要一个函数或方法,一次通过表中的每一行。
我已经玩过grepl,pmatch,str_count,但似乎没有真正做我需要的。我认为grepl可能是最接近我需要的,这里是一个我写的2行代码的例子,逻辑上我想要他们,但似乎没有工作:
df $ exists_in_title< - grepl(df $ A3,df $ A1)
df $ exists_in_description< - grepl(df $ A3, df $ A2)
然而,当我运行这些消息时,我会得到以下消息,这让我相信没有正常工作:参数'模式'长度> 1,只有第一个元素将被使用
任何帮助如何做到这一点将不胜感激。谢谢!
grepl
将与 mapply
:
样本数据框:
标题< - c('鸡蛋和培根',香肠饼干,煎饼)
描述< - c('炒鸡蛋和浓缩培根','自制饼干与早餐pattie'
关键字< - c('bacon','sausage','sourdough')
df< - data.frame(title,description,keyword,stringsAsFactors = F)
使用 grepl搜索匹配
:
df $ exists_in_title< - mapply(grepl,pattern = df $ keyword,x = df $ title)
df $ exists_in_description< - mapply(grepl,pattern = df $ keyword,x = df $ description)
结果:
标题描述关键字exists_in_title exists_in_description
1个鸡蛋和培根炒鸡蛋和浓缩培根培根真正的
2香肠饼干自制饼干与早餐pattie香肠TRUE FALSE
3煎饼堆的酸面团煎饼sourdough FALSE TRUE
I have a table, call it df, with 3 columns, the 1st is the title of a product, the 2nd is the description of a product, and the third is a one word string. What I need to do is run an operation on the entire table, creating 2 new columns (call them 'exists_in_title' and 'exists_in_description') that have either a 1 or 0 indicating if the 3rd column exists in either the 1st or 2nd column. I need it to simply be a 1:1 operation, so for example, calling row 1 'A', I need to check if the cell A3, exists in A1, and use that data to create column exists_in_title, and then check if A3 exists in A2, and use that data to create the column exists_in_description. Then move on to row B and go through the same operation. I have thousands of rows of data so it's not realistic to do these in a 1 at a time fashion, writing individual functions for each row, definitely need a function or method that will run through every row in the table in one shot.
I've played around with grepl, pmatch, str_count but none seem to really do what I need. I think grepl is probably the closest to what I need, here's an example of 2 lines of code I wrote that logically do what I would want them to, but didn't seem to work:
df$exists_in_title <- grepl(df$A3, df$A1)
df$exists_in_description <- grepl(df$A3, df$A2)
However when I run those I get the following message, which leads me to believe it did not work properly: "argument 'pattern' has length > 1 and only the first element will be used"
Any help on how to do this would be greatly appreciated. Thanks!
grepl
will work with mapply
:
Sample data frame:
title <- c('eggs and bacon','sausage biscuit','pancakes')
description <- c('scrambled eggs and thickcut bacon','homemade biscuit with breakfast pattie', 'stack of sourdough pancakes')
keyword <- c('bacon','sausage','sourdough')
df <- data.frame(title, description, keyword, stringsAsFactors=F)
Searching for matches using grepl
:
df$exists_in_title <- mapply(grepl, pattern=df$keyword, x=df$title)
df$exists_in_description <- mapply(grepl, pattern=df$keyword, x=df$description)
And the results:
title description keyword exists_in_title exists_in_description
1 eggs and bacon scrambled eggs and thickcut bacon bacon TRUE TRUE
2 sausage biscuit homemade biscuit with breakfast pattie sausage TRUE FALSE
3 pancakes stack of sourdough pancakes sourdough FALSE TRUE
这篇关于R - 如何在数据框的其他列的一列中搜索字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!