基于R中另一个数据帧中的列填充数据帧中的列 [英] Filling a column in a dataframe based on a column in another dataframe in r

查看:49
本文介绍了基于R中另一个数据帧中的列填充数据帧中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的评论数据框(df1)

I have a dataframe of comments which looks like this(df1)

Comments
Apple laptops are really good for work,we should buy them
Apple Iphones are too costly,we can resort to some other brands
Google search is the best search engine 
Android phones are great these days
I lost my visa card today

我还有另一个商户名称数据框,看起来像这样(df2):

I have another dataframe of merchent names which looks like this(df2):

Merchant_Name
Google
Android
Geoni
Visa
Apple
MC
WallMart

如果df2中的商人名称出现在df 1的注释中,则将商家名称附加到R中df1的第二列。匹配不必是精确匹配。这是所需的近似值。此外,df1包含约50万行!
我的最终输出df可能是这样的

If a merchant_name in df2 appears in a Comment in df 1 ,append that merchant name to the second column in df1 in R.The match need not be an exact match.An approximation is what is required.Also,the df1 contains around 500K rows! My final ooutput df may look like this

Comments                                                        Merchant
Apple laptops are really good for work,we should buy them       Apple
Apple Iphones are too costly,we can resort to some other brands Apple
Google search is the best search engine                         Google
Android phones are great these days                             Android
I lost my visa card today                                       Visa

我该如何在R中高效地做到这一点??
谢谢

How may i do this and efficiently in R.?? Thanks

推荐答案

这是 regex 的工作。在 lapply 内检查 grepl 命令。

This is a job for regex. Check out the grepl command inside the lapply.

comments = c(
   'Apple laptops are really good for work,we should buy them',
   'Apple Iphones are too costly,we can resort to some other brands',
   'Google search is the best search engine ',
   'Android phones are great these days',
   'I lost my visa card today'
)

brands = c(
   'Google',
   'Android',
   'Geoni',
   'Visa',
   'Apple',
   'MC',
   'WallMart'
)

brandinpattern = lapply(
   brands,
   function(brand) {
      commentswithbrand = grepl(x = tolower(comments), pattern = tolower(brand))
      if ( sum(commentswithbrand) > 0) {
         data.frame(
            comment = comments[commentswithbrand],
            brand = brand
         )
      } else {
         data.frame()
      }
   }
)

brandinpattern = do.call(rbind, brandinpattern)


> do.call(rbind, brandinpattern)
                                                          comment   brand
1                        Google search is the best search engine   Google
2                             Android phones are great these days Android
3                                       I lost my visa card today    Visa
4       Apple laptops are really good for work,we should buy them   Apple
5 Apple Iphones are too costly,we can resort to some other brands   Apple

这篇关于基于R中另一个数据帧中的列填充数据帧中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆