R在数据框架列上应用用户定义功能 [英] R apply user define function on data frame columns

查看:149
本文介绍了R在数据框架列上应用用户定义功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个函数定义来计算2个字符串之间的交集:

in R I have a function define to calculate intersection between 2 strings:

containedin <- function(t1,t2){
  return length(Reduce(intersect, strsplit(c(t1,t2), "\\s+"))) 
}

我想将此函数应用于包含2个字符串列的数据框:
data.selected [c('keywords','title')]

I want to apply this function on a data frame that contains 2 string columns: data.selected[c('keywords','title')]

keywords                                                                             title
1  Samsung UN48H6350 48" Samsung UN48H6350 48" Full 1080p Smart HDTV 120Hz with Wi-Fi +$50 Visa Gift Card
2  Samsung UN48H6350 48"     Samsung UN48H6350 48" Full HD Smart LED TV -Bundle- (See Below for Contents)
3  Samsung UN48H6350 48"      Samsung UN48H6350 48" Class Full HD Smart LED TV -BUNDLE- See below Details
4  Samsung UN48H6350 48"     Samsung UN48H6350 48" Full HD Smart LED TV With BD-H5100 Blu-ray Disc Player
5  Samsung UN48H6350 48"                 Samsung UN48H6350 48" Smart 1080p Clear Motion Rate 240 LED HDTV
6  Samsung UN48H6350 48"            Samsung UN48H6350 - 48-Inch Full HD 1080p Smart HDTV 120Hz with Wi-Fi
7  Samsung UN48H6350 48"               Samsung 6350 Series UN48H6350 48" 1080p HD LED LCD Internet TV NEW
8  Samsung UN48H6350 48"  Samsung Un48h6350af 75" 1080p Led-lcd Tv - 16:9 - Hdtv 1080p - (un75h6350afxza)
9  Samsung UN48H6350 48"                         Samsung UN48H6350 - 48" HD 1080p Smart HDTV 120Hz Bundle
10 Samsung UN48H6350 48"   Samsung UN48H6350 - 48-Inch Full HD 1080p Smart HDTV 120Hz with Wi-Fi, (R#416)

如何使用apply函数应用于这2列,以返回一个新的列,其结果为

How do I use the apply function to be applied on these 2 columns, to return a new column with the result ?

推荐答案

首先,你的返回语句应该给你一个错误。您可能意味着

First of all, your return statement should really give you an error. You probably mean

containedin <- function(t1,t2){
  length(Reduce(intersect, strsplit(c(t1,t2), "\\s+"))) 
}

无论如何,您可以使用 mapply 来解决您的问题。

Anyway, you can use mapply to solve your problem.

mapply(containedin, 
       as.character(data.selected[, 'keywords']), 
       as.character(data.selected[, 'title']))

as.character 只有在 class(data.selected [,'keywords'])因子(而不是字符

这篇关于R在数据框架列上应用用户定义功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆