在“ /”后提取文字在数据框列中 [英] Extract text after "/" in a data frame column

查看:77
本文介绍了在“ /”后提取文字在数据框列中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中有两列 Link Value 链接列的值类似于 abcd.com/efgh/ijkl/mnop,它是一个URL。此帧中有10,000行,我是从100,000行的样本中提取的。

I have a data frame that has two columns Link and Value. The Link column has values like "abcd.com/efgh/ijkl/mnop" and is a URL. There are 10,000 rows in this frame which i have taken from a sample of 100,000 rows.

现在,我想从左到右提取最后一个 /之后的数据或从右到左的第一个 /。因此,例如在上面显示的示例中,我要提取 mnop

Now I want to extract the data after the last "/" from left to right or first "/" from right to left. So for eg in the above sample shown I was to extract "mnop"

我想对列<$ c $中的所有10,000行执行此操作c>链接,而列不应受到影响。

I want to do this for all the 10,000 rows that is there in the column Link while the Value column should not be effected.

我能够

a = sapply(webdatatest, substring, 36)

但这不是动态方法,因为最后一个 /的位置会改变。

but this is not a dynamic method as positions of last "/" would change. Also this was effecting the second column also.

因此,对此需要一些帮助。

So need some help on this.

推荐答案

尝试 basename()


删除直到最后一个路径分隔符(如果有)的所有路径。

removes all of the path up to and including the last path separator (if any).



basename("abcd.com/efgh/ijkl/mnop")
# [1] "mnop"

它是矢量化的,因此您只需将整列粘贴在那里。

It is vectorized, so you can just stick the whole column in there.

basename(rep("abcd.com/efgh/ijkl/mnop", 3))
# [1] "mnop" "mnop" "mnop"

因此,请将其应用于一列链接 webdata 的$ c>,您只需完成

So, to apply this to one column link of a data frame webdata, you can simply do

webdata$link <- basename(webdata$link)

另一个明显的功能是 sub(),但我认为 basename()可以解决问题,而且更容易。

The other obvious function would be sub(), but I think basename() will do the trick and it's easier.

sub(".*/", "", rep("abcd.com/efgh/ijkl/mnop", 3))

这篇关于在“ /”后提取文字在数据框列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆