在“ /”后提取文字在数据框列中 [英] Extract text after "/" in a data frame column
问题描述
我有一个数据框,其中有两列 Link
和 Value
。 链接
列的值类似于 abcd.com/efgh/ijkl/mnop,它是一个URL。此帧中有10,000行,我是从100,000行的样本中提取的。
I have a data frame that has two columns Link
and Value
. The Link
column has values like "abcd.com/efgh/ijkl/mnop" and is a URL. There are 10,000 rows in this frame which i have taken from a sample of 100,000 rows.
现在,我想从左到右提取最后一个 /之后的数据或从右到左的第一个 /。因此,例如在上面显示的示例中,我要提取 mnop
Now I want to extract the data after the last "/" from left to right or first "/" from right to left. So for eg in the above sample shown I was to extract "mnop"
我想对列<$ c $中的所有10,000行执行此操作c>链接,而值
列不应受到影响。
I want to do this for all the 10,000 rows that is there in the column Link
while the Value
column should not be effected.
我能够
a = sapply(webdatatest, substring, 36)
但这不是动态方法,因为最后一个 /的位置会改变。
but this is not a dynamic method as positions of last "/" would change. Also this was effecting the second column also.
因此,对此需要一些帮助。
So need some help on this.
推荐答案
尝试 basename()
。
删除直到最后一个路径分隔符(如果有)的所有路径。
removes all of the path up to and including the last path separator (if any).
basename("abcd.com/efgh/ijkl/mnop")
# [1] "mnop"
它是矢量化的,因此您只需将整列粘贴在那里。
It is vectorized, so you can just stick the whole column in there.
basename(rep("abcd.com/efgh/ijkl/mnop", 3))
# [1] "mnop" "mnop" "mnop"
因此,请将其应用于一列链接$ c数据框
webdata
的$ c>,您只需完成
So, to apply this to one column link
of a data frame webdata
, you can simply do
webdata$link <- basename(webdata$link)
另一个明显的功能是 sub()
,但我认为 basename()
可以解决问题,而且更容易。
The other obvious function would be sub()
, but I think basename()
will do the trick and it's easier.
sub(".*/", "", rep("abcd.com/efgh/ijkl/mnop", 3))
这篇关于在“ /”后提取文字在数据框列中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!