在最后一个正斜杠之前删除部分字符串 [英] Remove Part of String Before the Last Forward Slash
问题描述
我目前正在开发的程序从网站检索 URL 并将它们放入列表中.我想得到的是 URL 的最后一部分.
因此,如果我的 URL 列表中的第一个元素是 "https://docs.python.org/3.4/tutorial/interpreter.html"
我想删除 之前的所有内容>interpreter.html"
.
是否有我可以使用的函数、库或正则表达式来实现这一点?我看过其他 Stack Overflow 帖子,但这些解决方案似乎不起作用.
这是我多次尝试中的两次:
用于链接列表中的链接:file_names.append(link.replace('/[^/]*$',''))打印(文件名)
&
用于链接列表中的链接:file_names.append(link.rpartition('//')[-1])打印(文件名)
看看
小注 - link.rpartition('//')[-1]
在你的代码中的问题是你试图匹配 //
而不是 /
.所以删除额外的 /
就像 link.rpartition('/')[-1]
一样.
The program I am currently working on retrieves URLs from a website and puts them into a list. What I want to get is the last section of the URL.
So, if the first element in my list of URLs is "https://docs.python.org/3.4/tutorial/interpreter.html"
I would want to remove everything before "interpreter.html"
.
Is there a function, library, or regex I could use to make this happen? I've looked at other Stack Overflow posts but the solutions don't seem to work.
These are two of my several attempts:
for link in link_list:
file_names.append(link.replace('/[^/]*$',''))
print(file_names)
&
for link in link_list:
file_names.append(link.rpartition('//')[-1])
print(file_names)
Have a look at str.rsplit
.
>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rsplit('/',1)
['https://docs.python.org/3.4/tutorial', 'interpreter.html']
>>> s.rsplit('/',1)[1]
'interpreter.html'
And to use RegEx
>>> re.search(r'(.*)/(.*)',s).group(2)
'interpreter.html'
Then match the 2nd group which lies between the last /
and the end of String. This is a greedy usage of the greedy technique in RegEx.
Small Note - The problem with link.rpartition('//')[-1]
in your code is that you are trying to match //
and not /
. So remove the extra /
as in link.rpartition('/')[-1]
.
这篇关于在最后一个正斜杠之前删除部分字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!