在最后一个正斜杠之前删除部分字符串 [英] Remove Part of String Before the Last Forward Slash

查看:34
本文介绍了在最后一个正斜杠之前删除部分字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在开发的程序从网站检索 URL 并将它们放入列表中.我想得到的是 URL 的最后一部分.

因此,如果我的 URL 列表中的第一个元素是 "https://docs.python.org/3.4/tutorial/interpreter.html" 我想删除 之前的所有内容>interpreter.html".

是否有我可以使用的函数、库或正则表达式来实现这一点?我看过其他 Stack Overflow 帖子,但这些解决方案似乎不起作用.

这是我多次尝试中的两次:

用于链接列表中的链接:file_names.append(link.replace('/[^/]*$',''))打印(文件名)

&

用于链接列表中的链接:file_names.append(link.rpartition('//')[-1])打印(文件名)

解决方案

看看

小注 - link.rpartition('//')[-1] 在你的代码中的问题是你试图匹配 // 而不是 /.所以删除额外的 / 就像 link.rpartition('/')[-1] 一样.

The program I am currently working on retrieves URLs from a website and puts them into a list. What I want to get is the last section of the URL.

So, if the first element in my list of URLs is "https://docs.python.org/3.4/tutorial/interpreter.html" I would want to remove everything before "interpreter.html".

Is there a function, library, or regex I could use to make this happen? I've looked at other Stack Overflow posts but the solutions don't seem to work.

These are two of my several attempts:

for link in link_list:
   file_names.append(link.replace('/[^/]*$',''))
print(file_names)

&

for link in link_list:
   file_names.append(link.rpartition('//')[-1])
print(file_names)

解决方案

Have a look at str.rsplit.

>>> s = 'https://docs.python.org/3.4/tutorial/interpreter.html'
>>> s.rsplit('/',1)
['https://docs.python.org/3.4/tutorial', 'interpreter.html']
>>> s.rsplit('/',1)[1]
'interpreter.html'

And to use RegEx

>>> re.search(r'(.*)/(.*)',s).group(2)
'interpreter.html'

Then match the 2nd group which lies between the last / and the end of String. This is a greedy usage of the greedy technique in RegEx.

Debuggex Demo

Small Note - The problem with link.rpartition('//')[-1] in your code is that you are trying to match // and not /. So remove the extra / as in link.rpartition('/')[-1].

这篇关于在最后一个正斜杠之前删除部分字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆