如何在python中将url字符串拆分为单独的部分? [英] How can I split a url string up into separate parts in Python?

查看:255
本文介绍了如何在python中将url字符串拆分为单独的部分?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我决定今晚将学习python :) 我非常了解C(在其中编写了一个OS),所以我在编程方面不是菜鸟,所以python中的所有内容看起来都很简单,但是我不知道如何解决这个问题: 假设我有这个地址:

I decided that I'll learn python tonight :) I know C pretty well (wrote an OS in it) so I'm not a noob in programming so everything in python seems pretty easy, but I don't know how to solve this problem : let's say I have this address:

http://example.com/random/folder/path.html 现在如何从中创建两个字符串,其中一个包含服务器的基本"名称,因此在此示例中将是 http://example.com/ 另一个包含没有最后文件名的东西,因此在此示例中为 http://example.com/random/folder/ . 我当然也知道分别找到第三个和最后一个斜杠的可能性,但也许您知道更好的方法:] 同样在两种情况下都带有斜杠也很酷,但我不在乎,因为可以轻松添加它. 因此,有人对此有一个良好,快速,有效的解决方案吗?还是只有我的"解决方案可以找到斜线?

http://example.com/random/folder/path.html Now how can I create two strings from this, one containing the "base" name of the server, so in this example it would be http://example.com/ and another containing the thing without the last filename, so in this example it would be http://example.com/random/folder/ . Also I of course know the possibility to just find the 3rd and last slash respectively but maybe you know a better way :] Also it would be cool to have the trailing slash in both cases but I don't care since it can be added easily. So anyone has a good, fast, effective solution for this? Or is there only "my" solution, finding the slashes?

谢谢!

推荐答案

python 2.x中的urlparse模块(或python 3.x中的urllib.parse)将是这样做的方法.

The urlparse module in python 2.x (or urllib.parse in python 3.x) would be the way to do it.

>>> from urllib.parse import urlparse
>>> url = 'http://example.com/random/folder/path.html'
>>> parse_object = urlparse(url)
>>> parse_object.netloc
'example.com'
>>> parse_object.path
'/random/folder/path.html'
>>> parse_object.scheme
'http'
>>>

如果要在url下的文件路径上做更多工作,可以使用posixpath模块:

If you wanted to do more work on the path of the file under the url, you can use the posixpath module :

>>> from posixpath import basename, dirname
>>> basename(parse_object.path)
'path.html'
>>> dirname(parse_object.path)
'/random/folder'

然后,您可以使用posixpath.join将零件粘合在一起.

After that, you can use posixpath.join to glue the parts together.

我完全忘记了Windows用户会阻塞os.path中的路径分隔符.我阅读了posixpath模块文档,它特别引用了URL操作,所以一切都很好.

I totally forgot that windows users will choke on the path separator in os.path. I read the posixpath module docs, and it has a special reference to URL manipulation, so all's good.

这篇关于如何在python中将url字符串拆分为单独的部分?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆