在python中解析网址并更改其中的一部分 [英] parsing a url in python with changing part in it

查看:442
本文介绍了在python中解析网址并更改其中的一部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在用Python解析一个URL,下面您可以找到一个示例URL和代码,我想要做的是从URL中拆分(74743)并进行一个for循环,该循环将从一部分中获取列表. 尝试使用urlparse,但由于url中的部分已更改,因此无法将其完成到最后.我只是想以最简单,最快的方式做到这一点.

I'm parsing a url in Python, below you can find a sample url and the code, what i want to do is splitting the (74743) from the url and make a for loop which will be taking it from a parts list. Tried to use urlparse but couldn't complete it to the end mostly because of the changing parts in the url. Ijust want the easiest and fastest way to do this.

示例网址:

http://example.com/wps/portal/lYuxDoIwGAYf6f9aqKSjMNQ/?PartNo=74743&IntNumberOf=&is=

( http://example.com/wps/portal )始终固定

(lYuxDoIwGAYf6f9aqKSjMNQ)随时更改

(lYuxDoIwGAYf6f9aqKSjMNQ) Always changing

(74743)将从列表名称零件"中提取

(74743) Will be taken from a list name Parts

(IntNumberOf =& is =)也根据 网站

(IntNumberOf=&is=) Also changing depending on the section of the website

以下是代码:

from lxml import html
import requests
import urlparse


Parts = [74743, 85731, 93021]

url = 'http://example.com/wps/portal/lYuxDoIwGAYf6f9aqKSjMNQ/?PartNo=74743&IntNumberOf=&is='

parsing = urlparse.urlsplit(url)

print parsing

推荐答案

>>> import urlparse

>>> url = 'http://example.com/wps/portal/lYuxDoIwGAYf6f9aqKSjMNQ/?PartNo=74743&IntNumberOf=&is='

>>> split_url = urlparse.urlsplit(url)
>>> split_url.path
'/wps/portal/lYuxDoIwGAYf6f9aqKSjMNQ/'

您可以使用'/'将路径拆分为字符串列表,对列表进行切片,然后重新加入:

You can split the path into a list of strings using '/', slice the list, and re-join:

>>> path = split_url.path
>>> path.split('/')
['', 'wps', 'portal', 'lYuxDoIwGAYf6f9aqKSjMNQ', '']

切掉最后两个:

>>> path.split('/')[:-2]
['', 'wps', 'portal']

然后重新加入:

>>> '/'.join(path.split('/')[:-2])
'/wps/portal'

要解析查询,请使用parse_qs:

To parse the query, use parse_qs:

>>> parsed_query = urlparse.parse_qs(split_url.query)
{'PartNo': ['74743']}

要保留空参数,请使用keep_blank_values=True:

To keep the empty parameters use keep_blank_values=True:

>>> query = urlparse.parse_qs(split_url.query, keep_blank_values=True)
>>> query
{'PartNo': ['74743'], 'is': [''], 'IntNumberOf': ['']}

然后您可以修改查询字典:

You can then modify the query dictionary:

>>> query['PartNo'] = 85731

并更新原始的split_url:

And update the original split_url:

>>> updated = split_url._replace(path='/'.join(base_path.split('/')[:-2] +
                                              ['ASDFZXCVQWER', '']),
                                query=urllib.urlencode(query, doseq=True))

>>> urlparse.urlunsplit(updated)
'http://example.com/wps/portal/ASDFZXCVQWER/?PartNo=85731&IntNumberOf=&is='

这篇关于在python中解析网址并更改其中的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆