从价值HREF源蟒蛇提取物ID [英] python extract id value from href source

查看:118
本文介绍了从价值HREF源蟒蛇提取物ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经成功地提取HREF URI的使用beautifulsoup从页面的源代码,但是我现在想提取下面的例子中的多个实例的UID值:

例如

 < A HREF =?test.html的UID = 5444974>
<?test.html的UID = 5444972A HREF =>
<?test.html的UID = 54444972A HREF =>

帮助将不胜AP preciated!


解决方案

 >>> HTML
'< A HREF =?test.html的UID = 5444974> \\ n< A HREF =?test.html的UID = 5444972> \\ n< A HREF =?test.html的UID = 54444972&GT ;'
>>>汤= BeautifulSoup(HTML)
>>>屁股= soup.find_all('A')
>>> R = re.compile('UID =(\\ D +))
>>>的uid = []
>>>一个在屁股:
... uids.append(r.search(一个['的href'])。组(1))
...
>>>的UID
['5444974','5444972','54444972']
>>>

I've managed to extract the href URI's using beautifulsoup from the source of the page, however I now want to extract the UID value from multiple instances of the example below:

e.g

<a href="test.html?uid=5444974">
<a href="test.html?uid=5444972">
<a href="test.html?uid=54444972">

Help would be greatly appreciated!

解决方案

>>> html
'<a href="test.html?uid=5444974">\n<a href="test.html?uid=5444972">\n<a href="test.html?uid=54444972">'
>>> soup = BeautifulSoup(html)
>>> ass = soup.find_all('a')
>>> r = re.compile('uid=(\d+)')
>>> uids = []
>>> for a in ass:
...     uids.append(r.search(a['href']).group(1))
... 
>>> uids
['5444974', '5444972', '54444972']
>>> 

这篇关于从价值HREF源蟒蛇提取物ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆