从价值HREF源蟒蛇提取物ID [英] python extract id value from href source
本文介绍了从价值HREF源蟒蛇提取物ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我已经成功地提取HREF URI的使用beautifulsoup从页面的源代码,但是我现在想提取下面的例子中的多个实例的UID值:
例如
< A HREF =?test.html的UID = 5444974>
<?test.html的UID = 5444972A HREF =>
<?test.html的UID = 54444972A HREF =>
帮助将不胜AP preciated!
解决方案
>>> HTML
'< A HREF =?test.html的UID = 5444974> \\ n< A HREF =?test.html的UID = 5444972> \\ n< A HREF =?test.html的UID = 54444972&GT ;'
>>>汤= BeautifulSoup(HTML)
>>>屁股= soup.find_all('A')
>>> R = re.compile('UID =(\\ D +))
>>>的uid = []
>>>一个在屁股:
... uids.append(r.search(一个['的href'])。组(1))
...
>>>的UID
['5444974','5444972','54444972']
>>>
I've managed to extract the href URI's using beautifulsoup from the source of the page, however I now want to extract the UID value from multiple instances of the example below:
e.g
<a href="test.html?uid=5444974">
<a href="test.html?uid=5444972">
<a href="test.html?uid=54444972">
Help would be greatly appreciated!
解决方案
>>> html
'<a href="test.html?uid=5444974">\n<a href="test.html?uid=5444972">\n<a href="test.html?uid=54444972">'
>>> soup = BeautifulSoup(html)
>>> ass = soup.find_all('a')
>>> r = re.compile('uid=(\d+)')
>>> uids = []
>>> for a in ass:
... uids.append(r.search(a['href']).group(1))
...
>>> uids
['5444974', '5444972', '54444972']
>>>
这篇关于从价值HREF源蟒蛇提取物ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文