获取使用BeautifulSoup属性值 [英] Getting attribute's value using BeautifulSoup
问题描述
我正在写一个python脚本从网页解析后,将提取的脚本的位置。
比方说,有两种情况:
I'm writing a python script which will extract the script locations after parsing from a webpage. Lets say there are two scenarios :
<script type="text/javascript" src="http://example.com/something.js"></script>
和
<script>some JS</script>
我能够从所述第二场景获取JS,即当JS在标签内写入。
I'm able to get the JS from the second scenario, that is when the JS is written within the tags.
但有什么办法,我可以从第一个方案中得到的src的值(即如的 http://example.com/something.js )
But is there any way, I could get the value of src from the first scenario (i.e extracting all the values of src tags within script such as http://example.com/something.js)
下面是我的code
#!/usr/bin/python
import requests
from bs4 import BeautifulSoup
r = requests.get("http://rediff.com/")
data = r.text
soup = BeautifulSoup(data)
for n in soup.find_all('script'):
print n
输出:一些JS
输出所需: http://example.com/something.js
推荐答案
它会得到所有的的src
值仅当它们是present。否则,将跳过&LT;脚本&GT;
标签
It will get all the src
values only if they are present. Or else it would skip that <script>
tag
from bs4 import BeautifulSoup
import urllib2
url="http://rediff.com/"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
sources=soup.findAll('script',{"src":True})
for source in sources:
print source['src']
我得到以下两个的src
值结果
http://imworld.rediff.com/worldrediff/js_2_5/ws-global_hm_1.js
http://im.rediff.com/uim/common/realmedia_banner_1_5.js
我想这是你想要的。希望这是有益的。
I guess this is what you want. Hope this is useful.
这篇关于获取使用BeautifulSoup属性值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!