获取使用BeautifulSoup属性值 [英] Getting attribute's value using BeautifulSoup

查看:1951
本文介绍了获取使用BeautifulSoup属性值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在写一个python脚本从网页解析后,将提取的脚本的位置。
比方说,有两种情况:

I'm writing a python script which will extract the script locations after parsing from a webpage. Lets say there are two scenarios :

<script type="text/javascript" src="http://example.com/something.js"></script>

<script>some JS</script>

我能够从所述第二场景获取JS,即当JS在标签内写入。

I'm able to get the JS from the second scenario, that is when the JS is written within the tags.

但有什么办法,我可以从第一个方案中得到的src的值(即如的 http://example.com/something.js

But is there any way, I could get the value of src from the first scenario (i.e extracting all the values of src tags within script such as http://example.com/something.js)

下面是我的code

#!/usr/bin/python

import requests 
from bs4 import BeautifulSoup

r  = requests.get("http://rediff.com/")
data = r.text
soup = BeautifulSoup(data)
for n in soup.find_all('script'):
    print n 

输出:一些JS

输出所需 http://example.com/something.js

推荐答案

它会得到所有的的src 值仅当它们是present。否则,将跳过&LT;脚本&GT; 标签

It will get all the src values only if they are present. Or else it would skip that <script> tag

from bs4 import BeautifulSoup
import urllib2
url="http://rediff.com/"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
sources=soup.findAll('script',{"src":True})
for source in sources:
 print source['src']

我得到以下两个的src 值结果

http://imworld.rediff.com/worldrediff/js_2_5/ws-global_hm_1.js
http://im.rediff.com/uim/common/realmedia_banner_1_5.js

我想这是你想要的。希望这是有益的。

I guess this is what you want. Hope this is useful.

这篇关于获取使用BeautifulSoup属性值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆