干净的URL与BeautifulSoup [英] Clean URL with BeautifulSoup
本文介绍了干净的URL与BeautifulSoup的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的剧本
import BeautifulSoup as bs
from BeautifulSoup import BeautifulSoup
url_list = sys.argv[1]
urls = [tag['href'] for tag in
BeautifulSoup(open(url_list)).findAll('a')]
返回
[u'http://www.youtube.com/watch?v=Gg81zi0pheg', u'http://www.youtube.com/watch?v=pP9VjGmmhfo', u'http://www.youtube.com/watch?v=yTA1u6D1fyE', u'http://www.youtube.com/watch?v=4v8HvQf4fgE', u'http://www.youtube.com/watch?v=e9zG20wQQ1U', u'http://www.youtube.com/watch?v=khL4s2bvn-8', u'http://www.youtube.com/watch?v=XTndQ7bYV0A', u'http://www.youtube.com/watch?v=xTT2MqgWRRc', u'http://www.youtube.com/watch?v=J2ZYQngwSUw', u'http://www.youtube.com/watch?v=9RZwvg7unrU', u'http://www.youtube.com/watch?v=vz3qOYWwm10', u'http://www.youtube.com/watch?v=yarv52QX_Yw', u'http://www.youtube.com/watch?v=LRREY1H3GCI']
我想它返回这样的:
I would like it to return this:
http://www.youtube.com/watch?v=Gg81zi0pheg
http://www.youtube.com/watch?v=pP9VjGmmhfo
http://www.youtube.com/watch?v=yTA1u6D1fyE
http://www.youtube.com/watch?v=4v8HvQf4fgE
http://www.youtube.com/watch?v=e9zG20wQQ1U
http://www.youtube.com/watch?v=khL4s2bvn-8
http://www.youtube.com/watch?v=XTndQ7bYV0A
http://www.youtube.com/watch?v=xTT2MqgWRRc
http://www.youtube.com/watch?v=J2ZYQngwSUw
http://www.youtube.com/watch?v=9RZwvg7unrU
http://www.youtube.com/watch?v=vz3qOYWwm10
http://www.youtube.com/watch?v=yarv52QX_Yw
http://www.youtube.com/watch?v=LRREY1H3GCI
我有一个很艰难的时期我的包裹周围BeautifulSoup头。什么会有所帮助。感谢您的时间。
I am having a really hard time wrapping my head around BeautifulSoup. Anything would help. Thank you for your time.
推荐答案
但是,这完全是普通的Python。你得到一个列表,并希望每行输出它的网址。
But this is completely basic Python. You're getting a list, and you want to output it one URL per line.
for url in urls:
print url
这篇关于干净的URL与BeautifulSoup的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文