美丽的汤没有“得到"完整的网页 [英] Beautiful Soup doesn't 'get' full webpage

查看:72
本文介绍了美丽的汤没有“得到"完整的网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用BeautifulSoup从

I am using BeautifulSoup to parse a bunch of links from this page but it wasn't extracting all the links I wanted it to. To try and figure out why, I downloaded the html to "web_page.html" and ran

soup = BeautifulSoup(open("web_page.html"))
print soup.get_text()

我注意到它不能打印整个网页.它结束于布雷克利.我看了一下html代码,看在'Brackley'上是否发生了一些奇怪的事情,但是我什么也没找到.另外,如果我将另一个链接移到Brackley的位置,它将打印该链接,而不是Brackley.看来它只会读取一定大小的html文件?

I notice that it doesn't print the whole web page. It ends at Brackley. I looked at the html code to see if something weird was happening at 'Brackley' but I couldn't find anything. Plus if I move another link to Brackley's place it will print that and not Brackley. It seems like it will only read a certain size html file?

推荐答案

尝试使用其他解析器.您没有指定一个,因此您可能正在使用默认的html.parser.尝试使用lxmlhtml5lib.

Try using different parsers. You are not specifying one, so you are probably using the default html.parser. Try using lxml or html5lib.

有关更多信息: http://www.crummy .com/software/BeautifulSoup/bs4/doc/#installing-a-parser

这篇关于美丽的汤没有“得到"完整的网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆