beautifulsoup 4:分段故障(核心转储) [英] beautifulsoup 4: Segmentation fault (core dumped)

查看:409
本文介绍了beautifulsoup 4:分段故障(核心转储)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我爬到以下页面:

http://www.nasa.gov/topics/地球/功能/平原,龙卷风,20120417.html

不过,我打电话时分段故障(核心转储):BeautifulSoup(page_html),其中page_html是请求库的内容。这是BeautifulSoup的错误吗?有没有什么办法来解决这件事?即使是像接近尝试...除了会帮助我,让我的code运行。先谢谢了。

在code是如下:

 进口要求
从BS4进口BeautifulSouptoy_url ='http://www.nasa.gov/topics/earth/features/plains-tornadoes-20120417.html
RES = requests.get(toy_url,标题= {用户代理:火狐/ 12.0})
页= res.content
汤= BeautifulSoup(页)


解决方案

此问题是由引起在LXML ,该固定在LXML 2.3.5的错误。您可以升级LXML,或使用美丽的汤与html5lib或HTMLParser的解析器。

I crawled the following page:

http://www.nasa.gov/topics/earth/features/plains-tornadoes-20120417.html

But I got Segmentation fault (core dumped) when calling: BeautifulSoup(page_html), where page_html is the content from requests library. Is this a bug for BeautifulSoup? Is there any way to get around with this? Even approach like try...except would help me to get my code running. Thanks in advance.

The code is as following:

import requests
from bs4 import BeautifulSoup

toy_url = 'http://www.nasa.gov/topics/earth/features/plains-tornadoes-20120417.html'
res = requests.get(toy_url,headers={"USER-Agent":"Firefox/12.0"})
page = res.content
soup = BeautifulSoup(page)

解决方案

This problem is caused by a bug in lxml, which is fixed in lxml 2.3.5. You can upgrade lxml, or use Beautiful Soup with the html5lib or the HTMLParser parser.

这篇关于beautifulsoup 4:分段故障(核心转储)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆