Python Beautifulsoup get_text（）没有获取所有文本 [英] Python Beautifulsoup get_text() not getting all text

查看：826 发布时间：2018/6/22 19:36:39 python html python-2.7 beautifulsoup urllib2

本文介绍了Python Beautifulsoup get_text（）没有获取所有文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图使用beautifulsoup get_text（）方法从html标签获取所有文本。我使用Python 2.7和Beautifulsoup 4.4.0。它适用于大多数时间。但是，这种方法有时只能从标签中获得第一段。我无法弄清楚为什么。请参阅以下示例。

  from bs4 import BeautifulSoup 
导入urllib2 
 
 job_url = http://www.indeed.com/viewjob?jk=0f5592c8191a21af
 site = urllib2.urlopen（job_url）.read（）
 soup = BeautifulSoup（site，html.parser）
 text = soup.find（span，{class：summary}）。get_text（）
打印文本

我想从这个确实的工作描述中获得所有内容。基本上，我想要获取所有文本。然而，利用上面的代码，我只能得到请注意，这是一个1年的合同任务，候选人不能开始任务，直到背景检查和药物测试完成。为什么我失去了文本的其余部分？如何在不指定子标签的情况下从此标签获取所有文本？

非常感谢。

解决方案

使用不同的解析器（如 lxml 解析器）而不是 html.parser 解析器：

替换：

 汤= BeautifulSoup b.brser）

with：

  soup = BeautifulSoup（site，lxml）

确保先安装了lxml解析器：
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser

I'm trying to get all text from a html tag using beautifulsoup get_text() method. I use Python 2.7 and Beautifulsoup 4.4.0. It works for most of the times. However, this method can only get first paragraph from a tag sometimes. I can't figure out why. Please see the following example.

from bs4 import BeautifulSoup
import urllib2

job_url = "http://www.indeed.com/viewjob?jk=0f5592c8191a21af"
site = urllib2.urlopen(job_url).read()
soup = BeautifulSoup(site, "html.parser")
text = soup.find("span", {"class": "summary"}).get_text()
print text

I want to get all content from this indeed job description. Basically, I want to get all text in . However, utilize the code above, I can only get "Please note that this is a 1 year contract assignment. Candidates cannot start an assignment until background check and drug test is completed". Why I'm losing the rest of text? How can I get all text from this tag without specifying sub-tags?

Thanks a lot.

解决方案

Try it with a different parser like the lxml parser instead of the html.parser parser:

Replace:

soup = BeautifulSoup(site, "html.parser")

with:

soup = BeautifulSoup(site, "lxml")

Make sure you have the lxml parser installed first: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser

这篇关于Python Beautifulsoup get_text（）没有获取所有文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python Beautifulsoup get_text（）没有获取所有文本 [英] Python Beautifulsoup get_text() not getting all text

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

Python Beautifulsoup get_text（）没有获取所有文本 [英] Python Beautifulsoup get_text() not getting all text

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭