美丽的汤只提取表头 [英] Beautiful soup just extract header of a table

查看：188 发布时间：2016/8/5 19:15:24 python python-3.x beautifulsoup bs4

本文介绍了美丽的汤只提取表头的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想提取Python中使用3.5美丽的汤以下网站从表中的信息。

I want to extract information from the table in the following website using beautiful soup in python 3.5.

http://www.askapatient.com/viewrating.asp?drug=19839&name=ZOLOFT

我必须先保存网页，因为我的程序需要脱机工作。

I have to save the web-page first, since my program needs to work off-line.

我在我的电脑中保存的网页，我用下面的codeS中提取表信息。但问题是，code只提取表的标题。

I saved the web-page in my computer and I used the following codes to extract table information. But the problem is that the code just extract heading of the table.

这是我的code：

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup
url = "file:///Users/MD/Desktop/ZoloftPage01.html"


home_page= urlopen(url)
soup = BeautifulSoup(home_page, "html.parser")
table = soup.find("table", attrs={"class":"ratingsTable" } )
comments = [td.get_text() for td in table.findAll("td")]
print(comments)

这是code的输出：

And this is the output of the code:

['RATING', '\xa0 REASON', 'SIDE EFFECTS FOR ZOLOFT', 'COMMENTS', 'SEX', 'AGE', 'DURATION/DOSAGE', 'DATE ADDED ', '\xa0’]

我需要在表中的所有行的信息。
感谢您的帮助！

I need all the information in the table’s rows. Thanks for your help !

推荐答案

这是因为破HTML 的页面。你需要切换到更的宽松的解析器的喜欢的 html5lib 。下面是我的什么作品：

This is because of the broken HTML of the page. You need to switch to a more lenient parser like html5lib. Here is what works for me:

from pprint import pprint

import requests
from bs4 import BeautifulSoup

url = "http://www.askapatient.com/viewrating.asp?drug=19839&name=ZOLOFT"
response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'})

# HTML parsing part
soup = BeautifulSoup(response.content, "html5lib")
table = soup.find("table", attrs={"class":"ratingsTable"})
comments = [[td.get_text() for td in row.find_all("td")] 
            for row in table.find_all("tr")]
pprint(comments)

这篇关于美丽的汤只提取表头的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

美丽的汤只提取表头 [英] Beautiful soup just extract header of a table

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

美丽的汤只提取表头 [英] Beautiful soup just extract header of a table

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭