Html表格的抓取和导出为csv：属性错误 [英] Html-table scraping and exporting to csv: attribute error

查看：219 发布时间：2018/6/25 19:03:11 python html csv beautifulsoup

本文介绍了Html表格的抓取和导出为csv：属性错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在Python 3.6上使用BeautifulSoup来抓取这个html表格，以便将其导出到csv，如下面的脚本所示。我使用了一个前例，试图适合我的情况。

  url ='http://finanzalocale.interno.it/apps /floc.php/certificati/index/codice_ente/2050540010/cod/4/anno/2015/md/0/cod_modello/CCOU/tipo_modello/U/cod_quadro/03'
 html = urlopen（url）.read 
 soup = BeautifulSoup（html（），lxml）
 table = soup.select_one（table.tabfin）
 headers = [th.text（iso-8859-1）对于table.select（tr th）]

，但我收到一个AttributeError。

AttributeError：'NoneType'对象没有属性'select'

然后我会试着用

 导出到csv，并打开（abano_spese.csv， w）为f：
 wr = csv.writer（f）
 wr.writerow（headers）
 wr.writerows（[[td.text.encode（iso-8859-1 ）for row.find_all（td）] for table.select（tr + tr）]）

这有什么问题？我很抱歉，如果有一些愚蠢的错误，我是一个绝对的Python初学者。

谢谢大家

解决方案

刮去'Ministero dell'Interno 的网站存在问题。让我们来试试这段代码：

  url ='http://finanzalocale.interno.it/apps/floc.php/certificati/ index / codice_ente / 2050540010 / cod / 4 / anno / 2015 / md / 0 / cod_modello / CCOU / tipo_modello / U / cod_quadro / 03'
 
 html = urlopen（url）.read（）
 soup = BeautifulSoup（html）
 print soup.prettify（）

La sua richiestaèstata bloccata dai sistemi posti a protezione del sito web。

Si prega di assicurarsidell'integritàdella postazione utilizzata e riprovare。

刮刮似乎不受欢迎，或者他们认为您的要求中存在某些令人讨厌的内容，这就是 table = None ，你得到一个 AttributeError

可能的解决方案：

**在开始其他任何事情之前，请检查'Ministero dell'Interno'的数据策略是否允许脚本使用其数据，否则这不是获得所需内容的方式。** b
$ b

第2步：您可以尝试将自定义标头传递给您的请求以充当浏览器。例如，

  headers = {User-Agent：Mozilla / 5.0（Windows; U; Windows NT 5.1; US; rv：1.9.2.8）Gecko / 20100722 Firefox / 3.6.8 GTB7.1（.NET CLR 3.5.30729）} 
r = requests.get（url，headers = headers）
 soup = BeautifulSoup（r.text，'lxml'）

现在你有了汤。请注意，您在页面中有3个不同的< table class =tabfin> 。我猜你需要第二个：

$ $ p $ $ $ $ c $ table $ soup.select（table.tabfin）[1]

通过这种方式，它可以工作。不好意思，如果我听起来有些迂腐，但我担心这种方法应该不符合他们的数据许可证。请检查它之前刮。

I'm trying to scrape this html table with BeautifulSoup on Python 3.6 in order to export it to csv, as in the scripts below. I used a former example, trying to fit my case.
url = 'http://finanzalocale.interno.it/apps/floc.php/certificati/index/codice_ente/2050540010/cod/4/anno/2015/md/0/cod_modello/CCOU/tipo_modello/U/cod_quadro/03' html =urlopen(url).read soup = BeautifulSoup(html(), "lxml") table = soup.select_one("table.tabfin") headers = [th.text("iso-8859-1") for th in table.select("tr th")]
but I receive an AttributeError.

AttributeError: 'NoneType' object has no attribute 'select'

Then I would try to export to csv with
with open("abano_spese.csv", "w") as f: wr = csv.writer(f) wr.writerow(headers) wr.writerows([[td.text.encode("iso-8859-1") for td in row.find_all("td")] for row in table.select("tr + tr")])
What's wrong with this? I'm sorry if there's some stupid error, I'm an absolute beginner with python.

Thank you all
解决方案
There is a problem with the scraping of the Web site of Ministero dell'Interno. Let's try this code:
url = 'http://finanzalocale.interno.it/apps/floc.php/certificati/index/codice_ente/2050540010/cod/4/anno/2015/md/0/cod_modello/CCOU/tipo_modello/U/cod_quadro/03' html = urlopen(url).read() soup = BeautifulSoup(html) print soup.prettify()
You get:
La sua richiesta è stata bloccata dai sistemi posti a protezione del sito web.
Si prega di assicurarsi dell'integrità della postazione utilizzata e riprovare.
Scraping seems not welcome or they think that there is something nasty in your request, and that's the reason why table = None in your code and you get an AttributeError

Possible solution:

** Before starting anything else, please check if Ministero dell'Interno's data policy allows a script to consume their data, otherwise this is not the way to get what you need.**

Step 2: you can try to pass custom headers to your request to act as a browser. E.g.,
headers = {"User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)"} r = requests.get(url, headers = headers) soup = BeautifulSoup(r.text, 'lxml')
Now you have your soup. Note that you have 3 different <table class="tabfin"> in the page. I guess you need the second one:
table = soup.select("table.tabfin")[1]
In that way, it works. Excuse me if I sound a bit pedantic but I'm afraid that such an approach should be not compliant with their data license. Please, check it before scraping.

这篇关于Html表格的抓取和导出为csv：属性错误的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Html表格的抓取和导出为csv：属性错误 [英] Html-table scraping and exporting to csv: attribute error

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

Html表格的抓取和导出为csv：属性错误 [英] Html-table scraping and exporting to csv: attribute error

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭