BeautifulSoup-AttributeError:"NavigableString"对象没有属性"find_all" [英] BeautifulSoup - AttributeError: 'NavigableString' object has no attribute 'find_all'

查看:41
本文介绍了BeautifulSoup-AttributeError:"NavigableString"对象没有属性"find_all"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试获取此脚本以遍历html文件并打印出所需结果.它一直给我这个错误.在表中只有一个游戏"的情况下,它可以很好地工作,但是如果超过一个,它就会中断.尝试对其进行修复,以使其可以迭代多个游戏/停车票,但由于该原因而无法继续进行.

Trying to get this script to iterate through the html file and print out the desired results. It keeps giving me this error. It works fine with only one "game" in the table, but if it is more than one it breaks. Trying to fix it so it can iterate over more than one game/parking ticket but can't continue due to this.

Traceback (most recent call last):
  File "C:/Users/desktop/Desktop/tabletest.py", line 11, in <module>
    for rows in table.find_all('tr'):
  File "C:\Program Files\Python36\lib\site-packages\bs4\element.py", line 737, in __getattr__
    self.__class__.__name__, attr))
AttributeError: 'NavigableString' object has no attribute 'find_all'

这是我的代码:

import pandas as pd
from bs4 import BeautifulSoup
import requests
import lxml.html as lh


with open("htmltabletest.html", encoding="utf-8") as f:
    data = f.read()
    soup = BeautifulSoup(data, 'lxml')
    for table in soup.find('table', attrs={'id': 'eventSearchTable'}):
        for rows in table.find_all('tr'):
            cols = table.find_all('td')

            empty = cols[0].get_text()
            eventdate = cols[1].get_text()
            eventname = cols[2].get_text()
            tickslisted = cols[3].get_text()
            pricerange = cols[4].get_text()

            entry = (empty, eventdate, eventname, tickslisted, pricerange)

            print(entry)

这是html文件中的内容:

This is whats in the html file:

<table class="dataTable st-alternateRows" id="eventSearchTable">
<thead>
<tr>
<th id="th-es-rb"><div class="dt-th"> </div></th>
<th id="th-es-ed"><div class="dt-th"><span class="th-divider"> </span>Event date<br/>Time (local)</div></th>
<th id="th-es-en"><div class="dt-th"><span class="th-divider"> </span>Event name<br/>Venue</div></th>
<th id="th-es-ti"><div class="dt-th"><span class="th-divider"> </span>Tickets<br/>listed</div></th>
<th id="th-es-pr"><div class="dt-th es-lastCell"><span class="th-divider"> </span>Price<br/>range</div></th>
</tr>
</thead>
<tbody class="" id="eventSearchTbody"><tr class="even" id="r-se-103577924">
<td class="nowrap"><input class="es-selectedEvent" id="se-103577924-check" name="selectEvent" type="radio"/></td>
<td class="nowrap" id="se-103577924-eventDateTime">Thu, 10/11/2018<br/>8:20 p.m.</td>
<td><div><a class="ellip" href="services/priceanalysis?eventId=103577924&amp;sectionId=0" id="se-103577924-eventName" target="_blank">Philadelphia Eagles at New York Giants</a></div><div id="se-103577924-venue">MetLife Stadium, East Rutherford, NJ</div></td>
<td id="se-103577924-nrTickets">6655</td>
<td class="es-lastCell nowrap" id="se-103577924-priceRange"><span id="se-103577924-minPrice">$134.50</span>  to<br/><span id="se-103577924-maxPrice">$2,222.50</span></td>
</tr><tr class="odd" id="r-se-103577925">
<td class="nowrap"><input class="es-selectedEvent" id="se-103577925-check" name="selectEvent" type="radio"/></td>
<td class="nowrap" id="se-103577925-eventDateTime">Thu, 10/11/2018<br/>8:21 p.m.</td>
<td><div><a class="ellip" href="services/priceanalysis?eventId=103577925&amp;sectionId=0" id="se-103577925-eventName" target="_blank">PARKING PASSES ONLY Philadelphia Eagles at New York Giants</a></div><div id="se-103577925-venue">MetLife Stadium Parking Lots, East Rutherford, NJ</div></td>
<td id="se-103577925-nrTickets">929</td>
<td class="es-lastCell nowrap" id="se-103577925-priceRange"><span id="se-103577925-minPrice">$20.39</span>  to<br/><span id="se-103577925-maxPrice">$3,602.50</span></td>
</tr></tbody>
</table>

推荐答案

错误在于您在表上进行迭代的方式,更具体地讲,是在行上:

The error lies in the way you iterate on the table, more specifically at the line:

for table in soup.find('table', attrs={'id': 'eventSearchTable'}):

如果要迭代,则应使用 find_all .确实,如果您查看两种方法返回的值的类型:

You should use find_all if you want to iterate. Indeed, if you look at the type of the value returned by the two methods:

print(type(soup.find('table', attrs={'id': 'eventSearchTable'})))
# <class 'bs4.element.Tag'>
print(type(soup.find_all('table', attrs={'id': 'eventSearchTable'})))
# <class 'bs4.element.ResultSet'>

在第一种情况下,您有一个表,在第二种情况下,您有一组表(在您的情况下仅由1个表组成),每个表的类型为 bs4.element.Tag .

in the first case you have a table, in the second case a set of tables (made by only 1 in your case) with each being of type bs4.element.Tag.

因此,您有两个选择,或者使用

Thus, you have two options, either you use

table = soup.find('table', attrs={'id': 'eventSearchTable'})

for table in soup.find_all("table", {"id":"eventSearchTable"}):

这篇关于BeautifulSoup-AttributeError:"NavigableString"对象没有属性"find_all"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆