BeautifulSoup标签的类型为bs4.element.NavigableString和bs4.element.Tag [英] BeautifulSoup tag is type bs4.element.NavigableString and bs4.element.Tag

查看:118
本文介绍了BeautifulSoup标签的类型为bs4.element.NavigableString和bs4.element.Tag的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Wikipedia文章中刮取一个表,并且每个表元素的类型似乎都是< class'bs4.element.Tag'> < class'bs4.element.NavigableString'> .

I'm trying to scrape a table in a Wikipedia article and the type of each table element appears to be both <class 'bs4.element.Tag'> and <class 'bs4.element.NavigableString'>.

import requests
import bs4
import lxml


resp = requests.get('https://en.wikipedia.org/wiki/List_of_municipalities_in_Massachusetts')

soup = bs4.BeautifulSoup(resp.text, 'lxml')

munis = soup.find(id='mw-content-text')('table')[1]

for muni in munis:
    print type(muni)
    print '============'

产生以下输出:

<class 'bs4.element.Tag'>
============
<class 'bs4.element.NavigableString'>
============
<class 'bs4.element.Tag'>
============
<class 'bs4.element.NavigableString'>
============
<class 'bs4.element.Tag'>
============
<class 'bs4.element.NavigableString'>
...

当我尝试检索 muni.contents 时,我得到了 AttributeError:'NavigableString'对象没有属性'contents'错误.

When I try to retrieve muni.contents I get the AttributeError: 'NavigableString' object has no attribute 'contents' error.

我做错了什么?如何为每个 muni 对象获取 bs4.element.Tag 对象?

What am I doing wrong? How do I get the bs4.element.Tag object for each muni?

(使用Python 2.7).

(Using Python 2.7).

推荐答案

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''

import requests
import bs4
from bs4 import BeautifulSoup
# from urllib.request import urlopen

html = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = BeautifulSoup(html.text, 'lxml')

symbolslist = soup.find('table').tr.next_siblings
for sec in symbolslist:
    # print(type(sec))
    if type(sec) is not bs4.element.NavigableString:
        print(sec.get_text())

这篇关于BeautifulSoup标签的类型为bs4.element.NavigableString和bs4.element.Tag的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆