BeautifulSoup标签的类型为bs4.element.NavigableString和bs4.element.Tag [英] BeautifulSoup tag is type bs4.element.NavigableString and bs4.element.Tag
本文介绍了BeautifulSoup标签的类型为bs4.element.NavigableString和bs4.element.Tag的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试在Wikipedia文章中刮取一个表,并且每个表元素的类型似乎都是< class'bs4.element.Tag'>
和< class'bs4.element.NavigableString'>
.
I'm trying to scrape a table in a Wikipedia article and the type of each table element appears to be both <class 'bs4.element.Tag'>
and <class 'bs4.element.NavigableString'>
.
import requests
import bs4
import lxml
resp = requests.get('https://en.wikipedia.org/wiki/List_of_municipalities_in_Massachusetts')
soup = bs4.BeautifulSoup(resp.text, 'lxml')
munis = soup.find(id='mw-content-text')('table')[1]
for muni in munis:
print type(muni)
print '============'
产生以下输出:
<class 'bs4.element.Tag'>
============
<class 'bs4.element.NavigableString'>
============
<class 'bs4.element.Tag'>
============
<class 'bs4.element.NavigableString'>
============
<class 'bs4.element.Tag'>
============
<class 'bs4.element.NavigableString'>
...
当我尝试检索 muni.contents
时,我得到了 AttributeError:'NavigableString'对象没有属性'contents'
错误.
When I try to retrieve muni.contents
I get the AttributeError: 'NavigableString' object has no attribute 'contents'
error.
我做错了什么?如何为每个 muni
对象获取 bs4.element.Tag
对象?
What am I doing wrong? How do I get the bs4.element.Tag
object for each muni
?
(使用Python 2.7).
(Using Python 2.7).
推荐答案
#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''
import requests
import bs4
from bs4 import BeautifulSoup
# from urllib.request import urlopen
html = requests.get('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
soup = BeautifulSoup(html.text, 'lxml')
symbolslist = soup.find('table').tr.next_siblings
for sec in symbolslist:
# print(type(sec))
if type(sec) is not bs4.element.NavigableString:
print(sec.get_text())
这篇关于BeautifulSoup标签的类型为bs4.element.NavigableString和bs4.element.Tag的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文