BeautifulSoup4:与符号文本 [英] BeautifulSoup4 : Ampersand in text

查看:331
本文介绍了BeautifulSoup4:与符号文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用BeautifulSoup4一个问题...(我相当一个Python / BeautifulSoup新手,所以请原谅我,如果我哑)

I have a problem using BeautifulSoup4... (I'm quite a Python/BeautifulSoup newbie, so forgive me if i'm dumb)

为什么以下code:

from bs4 import BeautifulSoup

soup_ko = BeautifulSoup('<select><option>foo</option><option>bar & baz</option><option>qux</option></select>')
soup_ok = BeautifulSoup('<select><option>foo</option><option>bar and baz</option><option>qux</option></select>')

print soup_ko.find_all('option')
print soup_ok.find_all('option')

产生下面的输出:

produce the following output:

[<option>foo</option>, <option>bar &amp; baz</option>]
[<option>foo</option>, <option>bar and baz</option>, <option>qux</option>]

我期待相同的结果,我的3个选项的数组...但BeautifulSoup似乎不喜欢在文字与符号?我怎样才能摆脱这种并得到一个正确的阵列,而无需编辑我的HTML(或通过变换/转换的话)?

i was expecting the same result, an array of my 3 options... but BeautifulSoup seems to dislike the ampersand in the text? How can i get rid of this and get a correct array without editing my HTML (or by transforming/converting it)?

感谢,

编辑:似乎是一个错误4.2.0 ...我下载都和4.2.0版本4.2.1(从的http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2.0.tar.gz和的http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2.1.tar.gz),将它解压缩在我的脚本文件夹,更改我的code为:

Seems like a 4.2.0 bug... i downloaded both 4.2.0 and 4.2.1 versions (from http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2.0.tar.gz and http://www.crummy.com/software/BeautifulSoup/bs4/download/4.2/beautifulsoup4-4.2.1.tar.gz), unzip it in my script folder, change my code to:

import sys
sys.path.insert(0, "beautifulsoup4-" + sys.argv[1])
from bs4 import BeautifulSoup, __version__

print "Beautiful Soup %s" % __version__
soup_ko = BeautifulSoup('<select><option>foo</option><option>bar & baz</option><option>qux</option></select>')
print soup_ko.find_all('option')

和得到的结果:

15:24:38 pataluc ~ % python stack.py 4.2.0
Beautiful Soup 4.2.0
[<option>foo</option>, <option>bar &amp; baz</option>]
15:24:41 pataluc ~ % python stack.py 4.2.1
Beautiful Soup 4.2.1
[<option>foo</option>, <option>bar &amp; baz</option>, <option>qux</option>]

所以我想我的问题是关闭的。感谢您的意见谁使我意识到这是一个版本的问题。

so i guess my question is closed. thanks for your comments who made me realize it was a version issue.

推荐答案

正如我在编辑的第一篇文章说,这是BeautifulSoup 4.2.0中的错误,我下载4.2.1和错误也没有了。

As i said in the edited first post, it was a bug in BeautifulSoup 4.2.0, i downloaded 4.2.1 and the bug is gone.

这篇关于BeautifulSoup4:与符号文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆