如何获得BeautifulSoup 4,尊重一个自闭的标签? [英] How to get BeautifulSoup 4 to respect a self-closing tag?

查看:233
本文介绍了如何获得BeautifulSoup 4,尊重一个自闭的标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题是特定于 BeautifulSoup4 ,这使得它从previous不同的问题

<一个href=\"http://stackoverflow.com/questions/1567402/why-is-beautifulsoup-modifying-my-self-closing-elements\">Why是BeautifulSoup修改我自闭的元素呢?

selfClosingTags在BeautifulSoup

由于 BeautifulStoneSoup 消失(在previous XML解析器),我怎么能得到 BS4 尊重一个新的自关闭标签?例如:

 进口BS4
S ='''&LT;富&GT; &LT;扎了=3/&GT; &LT; / foo的&GT;'''
汤= bs4.BeautifulSoup(S,selfClosingTags = ['酒吧'])打印汤。prettify()

不自行关闭标签,反而给人一种暗示。这是什么树生成器,BS4指的是,如何我自闭的标签?

  /usr/local/lib/python2.7/dist-packages/bs4 / __ init__.py:112:UserWarning:BS4不尊重selfClosingTags参数的构造函数BeautifulSoup。树构建器是负责了解自闭的标签。
  BS4不尊重selfClosingTags参数传递给
&LT; HTML和GT;
 &LT;身体GT;
  &LT;富&GT;
   &LT;扎了=3&GT;
   &LT; /酒吧和GT;
  &LT; / foo的&GT;
 &LT; /身体GT;
&LT; / HTML&GT;


解决方案

为了解析您的XML传递XML作为第二个参数的构造函数BeautifulSoup。

 汤= bs4.BeautifulSoup(S,'XML')

你需要安装lxml的。

您并不需要通过 selfClosingTags 了:

 在[1]:进口BS4
在[2]:S ='''&所述;富&GT; &LT;扎了=3/&GT; &LT; / foo的&GT;'''
在[3]:汤= bs4.BeautifulSoup(S,'XML')
[4]:打印汤prettify()
&LT; XML版本=1.0编码=UTF-8&GT?;
&LT;富&GT;
 &LT;扎了=3/&GT;
&LT; / foo的&GT;

This question is specific to BeautifulSoup4, which makes it different from the previous questions:

Why is BeautifulSoup modifying my self-closing elements?

selfClosingTags in BeautifulSoup

Since BeautifulStoneSoup is gone (the previous xml parser), how can I get bs4 to respect a new self-closing tag? For example:

import bs4   
S = '''<foo> <bar a="3"/> </foo>'''
soup = bs4.BeautifulSoup(S, selfClosingTags=['bar'])

print soup.prettify()

Does not self-close the bar tag, but gives a hint. What is this tree builder that bs4 is referring to and how to I self-close the tag?

/usr/local/lib/python2.7/dist-packages/bs4/__init__.py:112: UserWarning: BS4 does not respect the selfClosingTags argument to the BeautifulSoup constructor. The tree builder is responsible for understanding self-closing tags.
  "BS4 does not respect the selfClosingTags argument to the "
<html>
 <body>
  <foo>
   <bar a="3">
   </bar>
  </foo>
 </body>
</html>

解决方案

To parse XML you pass in "xml" as the second argument to the BeautifulSoup constructor.

soup = bs4.BeautifulSoup(S, 'xml')

You’ll need to have lxml installed.

You don't need to pass selfClosingTags anymore:

In [1]: import bs4
In [2]: S = '''<foo> <bar a="3"/> </foo>'''
In [3]: soup = bs4.BeautifulSoup(S, 'xml')
In [4]: print soup.prettify()
<?xml version="1.0" encoding="utf-8"?>
<foo>
 <bar a="3"/>
</foo>

这篇关于如何获得BeautifulSoup 4,尊重一个自闭的标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆