是否可以将bs4汤对象与lxml一起使用? [英] Is it possible to use bs4 soup object with lxml?

查看：61 发布时间：2021/4/15 19:17:53 beautifulsoup lxml

本文介绍了是否可以将bs4汤对象与lxml一起使用?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试同时使用BS4和lxml因此，除了解析两次html页面之外，还有什么方法可以在lxml中使用汤对象，反之亦然?

I am trying to use both BS4 and lxml so instead of parsing html page twice, is there any way to use soup object in lxml or vice versa?

self.soup = BeautifulSoup(open(path), "html.parser")

我尝试将此对象与lxml这样使用

i tried using this object with lxml like this

 doc = html.fromstring(self.soup)

这将引发错误 TypeError:预期的字符串或类似字节的对象

反正有这种用法吗?

推荐答案

我认为没有办法不通过字符串对象.

I don't think there is a way without going through a string object.

from bs4 import BeautifulSoup
import lxml.html

html = """
<html><body>
<div>
<p>test</p>
</div>
</body></html>
"""
soup = BeautifulSoup(html, 'html.parser')
# Soup to lxml.html
doc = lxml.html.fromstring(soup.prettify())
print (type(doc))
print (lxml.html.tostring(doc))
#lxml.html to soup
soup = BeautifulSoup(lxml.html.tostring(doc), 'html.parser')
print (type(soup))
print (soup.prettify())

输出:

<class 'lxml.html.HtmlElement'>
b'<html>\n <body>\n  <div>\n   <p>\n    test\n   </p>\n  </div>\n </body>\n</html>'
<class 'bs4.BeautifulSoup'>
<html>
 <body>
  <div>
   <p>
    test
   </p>
  </div>
 </body>
</html>

已更新，以回应评论:

您可以使用lxml.etree遍历doc对象:

You can use lxml.etree to iterate through the doc object:

# Soup to lxml.etree
doc = etree.fromstring(soup.prettify())
it = doc.getiterator()
for  element in it:
    print("%s - %s" % (element.tag, element.text.strip()))

这篇关于是否可以将bs4汤对象与lxml一起使用?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

是否可以将bs4汤对象与lxml一起使用? [英] Is it possible to use bs4 soup object with lxml?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

是否可以将bs4汤对象与lxml一起使用? [英] Is it possible to use bs4 soup object with lxml?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭