提取所有＆LT;脚本＆GT;在一个HTML页面和标记附加到文档的底部 [英] Extract all <script> tags in an HTML page and append to the bottom of the document

查看：325 发布时间：2016/8/5 18:56:01 python beautifulsoup

本文介绍了提取所有＆LT;脚本＆GT;在一个HTML页面和标记附加到文档的底部的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有人能告诉我，我怎么能提取并删除所有＆LT;脚本＆gt;在HTML文档中标记并将其添加到文档的末尾，右侧前＆LT; /身体GT;＆LT; / HTML＆GT; ？我想尽量避免使用 LXML 请

Could someone tell me how I can extract and remove all the <script> tags in a HTML document and add them to the end of the document, right before the </body></html>? I'd like to try and avoid using lxml please.

感谢。

推荐答案

答案很简单，可能会错过许多细微差别。怎么过，这应该给你如何去这样做，改进它，一般的想法。我相信这是可以改善，但你应该能够与文档的帮助下做到这一点很快。

The answer is simple and may miss many nuances. How ever, this should give you an idea of how to go about doing it, improving it in general. I am sure this can be improved but you should be able to do that quickly with help of the documentation.

参考文档： http://www.crummy.com/software/BeautifulSoup/documentation html的

from BeautifulSoup import BeautifulSoup

doc = ['<html><script type="text/javascript">document.write("Hello World!")',
       '</script><head><title>Page title</title></head>',
       '<body><p id="firstpara" align="center">This is paragraph <b>one</b>.',
       '<p id="secondpara" align="blah">This is paragraph <b>two</b>.',
       '</html>']
soup = BeautifulSoup(''.join(doc))


for tag in soup.findAll('script'):
    # Use extract to remove the tag
    tag.extract()
    # use simple insert
    soup.body.insert(len(soup.body.contents), tag)

print soup.prettify()

输出：

<html>
 <head>
  <title>
   Page title
  </title>
 </head>
 <body>
  <p id="firstpara" align="center">
   This is paragraph
   <b>
    one
   </b>
   .
  </p>
  <p id="secondpara" align="blah">
   This is paragraph
   <b>
    two
   </b>
   .
  </p>
  <script type="text/javascript">
   document.write("Hello World!")
  </script>
 </body>
</html>

这篇关于提取所有＆LT;脚本＆GT;在一个HTML页面和标记附加到文档的底部的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

提取所有＆LT;脚本＆GT;在一个HTML页面和标记附加到文档的底部 [英] Extract all <script> tags in an HTML page and append to the bottom of the document

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

提取所有＆LT;脚本＆GT;在一个HTML页面和标记附加到文档的底部 [英] Extract all &lt;script&gt; tags in an HTML page and append to the bottom of the document

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

提取所有＆LT;脚本＆GT;在一个HTML页面和标记附加到文档的底部 [英] Extract all <script> tags in an HTML page and append to the bottom of the document

登录关闭