只从 HTML 文件中获取脚本 [英] Only get scripts out of HTML file

查看：27 发布时间：2021/9/23 20:37:35 python html split

本文介绍了只从 HTML 文件中获取脚本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含网站完整代码的大型 html 文件.我只关心<script>...<script/>里面的代码.有没有办法轻松地将这些行从 html 文件中取出?或者我是否必须按每个 <li>文本 4<脚本><li>文本 5'''汤 = BeautifulSoup(pagehtml, 'html.parser')[s.extract() for s in soup.findAll('script')]

<小时><预><代码>>>>汤<li>文本 1<li>文本 4>>>

I have a large html file that contains the full code from a website. I only care about the code inside <script>...<script/>. Is there a way to easily just take those lines out of the html file? Or will I have to split the file by each <script>? I'll want to ignore the parts that come before the first <script> (like the head) and I need to ignore the tags at the end of the file as well in the middle like where it switches from <head> to <body>.

解决方案

if you want remove All script tags:

from bs4 import BeautifulSoup
pagehtml = '''
<li> Text 1 </li>
<script>
<li> Text 2 </li>
<li> Text 3 </li>
</script>
<li> Text 4 </li>
<script>
<li> Text 5 </li>
</script>
'''
soup = BeautifulSoup(pagehtml, 'html.parser')
[s.extract() for s in soup.findAll('script')]

>>> soup

<li> Text 1 </li>

<li> Text 4 </li>

>>>

这篇关于只从 HTML 文件中获取脚本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

只从 HTML 文件中获取脚本 [英] Only get scripts out of HTML file

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

只从 HTML 文件中获取脚本 [英] Only get scripts out of HTML file

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭