<script>中如何使用Beautiful Soup提取字符串标签? [英] How to use Beautiful Soup to extract string in &lt;script&gt; tag?

查看:30
本文介绍了<script>中如何使用Beautiful Soup提取字符串标签?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在给定的 .html 页面中,我有一个像这样的脚本标签:

In a given .html page, I have a script tag like so:

     <script>jQuery(window).load(function () {
  setTimeout(function(){
    jQuery("input[name=Email]").val("name@email.com");
  }, 1000);
});</script>

如何使用 Beautiful Soup 提取电子邮件地址?

How can I use Beautiful Soup to extract the email address?

推荐答案

@Bob 的回答中添加更多内容 并假设您还需要在 HTML 中找到 script 标记,该标记可能具有其他 script 标记.

To add a bit more to the @Bob's answer and assuming you need to also locate the script tag in the HTML which may have other script tags.

这个想法是定义一个正则表达式,用于 使用 BeautifulSoup 定位元素并提取 email 值:

The idea is to define a regular expression that would be used for both locating the element with BeautifulSoup and extracting the email value:

import re

from bs4 import BeautifulSoup


data = """
<body>
    <script>jQuery(window).load(function () {
      setTimeout(function(){
        jQuery("input[name=Email]").val("name@email.com");
      }, 1000);
    });</script>
</body>
"""
pattern = re.compile(r'.val("([^@]+@[^@]+.[^@]+)");', re.MULTILINE | re.DOTALL)
soup = BeautifulSoup(data, "html.parser")

script = soup.find("script", text=pattern)
if script:
    match = pattern.search(script.text)
    if match:
        email = match.group(1)
        print(email)

打印:name@email.com.

这里我们使用简单的电子邮件地址正则表达式,但我们可以更进一步,更严格地它但我怀疑这对于这个问题实际上是否必要.

Here we are using a simple regular expression for the email address, but we can go further and be more strict about it but I doubt that would be practically necessary for this problem.

这篇关于<script>中如何使用Beautiful Soup提取字符串标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆