使用BeautifulSoup捕获JavaScript警报文本 [英] Capture JavaScript alert text using BeautifulSoup

查看:106
本文介绍了使用BeautifulSoup捕获JavaScript警报文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用此JavaScript来验证表单:

I am using this JavaScript to validate a form:

<script type="text/javascript">
        function validateForm()
        {
            var a=document.forms["orderform"]["Name"].value;
            var b=document.forms["orderform"]["Street"].value;
            var c=document.forms["orderform"]["ZIP"].value;
            var d=document.forms["orderform"]["City"].value;
            var e=document.forms["orderform"]["PhoneNumber"].value;
            if (
                a==null || a=="" || 
                b==null || b=="" || 
                c==null || c=="" || 
                d==null || d=="" || 
                e==null || e==""
                )
            {alert("Please fill all the required fields.");
            return false;
            }
        }
      </script>

我正在尝试使用BeatifulSoup捕获警报文本:

I am trying to capture the alert text using BeatifulSoup:

import re
from bs4 import BeautifulSoup

with open("index.html") as fp:
  soup = BeautifulSoup(fp, "lxml")

for script in soup.find_all(re.compile("(?<=alert\(\").+(?=\")")):
  print(script)

这不返回任何内容.这基于BS文档中正则表达式"下给出的示例,以查找以"b"开头的标签名称:

This does not return anything. This is based on the example given in the BS documentation under 'A regular expression' to find tag names starting with a 'b':

import re
for tag in soup.find_all(re.compile("^b")):
    print(tag.name)
# body
# b

但是我似乎找不到与将打印警报文本的'print(tag.name)'等效的内容.还是我完全走错了轨道?非常感谢您的帮助.

but I seem to be unable to find the equivalent to 'print(tag.name)' that would print the alert text. Or am I completely on the wrong track? Any help is much appreciated.

我试过了:

pattern = re.compile("(?<=alert\(\").+(?=\")"))
for script in soup.find_all ('script'):
  print(script.pattern)

这将返回无".

推荐答案

在所有html数据上运行将不起作用.首先,您需要提取script数据,然后可以轻松地解析alert文本.

Running over the all html data will not work. First you need to extract the script data then you can easily parse the alert text.

import re
from bs4 import BeautifulSoup

with open("index.html") as fp:
  soup = BeautifulSoup(fp, "lxml")

script = soup.find("script").extract()

# find all alert text
alert = re.findall(r'(?<=alert\(\").+(?=\")', script.text)
print(alert)

输出:

['Please fill all the required fields.']

这篇关于使用BeautifulSoup捕获JavaScript警报文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆