如何使用Beautiful Soup在< script>中提取字符串标签? [英] How to use Beautiful Soup to extract string in <script> tag?
问题描述
在给定的.html页面中,我有一个脚本标签,如下所示:
In a given .html page, I have a script tag like so:
<script>jQuery(window).load(function () {
setTimeout(function(){
jQuery("input[name=Email]").val("name@email.com");
}, 1000);
});</script>
如何使用Beautiful Soup提取电子邮件地址?
How can I use Beautiful Soup to extract the email address?
推荐答案
要在 @Bob的答案中添加更多内容,并假设您还需要在HTML中找到script
标签,该标签可能还有其他script
标签.
To add a bit more to the @Bob's answer and assuming you need to also locate the script
tag in the HTML which may have other script
tags.
The idea is to define a regular expression that would be used for both locating the element with BeautifulSoup
and extracting the email
value:
import re
from bs4 import BeautifulSoup
data = """
<body>
<script>jQuery(window).load(function () {
setTimeout(function(){
jQuery("input[name=Email]").val("name@email.com");
}, 1000);
});</script>
</body>
"""
pattern = re.compile(r'\.val\("([^@]+@[^@]+\.[^@]+)"\);', re.MULTILINE | re.DOTALL)
soup = BeautifulSoup(data, "html.parser")
script = soup.find("script", text=pattern)
if script:
match = pattern.search(script.text)
if match:
email = match.group(1)
print(email)
打印:name@email.com
.
在这里,我们使用电子邮件地址的简单正则表达式,但是我们可以走得更远并且更严格但是我怀疑这个问题在实际中是否必要.
Here we are using a simple regular expression for the email address, but we can go further and be more strict about it but I doubt that would be practically necessary for this problem.
这篇关于如何使用Beautiful Soup在< script>中提取字符串标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!