使用python请求和漂亮的汤拉文本 [英] Using python requests and beautiful soup to pull text
本文介绍了使用python请求和漂亮的汤拉文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
感谢您看我的问题.我想知道是否有任何方法可以从此文本中提取data-sitekey ...这是页面
thanks for taking a look at my problem. i would like to know if there is any way to pull the data-sitekey from this text... here is the url to the page https://e-com.secure.force.com/adidasUSContact/
<div class="g-recaptcha" data-sitekey="6LfI8hoTAAAAAMax5_MTl3N-5bDxVNdQ6Gx6BcKX" data-type="image" id="ncaptchaRecaptchaId"><div style="width: 304px; height: 78px;"><div><iframe src="https://www.google.com/recaptcha/api2/anchor?k=6LfI8hoTAAAAAMax5_MTl3N-5bDxVNdQ6Gx6BcKX&co=aHR0cHM6Ly9lLWNvbS5zZWN1cmUuZm9yY2UuY29tOjQ0Mw..&hl=en&type=image&v=r20160921114513&size=normal&cb=ei2ddcb6rl03" title="recaptcha widget" width="304" height="78" role="presentation" frameborder="0" scrolling="no" name="undefined"></iframe></div><textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="width: 250px; height: 40px; border: 1px solid #c1c1c1; margin: 10px 25px; padding: 0px; resize: none; display: none; "></t
这是我当前的代码
import requests
from bs4 import BeautifulSoup
headers = {
'Host' : 'e-com.secure.force.com',
'Connection' : 'keep-alive',
'Upgrade-Insecure-Requests' : '1',
'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64)',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding' : 'gzip, deflate, sdch',
'Accept-Language' : 'en-US,en;q=0.8'
}
url = 'https://e-com.secure.force.com/adidasUSContact/'
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r, 'html.parser')
c = soup.find_all('div', attrs={"class": "data-sitekey"})
print c
推荐答案
好,现在我们有了代码,它很简单:
Ok now we have code, it is as simple as:
import requests
from bs4 import BeautifulSoup
soup = BeautifulSoup(requests.get("https://e-com.secure.force.com/adidasUSContact/").content, "html.parser")
key = soup.select_one("#ncaptchaRecaptchaId")["data-sitekey"]
data-sitekey 是一个属性,不是一个 css 类,因此您只需要从中提取它即可元素,您可以通过上面的 id 来找到该元素.
data-sitekey is an attribute, not a css class so you just need to extract it from the element, you can find the element by it's id as above.
您也可以使用类名:
# css selector
key = soup.select_one("div.g-recaptcha")["data-sitekey"]
# regular find using class name
key = soup.find("div",class_="g-recaptcha")["data-sitekey"]
这篇关于使用python请求和漂亮的汤拉文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文