我如何提取文本从一些JavaScript使用BeautifulSoup一个网页上的长字符串? [英] How do I extract a long string of text from some JavaScript on a web page using BeautifulSoup?
问题描述
我试图写一个脚本这样我就可以登录到一个网站,但为了做到这一点,我需要present的验证码。来从URL验证码那直接图像的唯一方法是提取巨大的字符串名称'挑战',但我一直没能与BeautifulSoup做的某些原因。什么是提取长字符串的最佳方式?
VAR RecaptchaState = {
网站:'4LfjPgEA56AABAJExraAeYXdMbVhPcG__Hyv-URXF',
挑战: '03AHJ_VusE_PgNB0vfBpD2h53o8uGMt1MeKi9bzhOTsjt0ze7SKmHVNe8uADceoU3JLPjpp8cJCVDGiYKo1ho-r1JcV19tm26doUHqevixJjH8SZ26i4EWbUOQLEuODf0Kt6JI0ZhtfiIaIXDg9MhUyDCEt_qxFWbSHA',
is_incorrect:假的,
programming_error:'',
错误信息 : '',
服务器:'http://www.google.com/recaptcha/api/',
超时:18000
};文件撰写('
< SCR>
');
< / SCR>
我只是用一个普通的前pression。不知道这一点,但我不认为beautifulsoup解析JavaScript的 - 只有(X)HTML:
挑战= re.search(R挑战*:*'(\\ S +)',X)。集团(1)
给出:
'03AHJ_VusE_PgNB0vfBpD2h53o8uGMt1MeKi9bzhOTsjt0ze7SKmHVNe8uADceoU3JLPjpp8cJCVDGiYKo1ho-r1JcV19tm26doUHqevixJjH8SZ26i4EWbUOQLEuODf0Kt6JI0ZhtfiIaIXDg9MhUyDCEt_qxFWbSHA'
I'm trying to write a script so I can log into a website, but in order to do that I need to present the captcha. The only way to get that direct image of the captcha from the URL is to extract the giant string name 'challenge' but I have not been able to do it with BeautifulSoup for some reason. What is the best way to extract the long string?
var RecaptchaState = {
site : '4LfjPgEA56AABAJExraAeYXdMbVhPcG__Hyv-URXF',
challenge : '03AHJ_VusE_PgNB0vfBpD2h53o8uGMt1MeKi9bzhOTsjt0ze7SKmHVNe8uADceoU3JLPjpp8cJCVDGiYKo1ho-r1JcV19tm26doUHqevixJjH8SZ26i4EWbUOQLEuODf0Kt6JI0ZhtfiIaIXDg9MhUyDCEt_qxFWbSHA',
is_incorrect : false,
programming_error : '',
error_message : '',
server : 'http://www.google.com/recaptcha/api/',
timeout : 18000
};
document.write('
<scr>
');
</scr>
I'd just use a regular expression. Not sure about this, but I don't think beautifulsoup parses javascript--only (x)html:
challenge = re.search(r"challenge *: *'(\S+)'", x).group(1)
Gives:
'03AHJ_VusE_PgNB0vfBpD2h53o8uGMt1MeKi9bzhOTsjt0ze7SKmHVNe8uADceoU3JLPjpp8cJCVDGiYKo1ho-r1JcV19tm26doUHqevixJjH8SZ26i4EWbUOQLEuODf0Kt6JI0ZhtfiIaIXDg9MhUyDCEt_qxFWbSHA'
这篇关于我如何提取文本从一些JavaScript使用BeautifulSoup一个网页上的长字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!