干刮:“没有找到......的路线"; [英] dryscrape: "No route found for....."
问题描述
上下文:
我正在尝试编写自己的货币聚合器,因为市场上的大多数可用工具尚未涵盖所有金融网站.我在树莓派上使用 python 2.7.9.
I am trying to code my own money aggregator because most of available tools on the market does not cover all financial websites yet. I am using python 2.7.9 on a raspberrypi.
到目前为止,由于请求库,我设法连接到我的 2 个帐户(一个是众筹网站,一个是我的养老金网站).我试图聚合的第三个网站自 2 周以来给我带来了困难,它的名字是 https://www.amundi-ee.com.
I managed to connect to 2 of my accounts so far (one crow-lending website and one for my pension) thanks to requests library. The third website I am trying to aggregate is giving me hard time since 2 weeks now and its name is https://www.amundi-ee.com.
我发现该网站实际上使用的是 JavaScript,经过多次研究,我最终使用了 dryscrape(我不能使用 selenium,因为不再支持 Arm).
I figured out that the website is actually using JavaScript and after many research I ended up using dryscrape (I cannot use selenium cause Arm is not supported anymore).
问题:
运行此代码时:
import dryscrape
url='https://www.amundi-ee.com'
extensionInit='/psf/#login'
extensionConnect='/psf/authenticate'
extensionResult='/psf/#'
urlInit = url + extensionInit
urlConnect = url + extensionConnect
urlResult = url + extensionResult
s = dryscrape.Session()
s.visit(urlInit)
print s.body()
login = s.at_xpath('//*[@id="identifiant"]')
login.set("XXXXXXXX")
pwd = s.at_xpath('//*[@name="password"]')
pwd.set("YYYYYYY")
# Push the button
login.form().submit()
s.visit(urlConnect)
print s.body()
s.visit(urlResult)
代码访问urlConnect第21行出现问题,正文打印第22行返回如下:
There is an issue when code visits urlConnect line 21, the body printing line 22 returns the below:
{"code":405,"message":"No route found for \u0022GET \/authenticate\u0022: Method Not Allowed (Allow: POST)","errors":[]}
问题
为什么我会收到这样的错误信息,我该如何正确登录网站以检索我要查找的数据?
Why do have I have such error message and how can I login to the website properly to retrieve the data I am looking for?
PS:我的代码灵感来自这个issue带有 cookie 的 Python dryscrape 抓取页面
PS: My code inspiration comes from this issue Python dryscrape scrape page with cookies
推荐答案
好的,经过一个多月的努力解决这个问题,我很高兴地说我终于得到了我想要的
ok so after more than one month of trying to tackle this down, I am very delighted to say that I finally managed to get what I want
出了什么问题?
基本上有 2 件主要的事情(也许更多,但我可能已经忘记了):
Basically 2 major things (maybe more but I might have forgotten in between):
- 密码必须通过按钮推送,这些密码是随机的生成,因此每次访问时都需要进行新映射
login.form().submit()
通过点击验证按钮来访问所需数据的页面就足够了
- the password has to be pushed via button and those are randomly generated so every time you access you need to do a new mapping
login.form().submit()
was messing around the access to the page of needed data, by clicking the validate button was good enough
这是最终的代码,如果你发现使用不当,请不要犹豫,因为我是一个python新手和一个零星的编码器.
Here is the final code, do not hesitate to comment if you find a bad usage as I am a python novice and a sporadic coder.
import dryscrape
from bs4 import BeautifulSoup
from lxml import html
from time import sleep
from webkit_server import InvalidResponseError
from decimal import Decimal
import re
import sys
def getAmundi(seconds=0):
url = 'https://www.amundi-ee.com/psf'
extensionInit='/#login'
urlInit = url + extensionInit
urlResult = url + '/#'
timeoutRetry=1
if 'linux' in sys.platform:
# start xvfb in case no X is running. Make sure xvfb
# is installed, otherwise this won't work!
dryscrape.start_xvfb()
print "connecting to " + url + " with " + str(seconds) + "s of loading wait..."
s = dryscrape.Session()
s.visit(urlInit)
sleep(seconds)
s.set_attribute('auto_load_images', False)
s.set_header('User-agent', 'Google Chrome')
while True:
try:
q = s.at_xpath('//*[@id="identifiant"]')
q.set("XXXXXXXX")
except Exception as ex:
seconds+=timeoutRetry
print "Failed, retrying to get the loggin field in " + str(seconds) + "s"
sleep(seconds)
continue
break
#get password button mapping
print "loging in ..."
soup = BeautifulSoup(s.body())
button_number = range(10)
for x in range(0, 10):
button_number[int(soup.findAll('button')[x].text.strip())] = x
#needed button
button_1 = button_number[1] + 1
button_2 = button_number[2] + 1
button_3 = button_number[3] + 1
button_5 = button_number[5] + 1
#push buttons for password
button = s.at_xpath('//*[@id="num-pad"]/button[' + str(button_2) +']')
button.click()
button = s.at_xpath('//*[@id="num-pad"]/button[' + str(button_1) +']')
button.click()
..............
# Push the validate button
button = s.at_xpath('//*[@id="content"]/router-view/div/form/div[3]/input')
button.click()
print "accessing ..."
sleep(seconds)
while True:
try:
soup = BeautifulSoup(s.body())
total_lended = soup.findAll('span')[8].text.strip()
total_lended = total_lended = Decimal(total_lended.encode('ascii','ignore').replace(',','.').replace(' ',''))
print total_lended
except Exception as ex:
seconds+=1
print "Failed, retrying to get the data in " + str(seconds) + "s"
sleep(seconds)
continue
break
s.reset()
这篇关于干刮:“没有找到......的路线";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!