干刮:“没有找到......的路线"; [英] dryscrape: "No route found for....."

查看:51
本文介绍了干刮:“没有找到......的路线";的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

上下文:

我正在尝试编写自己的货币聚合器,因为市场上的大多数可用工具尚未涵盖所有金融网站.我在树莓派上使用 python 2.7.9.

I am trying to code my own money aggregator because most of available tools on the market does not cover all financial websites yet. I am using python 2.7.9 on a raspberrypi.

到目前为止,由于请求库,我设法连接到我的 2 个帐户(一个是众筹网站,一个是我的养老金网站).我试图聚合的第三个网站自 2 周以来给我带来了困难,它的名字是 https://www.amundi-ee.com.

I managed to connect to 2 of my accounts so far (one crow-lending website and one for my pension) thanks to requests library. The third website I am trying to aggregate is giving me hard time since 2 weeks now and its name is https://www.amundi-ee.com.

我发现该网站实际上使用的是 JavaScript,经过多次研究,我最终使用了 dryscrape(我不能使用 selenium,因为不再支持 Arm).

I figured out that the website is actually using JavaScript and after many research I ended up using dryscrape (I cannot use selenium cause Arm is not supported anymore).

问题:

运行此代码时:

import dryscrape

url='https://www.amundi-ee.com'
extensionInit='/psf/#login'
extensionConnect='/psf/authenticate'
extensionResult='/psf/#'
urlInit = url + extensionInit
urlConnect = url + extensionConnect
urlResult = url + extensionResult

s = dryscrape.Session()
s.visit(urlInit)
print s.body()
login = s.at_xpath('//*[@id="identifiant"]')
login.set("XXXXXXXX")
pwd = s.at_xpath('//*[@name="password"]')
pwd.set("YYYYYYY")
# Push the button
login.form().submit()
s.visit(urlConnect)
print s.body()
s.visit(urlResult)

代码访问urlConnect第21行出现问题,正文打印第22行返回如下:

There is an issue when code visits urlConnect line 21, the body printing line 22 returns the below:

{"code":405,"message":"No route found for \u0022GET \/authenticate\u0022: Method Not Allowed (Allow: POST)","errors":[]}

问题

为什么我会收到这样的错误信息,我该如何正确登录网站以检索我要查找的数据?

Why do have I have such error message and how can I login to the website properly to retrieve the data I am looking for?

PS:我的代码灵感来自这个issue带有 cookie 的 Python dryscrape 抓取页面

PS: My code inspiration comes from this issue Python dryscrape scrape page with cookies

推荐答案

好的,经过一个多月的努力解决这个问题,我很高兴地说我终于得到了我想要的

ok so after more than one month of trying to tackle this down, I am very delighted to say that I finally managed to get what I want

出了什么问题?

基本上有 2 件主要的事情(也许更多,但我可能已经忘记了):

Basically 2 major things (maybe more but I might have forgotten in between):

  1. 密码必须通过按钮推送,这些密码是随机的生成,因此每次访问时都需要进行新映射
  2. login.form().submit() 通过点击验证按钮来访问所需数据的页面就足够了
  1. the password has to be pushed via button and those are randomly generated so every time you access you need to do a new mapping
  2. login.form().submit() was messing around the access to the page of needed data, by clicking the validate button was good enough

这是最终的代码,如果你发现使用不当,请不要犹豫,因为我是一个python新手和一个零星的编码器.

Here is the final code, do not hesitate to comment if you find a bad usage as I am a python novice and a sporadic coder.

import dryscrape
from bs4 import BeautifulSoup
from lxml import html
from time import sleep
from webkit_server import InvalidResponseError
from decimal import Decimal
import re
import sys 


def getAmundi(seconds=0):

    url = 'https://www.amundi-ee.com/psf'
    extensionInit='/#login'
    urlInit = url + extensionInit
    urlResult = url + '/#'
    timeoutRetry=1

    if 'linux' in sys.platform:
        # start xvfb in case no X is running. Make sure xvfb 
        # is installed, otherwise this won't work!
        dryscrape.start_xvfb()

    print "connecting to " + url + " with " + str(seconds) + "s of loading wait..." 
    s = dryscrape.Session()
    s.visit(urlInit)
    sleep(seconds)
    s.set_attribute('auto_load_images', False)
    s.set_header('User-agent', 'Google Chrome')
    while True:
        try:
            q = s.at_xpath('//*[@id="identifiant"]')
            q.set("XXXXXXXX")
        except Exception as ex:
            seconds+=timeoutRetry
            print "Failed, retrying to get the loggin field in " + str(seconds) + "s"
            sleep(seconds)
            continue
        break 

    #get password button mapping
    print "loging in ..."
    soup = BeautifulSoup(s.body())
    button_number = range(10)
    for x in range(0, 10):
     button_number[int(soup.findAll('button')[x].text.strip())] = x

    #needed button
    button_1 = button_number[1] + 1
    button_2 = button_number[2] + 1
    button_3 = button_number[3] + 1
    button_5 = button_number[5] + 1

    #push buttons for password
    button = s.at_xpath('//*[@id="num-pad"]/button[' + str(button_2) +']')
    button.click()
    button = s.at_xpath('//*[@id="num-pad"]/button[' + str(button_1) +']')
    button.click()
    ..............

    # Push the validate button
    button = s.at_xpath('//*[@id="content"]/router-view/div/form/div[3]/input')
    button.click()
    print "accessing ..."
    sleep(seconds)

    while True:
        try:
            soup = BeautifulSoup(s.body())
            total_lended = soup.findAll('span')[8].text.strip()
            total_lended = total_lended = Decimal(total_lended.encode('ascii','ignore').replace(',','.').replace(' ',''))
            print total_lended

        except Exception as ex:
            seconds+=1
            print "Failed, retrying to get the data in " + str(seconds) + "s"
            sleep(seconds)
            continue
        break 

    s.reset()

这篇关于干刮:“没有找到......的路线";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆