使用不带url的python脚本从网页下载文件,调用onClick函数 [英] downloading a file from a webpage using python script without url , calling onClick function

查看:591
本文介绍了使用不带url的python脚本从网页下载文件,调用onClick函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个链接点击下载的网页点击下载哪个文件。
我可以通过网页进行手动下载并单击此链接,但我需要通过python脚本下载此文件。



如果我看到source我可以看到anchor标记会运行一个js函数

 < a class =download-data-link1onclick =document .forms ['dataform']。submit()style =cursor:pointer; vertical-align:middle;> Download in csv< / a>  



但是我不知道csv文件的URL,我正在寻找一种通过python下载的方法。



我知道我们可以下载一个文件,如果我们使用httplib的网址,但不知道如何获得一个文件没有url。



试过几个例如在标题中添加了
'Content-Disposition':'attachment; filename =data.csv'}

k。任何想法?

解决方案

这里可以使用两个基本选项:


  • 模仿 onclick()调用中涉及的逻辑 - 在您的情况下,使 dataform 使用 请求 mechanize

  • 高级方法 - 自动化一个真正的浏览器,无头( PhantomJS )或不使用 selenium - 找到链接并点击它:

     来自selenium import webdriver 

    driver = webdriver.PhantomJS()
    driver.get('url here')

    driver.find_element_by_class_name('download-data-link1')。click()




尽管如此我明白,点击链接会触发一个下载浏览器对话框 - 然后 PhantomJS 不是一个选项,因为它不支持下载。对于 Chrome Firefox ,您需要调整浏览器功能以自动下载文件而不打开弹出窗口,请参阅:


There is a webpage which have a link "Click to Download" Clicking which a file is downloaded . I can download this file manually by going to webpage and clicking on this link however I need to download this file via a python script .

If i see the source i can see the anchor tag is will run a js function

<a class="download-data-link1" onclick=" document.forms['dataform'].submit()" style="cursor:pointer; vertical-align: middle;">Download in csv</a>

But i dont know the url of csv file and i am looking for a way to download it via python .

I know we can download a file if we have url using httplib but couldnt understand how to get a file without url .

Tried few things like in header added 'Content-Disposition': 'attachment;filename="data.csv"'}

but it dosent seems to work . Any ideas ?

解决方案

Two basic options can be applied here:

  • mimic the logic involved in the onclick() call - in your case, make the dataform form submission using requests, or mechanize
  • high-level approach - automate a real browser, headless (PhantomJS) or not, using selenium - find the link and click it:

    from selenium import webdriver
    
    driver = webdriver.PhantomJS()
    driver.get('url here')
    
    driver.find_element_by_class_name('download-data-link1').click()
    

Though, as far as I understand, clicking the link would trigger a "Download" browser dialog to appear - then PhantomJS is not an option since it doesn't support downloads. In case of Chrome or Firefox you would need to tweak browser capabilities to automatically download files without opening the popup, see:

这篇关于使用不带url的python脚本从网页下载文件,调用onClick函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆