硒刮JavaScript [英] selenium scraping javascript

查看:131
本文介绍了硒刮JavaScript的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我计划制作一个网站,从许多网站上刮下了大量每日更新的URLS(JavaScript)。我做了一些研究,发现硒,已经做了一些代码从一个网站提取一个URL

 从selenium import webdriver 
chrome_path = rC:\Users\hessien\Desktop\chromedriver_win32\chromedriver.exe
driver = webdriver.Chrome(chrome_path)
driver.get(http:// example.com)
driver.find_element_by_xpath(// * [@ id =header] / div / div [2] / div [3] / ul / li / label / a ).click()
element = driver.find_element_by_xpath(// * [@ id =s])
element.send_keys(example)
驱动程序.find_element_by_xpath(// * [@ id =searchform] / button / span)。click()
driver.find_element_by_xpath(// * [@ id =contenedor ] / div / div [2] / div [1] / div [2] / article / div [2] / div [1] / a)click()
driver.find_element_by_xpath( // * [@ id =playex] / div [1])。click()
elem = driver.find_element_by_xpath(// * [@ id =mediaplayer_media] /视频 )get_attribute( SRC);
print elem

但经过一些搜索,我发现硒主要用作测试框架不是为了刮和爬行!我的问题是硒可以做的工作吗?如果是,如何在HTML按钮中执行python代码?我也使用Django。
如果没有,你可以推荐任何可以完成任务的事情吗?

解决方案

如果你真的想做一个刮板我建议你使用美丽的汤,这是一个Python库,用于从HTML和XML文件中提取数据。
您可以将python脚本与django集成,这可以在一次点击时触发。
以下是链接。



https://pypi.python.org/pypi/beautifulsoup4


I'm planning on making a website that scrapes a lot of daily updated URLS (JavaScript) from many websites. I did some research and found selenium, already made some code to extract a URL from a website

from selenium import webdriver
chrome_path = r"C:\Users\hessien\Desktop\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("http://example.com")
driver.find_element_by_xpath("""//*[@id="header"]/div/div[2]/div[3]/ul/li/label/a""").click()
element = driver.find_element_by_xpath("""//*[@id="s"]""")
element.send_keys("example")
driver.find_element_by_xpath("""//*[@id="searchform"]/button/span""").click()
driver.find_element_by_xpath("""//*[@id="contenedor"]/div/div[2]/div[1]/div[2]/article/div[2]/div[1]/a""").click()
driver.find_element_by_xpath("""//*[@id="playex"]/div[1]""").click()
elem = driver.find_element_by_xpath("""//*[@id="mediaplayer_media"]/video""").get_attribute("src");
print elem

but after some searches I found out that selenium mainly used as a testing framework not for scraping and crawling!.. my question is can selenium do the work? if yes, how to execute the python code in an HTML button? I'm also using Django. if no, could you recommend anything that can do the task?

解决方案

If you really want to make a scrapper i recommend you to use Beautiful soup, which is a Python library for pulling data out of HTML and XML files. you can integrate the python script with django which can be triggered on a click. following is the link.

https://pypi.python.org/pypi/beautifulsoup4

这篇关于硒刮JavaScript的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆