在 python 中的 WebScraping javascript 页面 [英] WebScraping javascript page in python
本文介绍了在 python 中的 WebScraping javascript 页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
你好,
Python 中的新功能,我正在尝试抓取 javascript 页面:
解决方案
以下代码使用 PySelenium编写一>.
导入时间从硒导入网络驱动程序国家 = []法律名称 = []雷 = []驱动程序 = webdriver.Chrome()driver.implicitly_wait(5)对于范围内的 i (1,30395):driver.get('https://search.gleif.org/#/search/fulltextFilterId=LEIREC_FULLTEXT¤tPage='+str(i)+'&perPage=50&expertMode=false#results-section')时间.sleep(5)country += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell country"]/a')]legal_name += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell legal-name"]/a')]lei += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell lei"]/a')]
登录(使用相应的元素更改此设置.)
driver.find_element_by_id("用户名").send_keys("xxxx")driver.find_element_by_name("密码").send_keys("yyyy")driver.find_element_by_class("loginButton").click()
获取页面内容
print(driver.page_source)
Hello World,
New in Python, I am trying to webscrape a javascript page : https://search.gleif.org/#/search/
Please find below the result from my code (using request)
<!DOCTYPE html>
<html>
<head><meta charset="utf-8"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<title>LEI Search 2.0</title>
<link href="/static/icons/favicon.ico" rel="shortcut icon" type="image/x-icon"/>
<link href="https://fonts.googleapis.com/css?family=Open+Sans:200,300,400,600,700,900&subset=cyrillic,cyrillic-ext,greek,greek-ext,latin-ext,vietnamese" rel="stylesheet"/>
<link href="/static/css/main.045139db483277222eb714c1ff8c54f2.css" rel="stylesheet"/></head>
<body>
<div id="app"></div>
<script src="/static/js/manifest.2ae2e69a05c33dfc65f8.js" type="text/javascript"></script>
<script src="/static/js/vendor.6bd9028998d5ca3bb72f.js" type="text/javascript"></script>
<script src="/static/js/main.5da23c5198041f0ec5af.js" type="text/javascript"></script>
</body>
</html>
The question:
Instead of retrieving the above script:
"src="/static/js/manifest.2ae2e69a05c33dfc65f8.js" type="text/javascript""
I would like to have the content of the table in order to store it.
Table that I want to scrape
解决方案
Following code is written using PySelenium.
import time
from selenium import webdriver
country = []
legal_name = []
lei = []
driver = webdriver.Chrome()
driver.implicitly_wait(5)
for i in range(1,30395):
driver.get('https://search.gleif.org/#/search/fulltextFilterId=LEIREC_FULLTEXT¤tPage='+str(i)+'&perPage=50&expertMode=false#results-section')
time.sleep(5)
country += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell country"]/a')]
legal_name += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell legal-name"]/a')]
lei += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell lei"]/a')]
Logging in (Change this with the respective elements.)
driver.find_element_by_id("UserName").send_keys("xxxx")
driver.find_element_by_name("Password").send_keys("yyyy")
driver.find_element_by_class("loginButton").click()
Get page content
print(driver.page_source)
这篇关于在 python 中的 WebScraping javascript 页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文