在 python 中的 WebScraping javascript 页面 [英] WebScraping javascript page in python

查看：39 发布时间：2022/1/5 15:53:03 javascript python selenium beautifulsoup request

本文介绍了在 python 中的 WebScraping javascript 页面的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

你好，

Python 中的新功能，我正在尝试抓取 javascript 页面:

解决方案

以下代码使用 PySelenium.

导入时间从硒导入网络驱动程序国家 = []法律名称 = []雷 = []驱动程序 = webdriver.Chrome()driver.implicitly_wait(5)对于范围内的 i (1,30395):driver.get('https://search.gleif.org/#/search/fulltextFilterId=LEIREC_FULLTEXT&currentPage='+str(i)+'&perPage=50&expertMode=false#results-section')时间.sleep(5)country += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell country"]/a')]legal_name += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell legal-name"]/a')]lei += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell lei"]/a')]

登录(使用相应的元素更改此设置.)

driver.find_element_by_id("用户名").send_keys("xxxx")driver.find_element_by_name("密码").send_keys("yyyy")driver.find_element_by_class("loginButton").click()

获取页面内容

print(driver.page_source)

Hello World,

New in Python, I am trying to webscrape a javascript page : https://search.gleif.org/#/search/

Please find below the result from my code (using request)

<!DOCTYPE html>
<html>
<head><meta charset="utf-8"/>
<meta content="width=device-width,initial-scale=1" name="viewport"/>
<title>LEI Search 2.0</title>
<link href="/static/icons/favicon.ico" rel="shortcut icon" type="image/x-icon"/>
<link href="https://fonts.googleapis.com/css?family=Open+Sans:200,300,400,600,700,900&amp;subset=cyrillic,cyrillic-ext,greek,greek-ext,latin-ext,vietnamese" rel="stylesheet"/>
<link href="/static/css/main.045139db483277222eb714c1ff8c54f2.css" rel="stylesheet"/></head>
<body>
<div id="app"></div>
<script src="/static/js/manifest.2ae2e69a05c33dfc65f8.js" type="text/javascript"></script>
<script src="/static/js/vendor.6bd9028998d5ca3bb72f.js" type="text/javascript"></script>
<script src="/static/js/main.5da23c5198041f0ec5af.js" type="text/javascript"></script>
</body>
</html>

The question: Instead of retrieving the above script:
"src="/static/js/manifest.2ae2e69a05c33dfc65f8.js" type="text/javascript""

I would like to have the content of the table in order to store it.

Table that I want to scrape

解决方案

Following code is written using PySelenium.

import time
from selenium import webdriver

country = []
legal_name = []
lei = []

driver = webdriver.Chrome()
driver.implicitly_wait(5)

for i in range(1,30395):
    driver.get('https://search.gleif.org/#/search/fulltextFilterId=LEIREC_FULLTEXT&currentPage='+str(i)+'&perPage=50&expertMode=false#results-section')

    time.sleep(5)

    country += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell country"]/a')]
    legal_name += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell legal-name"]/a')]
    lei += [i.get_attribute('innerHTML') for i in driver.find_elements_by_xpath('//*[@class="table-cell lei"]/a')]

Logging in (Change this with the respective elements.)

driver.find_element_by_id("UserName").send_keys("xxxx")
driver.find_element_by_name("Password").send_keys("yyyy")
driver.find_element_by_class("loginButton").click()

Get page content

print(driver.page_source)

这篇关于在 python 中的 WebScraping javascript 页面的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在 python 中的 WebScraping javascript 页面 [英] WebScraping javascript page in python

问题描述

登录(使用相应的元素更改此设置.)

获取页面内容

I would like to have the content of the table in order to store it.

Logging in (Change this with the respective elements.)

Get page content

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

在 python 中的 WebScraping javascript 页面 [英] WebScraping javascript page in python

问题描述

登录(使用相应的元素更改此设置.)

获取页面内容

I would like to have the content of the table in order to store it.

Logging in (Change this with the respective elements.)

Get page content

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭