如何使用Python,Selenium和BeautifulSoup在Web上剪切JSP? [英] How do I web-scrape a JSP with Python, Selenium and BeautifulSoup?
本文介绍了如何使用Python,Selenium和BeautifulSoup在Web上剪切JSP?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我是使用Python进行网络抓取的绝对初学者.我正在尝试从以下URL中提取自动取款机的位置:
I'm an absolute beginner experimenting web-scraping with Python. I'm trying to extract the location of ATMs from this URL:
https://www.visa.com/atmlocator/mobile/index.jsp#(页面:结果,参数:(查询:'东京,%20Japan'))
使用以下代码.
#Script to scrape locations and addresses from VISA's ATM locator
# import the necessary libraries (to be installed if not available):
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
#ChromeDriver
#(see https://chromedriver.chromium.org/getting-started as reference)
driver = webdriver.Chrome("C:/Users/DefaultUser/Local Settings/Application Data/Google/Chrome/Application/chromedriver.exe")
offices=[] #List to branches/ATM names
addresses=[] #List to branches/ATM locations
driver.get("https://www.visa.com/atmlocator/mobile/index.jsp#(page:results,params:(query:'Tokyo,%20Japan'))")
content = driver.page_source
soup = BeautifulSoup(content, features = "lxml")
#the following code extracts all the content inside the tags displaying the information requested
for a in soup.findAll('li',attrs={'class':'visaATMResultListItem'}):
name=a.find('li', attrs={'class':'data-label'})
address=a.find('li', attrs={'class':'data-label'})
offices.append(name.text)
addresses.append(address.text)
#next row defines the dataframe with the results of the extraction
df = pd.DataFrame({'Office':offices,'Address':addresses})
#next row displays dataframe content
print(df)
#export data to .CSV file named 'branches.csv'
with open('branches.csv', 'a') as f:
df.to_csv(f, header=True)
起初,该脚本似乎可以正常运行,因为Chromedriver启动并根据浏览器的要求显示结果,但未返回任何结果:
The script seems to work correctly, at first, since Chromedriver starts and shows the results as required in the browser, but no result is returned:
Empty DataFrame
Columns: [Office, Address]
Index: []
Process finished with exit code 0
也许我在选择选择器时犯了一个错误?
Maybe I made a mistake in choosing the selectors?
非常感谢您的帮助
推荐答案
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import time
from bs4 import BeautifulSoup
import csv
options = Options()
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get("https://www.visa.com/atmlocator/mobile/index.jsp#(page:results,params:(query:'Tokyo,%20JAPAN'))")
time.sleep(2)
soup = BeautifulSoup(driver.page_source, 'html.parser')
na = []
addr = []
for name in soup.findAll("a", {'class': 'visaATMPlaceLink'}):
na.append(name.text)
for add in soup.findAll("p", {'class': 'visaATMAddress'}):
addr.append(add.get_text(strip=True, separator=" "))
with open('out.csv', 'w', newline="") as f:
writer = csv.writer(f)
writer.writerow(['Name', 'Address'])
for _na, _addr in zip(na, addr):
writer.writerow([_na, _addr])
driver.quit()
输出:点击此处
这篇关于如何使用Python,Selenium和BeautifulSoup在Web上剪切JSP?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文