使用 webscraping 提供数据框 [英] Feed dataframe with webscraping
问题描述
我不想将一些抓取的值附加到数据帧中.我有这个代码:
I'mt trying to append some scraped values to a dataframe. I have this code:
import time
import requests
import pandas
import pandas as pd
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import json
# Grab content from URL
url = "https://www.remax.pt/comprar?searchQueryState={%22regionName%22:%22%22,%22businessType%22:1,%22listingClass%22:1,%22page%22:1,%22sort%22:{%22fieldToSort%22:%22ContractDate%22,%22order%22:1},%22mapIsOpen%22:false,%22listingTypes%22:[],%22prn%22:%22%22}"
PATH = 'C:\DRIVERS\chromedriver.exe'
driver = webdriver.Chrome(PATH)
option = Options()
option.headless = False
#chromedriver =
#driver = webdriver.Chrome(chromedriver)
#driver = webdriver.Firefox() #(options=option)
#driver.get(url)
#driver.implicitly_wait(10) # in seconds
time.sleep(1)
wait = WebDriverWait(driver, 10)
driver.get(url)
rows = driver.find_elements_by_xpath("//div[@class='row results-list ']/div")
data=[]
for row in rows:
price=row.find_element_by_xpath(".//p[@class='listing-price']").text
print(price)
address=row.find_element_by_xpath(".//p[@class='listing-address']").text
print(address)
Tipo=row.find_element_by_xpath(".//p[@class='listing-type']").text
print(Tipo)
Area=row.find_element_by_xpath(".//p[@class='listing-area']").text
print(Area)
Quartos=row.find_element_by_xpath(".//p[@class='icon-bedroom-full']").text
print(Quartos)
data.append([price],[address],[Tipo],[Area],[Quartos])
#driver.quit()
问题在于它返回以下错误:
The problem is that it returns the following error:
NoSuchElementException Traceback (most recent call last)
<ipython-input-16-9e4d01985cda> in <module>
49 price=row.find_element_by_xpath(".//p[@class='listing-price']").text
50 print(price)
---> 51 address=row.find_element_by_xpath(".//p[@class='listing-address']").text
52 print(address)
53 Tipo=row.find_element_by_xpath(".//p[@class='listing-type']").text
~\anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py in find_element_by_xpath(self, xpath)
349 element = element.find_element_by_xpath('//div/td[1]')
350 """
--> 351 return self.find_element(by=By.XPATH, value=xpath)
352
353 def find_elements_by_xpath(self, xpath):
~\anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py in find_element(self, by, value)
656 value = '[name="%s"]' % value
657
--> 658 return self._execute(Command.FIND_CHILD_ELEMENT,
659 {"using": by, "value": value})['value']
660
~\anaconda3\lib\site-packages\selenium\webdriver\remote\webelement.py in _execute(self, command, params)
631 params = {}
632 params['id'] = self._id
--> 633 return self._parent.execute(command, params)
634
635 def find_element(self, by=By.ID, value=None):
~\anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
319 response = self.command_executor.execute(driver_command, params)
320 if response:
--> 321 self.error_handler.check_response(response)
322 response['value'] = self._unwrap_value(
323 response.get('value', None))
~\anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
240 alert_text = value['alert'].get('text')
241 raise exception_class(message, screen, stacktrace, alert_text)
--> 242 raise exception_class(message, screen, stacktrace)
243
244 def _value_or_default(self, obj, key, default):
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//p[@class='listing-address']"}
(Session info: chrome=90.0.4430.72)
但是当我只尝试使用第一个元素时,它会返回一个价格列表.如果我在数据框中给它不同的位置并且我使用相同类型的路径有什么区别?
But when I try only with the first element it returns a list of prices. What is the difference if I'm giving it the differente places in the dataframe and I use the same type of path?
推荐答案
您遇到的主要问题是定位器.1 首先,比较我使用的定位器和您代码中的定位器.2 二、添加显式等待from selenium.webdriver.support import expected_conditions as EC
3 三、去除不必要的代码.
The main problem you have are locators.
1 First, compare the locators I use and the ones in your code.
2 Second, Add explicit waits from selenium.webdriver.support import expected_conditions as EC
3 Third, remove unnecessary code.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
url = "https://www.remax.pt/comprar?searchQueryState={%22regionName%22:%22%22,%22businessType%22:1,%22listingClass%22:1,%22page%22:1,%22sort%22:{%22fieldToSort%22:%22ContractDate%22,%22order%22:1},%22mapIsOpen%22:false,%22listingTypes%22:[],%22prn%22:%22%22}"
driver.get(url)
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_all_elements_located((By.XPATH, "//div[@class='row results-list ']/div")))
rows = driver.find_elements_by_xpath("//div[@class='row results-list ']/div")
data = []
for row in rows:
price_p = row.find_element_by_xpath(".//p[@class='listing-price']").text
address = row.find_element_by_xpath(".//h2[@class='listing-address']").text
type = row.find_element_by_xpath(".//li[@class='listing-type']").text
area = row.find_element_by_xpath(".//li[@class='listing-area']").text
quartos = row.find_element_by_xpath(".//li[@class='listing-bedroom']").text
data.append([price, address, price_p, area, quartos])
driver.close()
driver.quit()
请注意,我是在 Linux 上完成的.您的 Chrome 驱动程序位置不同.另外,要打印列表,请使用:
Please note that I did in on Linux. Your Chrome driver location is different. Also, to print the list use:
for p in data:
print(p.text, sep='\n')
您可以随意修改它.我收到以下输出:
You can modify it as you like. I received the following output:
['240 000 €', 'Lisboa - Lisboa, Carnide', 'Apartamento', '54 m\n2', '1']
['280 000 €', 'Lisboa - Lisboa, Beato', 'Apartamento', '80 m\n2', '1']
['285 000 €', 'Lisboa - Lisboa, Beato', 'Apartamento', '83 m\n2', '1']
['290 000 €', 'Lisboa - Lisboa, Beato', 'Apartamento', '85 m\n2', '1']
['280 000 €', 'Lisboa - Lisboa, Beato', 'Apartamento', '80 m\n2', '1']
['290 000 €', 'Lisboa - Lisboa, Beato', 'Apartamento', '85 m\n2', '1']
['285 000 €', 'Lisboa - Lisboa, Beato', 'Apartamento', '83 m\n2', '1']
['80 000 €', 'Santarém - Cartaxo, Ereira e Lapa', 'Terreno', '12440 m\n2', '1']
['260 000 €', 'Lisboa - Sintra, Queluz e Belas', 'Prédio', '454 m\n2', '1']
['37 500 €', 'Santarém - Torres Novas, Torres Novas (Santa Maria, Salvador e Santiago)', 'Prédio', '92 m\n2', '1']
['505 000 €', 'Lisboa - Sintra, Algueirão-Mem Martins', 'Duplex', '357 m\n2', '1']
['135 700 €', 'Lisboa - Mafra, Milharado', 'Terreno', '310 m\n2', '1']
['132 800 €', 'Lisboa - Mafra, Milharado', 'Terreno', '310 m\n2', '1']
['133 440 €', 'Lisboa - Mafra, Milharado', 'Terreno', '310 m\n2', '1']
['179 000 €', 'Lisboa - Mafra, Milharado', 'Terreno', '310 m\n2', '1']
['75 000 €', 'Lisboa - Vila Franca de Xira, Vila Franca de Xira', 'Apartamento', '52 m\n2', '1']
['575 000 €', 'Porto - Matosinhos, Matosinhos e Leça da Palmeira', 'Apartamento', '140 m\n2', '1']
['35 000 €', 'Setúbal - Almada, Caparica e Trafaria', 'Outros - Habitação', '93 m\n2', '1']
['550 000 €', 'Leiria - Alcobaça, Évora de Alcobaça', 'Moradia', '160 m\n2', '1']
['550 000 €', 'Lisboa - Loures, Santa Iria de Azoia, São João da Talha e Bobadela', 'Moradia', '476 m\n2', '1']
这篇关于使用 webscraping 提供数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!