如果将信息添加到数据框中的条件 [英] If condition for adding information into a dataframe
问题描述
我需要创建一个包含以下列的数据框:
I'd need to create a dataframe with the following columns:
WEB | Country | Organisation
我正在从网站中提取这些信息:但是,有些网站在该网站上没有任何信息.这导致我在更新数据帧时出现一些问题.不幸的是,该代码一次只能在一个网站上运行,否则会出现验证码.请参阅下面的代码以了解单个输出:
I'm extracting these information from a website: however, there are some webs which do not have any information on the website. This is causing me some issues in updating the dataframe. Unfortunately, the code can work only one website a time, otherwise a captcha appears. Please see below the code to have an idea on the individual output:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
element=[]
organisation=[]
x=['stackoverflow.com'] # ['livevsfox.ca'] I would suggest to try first one, then the other one
frame_dict={}
element.append(x) # I am keeping this just because I'd like to consider a for loop in future
chrome_options = webdriver.ChromeOptions()
driver=webdriver.Chrome('path')
response=driver.get('website/'+x) # here x should stackoverflow.com, then the other web
try:
wait = WebDriverWait(driver, 30)
driver.execute_script("window.scrollTo(0, 1000)")
try:
error = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"section.selection div.container h2"))) # updated after answer from another post and comment below
except:
continue
# Country
c = wait.until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Company data']/../following-sibling::div/descendant::b[text()='Country']/../following-sibling::div"))).text
country.append(c)
# Organisation
try:
org=wait.until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Company data']/../following-sibling::div/descendant::b[text()='Organisation']/../following-sibling::div"))).text
organisation.append(org)
except:
organisation.append("Data not available")
except:
break
driver.quit()
frame_dict.update({'WEB': element, 'Organisation': organisation, 'Country': country})
df=pd.DataFrame.from_dict(frame_dict)
代码应执行以下操作:
- 对于
x = stackoverflow.com
(这只是工作网址的一个例子),打开chrome;如果有信息,则提取有关组织和国家的信息;如果没有,则在数据框中添加Missing";退出镀铬; - 对于
x = livevsfox.ca
,打开chrome;如果有信息,则提取有关组织和国家的信息;如果没有,则在Organisation
和Country
列中添加Missing";退出铬.
- for
x = stackoverflow.com
(this is just an example of working url), open chrome; if there is info, then extract information on organisation and country; if there is not, add 'Missing' to the dataframe; exit chrome; - for
x = livevsfox.ca
, open chrome; if there is info, then extract information on organisation and country; if there is not, then add 'Missing' inOrganisation
andCountry
columns; exit chrome.
那么预期的输出是:
WEB Country Organisation
stackoverflow.com US Stack Exchange, Inc.
livevsfox.ca Missing Missing
livevsfox.ca
实际上返回以下消息:
livevsfox.ca
returns, in fact, the following message:
Sorry, livevsfox.ca could not be found or reached (error code 404)
当我查找 stackoverflow.com 时没有出现的消息.由于 stackoverflow.com 有国家和组织,我可以在数据框中添加此信息,但我不能为 livevsfox.ca 做同样的事情.我认为可能的解决方案如下:
message that does not appear when I look for stackoverflow.com. Since stackoverflow.com has Country and Organisation, I can add this info in the dataframe, but I can't do the same for livevsfox.ca . I'm thinking a possible solution could be the following:
- 检查
h2 class
元素是否包含上述消息(抱歉,无法找到或到达 x(错误代码 404)"
):这将表示该网络未检测到任何信息; - 如果网络没有信息,则在数据框中添加
Missing
(或NA
,由您决定); - 否则,网络会在数据框中添加信息(所有者和国家/地区).
- check if the
h2 class
element contains the message above ("Sorry, x could not be found or reached (error code 404)"
) : this would mean that the web has no information detected; - if the web has no information, then add
Missing
(orNA
, up to you) in the dataframe; - otherwise, the web has information (Owner & Country) to be added in the dataframe.
希望你能提供一些帮助.
I hope you can provide some help.
推荐答案
我已经找到了解决这个问题的方法.
I have found a solution to this problem.
首先,我检测h2 class
元素如下:
First, I detect the h2 class
element as follows:
message = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"section.section div.container h2"))).text
然后,我检查 message
是否包含特定文本;例如.
Then, I check if message
contains specific text; for example.
if 'Sorry,' in message:
如果是,那么我将值附加到我将添加到数据框中的列表中:
If it does, then I append the value to my lists that I will add into the dataframe:
organisation.append('Missing')
country.append('Missing')
代码:
try:
message = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR,"section.section div.container h2"))).text
if 'Sorry,' in message:
organisation.append('Missing')
country.append('Missing')
except:
continue
这篇关于如果将信息添加到数据框中的条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!