当我使用 xpath 从网站提取信息时没有收集数据 [英] No data collected when I extract info from a website using xpath

查看：16 发布时间：2021/9/22 20:42:57 python selenium selenium-webdriver web-scraping webdriver

本文介绍了当我使用 xpath 从网站提取信息时没有收集数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要从网站中提取信息.该网站在以下路径中有信息:

<div class="accordion-block__text">服务器</div></div>...<div class="block__col"><b>Country</b></div>

运行

尝试:# 国家c=driver.find_element_by_xpath("//div[contains(@class,'block__col') and contains(text(),'Country')]").get_attribute('textContent')country.append(c)除了:country.append(错误")

我创建了一个包含所有错误的 df.我对所有领域都感兴趣(但为了解决这个问题，只有一个会很好)，包括 Trustscore(数字)，但我不知道是否有可能得到它.我在 Chrome 上使用 selenium，网络驱动程序.该网站是 https://www.scamadviser.com/check-website.

代码

这是完整的代码:

def 诈骗(df):chrome_options = webdriver.ChromeOptions()信任=[]国家 = []isp_country = []query=df['URL'].unique().tolist()driver=webdriver.Chrome('mypath',chrome_options=chrome_options))对于查询中的 x:等待 = WebDriverWait(驱动程序，10)response=driver.get('https://www.scamadviser.com/check-website/'+x)尝试:等待 = WebDriverWait(驱动程序，30)# 缺少信任分数# 国家c=driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", driver.find_element_by_xpath("//div[contains(@class,'block__col') and contains(text(),'Country')]")).get_attribute('innerText')country.append(c)# ISP 国家ic=driver.find_element_by_xpath("//div[contains(@class,'block__col') and contains(text(),'ISP')]").get_attribute('innerText')isp_country.append(ic)除了:# 缺少信任分数country.append(错误")isp_country.append(错误")# 创建数据框dict = {'URL':查询，'Trustscore':信任，'国家':国家，'ISP':isp_country}df=pd.DataFrame(dict)驱动程序退出()返回 df

您可以尝试例如 df['URL'] 等于

stackoverflow.comgitHub.com

解决方案

您正在寻找 innerText 而不是 textContent.

代码:

尝试:# 国家c = driver.find_element_by_xpath("//div[contains(@class,'block__col') and contains(text(),'Country')]").get_attribute('innerText')打印(c)country.append(c)除了:country.append(错误")

更新 1 :

如果已经使用的定位器是正确的.

driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", driver.find_element_by_xpath("//div[contains(@class,'block__col') and contains(text(),'国家')]"))

或者可以尝试使用此 xpath 的两个选项:-

//div[contains(@class,'block__col')]/b[text()='Country']

更新 2 :

试试:等待 = WebDriverWait(驱动程序，30)# 缺少信任分数

# 国家时间.sleep(2)ele = driver.find_element_by_xpath("//div[contains(@class,'block__col')]/b[text()='Country']")driver.execute_script("arguments[0].scrollIntoView(true);", ele)country.append(ele.get_attribute('innerText'))时间.sleep(2)# ISP 国家ic = driver.find_element_by_xpath("//div[contains(@class,'block__col')]/b[text()='ISP']")driver.execute_script("arguments[0].scrollIntoView(true);", ele)isp_country.append(ic.get_attribute('innerText'))

更新 3 :

获取公司数据，国家名称.

使用这个xpath:

//div[text()='公司数据']/../following-sibling::div/descendant::b[text()='Country']/../following-sibling::div

另外，在使用这个 xpath 之前，请确保一些事情.

以全屏模式启动浏览器.
使用 js 滚动，然后使用 sroll 进入视图或操作链.

代码:-

driver.maximize_window()时间.sleep(2)driver.execute_script("window.scrollTo(0, 1000)")时间.sleep(2)driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='公司数据']"))))# 现在使用提到的 xpath.company_data_country_name` = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Company data']/../following-sibling::div/descendant::b[text()='Country']/../following-sibling::div")))打印(company_data_country_name.text)

I'd need to extract information from a website. This website has information inside the following path:

<div class="accordion-block__question">
<div class="accordion-block__text">Server</div></div>
...
<div class="block__col"><b>Country</b></div>

Running

try: 
            # Country
            c=driver.find_element_by_xpath("//div[contains(@class,'block__col') and contains(text(),'Country')]").get_attribute('textContent')
            country.append(c)   
except: 
            country.append("Error")

I create a df with all errors. I'd interest in all the fields (but for fixing this issue, just one would be great), included the Trustscore (number), but I don't know if it'd possible to get it. I'm using selenium, web driver on Chrome. The website is https://www.scamadviser.com/check-website.

CODE

This is the entire code:

def scam(df):
    chrome_options = webdriver.ChromeOptions()

    trust=[]
    country = [] 
    isp_country = [] 
        
    query=df['URL'].unique().tolist() 
    driver=webdriver.Chrome('mypath',chrome_options=chrome_options))
    
    for x in query:
        
        wait = WebDriverWait(driver, 10)
        response=driver.get('https://www.scamadviser.com/check-website/'+x)
        
        try: 
            wait = WebDriverWait(driver, 30)
            # missing trustscore

            # Country
            c=driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", driver.find_element_by_xpath("//div[contains(@class,'block__col') and contains(text(),'Country')]")).get_attribute('innerText')
            country.append(c)  

            # ISP country
        ic=driver.find_element_by_xpath("//div[contains(@class,'block__col') and contains(text(),'ISP')]").get_attribute('innerText')
            isp_country.append(ic)
        
        except: 
            # missing trustscore
            country.append("Error")
            isp_country.append("Error")
            

    # Create dataframe
    dict = {'URL': query, 'Trustscore':trust, 'Country': country, 'ISP': isp_country} 
    df=pd.DataFrame(dict)

    driver.quit()
    
    return df

You can try for example with df['URL'] equal to

stackoverflow.com
gitHub.com

解决方案

You are looking for innerText not textContent.

Code :

try: 
  # Country
  c = driver.find_element_by_xpath("//div[contains(@class,'block__col') and contains(text(),'Country')]").get_attribute('innerText')
  print(c)
  country.append(c)   
except: 
   country.append("Error")

Updated 1 :

In case already used locator is correct.

driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", driver.find_element_by_xpath("//div[contains(@class,'block__col') and contains(text(),'Country')]"))

or may be try with both the options with this xpath :-

//div[contains(@class,'block__col')]/b[text()='Country']

Udpated 2 :

try: wait = WebDriverWait(driver, 30) # missing trustscore

# Country
time.sleep(2)
ele = driver.find_element_by_xpath("//div[contains(@class,'block__col')]/b[text()='Country']")
driver.execute_script("arguments[0].scrollIntoView(true);", ele)
country.append(ele.get_attribute('innerText'))

time.sleep(2)
# ISP country
ic = driver.find_element_by_xpath("//div[contains(@class,'block__col')]/b[text()='ISP']")
driver.execute_script("arguments[0].scrollIntoView(true);", ele)
isp_country.append(ic.get_attribute('innerText'))

Udpate 3 :

to get the Company data, Country name.

use this xpath :

//div[text()='Company data']/../following-sibling::div/descendant::b[text()='Country']/../following-sibling::div

also, make sure few things before using this xpath.

Launch browser in full screen mode.
Scroll using js, and then use sroll into view or Actions chain.

Code :-

driver.maximize_window()
time.sleep(2)
driver.execute_script("window.scrollTo(0, 1000)")
time.sleep(2)
driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Company data']"))))
# now use the mentioned xpath.

company_data_country_name` = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[text()='Company data']/../following-sibling::div/descendant::b[text()='Country']/../following-sibling::div")))
print(company_data_country_name.text)

这篇关于当我使用 xpath 从网站提取信息时没有收集数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

当我使用 xpath 从网站提取信息时没有收集数据 [英] No data collected when I extract info from a website using xpath

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

当我使用 xpath 从网站提取信息时没有收集数据 [英] No data collected when I extract info from a website using xpath

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭