基于 Selenium 的抓取代码失败,错误为 NoSuchElementException [英] Selenium-based scraping code fails with the error NoSuchElementException

查看:28
本文介绍了基于 Selenium 的抓取代码失败,错误为 NoSuchElementException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个可以抓取不同数据的 Python 代码.例如,它从这个 HTML 代码中删除了 Website:

<a data-ix="show-popup-on-click" target="_blank" rel="nofollow" href="https://mylink.org/" class="button full w-button" style="transition: all 0.4s ease 0s;">网站</a>

它工作正常,但现在失败并出现错误:

<块引用>

NoSuchElementException: Message: {"errorMessage":"无法找到带有链接文本的元素'网站'","re​​quest":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"95","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:40581","User-Agent":"Pythonhttp auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\":\"链接文本\", \"sessionId\":\"a7a441f0-0f6a-11e8-ad3a-6121f74a30f4\",\"值\":\"网站\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","re​​lative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/a7a441f0-0f6a-11e8-ad3a-6121f74a30f4/element"}}截图:可通过屏幕获取

这是我的代码:

导入请求从 bs4 导入 BeautifulSoup从硒导入网络驱动程序驱动程序 = webdriver.PhantomJS()driver.set_window_size(1120, 550)driver.get(链接)driver.implicitly_wait(10)网站 = driver.find_element_by_link_text("网站").get_attribute("href")

我做错了什么?

更新:

<a data-ix="show-popup-on-click" target="_blank"rel="nofollow" href="https://example.com/"class="button full w-button"style="transition: all 0.4s ease 0s;">网站</a><div class="space big"></div><a target="_blank" rel="nofollow"href="https://example.com/storage/b/2/0/2/WhitepaperLive.pdf"class="button-2 w-button">白皮书</a><div class="space big"></div><a class="button-2 w-condition-invisible w-button">程序</a><div class="space big w-condition-invisible"></div><div><div class="div-block-4 w-clearfix"><div class="div-block-2">令牌:</div><div class="div-block-5 w-clearfix"><div class="text-block-12">UTC</div>

<div class="div-block-4 w-clearfix"><div class="div-block-2">价格:</div><div class="div-block-5 w-clearfix"><div class="text-block-12">1 LUC=0.05 美元</div>

<div class="div-block-4 w-clearfix"><div class="div-block-2">购买方式:</div><div class="div-block-5 w-clearfix"><div class="text-block-12">美元、欧元</div>

<div class="div-block-4 w-clearfix"><div class="div-block-2">平台:</div><div class="div-block-5 w-clearfix"><div class="text-block-12">MyPlatform</div>

<div class="div-block-4 w-clearfix w-condition-invisible"><div class="div-block-2">KYC:</div><div class="div-block-5 w-clearfix"><div class="text-block-12">否</div>

<div class="div-block-4 w-clearfix"><div class="div-block-2">KYC:</div><div class="div-block-5 w-clearfix"><div class="text-block-12">是</div>

<div class="div-block-4 w-clearfix"><div class="div-block-2">位置:</div><div class="div-block-5 w-clearfix"><div class="text-block-12">马耳他</div>

<div class="div-block-4 w-clearfix"><div class="div-block-2">无法加入:</div><div class="div-block-5 w-clearfix"><div class="text-block-12">美国</div>

<div class="space big"></div><div class="div-block-4 w-clearfix"><div class="div-block-2">开始:</div><div class="div-block-5 w-clearfix"><div class="text-block-12">2018 年 1 月 25 日</div>

<div class="div-block-4 w-clearfix"><div class="div-block-2">结束:</div><div class="div-block-5 w-clearfix"><div class="text-block-12">2018 年 2 月 5 日</div>

<div class="space big"></div><div class="div-block-4 w-clearfix"><div class="div-block-2">Start2:</div><div class="div-block-5 w-clearfix"><div class="text-block-12">2018 年 2 月 12 日</div>

<div class="div-block-4 w-clearfix"><div class="div-block-2">End2:</div><div class="div-block-5 w-clearfix"><div class="text-block-12">2018 年 3 月 5 日</div>

<div><div class="div-block-33"><div class="space big"></div><div><a target="_blank" rel="nofollow"class="button green full w-condition-invisible w-button">现在加入白名单»</a><div class="div-block-34"><a target="_blank" rel="nofollow" href="http://we-do-not-have-slack.com"class="link-block-2 w-inline-block"><img src="https://global-uploads.webflow.com/903_slack-symbol.png" alt="ICO 松弛链接"></a><a target="_blank" rel="nofollow" href="https://twitter.com/live" class="link-block-2 w-inline-block"><img src="https://global-uploads.webflow.com/f4000142b091_twitter%20(1).png" width="16" alt="ICO Twitter 链接"></a><a target="_blank" rel="nofollow" href="https://t.me/live" class="link-block-2 w-inline-block"><img src="https://global-uploads.webflow.com/790001798dfe_telegram.png" alt="ICO 电报链接"></a><a target="_blank" rel="nofollow" href="http://we-do-not-have-GitHub.com" class="link-block-2 w-inline-block"><img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b26a_github-logo.png" alt="ICO GitHun 链接"></a><a target="_blank" rel="nofollow" href="https://www.facebook.com/Play2Live-504880049864038/" class="link-block-2 w-inline-block"><img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b117/59d510290116ac0001964c8e_facebook.png" alt="Facebook 链接"></a><a target="_blank" rel="nofollow" href="https://talk.org/index.php?topic=2381679.0" class="link-block-2 w-inline-block"><img src="https://global-uploads.webflow.com/0011f8c3c_talk.jpg" alt="对话链接"></a>

解决方案

代码没有问题,在检查网页中的 Website 链接时,我可以看到文本为Website",但是如果我使用相同的文本通过如下所示的链接文本查找元素,我将收到 NoSuchElementException

website = driver.find_element_by_link_text("网站").get_attribute("href")打印(网站)

我试过给 'waits' 并使用 partial_link_text 但没有运气.

然后我尝试获取标签名称a"的所有元素并使用以下代码打印文本.

elements = driver.find_elements_by_tag_name("a")对于元素中的元素:打印(元素.文本)

后来我才知道它不是网站",而是网站".但我不确定它为什么会这样.

将所有字符 od 网站更改为大写后,我能够识别该元素并从中获取 href.

driver.get("https://topicolist.com/ico/adhive")website = driver.find_element_by_link_text("WEBSITE").get_attribute("href")打印(网站)

希望它能解决您的问题.

I have a Python code that scraps different data. For example, it scraps the Website from this HTML code:

<a data-ix="show-popup-on-click" target="_blank" rel="nofollow" href="https://mylink.org/" class="button full w-button" style="transition: all 0.4s ease 0s;">Website</a>

It was working properly, but now it fails with the error:

NoSuchElementException: Message: {"errorMessage":"Unable to find element with link text 'Website'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"95","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:40581","User-Agent":"Python http auth"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"link text\", \"sessionId\": \"a7a441f0-0f6a-11e8-ad3a-6121f74a30f4\", \"value\": \"Website\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/a7a441f0-0f6a-11e8-ad3a-6121f74a30f4/element"}} Screenshot: available via screen

This is my code:

import requests
from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.PhantomJS()
driver.set_window_size(1120, 550)
driver.get(link)
driver.implicitly_wait(10)

website = driver.find_element_by_link_text("Website").get_attribute("href")

What am I doing wrong?

UPDATE:

<div class="column-space w-col w-col-4">
   <a data-ix="show-popup-on-click" target="_blank" 
      rel="nofollow" href="https://example.com/" 
      class="button full w-button" 
      style="transition: all 0.4s ease 0s;">Website</a>

   <div class="space big"></div>
   <a target="_blank" rel="nofollow" 
      href="https://example.com/storage/b/2/0/2/WhitepaperLive.pdf" 
      class="button-2 w-button">Whitepaper</a>
   <div class="space big"></div>
   <a class="button-2 w-condition-invisible w-button">Program</a>
   <div class="space big w-condition-invisible"></div>
   <div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Token:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">UTC</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Price:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">1 LUC=0,05 USD</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Buy with:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">USD, EUR</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Platform:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">MyPlatform</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix w-condition-invisible">
         <div class="div-block-2">KYC:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">No</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">KYC:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">Yes</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Location:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">Malta</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Can't join:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">USA</div>
         </div>
      </div>
      <div class="space big"></div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Start:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">January 25, 2018</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">End:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">February 5, 2018</div>
         </div>
      </div>
      <div class="space big"></div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">Start2:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">February 12, 2018</div>
         </div>
      </div>
      <div class="div-block-4 w-clearfix">
         <div class="div-block-2">End2:</div>
         <div class="div-block-5 w-clearfix">
            <div class="text-block-12">March 5, 2018</div>
         </div>
      </div>
      <div>
         <div class="div-block-33">
            <div class="space big"></div>
            <div>
               <a target="_blank" rel="nofollow" 
               class="button green full w-condition-invisible w-button">JOIN WHITELIST NOW »</a>
               <div class="div-block-34">
                  <a target="_blank" rel="nofollow" href="http://we-do-not-have-slack.com" 
                     class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/903_slack-symbol.png" alt="ICO Slack link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://twitter.com/live" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/f4000142b091_twitter%20(1).png" width="16" alt="ICO Twitter link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://t.me/live" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/790001798dfe_telegram.png" alt="ICO Telegram link">
                  </a>
                  <a target="_blank" rel="nofollow" href="http://we-do-not-have-GitHub.com" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b26a_github-logo.png" alt="ICO GitHun link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://www.facebook.com/Play2Live-504880049864038/" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/59cf77c1fb0edc0001b4b117/59d510290116ac0001964c8e_facebook.png" alt="Facebook link">
                  </a>
                  <a target="_blank" rel="nofollow" href="https://talk.org/index.php?topic=2381679.0" class="link-block-2 w-inline-block">
                     <img src="https://global-uploads.webflow.com/0011f8c3c_talk.jpg" alt="Talk link">
                  </a>
               </div>
            </div>
         </div>
      </div>
   </div>
</div>

解决方案

There is no problem in the code , on inspecting the Websitelink from web page i can see the text as "Website" but if i use the same text to find the element by link text like below i am getting NoSuchElementException

website = driver.find_element_by_link_text("Website").get_attribute("href")
print(website)

I have tried giving 'waits' and used partial_link_text also but no luck.

Then i tried fetching all the element of tag name "a" and print the text from those with the below code.

elements = driver.find_elements_by_tag_name("a")
for element in elements:
    print(element.text)

Later i got to know its not the "Website" its "WEBSITE". But i am not sure why its behaving like this.

After changing the all characters od website to capital i am able to identify the element and fetch the href from that.

driver.get("https://topicolist.com/ico/adhive")
website = driver.find_element_by_link_text("WEBSITE").get_attribute("href")
print(website)

Hope its solves your problem.

这篇关于基于 Selenium 的抓取代码失败,错误为 NoSuchElementException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
前端开发最新文章
热门教程
热门工具
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆