使用硒和beautifulsoup进行Web抓取..解析和选择按钮时遇到麻烦 [英] Web scrapping using selenium and beautifulsoup.. trouble in parsing and selecting button

查看:77
本文介绍了使用硒和beautifulsoup进行Web抓取..解析和选择按钮时遇到麻烦的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正尝试在以下网站上使用网址"url =' https://angel.co/life-科学' .该网站包含8000多个数据.在此页面上,我需要诸如公司名称和链接,加入日期和关注者之类的信息.在此之前,我需要通过单击按钮对关注者列进行排序.然后,通过单击更多隐藏项来加载更多信息按钮.该页面最多可点击20次(更多隐藏的内容),此后它不会加载更多信息.但是我只能通过排序来获取主要关注者信息.在这里,我实现了click()事件,但它是显示错误.

I am trying to web srape the following website "url='https://angel.co/life-sciences' ". The website contains more than 8000 data. From this page I need the information like company name and link, joined date and followers. Before that I need to sort the followers column by clicking the button. then load more information by clicking more hidden button. The page is clickable (more hidden) content at the max 20 times, after that it doesn't load more information. But I can take only top follower information by sorting it. Here I have implemented the click() event but it's showing error.

Unable to locate element: {"method":"xpath","selector":"//div[@class="column followers sortable sortable"]"} #before edit this was my problem, using wrong class name

所以我需要在这里给更多的睡眠时间吗?(尝试给出相同但错误相同的结果)

So do I need to give more sleep time here?(tried giving that but same error)

我需要解析所有上述信息,然后访问那些网站的各个链接以仅刮取该html页面的内容div.

I need to parse all the above information then visit individual link of those website to scrape content div of that html page only.

请给我建议一种方法

这是我当前的代码,我还没有使用beautifulsoup添加html解析部分.

Here is my current code, I have not added html parsing part using beautifulsoup.

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from time import sleep
from selenium import webdriver 
from bs4 import BeautifulSoup
#import urlib2
driver = webdriver.Chrome()
url='https://angel.co/life-sciences'
driver.get(url)
sleep(10)

driver.find_element_by_xpath('//div[@class="column followers sortable"]').click()#edited
sleep(5)
for i in range(2):
    driver.find_element_by_xpath('//div[@class="more hidden"]').click()
    sleep(8)

sleep(8)
element = driver.find_element_by_id("root").get_attribute('innerHTML')
#driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
#WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CLASS_NAME, 'more hidden')))
'''
results = html.find_elements_by_xpath('//div[@class="name"]')
 # wait for the page to load

for result in results:
    startup = result.find_elements_by_xpath('.//a')
    link = startup.get_attribute('href')
    print(link)
'''
page_source = driver.page_source

html = BeautifulSoup(element, 'html.parser')
#for link in html.findAll('a', {'class': 'startup-link'}):
#       print link

divs = html.find_all("div", class_=" dts27 frw44 _a _jm")

在添加关注者"点击事件之前,上面的代码可以正常工作并且为我提供了html源代码.

The above code was working and was giving me html source before I have added the Followers click event.

我的最终目标是将所有这五种信息(例如公司名称,其链接,加入日期,关注者数量和公司描述(在访问其各自的链接后获得))导入CSV或xls文件.

My final goal is to import all these five information like Name of the company, Its link, Joined date, No of Followers and the company description (which to be obtained after visiting their individual links) into a CSV or xls file.

可以得到帮助和评论. 这是我的第一个python和selenium工作,很少混淆,需要指导.

Help and comments are apprecieted. This is my first python work and selenium, so little confused and need guidance.

谢谢:-)

推荐答案

click方法旨在模拟鼠标单击.它用于可单击的元素,例如按钮,下拉列表,复选框等.您已将此方法应用于不可可单击的div元素.诸如divspanframe之类的元素用于组织HTML并提供字体装饰等.

The click method is intended to emulate a mouse click; it's for use on elements that can be clicked, such as buttons, drop-down lists, check boxes, etc. You have applied this method to a div element which is not clickable. Elements like div, span, frame and so on are used to organise HTML and provide for decoration of fonts, etc.

要使此代码正常工作,您需要确定页面中实际可单击的元素.

To make this code work you will need to identify the elements in the page that are actually clickable.

这篇关于使用硒和beautifulsoup进行Web抓取..解析和选择按钮时遇到麻烦的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆