我无法从确实的特定时间段内抓取每个链接内容 [英] I am unable to scrape each link content for specific time period from indeed

查看:24
本文介绍了我无法从确实的特定时间段内抓取每个链接内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Python 和网页抓取的新手.您的帮助将不胜感激.我是编程和练习的新手.我正在使用 python 和 selenium 进行网页抓取

I am new to python and web scraping. Your help will be appreciated. I am newbie in programming and practicing . i am using python and selenium for web scraping

我正试图从中抓取数据.目标是找到过去 24 小时内发布的所有职位,并抓取职位详细信息页面上提供的外部链接,链接文本为在公司网站上申请"、标题、公司、姓名、地点、职位描述.

I am trying to scrape the data from indeed. goal is to find all jobs posted in last 24 hour and scrape the external link which is available on job detail page with link text "Apply on company site", Heading, company, name, location, Job description.

我编写了以下代码,但它正确获取了页面上的所有链接,然后当我尝试打开每个链接时,它只打开第一个链接.如何打开我一一获取的所有链接.提前致谢,这是我的代码示例:

i write following code but it is fetching all links on the page correctly and then when i try to open the each link it is only opening first link. How can i open all links which i fetch one by one. Thanks in advance , here is my code sample:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys

Path = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(Path)

driver.get("https://indeed.ae/")
print(driver.title)
search = driver.find_element_by_name("l")
search.send_keys("Dubai")
search.send_keys(Keys.RETURN)

try:
    td = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.ID, "resultsCol"))
    )
    divs = td.find_elements_by_tag_name("div")

    for div in divs:
        try:
            title = div.find_element_by_class_name("title")
            anchors = title.find_elements_by_tag_name('a')
            links = []
            for anchor in anchors:
                link = anchor.get_attribute('href')
                links.append(link)
                print(links)
                for link in links:
                    url = driver.get(link)
        except:
            continue

finally:
    driver.quit()


driver.quit()

推荐答案

问题是你得到了 href,去页面,抓取它并说给我下一个 href代码>但是现在看,您再也找不到它了,因为您在不同的页面上.

Problem is you get the href, go to the page, scrape it and say give me the next href but now see, you can't find it anymore because you are on a different page.

解决方案:抓取所有网址并将它们放入列表中.迭代这个列表,然后一个一个地继续,每个人都抓取它并从该列表中选择下一个元素.

Solution: Scrape all urls and put them in a list. Iterate the list and one by one go on each of them scrape it and choose next element from that list.

这篇关于我无法从确实的特定时间段内抓取每个链接内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆