用scrapy和selenium提取链接 [英] extracting link with scrapy and selenium

查看：59 发布时间：2019/6/15 22:39:21 Python

本文介绍了用scrapy和selenium提取链接的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用selenium和scrapy导航到数据表，我想将links / href解压缩到csv文件。到目前为止，我所尝试的一切似乎都不起作用，我不确定该尝试什么或如何获取链接。

这里的重要部分是该表我试图从中提取链接/ href：

I am using selenium and scrapy to navigate to a table of data and I would like to extract the links/href to a csv file. so far everything i have tried doesn't seem to work and I'm unsure what to try or how to go about getting the links.

here's the important part of the table I am trying to extract the links/href from:

<tr class="even">

<td class="paddingColumnValue"> </td>

<td class="nameColumnValue"><a href="/m/app?service=external/sdata_details&sp=12812" class="sdata" title="Click here for additional details.">click</a></td>

<td class="amountColumnValue">$600,000.00</td>

<td class="myListColumnValue"><a href="" onclick="doMyListButton(this.firstChild.getAttribute('src'),this.name);myListHandler(this.name);return false;"  önmouseover="return true" name="12812"><img src="/m/images/add.gif" border="0" title="Click to add this to your list" name="A12812"></a></td>


</tr>

我最接近实际获取数据的是这段代码...（注意表格id = search_results）

the closest I've gotten to actually getting data is with this code...(note table id = search_results)

import time
from scrapy.item import Item, Field
from selenium import webdriver
from scrapy.spider import BaseSpider
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import HtmlXPathSelector

class ElyseAvenueItem(Item):
    link = Field()

class ElyseAvenueSpider(BaseSpider):
    name = "elyse"
    allowed_domains = ["domain.com"]
    start_urls = [
    'http://www.domain.com']
    
    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self, response):
        self.driver.get(response.url)
        el1 = self.driver.find_element_by_xpath("//*[@id='headerRelatedLinks']/ul/li[5]/a")
        el1.click()
        time.sleep(2)
        el2 = self.driver.find_element_by_xpath("/html/body/form/table/tbody/tr[2]/td[2]/table/tbody/tr/td[3]/p[3]/a[1]")
        if el2:
            el2.click()
            time.sleep(2)
        el3 = self.driver.find_element_by_xpath("/html/body/form/table/tbody/tr[2]/td[2]/table[1]/tbody/tr/td[3]/a")
        if el3:
            el3.click()
            time.sleep(20)
            
        
            titles = self.driver.find_elements_by_class_name("sdata")
            items = []
            for titles in titles:
                item = ElyseAvenueItem()
                item ["link"] = titles.find_element_by_xpath("//*[@id='search_results']/tbody/tr[2]/td[2]/a")
                items.append(item)
                return item

输出到csv：

selenium.webdriver.remote.webelement.webelement对象=at =0x03f16e90 =

谢谢你的帮助。我可以发布更多我的尝试和他们的输出，如果这将有所帮助。就像我说的，我需要的是href

output to csv:
selenium.webdriver.remote.webelement.webelement object="" at="" 0x03f16e90=""

thank you for the help. i can post more of my attempts and their output if that will help. Like I said, what i need is the href

推荐答案

600,000.00 < / td >

< td class = myListColumnValue > < a href = onclick = doMyListButton（this.firstChild.getAttribute（'src'），this.name）; myListHandler（this.name）;返回false; önmouseover = return true name = 12812 > < img src = / m / images / add.gif border = 0 title = 点击将其添加到列表中 name = A12812 > < / a > < / td >

< / tr >

600,000.00</td> <td class="myListColumnValue"><a href="" onclick="doMyListButton(this.firstChild.getAttribute('src'),this.name);myListHandler(this.name);return false;" önmouseover="return true" name="12812"><img src="/m/images/add.gif" border="0" title="Click to add this to your list" name="A12812"></a></td> </tr>

我最接近实际得到的数据是使用此代码...（注意表id = search_results）

the closest I've gotten to actually getting data is with this code...(note table id = search_results)

import time from scrapy.item import Item, Field from selenium import webdriver from scrapy.spider import BaseSpider from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.selector import HtmlXPathSelector class ElyseAvenueItem(Item): link = Field() class ElyseAvenueSpider(BaseSpider): name = "elyse" allowed_domains = ["domain.com"] start_urls = [ 'http://www.domain.com'] def __init__(self): self.driver = webdriver.Firefox() def parse(self, response): self.driver.get(response.url) el1 = self.driver.find_element_by_xpath("//*[@id='headerRelatedLinks']/ul/li[5]/a") el1.click() time.sleep(2) el2 = self.driver.find_element_by_xpath("/html/body/form/table/tbody/tr[2]/td[2]/table/tbody/tr/td[3]/p[3]/a[1]") if el2: el2.click() time.sleep(2) el3 = self.driver.find_element_by_xpath("/html/body/form/table/tbody/tr[2]/td[2]/table[1]/tbody/tr/td[3]/a") if el3: el3.click() time.sleep(20) titles = self.driver.find_elements_by_class_name("sdata") items = [] for titles in titles: item = ElyseAvenueItem() item ["link"] = titles.find_element_by_xpath("//*[@id='search_results']/tbody/tr[2]/td[2]/a") items.append(item) return item

输出到csv：

selenium.webdriver.remote.webelement.webelement object =at =0x03f16e90 =

谢谢你的帮助。我可以发布更多我的尝试和他们的输出，如果这将有所帮助。就像我说的，我需要的是href

output to csv:
selenium.webdriver.remote.webelement.webelement object="" at="" 0x03f16e90=""

thank you for the help. i can post more of my attempts and their output if that will help. Like I said, what i need is the href

这篇关于用scrapy和selenium提取链接的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

~~查看全文~~

用scrapy和selenium提取链接 [英] extracting link with scrapy and selenium

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

用scrapy和selenium提取链接 [英] extracting link with scrapy and selenium

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭