无法运行我的班级搜寻器 [英] Trouble running my class crawler

查看:68
本文介绍了无法运行我的班级搜寻器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

用python写一个类爬虫,我陷入了中途。我不知道如何将[app_crawler类生成的]新生成的链接传递到 App类,以便在那里进行其余操作。如果有人通过展示如何运行它为我指明了正确的方向,那么我将非常有帮助。提前致谢。顺便说一句,它也正在运行,但仅用于单个链接。

Writing a class crawler in python, I got stuck on the half-way. I can't find any idea how to pass the newly produced links [generated by app_crawler class] to the "App" class so that I can do the rest over there. If anyone points me into the right direction by showing how can I run it, I would be very helpful. Thanks in advance. Btw, it is also running but only for a single link.

from lxml import html
import requests

class app_crawler:

    starturl = "https://itunes.apple.com/us/app/candy-crush-saga/id553834731?mt=8"

    def crawler(self):
        self.get_app(self.starturl)


    def get_app(self, link):
        page = requests.get(link)
        tree = html.fromstring(page.text)
        links = tree.xpath('//div[@class="lockup-info"]//*/a[@class="name"]/@href')
        for link in links:
            return link # I wish to make this link penetrate through the App class but can't get any idea


class App(app_crawler):

    def __init__(self, link):
        self.links = [link]

    def process_links(self):
        for link in self.links:
            self.get_item(link)

    def get_item(self, url):
        page = requests.get(url)
        tree = html.fromstring(page.text)
        name = tree.xpath('//h1[@itemprop="name"]/text()')[0]
        developer = tree.xpath('//div[@class="left"]/h2/text()')[0]        
        price = tree.xpath('//div[@itemprop="price"]/text()')[0]
        print(name, developer, price)

if __name__ == '__main__':

    parse = App(app_crawler.starturl)
    parse.crawler()
    parse.process_links()

我创建了另一个效果不错,但我想使上面的爬虫具有不同的外观。以下是工作人员的链接:
https://www.dropbox.com/s/galjorcdynueequ/Working%20one.txt?dl=0

I've created another one which is working fine but I wanted to make the above crawler to get a different look. Here is the link for the working one: "https://www.dropbox.com/s/galjorcdynueequ/Working%20one.txt?dl=0"

推荐答案

您的代码有几个问题:


  • App 继承自 app_crawler ,但您向 App提供了一个 app_crawler 实例。__init __

  • App inherits from app_crawler yet you provide an app_crawler instance to App.__init__.

App .__ init __ 调用 app_crawler .__ init __ 而不是 super().__ init __()

不仅 app_crawler.get_app 实际上不会返回任何 ,它会创建一个全新的 App 对象。

Not only app_crawler.get_app doesn't actually return anything, it creates a brand new App object.

这会导致您的代码将 app_crawler 对象传递给 requests.get 而不是URL字符串。

This results in your code passing an app_crawler object to requests.get instead of a url string.

您的代码中太多封装

考虑以下比不工作的代码短的代码,使代码更整洁,并且不需要不必要地传递对象:

Consider the following code that is shorter than your not-working code, cleaner and without needing to needlessly pass objects around:

from lxml import html
import requests

class App:
    def __init__(self, starturl):
        self.starturl = starturl
        self.links = []

    def get_links(self):
        page = requests.get(self.starturl)
        tree = html.fromstring(page.text)
        self.links = tree.xpath('//div[@class="lockup-info"]//*/a[@class="name"]/@href')

    def process_links(self):
        for link in self.links:
            self.get_docs(link)

    def get_docs(self, url):
        page = requests.get(url)
        tree = html.fromstring(page.text)
        name = tree.xpath('//h1[@itemprop="name"]/text()')[0]
        developper = tree.xpath('//div[@class="left"]/h2/text()')[0]
        price = tree.xpath('//div[@itemprop="price"]/text()')[0]
        print(name, developper, price)

if __name__ == '__main__':
    parse = App("https://itunes.apple.com/us/app/candy-crush-saga/id553834731?mt=8")
    parse.get_links()
    parse.process_links()

输出

Cookie Jam By Jam City, Inc. Free
Zombie Tsunami By Mobigame Free
Flow Free By Big Duck Games LLC Free
Bejeweled Blitz By PopCap Free
Juice Jam By Jam City, Inc. Free
Candy Crush Soda Saga By King Free
Bubble Witch 3 Saga By King Free
Candy Crush Jelly Saga By King Free
Farm Heroes Saga By King Free
Pet Rescue Saga By King Free

这篇关于无法运行我的班级搜寻器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆