学习python和线程.我认为我的代码可以无限运行.帮我找bug? [英] Learning python and threading. I think my code runs infinitely. Help me find bugs?

查看:14
本文介绍了学习python和线程.我认为我的代码可以无限运行.帮我找bug?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我现在开始学习 python,我绝对爱上了它.

So I've started learning python now, and I absolutely am in love with it.

我正在构建一个小型 Facebook 数据抓取工具.基本上,它将使用 Graph API 并抓取指定数量用户的名字.它在单线程中工作正常(或者我猜没有线程).

I'm building a small scale facebook data scraper. Basically, it will use the Graph API and scrape the first names of the specified number of users. It works fine in a single thread (or no thread I guess).

我使用在线教程想出了以下多线程版本(更新代码):

I used online tutorials to come up with the following multithreaded version (updated code):

import requests
import json
import time
import threading
import Queue

GraphURL = 'http://graph.facebook.com/'
first_names = {} # will store first names and their counts
queue = Queue.Queue()

def getOneUser(url):
    http_response = requests.get(url) # open the request URL
    if http_response.status_code == 200:
        data = http_response.text.encode('utf-8', 'ignore') # Get the text of response, and encode it
        json_obj = json.loads(data) # load it as a json object
        # name = json_obj['name']
        return json_obj['first_name']
        # last = json_obj['last_name']
    return None

class ThreadGet(threading.Thread):
    """ Threaded name scraper """
    def __init__(self, queue):
        threading.Thread.__init__(self)
        self.queue = queue

    def run(self):
        while True:
            #print 'thread started
'
            url = GraphURL + str(self.queue.get())
            first = getOneUser(url) # get one user's first name
            if first is not None:
                if first_names.has_key(first): # if name has been encountered before
                    first_names[first] = first_names[first] + 1 # increment the count
                else:
                    first_names[first] = 1 # add the new name
            self.queue.task_done()
            #print 'thread ended
'

def main():
    start = time.time()
    for i in range(6):
        t = ThreadGet(queue)
        t.setDaemon(True)
        t.start()

    for i in range(100):
        queue.put(i)

    queue.join()

    for name in first_names.keys():
        print name + ': ' + str(first_names[name])

    print '----------------------------------------------------------------'
    print '================================================================'
    # Print top first names
    for key in first_names.keys():
        if first_names[key] > 2:
            print key + ': ' + str(first_names[key])

    print 'It took ' + str(time.time()-start) + 's'

main()

老实说,我不明白代码的某些部分,但我明白了主要思想.输出什么都没有.我的意思是 shell 里面什么都没有,所以我相信它会继续运行.

To be honest, I don't understand some of the parts of the code but I get the main idea. The output is nothing. I mean the shell has nothing in it, so I believe it keeps on running.

所以我正在做的是用整数填充 queue ,这些整数是 fb 上的用户 ID.然后每个 ID 用于构建 api 调用 URL.getOneUser 一次返回一个用户的名称.该 task (ID) 被标记为完成"并继续前进.

So what I am doing is filling queue with integers that are the user id's on fb. Then each ID is used to build the api call URL. getOneUser returns the name of one user at a time. That task (ID) is marked as 'done' and it moves on.

上面的代码有什么问题?

What is wrong with the code above?

推荐答案

你原来的 run 函数只处理了队列中的一项.您总共只从队列中删除了 5 个项目.

Your original run function only processed one item from the queue. In all you've only removed 5 items from the queue.

通常run函数看起来像

run(self):
    while True:
         doUsefulWork()

即他们有一个循环,可以完成重复性工作.

i.e. they have a loop which causes the recurring work to be done.

OP 编辑​​代码以包含此更改.

OP edited code to include this change.

一些其他有用的尝试:

  • run函数中添加一个print语句:你会发现它只被调用了5次.
  • 删除 queue.join() 调用,这是导致模块阻塞的原因,然后您将能够探测队列的状态.
  • run 的整个主体放入一个函数中.验证您可以以单线程方式使用该函数来获得所需的结果,然后
  • 只用一个工作线程尝试一下,然后最后继续
  • 多个工作线程.
  • Add a print statement into the run function: you'll find that it is only called 5 times.
  • Remove the queue.join() call, this is what is causing the module to block, then you will be able to probe the state of the queue.
  • put the entire body of run into a function. Verify that you can use that function in a single threaded manner to get the desired results, then
  • try it with just a single worker thread, then finally go for
  • multiple worker threads.

这篇关于学习python和线程.我认为我的代码可以无限运行.帮我找bug?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆