Python 多处理:TypeError:预期的字符串或 Unicode 对象,找到 NoneType [英] Python multiprocessing: TypeError: expected string or Unicode object, NoneType found

查看:122
本文介绍了Python 多处理:TypeError:预期的字符串或 Unicode 对象,找到 NoneType的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试并行下载整个 ftp 目录.

#!/usr/bin/python导入系统导入日期时间导入操作系统从多处理导入进程,池从 ftplib 导入 FTPcurYear=""远程路径 ="本地路径 = ""def downloadFiles (remotePath,localPath):splitted = remotePath.split('/');主机=拆分[2]path='/'+'/'.join(splitted[3:])ftp = FTP(主机)ftp.登录()ftp.cwd(路径)文件名 = ftp.nlst()总计=len(文件名)我=0池 = 池()对于文件名中的文件名:local_filename = os.path.join(localPath,filename)pool.apply_async(downloadFile, (filename,local_filename,ftp))#downloadFile(filename,local_filename,ftp);我=我+1池.close()池加入()ftp.close()def 下载文件(文件名,本地文件名,ftp):文件 = 打开(本地文件名,'wb')ftp.retrbinary('RETR'+ 文件名, file.write)文件.close()def getYearFromArgs():如果 len(sys.argv) >= 2 并且 sys.argv[1] == "Y":年 = sys.argv[2]del sys.argv[1:2]别的:年 = str(datetime.datetime.now().year)回归年份defassignGlobals():全局p全局远程路径全局本地路径全球网址全球主机全球用户全局密码全局 sqldbremotePath = 'ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/isd-lite/%s/' % (curYear)localPath = '/home/isd-lite/%s/' % (curYear)定义主():全球货币年curYear=getYearFromArgs()assignGlobals()下载文件(远程路径,本地路径)如果 __name__ == "__main__":主要的()

但我得到这个例外:

线程 Thread-1 中的异常:回溯(最近一次调用最后一次):文件/usr/lib64/python2.6/threading.py",第 532 行,在 __bootstrap_innerself.run()运行中的文件/usr/lib64/python2.6/threading.py",第 484 行self.__target(*self.__args, **self.__kwargs)_handle_tasks 中的文件/usr/lib64/python2.6/multiprocessing/pool.py",第 225 行放置(任务)类型错误:预期的字符串或 Unicode 对象,发现 NoneType

如果我注释掉这一行:

pool.apply_async(downloadFile, (filename,local_filename,ftp)

并删除此行的注释:

downloadFile(filename,local_filename,ftp);

然后它工作得很好,但它很慢而且不是多线程的.

解决方案

更新,2014 年 5 月 9 日:

我已经确定了精确的限制.只要对象可以被 腌制,就可以跨进程边界将对象发送到工作进程Python 的pickle 工具.我在原始答案中描述的问题是因为我试图向工作人员发送文件句柄.一个快速实验说明了为什么这不起作用:

<预><代码>>>>f = open("/dev/null")>>>进口泡菜>>>pickle.dumps(f)回溯(最近一次调用最后一次):文件<stdin>",第 1 行,位于 <module>转储中的文件/usr/lib/python2.7/pickle.py",第 1374 行Pickler(文件,协议).转储(对象)转储中的文件/usr/lib/python2.7/pickle.py",第 224 行自我保存(对象)文件/usr/lib/python2.7/pickle.py",第306行,保存rv = 减少(self.proto)文件/usr/lib/python2.7/copy_reg.py",第 70 行,在 _reduce_ex引发类型错误,无法腌制 %s 个对象"% base.__name__类型错误:无法腌制文件对象

因此,如果您遇到导致您找到此堆栈溢出问题的 Python 错误,请确保您跨进程边界发送的所有内容都可以被pickle.

原答案:

我的回答有点晚了.但是,我在尝试使用 Python 的多处理模块时遇到了与原始海报相同的错误消息.我会记录我的发现,以便任何偶然发现此线程的人都可以尝试.

在我的例子中,发生错误是因为我试图发送到工作池中的内容:我试图传递一组文件对象以供池工作人员咀嚼.在 Python 中跨进程边界发送显然太多了.我通过发送指定输入和输出文件名字符串的池工作者字典解决了这个问题.

所以看起来你提供给函数的迭代,比如 apply_async(我使用了 map()imap_unordered())可以包含数字或字符串列表,甚至是详细的字典数据结构(只要值不是对象).

就你而言:

pool.apply_async(downloadFile, (filename,local_filename,ftp))

ftp 是一个可能导致问题的对象.作为一种解决方法,我建议将参数发送给工作人员(在这种情况下看起来像 hostpath)并让工作人员实例化对象并处理清理.

I am attempting to download a whole ftp directory in parallel.

#!/usr/bin/python
import sys
import datetime
import os
from multiprocessing import Process, Pool
from ftplib import FTP
curYear=""
remotePath =""
localPath = ""

def downloadFiles (remotePath,localPath):
        splitted = remotePath.split('/');
        host= splitted[2]
        path='/'+'/'.join(splitted[3:])
        ftp = FTP(host)
        ftp.login()
        ftp.cwd(path)
        filenames =  ftp.nlst()
        total=len(filenames)
        i=0
        pool = Pool()
        for filename in filenames:
                        local_filename = os.path.join(localPath,filename)
                        pool.apply_async(downloadFile, (filename,local_filename,ftp))
                        #downloadFile(filename,local_filename,ftp);
                        i=i+1

        pool.close()
        pool.join()
        ftp.close()

def downloadFile(filename,local_filename,ftp):
        file = open(local_filename, 'wb')
        ftp.retrbinary('RETR '+ filename, file.write)
        file.close()

def getYearFromArgs():
        if len(sys.argv) >= 2 and sys.argv[1] == "Y":
                year = sys.argv[2]
                del sys.argv[1:2]
        else:
                year = str(datetime.datetime.now().year)
        return year

def assignGlobals():
        global p
        global remotePath
        global localPath
        global URL
        global host
        global user
        global password
        global sqldb
        remotePath = 'ftp://ftp3.ncdc.noaa.gov/pub/data/noaa/isd-lite/%s/' % (curYear)
        localPath = '/home/isd-lite/%s/' % (curYear)

def main():
        global curYear
        curYear=getYearFromArgs()
        assignGlobals()
        downloadFiles(remotePath,localPath)

if __name__ == "__main__":
        main()

But I get this exception:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib64/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/lib64/python2.6/multiprocessing/pool.py", line 225, in _handle_tasks
    put(task)
TypeError: expected string or Unicode object, NoneType found

If I comment out this line:

pool.apply_async(downloadFile, (filename,local_filename,ftp)

and remove the comment on this line:

downloadFile(filename,local_filename,ftp);

Then it works just fine but it is slow and not multithreaded.

解决方案

Update, May 9, 2014:

I have determined the precise limitation. It is possible to send objects across process boundaries to worker processes as long as the objects can be pickled by Python's pickle facility. The problem which I described in my original answer occurred because I was trying to send a file handle to the workers. A quick experiment demonstrates why this doesn't work:

>>> f = open("/dev/null")
>>> import pickle
>>> pickle.dumps(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/pickle.py", line 1374, in dumps
    Pickler(file, protocol).dump(obj)
  File "/usr/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/usr/lib/python2.7/pickle.py", line 306, in save
    rv = reduce(self.proto)
  File "/usr/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle file objects

Thus, if you're encountering the Python error which led you to find this Stack Overflow question, make sure all the things you're sending across process boundaries can be pickled.

Original answer:

I'm a bit late to answering. However, I ran into the same error message as the original poster while trying to use Python's multiprocessing module. I'll record my findings so that anyone else who stumbles upon this thread has something to try.

In my case, the error occurred because of what I was trying to send to the pool of workers: I was trying to pass an array of file objects for the pool workers to chew on. That's apparently too much to send across process boundaries in Python. I solved the problem by sending the pool workers dictionaries which specified input and output filename strings.

So it seems that the iterable that you supply to the function such as apply_async (I used map() and imap_unordered()) can contain a list of numbers or strings, or even a detailed dictionary data structure (as long as the values aren't objects).

In your case:

pool.apply_async(downloadFile, (filename,local_filename,ftp))

ftp is an object, which might be causing the problem. As a workaround, I would recommend sending the parameters to the worker (looks like host and path in this case) and let the worker instantiate the object and deal with the cleanup.

这篇关于Python 多处理:TypeError:预期的字符串或 Unicode 对象,找到 NoneType的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆