"UnicodeDecodeError:'utf-8'编解码器无法解码字节0x80".在Google colaboratory上使用pydrive加载泡菜文件时 [英] "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80" while loading pickle file using pydrive on google colaboratory

查看:223
本文介绍了"UnicodeDecodeError:'utf-8'编解码器无法解码字节0x80".在Google colaboratory上使用pydrive加载泡菜文件时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是第一次使用google colaboratory(colab)和pydrive.我正在尝试使用colab将数据加载到在我的Google驱动器中特定目录中的pickle文件中写入的"CAS_num_strings"中,数据为:

I am new to using google colaboratory (colab) and pydrive along with it. I am trying to load data in 'CAS_num_strings' which was written in a pickle file in a specific directory on my google drive using colab as:

pickle.dump(CAS_num_strings,open('CAS_num_strings.p', 'wb'))
dump_meta = {'title': 'CAS.pkl', 'parents': [{'id':'1UEqIADV_tHic1Le0zlT25iYB7T6dBpBj'}]} 
pkl_dump = drive.CreateFile(dump_meta)
pkl_dump.SetContentFile('CAS_num_strings.p')
pkl_dump.Upload()
print(pkl_dump.get('id'))

在'id':'1UEqIADV_tHic1Le0zlT25iYB7T6dBpBj'所在的位置,请确保它具有此ID所指定的特定父文件夹.最后一个打印命令给我输出:

Where 'id':'1UEqIADV_tHic1Le0zlT25iYB7T6dBpBj' makes sure that it has a specific parent folder with this given by this id. The last print command gives me the output:

'1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH'

因此,我能够创建和转储其ID为"1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH"的泡菜文件.现在,出于其他目的,我想在另一个colab脚本中加载此pickle文件.为了加载,我使用命令集:

Hence, I am able to create and dump the pickle file whose id is '1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH'. Now, I want to load this pickle file in another colab script for a different purpose. In order to load, I use the command set:

cas_strings = drive.CreateFile({'id':'1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH'})
print('title: %s, mimeType: %s' % (cas_strings['title'], cas_strings['mimeType']))
print('Downloaded content "{}"'.format(cas_strings.GetContentString()))

这给了我输出:

title: CAS.pkl, mimeType: text/x-pascal

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-9-a80d9de0fecf> in <module>()
     30 cas_strings = drive.CreateFile({'id':'1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH'})
     31 print('title: %s, mimeType: %s' % (cas_strings['title'], cas_strings['mimeType']))
---> 32 print('Downloaded content "{}"'.format(cas_strings.GetContentString()))
     33 
     34 

/usr/local/lib/python3.6/dist-packages/pydrive/files.py in GetContentString(self, mimetype, encoding, remove_bom)
    192                     self.has_bom == remove_bom:
    193       self.FetchContent(mimetype, remove_bom)
--> 194     return self.content.getvalue().decode(encoding)
    195 
    196   def GetContentFile(self, filename, mimetype=None, remove_bom=False):

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

如您所见,它找到文件CAS.pkl,但是无法解码数据.我希望能够解决此错误.我了解到正常的utf-8编码/解码在使用'wb'和'rb'选项进行的正常泡菜转储和加载过程中均能顺利进行.但是在当前情况下,转储后,我似乎无法从上一步中创建的google驱动器中的pickle文件中加载它.错误存在于我的某个地方,无法在"return self.content.getvalue().decode(encoding)"处指定如何解码数据.我似乎在这里找不到( https://developers. google.com/drive/v2/reference/files#resource-representations )要修改的关键字/元数据标签.任何帮助表示赞赏.谢谢

As you can see, it finds the file CAS.pkl but cannot decode the data. I want to be able to resolve this error. I understand that the normal utf-8 encoding/decoding works smoothly during normal pickle dumping and loading with the 'wb' and 'rb' options. However in the present case, after dumping I can't seem to load it from the pickle file in google drive created in the previous step. The error exists somewhere in me not being able to specify how to decode the data at "return self.content.getvalue().decode(encoding)". I can't seem to find from here (https://developers.google.com/drive/v2/reference/files#resource-representations) which keywords/metadata tags to modify. Any help is appreciated. Thanks

推荐答案

实际上,在朋友的帮助下,我找到了一个很好的答案.我使用的是GetContentFile,而不是GetContentStrings,它是SetContentFile的副本.这会将文件加载到当前工作区中,就像其他pickle文件一样,我可以从中读取文件.最终,数据完全加载到了cas_nums中.

Actually, I found an elegant answer with a little help from my friends. Instead of GetContentStrings, I use GetContentFile, which is the counterpart of the SetContentFile. This loads the file in the current workspace from which I can read it like any pickle file. Finally, the data gets loaded into cas_nums all well.

cas_strings = drive.CreateFile({'id':'1ZgZfEaKgqGnuBD40CY8zg0MCiqKmi1vH'})
print('title: %s, mimeType: %s' % (cas_strings['title'], cas_strings['mimeType']))
cas_strings.GetContentFile(cas_strings['title'])
cas_nums = pickle.load(open(cas_strings['title'],'rb'))

有关此问题的更多详细信息,请参见pydrive文档中的下载文件内容-

More details about this can be found in the pydrive documentation in the section download file content - http://pythonhosted.org/PyDrive/filemanagement.html#download-file-content

这篇关于"UnicodeDecodeError:'utf-8'编解码器无法解码字节0x80".在Google colaboratory上使用pydrive加载泡菜文件时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆