time.strptime()-参数0必须是str,而不是字节 [英] time.strptime() - argument 0 must be str, not bytes

查看:313
本文介绍了time.strptime()-参数0必须是str,而不是字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很明显,我已经知道strftimestrptime不喜欢字节字符串作为参数,但是我在这里很烦,因为我有点需要读取保存了不同字符编码的文件内容在其中,我需要处理所有这些内容,并将此文本文件中每行的时间部分发送到strptime().

Obviously I'm aware already that strftime and strptime doesn't like byte strings as parameters, however i'm in a pickle here because I sort of need to read a file content which has different character encodings saved in it and i need to handle them all, and send the time portion of each line in this text-file to strptime().

一种快速的解决方法是分割字符串,确保时间仅包含数字和破折号,但是是否可以通过某种方式传递字节对象而无需弄清楚对strptime()的编码?

A quick fix would be to split the string, making sure the time simply contains numbers and dashes, but is it possible to somehow pass the byte object without trying to figure out the encoding to strptime()?

with open('file.txt', 'rb') as fh:
    for line in fh:
        time.strptime(line, '%Y-%m-%d ...')

这显然会失败.我想做repr(line),但这会使字符串看起来像b'2014-01-07 ...',我可以剥离它.

This would obviously fail. I thought of doing repr(line) but that causes the string to look like b'2014-01-07 ...', which i could strip..

推荐答案

line是字节字符串,因为您以二进制模式打开了文件.您需要解码字符串;如果它是与模式匹配的日期字符串,则可以简单地使用ASCII:

line is a bytestring, because you opened the file in binary mode. You'll need to decode the string; if it is a date string matching the pattern, you can simply use ASCII:

 time.strptime(line.decode('ascii'), '%Y-%m-%d ...')

您可以添加'ignore'参数来忽略任何非ASCII的内容,但是无论如何,该行很可能不适合您的日期格式.

You can add a 'ignore' argument to ignore anything non-ASCII, but chances are the line won't fit your date format then anyway.

请注意,您传递的值不能包含比解析格式中的更多值.无论您使用哪种编解码器,一行上带有其他文本但未明确被strptime()模式覆盖的行将不起作用.

Note that you cannot pass a value that contains more than the parsed format in it; a line with other text on it not explicitly covered by the strptime() pattern will not work, whatever codec you used.

如果您的输入在编解码器中确实有很大的不同,则无论如何都需要捕获异常.

And if your input really varies that widely in codecs, you'll need to catch exceptions one way or another anyway.

除了UTF-16或UTF-32,我不希望您遇到任何使用不同字节作为阿拉伯数字的编解码器.如果您的输入确实在一个文件中混合了多字节编解码器和单字节编解码器,那么您手头的问题就更大了,至少因为换行处理会变得很麻烦.

Aside from UTF-16 or UTF-32, I would not expect you to encounter any codecs that use different bytes for the arabic numerals. If your input really mixes multi-byte and single-byte codecs in one file, you have a bigger problem on your hand, not in the least because newline handling will be majorly messed up.

这篇关于time.strptime()-参数0必须是str,而不是字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆