从文件中读取 utf-8 转义序列 [英] Reading utf-8 escape sequences from a file

查看:31
本文介绍了从文件中读取 utf-8 转义序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 utf-8 编码的文件,其中包含多行

<前>\x02我不喜欢\x0307bananas\x03.\x02你好吗?你打过电话了?

如何将该文件的行读入列表,解码所有转义序列?我试过下面的代码:

 with codecs.open(file, 'r', encoding='utf-8') as q:引号 = q.readlines()打印(str(random.choice(quotes)))

但它在不解码转义字符的情况下打印该行.

\x02我不喜欢\x0307bananas\x03\x02

(注意:转义字符是 IRC 颜色代码,\x02 是粗体文本的字符,\x03 前缀是颜色代码.另外,这个代码是从内部我的 IRC 机器人,用 print())

替换 MSG 函数

解决方案

根据这个答案,更改以下内容应该有预期的结果.

在 Python 3 中:

codecs.open(file, 'r', encoding='utf-8')

codecs.open(file, 'r', encoding='unicode_escape')

在 Python 2 中:

codecs.open(file, 'r', encoding='string_escape')

I have an utf-8 encoded file that contains multiple lines like

\x02I don't like \x0307bananas\x03.\x02
Hey, how are you doing?
You called?

How do I read the lines of that file to a list, decoding all the escape sequences? I tried the code below:

with codecs.open(file, 'r', encoding='utf-8') as q:
    quotes = q.readlines()

print(str(random.choice(quotes)))

But it prints the line without decoding escape characters.

\x02I don't like \x0307bananas\x03\x02

(Note: escape characters are IRC color codes, \x02 being character for bolded text, and \x03 prefix for color codes. Also, this code is from within my IRC bot, with the MSG function replaced by print())

解决方案

According to this answer, changing the following should have the expected result.

In Python 3:

codecs.open(file, 'r', encoding='utf-8') to

codecs.open(file, 'r', encoding='unicode_escape')

In Python 2:

codecs.open(file, 'r', encoding='string_escape')

这篇关于从文件中读取 utf-8 转义序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆