以UTF-8编码读取文本 [英] Read text as UTF-8 encoding

查看:363
本文介绍了以UTF-8编码读取文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我写了一个函数来解析包含德语的输入流.下面是一个玩具示例.以下内容可在我的机器上工作(因为UTF8是标准的):

Suppose I write a function that parses an input stream containing German. Below a toy example. The following works on my machine (because UTF8 is standard):

readLines(textConnection("Zürich"))
readLines(textConnection("Z\u00FCrich")) #same thing

但是,当UTF-8不是当前的语言环境编码时,我想确保它也能正常工作.例如,在rApache内部,默认值为ascii.因此,我传递了编码参数:

However I want to make sure it works also when UTF-8 is not the current locale encoding. For example inside rApache, default is ascii. Hence I pass the encoding parameter:

readLines(textConnection("Zürich", encoding="UTF-8"))
readLines(textConnection("Z\u00FCrich", encoding="UTF-8"))

但这实际上导致输出混乱.为什么是这样?我应该如何调用textConnection以确保在任何平台或区域设置上都能正确读取流?

But this actually results in output getting messed up. Why is this? How should I call textConnection to make sure the stream gets read properly on any platform or locale?

推荐答案

@flodel的建议确实可以解决问题:

The suggestion by @flodel did the trick indeed:

readLines(textConnection("Z\u00FCrich", encoding="UTF-8"), encoding="UTF-8")

但是,我从未清楚为什么需要这样做.

However it never became clear to me why this is needed.

这篇关于以UTF-8编码读取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆