以UTF-8编码读取文本 [英] Read text as UTF-8 encoding
问题描述
假设我写了一个函数来解析包含德语的输入流.下面是一个玩具示例.以下内容可在我的机器上工作(因为UTF8是标准的):
Suppose I write a function that parses an input stream containing German. Below a toy example. The following works on my machine (because UTF8 is standard):
readLines(textConnection("Zürich"))
readLines(textConnection("Z\u00FCrich")) #same thing
但是,当UTF-8
不是当前的语言环境编码时,我想确保它也能正常工作.例如,在rApache内部,默认值为ascii
.因此,我传递了编码参数:
However I want to make sure it works also when UTF-8
is not the current locale encoding. For example inside rApache, default is ascii
. Hence I pass the encoding parameter:
readLines(textConnection("Zürich", encoding="UTF-8"))
readLines(textConnection("Z\u00FCrich", encoding="UTF-8"))
但这实际上导致输出混乱.为什么是这样?我应该如何调用textConnection
以确保在任何平台或区域设置上都能正确读取流?
But this actually results in output getting messed up. Why is this? How should I call textConnection
to make sure the stream gets read properly on any platform or locale?
推荐答案
@flodel的建议确实可以解决问题:
The suggestion by @flodel did the trick indeed:
readLines(textConnection("Z\u00FCrich", encoding="UTF-8"), encoding="UTF-8")
但是,我从未清楚为什么需要这样做.
However it never became clear to me why this is needed.
这篇关于以UTF-8编码读取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!