以UTF-8编码读取文本 [英] Read text as UTF-8 encoding

查看：363 发布时间：2020/5/3 3:59:58 r utf-8 locale

本文介绍了以UTF-8编码读取文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我写了一个函数来解析包含德语的输入流.下面是一个玩具示例.以下内容可在我的机器上工作(因为UTF8是标准的):

Suppose I write a function that parses an input stream containing German. Below a toy example. The following works on my machine (because UTF8 is standard):

readLines(textConnection("Zürich"))
readLines(textConnection("Z\u00FCrich")) #same thing

但是，当UTF-8不是当前的语言环境编码时，我想确保它也能正常工作.例如，在rApache内部，默认值为ascii.因此，我传递了编码参数:

However I want to make sure it works also when UTF-8 is not the current locale encoding. For example inside rApache, default is ascii. Hence I pass the encoding parameter:

readLines(textConnection("Zürich", encoding="UTF-8"))
readLines(textConnection("Z\u00FCrich", encoding="UTF-8"))

但这实际上导致输出混乱.为什么是这样?我应该如何调用textConnection以确保在任何平台或区域设置上都能正确读取流?

But this actually results in output getting messed up. Why is this? How should I call textConnection to make sure the stream gets read properly on any platform or locale?

推荐答案

@flodel的建议确实可以解决问题:

The suggestion by @flodel did the trick indeed:

readLines(textConnection("Z\u00FCrich", encoding="UTF-8"), encoding="UTF-8")

但是，我从未清楚为什么需要这样做.

However it never became clear to me why this is needed.

这篇关于以UTF-8编码读取文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

以UTF-8编码读取文本 [英] Read text as UTF-8 encoding

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

以UTF-8编码读取文本 [英] Read text as UTF-8 encoding

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭