R编码UTF-8：U + 0080-U + 009F [英] R encoding UTF-8: U+0080-U+009F

查看：164 发布时间：2016/11/19 17:03:42 r utf-8 character-encoding

本文介绍了R编码UTF-8：U + 0080-U + 009F的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我遇到了一些编码问题。我有很多文本文件，其中包含以下格式的行：

I am struggling with some encoding issues. I have many textfiles that contain rows in the following format:

https://dl.dropboxusercontent.com/u/94114397/example.txt

根据Notepad ++，这些都是编码的在UTF-8和大多数非ASCII字符显示正确，你可以看到在第1和2行。但是，我有一些字符的问题，似乎被错误解释（？）。在我的示例文件中，这是在单词Lakuic的第3行的情况，其中在u和i之间应该有一个š。

According to Notepad++, these are all encoded in UTF-8 and most non-ASCII characters are displayed correctly, as you can see in lines 1 and 2. However, I have problems with some characters that seem to be wrongly interpreted(?). In my example file, this the case in line 3 in the word "Lakuic", where there should be an "š" between the "u" and the "i". There actually is a character between those two letters which can be seen by copy-pasting the word into the google chrome address bar.

现在，当我读取R中的数据时，我们可以看到这两个字母之间的字符，它显示Laku ic。

Now when I read the data in R, it displays "Laku< U+009A>ic". How can I resolve this?

推荐答案

尝试从UTF-8转换为latin1：

Try converting from UTF-8 to latin1:

    df <- read.table("http://dl.dropboxusercontent.com/u/94114397/example.txt", sep = "\t", row.names = 1, stringsAsFactors = FALSE, encoding="UTF-8")
    iconv(df[, 1], from = "UTF-8", to = "latin1")
    # [1] "Trichocentrum<->longifolium<-><->(Lindl.) R.Jiménez, Acta Bot. Mex. 97: 54 (2011)." 
    # [2] "Salvia<->× hegelmaieri<->nothosubsp. accidentalis<->(Sánchez-Gómez & R.Morales)."   
    # [3] "Edraianthus<->tarae<-><->Lakušic, Bilten Drustva Ekologa BiH, Ser. A 4: 108 (1987)."

我 sessioInfo（）：

# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
# 
# locale:
#   [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                    LC_TIME=German_Germany.1252

这篇关于R编码UTF-8：U + 0080-U + 009F的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R编码UTF-8：U + 0080-U + 009F [英] R encoding UTF-8: U+0080-U+009F

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R编码UTF-8：U + 0080-U + 009F [英] R encoding UTF-8: U+0080-U+009F

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭