fread(R中的data.table)与编码规范 [英] fread (data.table in R) with specification of encoding

查看:245
本文介绍了fread(R中的data.table)与编码规范的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在先前的问题和我的问题的答案中找不到正确的答案: 1.我有一个2.3 GB的csv文件,其中包含240万行希伯来语文本,当前以ASCII编码. 由于我们在谈论大文件,因此最好使用fread,但是编码呢? 有什么想法如何读取以ASCII编码的csv文件,以避免著名的字符串中嵌入nul"错误?

Could not find proper answer in previous questions and answers to my problem: 1. I have a 2.3 GB csv file which contains 2.4 million rows of Hebrew Text, currently coded in ASCII. Since we are talking about big file, fread would be preferable but what about the encoding? Any idea how to read csv file coded in ASCII to avoid the famous "embedded nul in string" error?

谢谢

推荐答案

截至8月25日,由David Arenburg链接的案例已关闭,该功能包含在data.table当前可用的版本中.现在可以在调用fread时使用encoding参数:

As of August 25th the case linked by David Arenburg is closed, and the functionality is included in the currently available version of data.table. The encoding parameter can now be used when calling fread:

text <- fread(file, encoding = 'UTF-8')

ASCII不是显式的编码选项,但是ASCII是有效的UTF-8,因此,当您要阅读希伯来语文本时,可以指定UTF-8.

ASCII is not an explicit encoding option, but ASCII is valid UTF-8, so you can specify UTF-8 when you want to read your Hebrew text.

这篇关于fread(R中的data.table)与编码规范的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆