R H2O包导入带有中文字符的csv文件 [英] R H2O package import csv file with Chinese characters
问题描述
我有一个csv
格式的大型数据集,用于构建预测模型.由于它的大小,我计划使用 R 中的h2o
包来构建模型.但是,data.frame
的多列中的数据包含一些简体中文字符,并且h2o
难以接收数据.
I have a large dataset in csv
format to build a prediction model. Because of its size, I planned to use h2o
package in R to build the model. However, the data, in multiple columns of the data.frame
, contains some Chinese Simplified characters and h2o
is having difficulty receiving the data.
我尝试了两种不同的方法.第一种方法涉及使用h2o.importFile()
函数直接读取文件以导入数据.但是,这种方法最终将汉字转换为一些混乱的代码.
I've tried two different approaches. The first approach involved directly reading from the file using the h2o.importFile()
function to import the data. However, this approach ends up converting the Chinese characters into some messy codes.
我试图使用readr
和基数R read_csv
/read.csv
函数将数据带入 R 的第二种方法.将数据正确加载到 R 后,我尝试使用as.h2o
函数将data.frame
转换为h2o
帧.不过,这种方法的最终结果也导致翻译混乱.
The second approach I've tried to first bring the data into R using readr
and base R read_csv
/read.csv
functions. After the data is loaded correctly into R, I tried to convert the data.frame
into h2o
frame using as.h2o
function. Though, the end result of this approach also resulted in a messed up translation.
为了说明,我编写了以下代码作为示例:
To illustrate, I've written the following piece of codes as an example:
require(h2o)
dat<-data.frame(x=rep(c("北京","上海"),50),
y=rnorm(mean=10,sd=3,n=100))
h2o.init(nthreads=-1)
h2o.dat<-as.h2o(dat)
推荐答案
我认为这是一个错误,因为R的data.frame可以显示字符,但同时R H2OFrame不能.我检查了它是否适用于Python中的H2OFrames,因此仅是R问题.我在此处提交了一个错误.
I would consider this a bug since R's data.frame can display the characters, but at the same time, the R H2OFrame cannot. I checked that this works for H2OFrames in Python, so it's an R issue only. I filed a bug here.
这篇关于R H2O包导入带有中文字符的csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!