R H2O包导入带有中文字符的csv文件 [英] R H2O package import csv file with Chinese characters

查看:230
本文介绍了R H2O包导入带有中文字符的csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv格式的大型数据集,用于构建预测模型.由于它的大小,我计划使用 R 中的h2o包来构建模型.但是,data.frame的多列中的数据包含一些简体中文字符,并且h2o难以接收数据.

I have a large dataset in csv format to build a prediction model. Because of its size, I planned to use h2o package in R to build the model. However, the data, in multiple columns of the data.frame, contains some Chinese Simplified characters and h2o is having difficulty receiving the data.

我尝试了两种不同的方法.第一种方法涉及使用h2o.importFile()函数直接读取文件以导入数据.但是,这种方法最终将汉字转换为一些混乱的代码.

I've tried two different approaches. The first approach involved directly reading from the file using the h2o.importFile() function to import the data. However, this approach ends up converting the Chinese characters into some messy codes.

我试图使用readr和基数R read_csv/read.csv函数将数据带入 R 的第二种方法.将数据正确加载到 R 后,我尝试使用as.h2o函数将data.frame转换为h2o帧.不过,这种方法的最终结果也导致翻译混乱.

The second approach I've tried to first bring the data into R using readr and base R read_csv/read.csv functions. After the data is loaded correctly into R, I tried to convert the data.frame into h2o frame using as.h2o function. Though, the end result of this approach also resulted in a messed up translation.

为了说明,我编写了以下代码作为示例:

To illustrate, I've written the following piece of codes as an example:

require(h2o)
dat<-data.frame(x=rep(c("北京","上海"),50),
                y=rnorm(mean=10,sd=3,n=100))
h2o.init(nthreads=-1)
h2o.dat<-as.h2o(dat)

推荐答案

我认为这是一个错误,因为R的data.frame可以显示字符,但同时R H2OFrame不能.我检查了它是否适用于Python中的H2OFrames,因此仅是R问题.我在此处提交了一个错误.

I would consider this a bug since R's data.frame can display the characters, but at the same time, the R H2OFrame cannot. I checked that this works for H2OFrames in Python, so it's an R issue only. I filed a bug here.

这篇关于R H2O包导入带有中文字符的csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆