当列名称包含空格和特殊字符时,从data.table包中读取? [英] fread from data.table package when column names include spaces and special characters?

查看:678
本文介绍了当列名称包含空格和特殊字符时,从data.table包中读取?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv文件,其中列名包含空格和特殊字符。

I have a csv file where column names include spaces and special characters.

fread - 但是如何改变这种行为?一个原因是我的列名以空格开头,我不知道如何处理它们。

fread imports them with quotes - but how can I change this behaviour? One reason is that I have column names starting with a space and I don't know how to handle them.

任何指针都会有帮助。

编辑:一个例子。

> packageVersion("data.table")
[1] ‘1.8.8’

p2p <- fread("p2p.csv", header = TRUE, stringsAsFactors=FALSE)

> head(p2p[,list(Principal remaining)])
Error: unexpected symbol in "head(p2p[,list(Principal remaining"

> head(p2p[,list("Principal remaining")])
                    V1
1: Principal remaining

> head(p2p[,list(c("Principal remaining"))])
                    V1
1: Principal remaining

我期待/想要的是当然,没有空格的列名称:

What I was expecting/want is of course, what a column name without spaces yields:

> head(p2p[,list(Principal)])
   Principal
1:      1000
2:      1000
3:      1000
4:      2000
5:      1000
6:      4130


推荐答案

在列名称中获取前导空格应该很困难,另一方面,我在 fread 代码中没有看到太多错误检查,所以可能直到这个不良行为是固定的(或者特性请求被拒绝) ,你可以这样做:

It should be rather difficult to get a leading space in a column name. Should not happen by "casual coding". On the other hand I don't see very much error checking in the fread code, so maybe until this undesirable behavior is fixed, (or the feature request refused), you can do something like this:

setnames(DT, make.names(colnames(DT))) 

另一方面,如果你对 colnames(DT)将显示带引号的列名,然后覆盖。交互式控制台将显示任何字符值。

If on the other hand you are bothered by the fact that colnames(DT) will display the column names with quotes then just "get over it." That's how the interactive console will display any character value.

如果您有数据在原始文件中看起来像ttt的字符列中的项目,那么在导入时会有前导空格,您需要使用 colnames(dfrm)< - sub(^ \\s +,,colnames(dfrm))或几个 trim 函数在各种包(如'gdata')

If you have a data item in a character column that looks like " ttt" in the original, then it's going to have leading spaces when imported, and you need to process it with colnames(dfrm) <- sub("^\\s+", "", colnames(dfrm)) or one of the several trim functions in various packages (such as 'gdata')

这篇关于当列名称包含空格和特殊字符时,从data.table包中读取?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆