如何跳过阅读阅读器中的某些列 [英] how to skip reading certain columns in readr

查看:152
本文介绍了如何跳过阅读阅读器中的某些列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为"test.csv"的简单csv文件,其内容如下:

I have a simple csv file called "test.csv" with the following content:

colA,colB,colC
1,"x",12
2,"y",34
3,"z",56

假设我要跳过colA中的阅读,而只阅读colB和colC中的阅读.我想要一种通用的方法来执行此操作,因为我要读取很多文件,有时colA可能被称为其他名称,但colB和colC始终相同.

Let's say I want to skip reading in colA and just read in colB and colC. I want a general way to do this because I have lots of files to read in and sometimes colA is called something else altogether but colB and colC are always the same.

根据read_csv文档,完成此操作的一种方法是传递col_types的命名列表,并仅命名要保留的列:

According to the read_csv documentation, one way to accomplish this is to pass a named list for col_types and only name the columns you want to keep:

read_csv('test.csv', col_types = list(colB = col_character(), colC = col_numeric()))

通过不提及colA,它应该从输出中删除.但是,结果数据帧为:

By not mentioning colA it should get dropped from the output. However, the resulting data frame is:

Source: local data frame [3 x 3]

      colA colB colC
    1    1    x   12
    2    2    y   34
    3    3    z   56

我做错了什么还是read_csv文档不正确?根据帮助文件:

Am I doing something wrong or is the read_csv documentation not correct? According to the help file:

如果是列表,则每一列必须包含一个收集器".如果你 只想读取列的子集,可以使用命名列表 (其中的名称给出列名称).如果没有提到一列 按名称,它不会包含在输出中.

If a list, it must contain one "collector" for each column. If you only want to read a subset of the columns, you can use a named list (where the names give the column names). If a column is not mentioned by name, it will not be included in the output.

推荐答案

有一个答案,我只是没有足够努力地进行搜索: https://github.com/hadley/readr/issues/132

There is an answer out there, I just didn't search hard enough: https://github.com/hadley/readr/issues/132

显然,这是一个已解决的文档问题.最终可能会添加此功能,但Hadl​​ey认为,仅更新一种列类型而不删除其他列类型会更有用.

Apparently this was a documentation issue that has been corrected. This functionality may eventually get added but Hadley thought it was more useful to be able to just update one column type and not drop the others.

更新:已添加功能

以下代码来自阅读器文档:

read_csv("iris.csv", col_types = cols_only( Species = col_factor(c("setosa", "versicolor", "virginica"))))

这将仅读取虹膜数据集的种类"列.为了只读取特定的列,您还必须传递列规范,即col_factorcol_double等...

This will read only the Species column of the iris data set. In order to read only a specific column you must also pass the column specification i.e. col_factor, col_double, etc...

这篇关于如何跳过阅读阅读器中的某些列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆