在有许多列的情况下使用readr :: read_csv()导入数据时覆盖列类型 [英] Override column types when importing data using readr::read_csv() when there are many columns

查看:115
本文介绍了在有许多列的情况下使用readr :: read_csv()导入数据时覆盖列类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用R中的readr :: read_csv读取csv文件。我要导入的csv文件大约有150列,我仅在示例中包括前几列。我想覆盖从默认类型(即我做read_csv时的日期)到字符或其他日期格式的第二列。

I am trying to read a csv file using readr::read_csv in R. The csv file that I am importing has about 150 columns, I am just including the first few columns for the example. I am looking to override the second column from the default type (which is date when I do read_csv) to character, or other date format.

GIS Join Match Code Data File Year  State Name  State Code  County Name County   Code   Area Name   Persons: Total
G0100010    2008-2012   Alabama 1   Autauga County  1   Autauga County, Alabama 54590

df <- data.frame("GIS Join Match Code"="G0100010", "Data File" = "2008-2012", "State" = "Alabama", "County" = "Autauga County", "Population" = 54590)



<问题是,当我使用readr :: read_csv时,似乎我可能不得不在覆盖col_types时使用所有变量(请参见下面的错误)。问题是:有没有一种方法可以指定只覆盖特定列或命名对象列表的col_type来覆盖所有150列?在我的情况下,它将只是覆盖数据文件年列。

The issue is that when I use readr::read_csv, it seems I may have to use all variables while overriding in the col_types (see error below). That is need to specify overriding all the 150 columns individually(?).. The question is that : Is there a way to specify overriding the col_type of just specific columns, or a named list of objects? In my case, it would be just overriding the column "Data File Year".

我知道所有遗漏的列都将被自动解析,这对我的分析很好。我认为这变得更加复杂,因为我下载的文件中的列名中有空格(例如,数据文件年,州代码)等。

I understand that any omitted columns will be automatically parsed, which is fine for my analysis. I think it gets further complex as the column names have a space in them in the file I downloaded (For e.g., "Data File Year", "State Code") etc.

tempdata <- read_csv(df, col_types = "cc")
Error: You have 135 column names, but 2 columns

如果可以的话,我认为其他选择就是只一起跳过第二列?

The Other option I guess, if possible, is to just skip reading the second column all together?

推荐答案

如果将来有人偶然发现此问题,这里将给出一个更通用的答案。不建议使用跳过来跳过列,因为如果更改了导入的数据源结构,这将无法正常工作。

Here follows a more generic answer to this question if someone happens to stumble upon this in the future. It is less advisable to use "skip" to jump columns as this will fail to work if the imported data source structure is changed.

在您的示例中,这样做可能更容易只需设置默认列类型,然后定义任何与默认列不同的列。

It could be easier in your example to simply set a default column type, and then define any columns that differ from the default.

例如,如果所有列通常为 d,但日期列应为 D,则按如下所示加载数据:

E.g., if all columns typically are "d", but the date column should be "D", load the data as follows:

  read_csv(df, col_types = cols(.default = "d", date = "D"))

或者,例如,如果列日期应为 D而列 xxx应为 i,则按以下方式操作:

or if, e.g., column date should be "D" and column "xxx" be "i", do so as follows:

  read_csv(df, col_types = cols(.default = "d", date = "D", xxx = "i"))

如果您有多个列并且只有特定的例外情况(例如 date和 xxx)。

The use of "default" above is powerful if you have multiple columns and only specific exceptions (such as "date" and "xxx").

这篇关于在有许多列的情况下使用readr :: read_csv()导入数据时覆盖列类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆