当有很多列时,使用 readr::read_csv() 导入数据时覆盖列类型 [英] Override column types when importing data using readr::read_csv() when there are many columns

查看:20
本文介绍了当有很多列时,使用 readr::read_csv() 导入数据时覆盖列类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 R 中的 readr::read_csv 读取 csv 文件.我导入的 csv 文件有大约 150 列,我只包括示例的前几列.我希望将第二列从默认类型(我执行 read_csv 时的日期)覆盖为字符或其他日期格式.

I am trying to read a csv file using readr::read_csv in R. The csv file that I am importing has about 150 columns, I am just including the first few columns for the example. I am looking to override the second column from the default type (which is date when I do read_csv) to character, or other date format.

GIS Join Match Code Data File Year  State Name  State Code  County Name County   Code   Area Name   Persons: Total
G0100010    2008-2012   Alabama 1   Autauga County  1   Autauga County, Alabama 54590

df <- data.frame("GIS Join Match Code"="G0100010", "Data File" = "2008-2012", "State" = "Alabama", "County" = "Autauga County", "Population" = 54590)

问题是,当我使用 readr::read_csv 时,似乎我可能必须在覆盖 col_types 时使用所有变量(请参阅下面的错误).这需要单独指定覆盖所有 150 列(?).问题是:有没有办法指定覆盖特定列的 col_type 或命名的对象列表?就我而言,它只是覆盖数据文件年份"列.

The issue is that when I use readr::read_csv, it seems I may have to use all variables while overriding in the col_types (see error below). That is need to specify overriding all the 150 columns individually(?).. The question is that : Is there a way to specify overriding the col_type of just specific columns, or a named list of objects? In my case, it would be just overriding the column "Data File Year".

我知道任何省略的列都会被自动解析,这对我的分析来说很好.我认为它变得更加复杂,因为我下载的文件中的列名中有一个空格(例如,数据文件年份"、州代码")等.

I understand that any omitted columns will be automatically parsed, which is fine for my analysis. I think it gets further complex as the column names have a space in them in the file I downloaded (For e.g., "Data File Year", "State Code") etc.

tempdata <- read_csv(df, col_types = "cc")
Error: You have 135 column names, but 2 columns

如果可能的话,我猜的另一个选择是直接跳过阅读第二列?

The Other option I guess, if possible, is to just skip reading the second column all together?

推荐答案

如果将来有人碰巧遇到这个问题,这里有一个更通用的答案.不建议使用skip"跳转列,因为如果更改导入的数据源结构,这将无法工作.

Here follows a more generic answer to this question if someone happens to stumble upon this in the future. It is less advisable to use "skip" to jump columns as this will fail to work if the imported data source structure is changed.

在您的示例中,简单地设置默认列类型,然后定义与默认值不同的任何列可能会更容易.

It could be easier in your example to simply set a default column type, and then define any columns that differ from the default.

例如,如果所有列通常为d",但日期列应为D",则按如下方式加载数据:

E.g., if all columns typically are "d", but the date column should be "D", load the data as follows:

  read_csv(df, col_types = cols(.default = "d", date = "D"))

或者如果,例如,列日期应为D",而xxx"列应为i",请执行以下操作:

or if, e.g., column date should be "D" and column "xxx" be "i", do so as follows:

  read_csv(df, col_types = cols(.default = "d", date = "D", xxx = "i"))

如果您有多个列并且只有特定的例外(例如日期"和xxx"),则上述默认"的使用非常有用.

The use of "default" above is powerful if you have multiple columns and only specific exceptions (such as "date" and "xxx").

这篇关于当有很多列时,使用 readr::read_csv() 导入数据时覆盖列类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆