如何读取R中具有不同列数的CSV文件 [英] How can you read a CSV file in R with different number of columns

查看:1531
本文介绍了如何读取R中具有不同列数的CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个稀疏数据集,其列数的长度变化,以csv格式。以下是文件文本的示例。

I have a sparse data set, one whose number of columns vary in length, in a csv format. Here is a sample of the file text.

12223, University
12227, bridge, Sky
12828, Sunset
13801, Ground
14853, Tranceamerica
14854, San Francisco
15595, shibuya, Shrine
16126, fog, San Francisco
16520, California, ocean, summer, golden gate, beach, San Francisco

当我使用

read.csv("data.txt", header = F)

R会将数据集解释为具有3列,因为大小是从前5行确定的。有没有反正强制r将数据放在更多的列?

R will interpret the data set as having 3 columns because the size is determined from the first 5 rows. Is there anyway to force r to put the data in more columns?

推荐答案

?read.table 文档中, :


数据列的数量是通过查看前五个
行的输入(如果整个文件具有少于五行),或
col.names 的长度,如果它被指定并且更长。如果 fill blank.lines.skip为true ,则
可能会出错,因此
如果需要,指定 col.names (如示例中所述)。

The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the ‘Examples’).

因此,让我们将 col.names 定义为长度X(其中X是数据集中的最大字段数),并设置 fill = TRUE ::

Therefore, let's define col.names to be length X (where X is the max number of fields in your dataset), and set fill = TRUE:

dat <- textConnection("12223, University
12227, bridge, Sky
12828, Sunset
13801, Ground
14853, Tranceamerica
14854, San Francisco
15595, shibuya, Shrine
16126, fog, San Francisco
16520, California, ocean, summer, golden gate, beach, San Francisco")

read.table(dat, header = FALSE, sep = ",", 
  col.names = paste0("V",seq_len(7)), fill = TRUE)

     V1             V2             V3      V4           V5     V6             V7
1 12223     University                                                          
2 12227         bridge            Sky                                           
3 12828         Sunset                                                          
4 13801         Ground                                                          
5 14853  Tranceamerica                                                          
6 14854  San Francisco                                                          
7 15595        shibuya         Shrine                                           
8 16126            fog  San Francisco                                           
9 16520     California          ocean  summer  golden gate  beach  San Francisco

如果最大字段数未知,可以使用nifty效用函数 count.fields (我在 read.table 示例代码中找到):

If the maximum number of fields is unknown, you can use the nifty utility function count.fields (which I found in the read.table example code):

count.fields(dat, sep = ',')
# [1] 2 3 2 2 2 2 3 3 7
max(count.fields(dat, sep = ','))
# [1] 7


b $ b

可能有用的相关阅读:只读有限数量的R 中的列

这篇关于如何读取R中具有不同列数的CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆