如何读取R中具有不同列数的CSV文件 [英] How can you read a CSV file in R with different number of columns
问题描述
我有一个稀疏数据集,其列数的长度变化,以csv格式。以下是文件文本的示例。
I have a sparse data set, one whose number of columns vary in length, in a csv format. Here is a sample of the file text.
12223, University
12227, bridge, Sky
12828, Sunset
13801, Ground
14853, Tranceamerica
14854, San Francisco
15595, shibuya, Shrine
16126, fog, San Francisco
16520, California, ocean, summer, golden gate, beach, San Francisco
当我使用
read.csv("data.txt", header = F)
R会将数据集解释为具有3列,因为大小是从前5行确定的。有没有反正强制r将数据放在更多的列?
R will interpret the data set as having 3 columns because the size is determined from the first 5 rows. Is there anyway to force r to put the data in more columns?
推荐答案
在?read.table
文档中, :
数据列的数量是通过查看前五个
行的输入(如果整个文件具有少于五行),或
从col.names
的长度,如果它被指定并且更长。如果fill
或blank.lines.skip为true
,则
可能会出错,因此
如果需要,指定col.names
(如示例中所述)。
The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of
col.names
if it is specified and is longer. This could conceivably be wrong iffill
orblank.lines.skip are true
, so specifycol.names
if necessary (as in the ‘Examples’).
因此,让我们将 col.names
定义为长度X(其中X是数据集中的最大字段数),并设置 fill = TRUE
::
Therefore, let's define col.names
to be length X (where X is the max number of fields in your dataset), and set fill = TRUE
:
dat <- textConnection("12223, University
12227, bridge, Sky
12828, Sunset
13801, Ground
14853, Tranceamerica
14854, San Francisco
15595, shibuya, Shrine
16126, fog, San Francisco
16520, California, ocean, summer, golden gate, beach, San Francisco")
read.table(dat, header = FALSE, sep = ",",
col.names = paste0("V",seq_len(7)), fill = TRUE)
V1 V2 V3 V4 V5 V6 V7
1 12223 University
2 12227 bridge Sky
3 12828 Sunset
4 13801 Ground
5 14853 Tranceamerica
6 14854 San Francisco
7 15595 shibuya Shrine
8 16126 fog San Francisco
9 16520 California ocean summer golden gate beach San Francisco
如果最大字段数未知,可以使用nifty效用函数 count.fields
(我在 read.table
示例代码中找到):
If the maximum number of fields is unknown, you can use the nifty utility function count.fields
(which I found in the read.table
example code):
count.fields(dat, sep = ',')
# [1] 2 3 2 2 2 2 3 3 7
max(count.fields(dat, sep = ','))
# [1] 7
b $ b
可能有用的相关阅读:只读有限数量的R 中的列
这篇关于如何读取R中具有不同列数的CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!