read.table用逗号分隔的值以及每个元素内的逗号 [英] read.table with comma separated values and also commas inside each element

查看:714
本文介绍了read.table用逗号分隔的值以及每个元素内的逗号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从csv文件中创建一个逗号分隔的表。我知道不是所有的行有相同数量的元素,所以我会写一些代码来消除这些行。问题是,有行包括数字(以千为单位),其中包括另一个逗号。我不能正确分割这些行,这里是我的代码:

I'm trying to create a table from a csv file comma separated. I'm aware that not all the rows have the same number of elements so I would write some code to eliminate those rows. The problem is that there are rows that include numbers (in thousands) which include another comma as well. I'm not capable of splitting those rows properly, here's my code:

pURL <- "http://financials.morningstar.com/ajax/exportKR2CSV.html?&callback=?&t=EI&region=FRA&order=asc"
res <- read.table(pURL, header=T, sep='\t', dec = '.', stringsAsFactors=F)
x <- unlist( lapply(keyRatios, function(u) strsplit(u,split='\n')) [[1]] )


推荐答案

您需要使用 quote = 参数 read.table read.delim

res <- read.delim( pURL, header=F, sep=',', dec = '.', stringsAsFactors=F , quote = "\"" ,   fill = TRUE , skip = 2 )

分隔符为\t。 在此文件中引用,因此您可以使用 quote 参数使R忽略引号内的逗号, quote =\ ,并且要跳过前两行,并使用 fill = TRUE 在不均匀的行上填充空格。

The seperator is "," not "\t". Numbers written as thousands of millions are always quoted in this file so you can use the quote argument to make R ignore the comma inside the quotes with quote = "\"", and you want to skip the first two lines, and use fill = TRUE to fill in blanks on uneven lines.

head( res )

#                           2003-12 2004-12 2005-12 2006-12 2007-12 2008-12 2009-12 2010-12 2011-12 2012-12   TTM
#2          Revenue EUR Mil   2,116   2,260   2,424   2,690   2,908   3,074   3,268   3,892   4,190   4,989 5,034
#3           Gross Margin %    60.6    60.3    57.3    58.2    57.6    56.9    56.1    55.5    55.4    55.8  56.1
#4 Operating Income EUR Mil     365     404     394     460     505     515     555     618     683     832   841
#5       Operating Margin %    17.2    17.9    16.2    17.1    17.4    16.7    17.0    15.9    16.3    16.7  16.7
#6       Net Income EUR Mil     200     227     289     331     371     389     402     472     518     584   594
#7   Earnings Per Share EUR    3.90    4.30    5.44    6.22    3.48    3.62    3.78    4.36    4.82    2.77  2.80

我之后设置 res 的列名...

I set the column names of res afterwards like this...

names( res ) <- res[1,]; res <- res[-1,]

它提供了更好的格式。

这篇关于read.table用逗号分隔的值以及每个元素内的逗号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆