如何格式化(a)CSV文件中的数据,以便它可以轻松地导入R? [英] How to format data in (a) CSV file(s) so that it can easily be imported in R?

查看:209
本文介绍了如何格式化(a)CSV文件中的数据,以便它可以轻松地导入R?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑:



因此,此格式可以工作:

  featureID charge xcoordinate ycoordinate 
1 2 5105.9217 336.125209180674
1 2 5108.7642 336.124751115092
2 0 2434.9217 145.893331325278

但是如果我有两个列有多个值的链接。

 机器质量
[[{1: 1224},{2:3453}],[{1:2242},{2:4142}]

现在,如果我想要像我做的与凸包的坐标,我需要2行,而不是1,但我不需要2行已经在(所以4,因为有已经有2个额外的坐标)像这样:

  featureID charge xcoordinate ycoordinate quality1 quality2 
1 2 5105.9217 336.125209180674 1224 3453
1 2 5105.9217 336.125209180674 2242 4142
1 2 5108.7642 336.124751115092 1224 3453
1 2 5108.7642 336.124751115092 2242 4142
[...]



它必须是这样吗?






我是R的新手,我的知识不会比知道如何做一个向量和一些简单的图更进一步。我将在接下来的几个月里使用R进行实习项目,在这段时间我将(希望)学习一些R的内容。但是,在我开始之前,我需要生成我的数据要做的统计。我需要事先知道我应该如何格式化我的输出CSV数据,以便我可以很容易地读取它,一旦我开始我的R分析。



我被要求做的一件事是从数据中制作一个CSV文件,以便它可以被R读取。用于导入的示例CSV文件我看到的R看起来都像这样

  featureID收费价值
1 2 10
2 0 9

但是,我的数据大部分由值包含多个值的列组成。澄清:
例如,我的数据存在特征,其他信息中有一个convexhull。这个凸包由成对的x和y坐标组成。所以我可以为数据(只显示两个坐标,可以是很多)

  featureID Charge Convexhull 
1 2 [[{'y':'336.125209180674'},{'x':'5105.9217'}],[{'y':'336.124751115092'},{'x':'5108.7642'}]

可以在一个CSV文件中获得这个,能够在R中正确读取和y坐标被保留)?如果是这样,CSV文件应该是什么样子?例如,我已经看到了CSV文件的多个值的示例如下:

  featureID charge xcoordinate ycoordinate 
1 2 5105.9217 336.125209180674
5108.7642 336.124751115092
2 0 2434.9217 145.893331325278

找不到这是否容易由R导入。



如果这在一个CSV文件中不可行,是容易导入的CSV文件,主要想法,喜欢数据库链接?

解决方案

唯一重要的是,你有一个独特的字符分隔数据列,每列的长度相同。只要你的最后一个例子中的第二行填充,那将导入罚款。



你需要考虑你想做什么,决定如何预先想要任何其他特殊格式化。但是,只要列分隔符是唯一字符,并且列长度相等,则它将导入。



(如果您的条目如果你想得到真正的想法,你可以导入几乎任何东西,但如果有人要求你格式化数据,那么他们可能想要一个矩形的data.frame兼容的布局,他们可能需要在每列的唯一值(没有列的点数),但这是你和他们之间。)


Edit:

So, this format would work:

featureID    charge    xcoordinate    ycoordinate
1            2         5105.9217      336.125209180674
1            2         5108.7642      336.124751115092
2            0         2434.9217      145.893331325278

But what if I have two columns with multiple value that are linked. Say column quality has a machine and a quality linked and the column looks like this

 MachineQuality
 [[{1:1224}, {2:3453}], [{1:2242}, {2:4142}]

Now if I want to split that up like I did with the coordinates of the convexhull I would need 2 rows instead of 1. But wouldn't I need 2 rows for every row that is already in (so 4, because there are already 2 extra for the coordinates) like this:

featureID    charge    xcoordinate    ycoordinate         quality1    quality2
1            2         5105.9217      336.125209180674    1224        3453
1            2         5105.9217      336.125209180674    2242        4142
1            2         5108.7642      336.124751115092    1224        3453
1            2         5108.7642      336.124751115092    2242        4142
[...]

Would it have to be like this?


I'm very new to R, my knowledge doesn't go much further than knowing how to make a vector and some simple plots. I'm going to use R for an internship project the next couple of months and during this time I will (hopefully) learn some of the ins and outs of R. However, before I start I need to produce the data that I'm going to do the statistics on. I need to know beforehand how I should format my output CSV data so that I can easily read it in once I start my R analysis.

One thing that I've been asked to do is make a CSV file out of the data so that it can be read in by R. The example CSV files for importing with R that I've seen all look like this

featureID    Charge    value
1            2         10
2            0         9

However, my data mostly consists out of columns for which the values contain multiple values. To clarify: As an example, my data exists of "features" that, amongs other information has a "convexhull". This convexhull consists of paired x and y coordinates. So what I could have for data is (only showing two coordinates, can be many)

featureID    Charge    Convexhull
1            2         [[{'y': '336.125209180674'}, {'x': '5105.9217'}], [{'y': '336.124751115092'}, {'x': '5108.7642'}]]

Is it possible to get this in one CSV file, being able to read it in R correctly (so that the paired x and y coordinates are preserved)? If so, how should the CSV file look like? For example, I've seen examples for CSV files with multiple values that look like this:

featureID    charge    xcoordinate    ycoordinate
1            2         5105.9217      336.125209180674
                       5108.7642      336.124751115092
2            0         2434.9217      145.893331325278

But I can't find if this is easily imported by R.

If this is not doable in one CSV file, are the CSV files easily imported independently, with a primary key idea, like database linking?

解决方案

The only critical things are that you have a unique character separating your data columns and that each column is the same length. As long as the second row in your last example is filled in that will import fine.

You need to consider what you want to do with the data after it's in R to decide how you might want any other special formatting beforehand. But, as long as the column separator is a unique character and the columns are of equal length then it will import.

(You can violate the unique separator requirement if your entries are wrapped in quotes. And if you want to get really fancy you could "import" almost anything. But if someone's asking you to format the data then they probably want a rectangular data.frame compatible layout. They probably want unique values in each column (no columns of points). But that's between you and them.)

这篇关于如何格式化(a)CSV文件中的数据,以便它可以轻松地导入R?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆