如何将数据集放入R包中 [英] How to put datasets into an R package

查看:636
本文介绍了如何将数据集放入R包中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在创建自己的R包,我想知道我可以用什么方法来添加(时间序列)数据集到我的包中。以下是具体细节:



我已经创建了一个名为数据的包子目录,我知道这是我应该保存数据集的位置我想添加到我的包中。我也认识到包含数据的文件可能是这样的:rda , .txt .csv 文件。



我要添加到包中的每一系列数据都包含一列数字(例如,形式为340或4.5),每个数据系列不同



到目前为止,我已将所有数据集保存到一个 .txt 文件中。我也使用 data()函数成功加载数据。但是问题没有解决。



问题是每个系列的数据都作为一个因素加载,除了长度最大的系列之外。作为因素加载的系列包含缺少的值(。形式)。我不得不添加这些缺少的值,以使每列数据的长度相同。我尝试将数据保存为不等的列,但是在调用 data()之后收到错误消息。



将缺省值添加到获取数据加载是一旦数据加载,我需要删除NA,以便我的分析数据!所以,这显然不是一个很好的办法。



理想情况下(我想),我希望将数据作为数字向量或列表加载。这样,我不需要在每个系列的末尾附加NA。



如何解决这个问题?我应该将所有数据保存到一个文件中吗?如果是这样,我应该采用什么格式?也许我应该将数据集保存到多个文件中?再一次,在哪种格式?这样做最好的实践方法是什么?任何提示都将不胜感激。

解决方案

我不知道我是否正确理解您的问题。但是,如果您以最喜欢的格式编辑数据并使用

  save(myediteddata,file =data.rda)保存, 

数据应该按照您在R中看到的方式加载。



要加载数据目录中的所有文件,您应该添加

  LazyData:true 

在您的包中的DESCRIPTION文件。



如果不要帮你,你可以发布你的一个文件,并打印你想要的格式,这将有助于我们帮助你;)


I am creating my own R package and I was wondering what are the possible methods that I can use to add (time-series) datasets to my package. Here are the specifics:

I have created a package subdirectory called data and I am aware that this is the location where I should save the datasets that I want to add to my package. I am also cognizant of the fact that the files containing the data may be .rda, .txt, or .csv files.

Each series of data that I want to add to the package consists of a single column of numbers (eg. of the form 340 or 4.5) and each series of data differs in length.

So far, I have saved all of the datasets into a .txt file. I have also successfully loaded the data using the data() function. Problem not solved, however.

The problem is that each series of data loads as a factor except for the series greatest in length. The series that load as factors contain missing values (of the form '.'). I had to add these missing values in order to make each column of data the same in length. I tried saving the data as unequal columns, but I received an error message after calling data().

A consequence of adding missing values to get the data to load is that once the data is loaded, I need to remove the NA's in order to get on with my analysis of the data! So, this clearly is not a good way of doing things.

Ideally (I suppose), I would like the data to load as numeric vectors or as a list. In this way, I wouldn't need the NA's appended to the end of each series.

How do I solve this problem? Should I save all of the data into one single file? If so, in what format should I do it? Perhaps I should save the datasets into a number of files? Again, in which format? What is the best practical way of doing this? Any tips would greatly be appreciated.

解决方案

I'm not sure if I understood your question correctly. But, if you edit your data in your favorite format and save with

save(myediteddata, file="data.rda")

The data should be loaded exactly the way you saw it in R.

To load all files in data directory you should add

LazyData: true

To your DESCRIPTION file, in your package.

If this don't help you could post one of your files and a print of the format you want, this will help us to help you ;)

这篇关于如何将数据集放入R包中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆