是否有适用于Python的示例数据集? [英] Are there any example data sets for Python?

查看:101
本文介绍了是否有适用于Python的示例数据集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了进行快速测试,调试,创建可移植的示例和进行基准测试,R提供了许多数据集(在Base R datasets 包中)。 R提示符下的命令 library(help = datasets)描述了近100个历史数据集,每个数据集都有相关的描述和元数据。

For quick testing, debugging, creating portable examples, and benchmarking, R has available to it a large number of data sets (in the Base R datasets package). The command library(help="datasets") at the R prompt describes nearly 100 historical datasets, each of which have associated descriptions and metadata.

Python是否有类似的东西?

Is there anything like this for Python?

推荐答案

您可以使用 rpy2 包可从Python访问所有R数据集。

You can use rpy2 package to access all R datasets from Python.

设置界面:

>>> from rpy2.robjects import r, pandas2ri
>>> def data(name): 
...    return pandas2ri.ri2py(r[name])

然后使用可用数据集的任何数据集名称调用 data()(就像在 R 中一样)

Then call data() with any dataset's name of the available datasets (just like in R)

>>> df = data('iris')
>>> df.describe()
       Sepal.Length  Sepal.Width  Petal.Length  Petal.Width
count    150.000000   150.000000    150.000000   150.000000
mean       5.843333     3.057333      3.758000     1.199333
std        0.828066     0.435866      1.765298     0.762238
min        4.300000     2.000000      1.000000     0.100000
25%        5.100000     2.800000      1.600000     0.300000
50%        5.800000     3.000000      4.350000     1.300000
75%        6.400000     3.300000      5.100000     1.800000
max        7.900000     4.400000      6.900000     2.500000

要查看可用数据集的列表以及每个数据集的描述:

To see a list of the available datasets with description for each:

>>> print(r.data())



注意:rpy2需要 R 安装,并设置 R_HOME 变量,并 pandas


Note: rpy2 requires R installation with setting R_HOME variable, and pandas must be installed as well.

我刚刚创建了 PyDataset ,这是一个简单的模块,可让您轻松地从Python加载数据集,就像 R 一样(它不会只需安装 R ,仅 pandas )。

I just created PyDataset, which is a simple module to make loading a dataset from Python as easy as R's (and it does not require R installation, only pandas).

到开始使用它,安装模块:

To start using it, install the module:


$ pip install pydataset

然后只需加载您想要的任何数据集(当前可访问757个数据集):

then just load up any dataset you wish (currently around 757 datasets available) :

from pydataset import data

titanic = data('titanic')

这篇关于是否有适用于Python的示例数据集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆