pandas 中的样本数据集 [英] Sample datasets in Pandas

查看:188
本文介绍了 pandas 中的样本数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用R时,使用

data(iris)

data(mtcars)

熊猫有类似的东西吗?我知道我可以使用任何其他方法加载,只是想知道是否有内置的东西.

Is there something similar for Pandas? I know I can load using any other method, just curious if there's anything builtin.

推荐答案

自从我最初编写此答案以来,我已经使用现在可用于访问Python中的示例数据集的许多方式对它进行了更新.就我个人而言,我倾向于坚持使用我的任何包装 已经使用过(通常是海生的或大熊猫的).如果您需要离线访问, 用Quilt安装数据集似乎是唯一的选择.

Since I originally wrote this answer, I have updated it with the many ways that are now available for accessing sample data sets in Python. Personally, I tend to stick with whatever package I am already using (usually seaborn or pandas). If you need offline access, installing the data set with Quilt seems to be the only option.

精巧的绘图包seaborn具有几个内置的样本数据集.

The brilliant plotting package seaborn has several built-in sample data sets.

import seaborn as sns

iris = sns.load_dataset('iris')
iris.head()

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

熊猫

如果您不想导入seaborn,但仍想访问其示例 数据集,您可以使用@andrewwowens的方法获取海洋样本 数据:

Pandas

If you do not want to import seaborn, but still want to access its sample data sets, you can use @andrewwowens's approach for the seaborn sample data:

iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

请注意,包含类别列的样本数据集的列 由sns.load_dataset() 修改的类型,结果可能不相同 直接从网址获取.虹膜和笔尖样本数据集也 可在pandas github repo 此处中找到.

Note that the sample data sets containing categorical columns have their column type modified by sns.load_dataset() and the result might not be the same by getting it from the url directly. The iris and tips sample data sets are also available in the pandas github repo here.

由于可以通过pd.read_csv()读取任何数据集,因此可以访问所有 通过复制此R数据集的URL,R的示例数据集 存储库.

Since any dataset can be read via pd.read_csv(), it is possible to access all R's sample data sets by copying the URLs from this R data set repository.

加载R个样本数据集的其他方式包括 statsmodel

Additional ways of loading the R sample data sets include statsmodel

import statsmodels.api as sm

iris = sm.datasets.get_rdataset('iris').data

PyDataset

from pydataset import data

iris = data('iris')

scikit学习

scikit-learn以numpy数组而不是熊猫数据的形式返回示例数据 框架.

scikit-learn

scikit-learn returns sample data as numpy arrays rather than a pandas data frame.

from sklearn.datasets import load_iris

iris = load_iris()
# `iris.data` holds the numerical values
# `iris.feature_names` holds the numerical column names
# `iris.target` holds the categorical (species) values (as ints)
# `iris.target_names` holds the unique categorical names

安静

Quilt 是一个数据集管理器,旨在帮助 数据集管理.它包括许多常见的样本数据集,例如 几个 index.php"rel =" noreferrer> uciml示例 存储库. 快速开始 页面显示了如何安装 并导入虹膜数据集:

Quilt

Quilt is a dataset manager created to facilitate dataset management. It includes many common sample datasets, such as several from the uciml sample repository. The quick start page shows how to install and import the iris data set:

# In your terminal
$ pip install quilt
$ quilt install uciml/iris

安装数据集后,可以在本地访问它,因此,如果要脱机使用数据,这是最佳选择.

After installing a dataset, it is accessible locally, so this is the best option if you want to work with the data offline.

import quilt.data.uciml.iris as ir

iris = ir.tables.iris()

   sepal_length  sepal_width  petal_length  petal_width        class
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa

Quilt还支持数据集版本控制,并包含简短 每个数据集的描述.

Quilt also support dataset versioning and include a short description of each dataset.

这篇关于 pandas 中的样本数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆