pandas 中的样本数据集 [英] Sample datasets in Pandas
问题描述
使用R时,使用
data(iris)
或
data(mtcars)
熊猫有类似的东西吗?我知道我可以使用任何其他方法加载,只是想知道是否有内置的东西.
Is there something similar for Pandas? I know I can load using any other method, just curious if there's anything builtin.
推荐答案
自从我最初编写此答案以来,我已经使用现在可用于访问Python中的示例数据集的许多方式对它进行了更新.就我个人而言,我倾向于坚持使用我的任何包装 已经使用过(通常是海生的或大熊猫的).如果您需要离线访问, 用Quilt安装数据集似乎是唯一的选择.
Since I originally wrote this answer, I have updated it with the many ways that are now available for accessing sample data sets in Python. Personally, I tend to stick with whatever package I am already using (usually seaborn or pandas). If you need offline access, installing the data set with Quilt seems to be the only option.
精巧的绘图包seaborn
具有几个内置的样本数据集.
The brilliant plotting package seaborn
has several built-in sample data sets.
import seaborn as sns
iris = sns.load_dataset('iris')
iris.head()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
熊猫
如果您不想导入seaborn
,但仍想访问其示例
数据集,您可以使用@andrewwowens的方法获取海洋样本
数据:
Pandas
If you do not want to import seaborn
, but still want to access its sample
data sets, you can use @andrewwowens's approach for the seaborn sample
data:
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
请注意,包含类别列的样本数据集的列
由sns.load_dataset()
修改的类型,结果可能不相同
直接从网址获取.虹膜和笔尖样本数据集也
可在pandas github repo 此处中找到.
Note that the sample data sets containing categorical columns have their column
type modified by sns.load_dataset()
and the result might not be the same
by getting it from the url directly. The iris and tips sample data sets are also
available in the pandas github repo here.
由于可以通过pd.read_csv()
读取任何数据集,因此可以访问所有
通过复制此R数据集的URL,R的示例数据集
存储库.
Since any dataset can be read via pd.read_csv()
, it is possible to access all
R's sample data sets by copying the URLs from this R data set
repository.
加载R个样本数据集的其他方式包括
statsmodel
Additional ways of loading the R sample data sets include
statsmodel
import statsmodels.api as sm
iris = sm.datasets.get_rdataset('iris').data
from pydataset import data
iris = data('iris')
scikit学习
scikit-learn
以numpy数组而不是熊猫数据的形式返回示例数据
框架.
scikit-learn
scikit-learn
returns sample data as numpy arrays rather than a pandas data
frame.
from sklearn.datasets import load_iris
iris = load_iris()
# `iris.data` holds the numerical values
# `iris.feature_names` holds the numerical column names
# `iris.target` holds the categorical (species) values (as ints)
# `iris.target_names` holds the unique categorical names
安静
Quilt 是一个数据集管理器,旨在帮助
数据集管理.它包括许多常见的样本数据集,例如
Quilt
Quilt is a dataset manager created to facilitate dataset management. It includes many common sample datasets, such as several from the uciml sample repository. The quick start page shows how to install and import the iris data set:
# In your terminal
$ pip install quilt
$ quilt install uciml/iris
安装数据集后,可以在本地访问它,因此,如果要脱机使用数据,这是最佳选择.
After installing a dataset, it is accessible locally, so this is the best option if you want to work with the data offline.
import quilt.data.uciml.iris as ir
iris = ir.tables.iris()
sepal_length sepal_width petal_length petal_width class
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
Quilt还支持数据集版本控制,并包含简短 每个数据集的描述.
Quilt also support dataset versioning and include a short description of each dataset.
这篇关于 pandas 中的样本数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!