Python Pandas,创建指定列dtypes的空DataFrame [英] Python Pandas, create empty DataFrame specifying column dtypes
问题描述
我发现自己不得不经常做一件事,这让我感到惊讶,在熊猫中实现这一目标有多么困难.假设我需要使用指定的索引类型和名称以及列类型和名称创建一个空的DataFrame
. (例如,我可能想在以后的循环中填充它.)我发现,最简单的方法是为每列创建一个空的pandas.Series
对象,并指定其dtype
并将其放入放入指定其名称的字典中,然后将字典传递到DataFrame
构造函数中.类似于以下内容.
There is one thing that I find myself having to do quite often, and it surprises me how difficult it is to achieve this in Pandas. Suppose I need to create an empty DataFrame
with specified index type and name, and column types and names. (I might want to fill it later, in a loop for example.) The easiest way to do this, that I have found, is to create an empty pandas.Series
object for each column, specifying their dtype
s, put them into a dictionary which specifies their names, and pass the dictionary into the DataFrame
constructor. Something like the following.
def create_empty_dataframe():
index = pandas.Index([], name="id", dtype=int)
column_names = ["name", "score", "height", "weight"]
series = [pandas.Series(dtype=str), pandas.Series(dtype=int), pandas.Series(dtype=float), pandas.Series(dtype=float)]
columns = dict(zip(column_names, series))
return pandas.DataFrame(columns, index=index, columns=column_names)
# The columns=column_names is required because the dictionary will in general put the columns in arbitrary order.
第一个问题.以上确实是最简单的方法吗?关于这一点,有很多事情令人费解.我真正想做的事情,以及我敢肯定很多人真正想做的事情,如下所示.
First question. Is the above really the simplest way of doing this? There are so many things that are convoluted about this. What I really want to do, and what I'm pretty sure a lot of people really want to do, is something like the following.
df = pandas.DataFrame(columns=["id", "name", "score", "height", "weight"], dtypes=[int, str, int, float, float], index_column="id")
第二个问题.在Pandas中,这种语法完全可行吗?如果没有,那么开发人员是否会考虑完全支持这样的事情?在我看来,它确实应该像这样简单(上面的语法).
Second question. Is this sort of syntax at all possible in Pandas? If not, are the devs considering supporting something like this at all? It feels to me that it really ought to be as simple as this (the above syntax).
推荐答案
不幸的是,DateFrame
ctor接受单个dtype
描述符,但是您可以使用read_csv
作弊:
Unfortunately the DateFrame
ctor accepts a single dtype
descriptor, however you can cheat a little by using read_csv
:
In [143]:
import pandas as pd
import io
cols=["id", "name", "score", "height", "weight"]
df = pd.read_csv(io.StringIO(""), names=cols, dtype=dict(zip(cols,[int, str, int, float, float])), index_col=['id'])
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 0 entries
Data columns (total 4 columns):
name 0 non-null object
score 0 non-null int32
height 0 non-null float64
weight 0 non-null float64
dtypes: float64(2), int32(1), object(1)
memory usage: 0.0+ bytes
因此,您可以看到dtypes是所需的,并且索引是根据需要设置的:
So you can see that the dtypes are as desired and that the index is set as desired:
In [145]:
df.index
Out[145]:
Int64Index([], dtype='int64', name='id')
这篇关于Python Pandas,创建指定列dtypes的空DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!