Python Pandas,创建指定列dtypes的空DataFrame [英] Python Pandas, create empty DataFrame specifying column dtypes

查看:597
本文介绍了Python Pandas,创建指定列dtypes的空DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现自己不得不经常做一件事,这让我感到惊讶,在熊猫中实现这一目标有多么困难.假设我需要使用指定的索引类型和名称以及列类型和名称创建一个空的DataFrame. (例如,我可能想在以后的循环中填充它.)我发现,最简单的方法是为每列创建一个空的pandas.Series对象,并指定其dtype并将其放入放入指定其名称的字典中,然后将字典传递到DataFrame构造函数中.类似于以下内容.

There is one thing that I find myself having to do quite often, and it surprises me how difficult it is to achieve this in Pandas. Suppose I need to create an empty DataFrame with specified index type and name, and column types and names. (I might want to fill it later, in a loop for example.) The easiest way to do this, that I have found, is to create an empty pandas.Series object for each column, specifying their dtypes, put them into a dictionary which specifies their names, and pass the dictionary into the DataFrame constructor. Something like the following.

def create_empty_dataframe():
    index = pandas.Index([], name="id", dtype=int)
    column_names = ["name", "score", "height", "weight"]
    series = [pandas.Series(dtype=str), pandas.Series(dtype=int), pandas.Series(dtype=float), pandas.Series(dtype=float)]
    columns = dict(zip(column_names, series))
    return pandas.DataFrame(columns, index=index, columns=column_names)
    # The columns=column_names is required because the dictionary will in general put the columns in arbitrary order.

第一个问题.以上确实是最简单的方法吗?关于这一点,有很多事情令人费解.我真正想做的事情,以及我敢肯定很多人真正想做的事情,如下所示.

First question. Is the above really the simplest way of doing this? There are so many things that are convoluted about this. What I really want to do, and what I'm pretty sure a lot of people really want to do, is something like the following.

df = pandas.DataFrame(columns=["id", "name", "score", "height", "weight"], dtypes=[int, str, int, float, float], index_column="id") 

第二个问题.在Pandas中,这种语法完全可行吗?如果没有,那么开发人员是否会考虑完全支持这样的事情?在我看来,它确实应该像这样简单(上面的语法).

Second question. Is this sort of syntax at all possible in Pandas? If not, are the devs considering supporting something like this at all? It feels to me that it really ought to be as simple as this (the above syntax).

推荐答案

不幸的是,DateFrame ctor接受单个dtype描述符,但是您可以使用read_csv作弊:

Unfortunately the DateFrame ctor accepts a single dtype descriptor, however you can cheat a little by using read_csv:

In [143]:
import pandas as pd
import io
cols=["id", "name", "score", "height", "weight"]
df = pd.read_csv(io.StringIO(""), names=cols, dtype=dict(zip(cols,[int, str, int, float, float])), index_col=['id']) 
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 0 entries
Data columns (total 4 columns):
name      0 non-null object
score     0 non-null int32
height    0 non-null float64
weight    0 non-null float64
dtypes: float64(2), int32(1), object(1)
memory usage: 0.0+ bytes

因此,您可以看到dtypes是所需的,并且索引是根据需要设置的:

So you can see that the dtypes are as desired and that the index is set as desired:

In [145]:

df.index
Out[145]:
Int64Index([], dtype='int64', name='id')

这篇关于Python Pandas,创建指定列dtypes的空DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆