Python - 大 pandas - 将系列附加到空白DataFrame中 [英] Python - pandas - Append Series into Blank DataFrame

查看:158
本文介绍了Python - 大 pandas - 将系列附加到空白DataFrame中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我在python里有两只大熊猫系列:

  import pandas as pd 
h = pd.Series([ 'g',4,2,1,1])$ ​​b $ bg = pd.Series([1,6,5,4,abc])

我可以用h创建一个DataFrame,然后附加g:

  df = pd.DataFrame([h])
df1 = df.append(g,ignore_index = True)

我得到:

 >>> df1 
0 1 2 3 4
0 g 4 2 1 1
1 1 6 5 4 abc

但现在假设我有一个空的DataFrame,我试图追加h:

  df2 = pd.DataFrame([])
df3 = df2.append(h,ignore_index = True)

这不行。我认为问题出在第二行到第二行。我需要以某种方式定义空白的DataFrame以具有正确的列数。



顺便说一下,我试图这样做的原因是我正在从互联网使用请求+ BeautifulSoup,我正在处理它,并尝试一次写入DataFrame一行。

解决方案

所以如果你没有将空列表传递给DataFrame构造函数,那么它的工作原理是:

 在[16]中:

df = pd.DataFrame()
h = pd.Series(['g',4,2,1,1])$ ​​b $ b df = df.append(h,ignore_index = True )
df
输出[16]:
0 1 2 3 4
0 g 4 2 1 1

[1行×5列]

两种构造方法之间的区别似乎是索引 dtypes 设置不同,空列表是 Int64 ,没有任何东西是对象

 在[21]中:

df = pd.DataFrame()
pr int(df.index.dtype)
df = pd.DataFrame([])
print(df.index.dtype)
对象
int64

我不清楚为什么上述情况会影响行为(我猜这里)。



更新



重新访问后,我可以确认这看起来是一个熊猫版本的错误 0.12.0 因为您的原始代码正常工作:

 在[13]中:

import pandas as pd
df = pd.DataFrame([])
h = pd.Series(['g',4,2,1,1])$ ​​b $ b df.append(h,ignore_index = True)

输出[13]:
0 1 2 3 4
0 g 4 2 1 1

[1行×5列]

我正在运行大熊猫 0.13.1 和numpy 1.8.1 64位使用python 3.3.5.0 但我认为问题是大熊猫,但是我会升级大熊猫和麻木来安全,我不认为这是一个32对64位的Python问题。


Say I have two pandas Series in python:

import pandas as pd
h = pd.Series(['g',4,2,1,1])
g = pd.Series([1,6,5,4,"abc"])

I can create a DataFrame with just h and then append g to it:

df = pd.DataFrame([h])
df1 = df.append(g, ignore_index=True)

I get:

>>> df1
   0  1  2  3    4
0  g  4  2  1    1
1  1  6  5  4  abc

But now suppose that I have an empty DataFrame and I try to append h to it:

df2 = pd.DataFrame([])
df3 = df2.append(h, ignore_index=True)

This does not work. I think the problem is in the second-to-last line of code. I need to somehow define the blank DataFrame to have the proper number of columns.

By the way, the reason I am trying to do this is that I am scraping text from the internet using requests+BeautifulSoup and I am processing it and trying to write it to a DataFrame one row at a time.

解决方案

So if you don't pass an empty list to the DataFrame constructor then it works:

In [16]:

df = pd.DataFrame()
h = pd.Series(['g',4,2,1,1])
df = df.append(h,ignore_index=True)
df
Out[16]:
   0  1  2  3  4
0  g  4  2  1  1

[1 rows x 5 columns]

The difference between the two constructor approaches appears to be that the index dtypes are set differently, with an empty list it is an Int64 with nothing it is an object:

In [21]:

df = pd.DataFrame()
print(df.index.dtype)
df = pd.DataFrame([])
print(df.index.dtype)
object
int64

Unclear to me why the above should affect the behaviour (I'm guessing here).

UPDATE

After revisiting this I can confirm that this looks to me to be a bug in pandas version 0.12.0 as your original code works fine:

In [13]:

import pandas as pd
df = pd.DataFrame([])
h = pd.Series(['g',4,2,1,1])
df.append(h,ignore_index=True)

Out[13]:
   0  1  2  3  4
0  g  4  2  1  1

[1 rows x 5 columns]

I am running pandas 0.13.1 and numpy 1.8.1 64-bit using python 3.3.5.0 but I think the problem is pandas but I would upgrade both pandas and numpy to be safe, I don't think this is a 32 versus 64-bit python issue.

这篇关于Python - 大 pandas - 将系列附加到空白DataFrame中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆