如何在for循环中追加 pandas 数据框中的行? [英] How to append rows in a pandas dataframe in a for loop?

查看:169
本文介绍了如何在for循环中追加 pandas 数据框中的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 对于链接:
data = urllib2.urlopen(str (i))。read()
data = json.loads(data)
data = pd.DataFrame(data.items())
data = data.transpose()
data.columns = data.iloc [0]
data = data.drop(data.index [[0]])

这样创建的每个数据框都与其他列相同,但不是全部。而且,他们都只有一排。我需要的是添加到数据框中的所有不同的列和由for循环产生的每个数据帧的每一行

我尝试了熊猫连接或相似,但似乎没有上班。任何想法?

解决方案

假设您的数据如下所示:

 导入pandas为pd 
导入numpy为np

np.random.seed(2015)
df = pd.DataFrame([])$ (5):
data = dict(np.random.choice(10,replace = False,size = 5),
np.random.randint(10,大小= 5)))
data = pd.DataFrame(data.items())
data = data.transpose()
data.columns = data.iloc [0]
data = data.drop(data.index [[0]])
df = df.append(data)
print('{} \\\
'.format(df))
#0 0 1 2 3 4 5 6 7 8 9
#1 6 NaN NaN 8 5 NaN NaN 7 0 NaN
#1 NaN 9 6 NaN 2 NaN 1 NaN NaN 2
# 1 NaN 2 2 1 2 NaN 1 NaN NaN NaN
#1 6 NaN 6 NaN 4 4 0 NaN NaN NaN
#1 NaN 9 NaN 9 NaN 7 1 9 NaN NaN

然后可以用

  np.random.se (5):
data.append(dict(zip,np.random.choice(10,replace = False,size = 5),
np.random.randint(10,size = 5))))
df = pd.DataFrame(data)
print(df)

换句话说,不要为每一行形成一个新的DataFrame。相反,收集所有数据在一个列表中,然后在循环之外调用 df = pd.DataFrame(data)一次。



每次调用 df.append 时,都需要为新的DataFrame分配一行额外的空间,将原始DataFrame中的所有数据复制到新的DataFrame,然后将数据复制到新的行。所有这些分配和复制都使得在循环中调用 df.append 非常低效。复制的时间成本与行数成正比地增长。调用DataFrame一次的代码不仅更容易编写,而且性能会更好 - 复制的时间成本随着行数的增加而线性增长。


I have the following for loop:

for i in links:
     data = urllib2.urlopen(str(i)).read()
     data = json.loads(data)
     data = pd.DataFrame(data.items())
     data = data.transpose()
     data.columns = data.iloc[0]
     data = data.drop(data.index[[0]])

Each dataframe so created has most columns in common with the others but not all of them. Moreover, they all have just one row. What I need to to is to add to the dataframe all the distinct columns and each row from each dataframe produced by the for loop

I tried pandas concatenate or similar but nothing seemed to work. Any idea? Thanks.

解决方案

Suppose your data looks like this:

import pandas as pd
import numpy as np

np.random.seed(2015)
df = pd.DataFrame([])
for i in range(5):
    data = dict(zip(np.random.choice(10, replace=False, size=5),
                    np.random.randint(10, size=5)))
    data = pd.DataFrame(data.items())
    data = data.transpose()
    data.columns = data.iloc[0]
    data = data.drop(data.index[[0]])
    df = df.append(data)
print('{}\n'.format(df))
# 0   0   1   2   3   4   5   6   7   8   9
# 1   6 NaN NaN   8   5 NaN NaN   7   0 NaN
# 1 NaN   9   6 NaN   2 NaN   1 NaN NaN   2
# 1 NaN   2   2   1   2 NaN   1 NaN NaN NaN
# 1   6 NaN   6 NaN   4   4   0 NaN NaN NaN
# 1 NaN   9 NaN   9 NaN   7   1   9 NaN NaN

Then it could be replaced with

np.random.seed(2015)
data = []
for i in range(5):
    data.append(dict(zip(np.random.choice(10, replace=False, size=5),
                         np.random.randint(10, size=5))))
df = pd.DataFrame(data)
print(df)

In other words, do not form a new DataFrame for each row. Instead, collect all the data in a list of dicts, and then call df = pd.DataFrame(data) once at the end, outside the loop.

Each call to df.append requires allocating space for a new DataFrame with one extra row, copying all the data from the original DataFrame into the new DataFrame, and then copying data into the new row. All that allocation and copying makes calling df.append in a loop very inefficient. The time cost of copying grows quadratically with the number of rows. Not only is the call-DataFrame-once code easier to write, it's performance will be much better -- the time cost of copying grows linearly with the number of rows.

这篇关于如何在for循环中追加 pandas 数据框中的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆