如何在for循环中追加pandas数据帧中的行? [英] How to append rows in a pandas dataframe in a for loop?

查看:31
本文介绍了如何在for循环中追加pandas数据帧中的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下 for 循环:

I have the following for loop:

for i in links:
     data = urllib2.urlopen(str(i)).read()
     data = json.loads(data)
     data = pd.DataFrame(data.items())
     data = data.transpose()
     data.columns = data.iloc[0]
     data = data.drop(data.index[[0]])

这样创建的每个数据框都有大多数与其他数据框相同的列,但不是全部.而且,它们都只有一排.我需要做的是将 for 循环生成的每个数据框中的所有不同列和每一行添加到数据框中

Each dataframe so created has most columns in common with the others but not all of them. Moreover, they all have just one row. What I need to to is to add to the dataframe all the distinct columns and each row from each dataframe produced by the for loop

我尝试了 pandas concatenate 或类似方法,但似乎没有任何效果.任何的想法?谢谢.

I tried pandas concatenate or similar but nothing seemed to work. Any idea? Thanks.

推荐答案

假设您的数据如下所示:

Suppose your data looks like this:

import pandas as pd
import numpy as np

np.random.seed(2015)
df = pd.DataFrame([])
for i in range(5):
    data = dict(zip(np.random.choice(10, replace=False, size=5),
                    np.random.randint(10, size=5)))
    data = pd.DataFrame(data.items())
    data = data.transpose()
    data.columns = data.iloc[0]
    data = data.drop(data.index[[0]])
    df = df.append(data)
print('{}
'.format(df))
# 0   0   1   2   3   4   5   6   7   8   9
# 1   6 NaN NaN   8   5 NaN NaN   7   0 NaN
# 1 NaN   9   6 NaN   2 NaN   1 NaN NaN   2
# 1 NaN   2   2   1   2 NaN   1 NaN NaN NaN
# 1   6 NaN   6 NaN   4   4   0 NaN NaN NaN
# 1 NaN   9 NaN   9 NaN   7   1   9 NaN NaN

然后可以换成

np.random.seed(2015)
data = []
for i in range(5):
    data.append(dict(zip(np.random.choice(10, replace=False, size=5),
                         np.random.randint(10, size=5))))
df = pd.DataFrame(data)
print(df)

换句话说,不要为每一行形成一个新的 DataFrame.相反,收集字典列表中的所有数据,然后在循环外调用 df = pd.DataFrame(data) 一次.

In other words, do not form a new DataFrame for each row. Instead, collect all the data in a list of dicts, and then call df = pd.DataFrame(data) once at the end, outside the loop.

每次调用 df.append 都需要为新的 DataFrame 分配一个额外的行空间,将原始 DataFrame 中的所有数据复制到新 DataFrame 中,然后将数据复制到新行中.所有这些分配和复制使得在循环中调用 df.append 非常低效.复制的时间成本随行数呈二次方增长.call-DataFrame-once 代码不仅更容易编写,它的性能也会好得多——复制的时间成本随着行数线性增长.

Each call to df.append requires allocating space for a new DataFrame with one extra row, copying all the data from the original DataFrame into the new DataFrame, and then copying data into the new row. All that allocation and copying makes calling df.append in a loop very inefficient. The time cost of copying grows quadratically with the number of rows. Not only is the call-DataFrame-once code easier to write, its performance will be much better -- the time cost of copying grows linearly with the number of rows.

这篇关于如何在for循环中追加pandas数据帧中的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆