将数据帧分解成多个数据帧 [英] Splitting dataframe into multiple dataframes

查看:119
本文介绍了将数据帧分解成多个数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的数据帧(大约100万行)来自实验的数据(60个受访者)。
我想将数据帧分为60个数据帧(每个参与者的数据帧)。

I have a very large dataframe (around 1 million rows) with data from an experiment (60 respondents). I would like to split the dataframe into 60 dataframes (a dataframe for each participant).

在数据帧(称为= data)中,有一个名为name的变量,它是每个参与者的唯一代码。

In the dataframe (called = data) there is a variable called 'name' which is the unique code for each participant.

我尝试了以下操作,但没有任何反应(或者一小时内不停止)。我打算做的是将数据框(数据)拆分成较小的数据框,并将其附加到列表(datalist):

I have tried the following, but nothing happens (or the does not stop within an hour). What I intend to do is to split the dataframe (data) into smaller dataframes and append these to a list (datalist):

import pandas as pd

def splitframe(data, name='name'):

    n = data[name][0]

    df = pd.DataFrame(columns=data.columns)

    datalist = []

    for i in range(len(data)):
        if data[name][i] == n:
            df = df.append(data.iloc[i])
        else:
            datalist.append(df)
            df = pd.DataFrame(columns=data.columns)
            n = data[name][i]
            df = df.append(data.iloc[i])

    return datalist

我没有收到错误消息,脚本似乎永远运行!

I do not get an error message, the script just seems to run forever!

有聪明的方法吗?

推荐答案

首先你的方法是低效的,因为附加到列表在一个一个一个的基础上将是缓慢的,因为它必须pe当新条目的空间不足时,列表的理解程度在这方面更好,因为大小已经确定并分配一次。

Firstly your approach is inefficient because the appending to the list on a row by basis will be slow as it has to periodically grow the list when there is insufficient space for the new entry, list comprehensions are better in this respect as the size is determined up front and allocated once.

然而,我从根本上说,你的方法有点浪费,因为你有一个数据框已经是为什么为每个这些用户创建一个新的?

However, I think fundamentally your approach is a little wasteful as you have a dataframe already so why create a new one for each of these users?

我将按照'name',将索引设置为此,如果需要,不要删除列。

I would sort the dataframe by column 'name', set the index to be this and if required not drop the column.

然后生成所有的列表唯一的条目,然后您可以使用这些条目执行查找,并且至关重要的是,如果您仅查询数据,请使用选择标签返回数据框上的视图,而不会导致昂贵的数据副本。

Then generate a list of all the unique entries and then you can perform a lookup using these entries and crucially if you only querying the data, use the selection critieria to return a view on the dataframe without incurring a costly data copy.

所以:

# sort the dataframe
df.sort(columns=['name'], inplace=True)
# set the index to be this and don't drop
df.set_index(keys=['name'], drop=False,inplace=True)
# get a list of names
names=df['name'].unique().tolist()
# now we can perform a lookup on a 'view' of the dataframe
joe = df.loc[df.name=='joe']
# now you can query all 'joes'

这篇关于将数据帧分解成多个数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆