从发电机列表创建 pandas 数据框 [英] Create Pandas Dataframe from List of Generators

查看：68 发布时间：2020/10/17 2:43:59 python pandas dataframe generator

本文介绍了从发电机列表创建 pandas 数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我要问以下问题。有没有一种方法可以从python Generator对象列表中构建DataFrame。我使用列表推导来创建包含数据帧数据的列表：

  data_list.append（[record.Timestamp，record.Value ，record.Name，record.desc]记录中的记录）

我这样做是因为正常for循环中的list append花费的时间大约是20倍：

 用于记录中的记录：
 data_list.append（ record.Timestamp，record.Value，record.Name，record.desc）

我试图创建数据框，但不起作用：

此：

 数据框= pd.DataFrame（data_list，columns = ['timestamp'，'value'，'name'，'desc']）

抛出异常：

ValueError：传递了4列，传递的数据有142538列。

我也尝试使用以下itertools：

  dataframe = pd.DataFrame（data =（[[list（elem）for itm.chai中的elem n.from_iterable（data_list）]），columns = ['timestamp'，'value'，'name'，'desc']）

结果为空的DataFrame：

Empty DataFrame\nColumns：[时间戳，值，名称， desc] \nIndex：[]

data_list看起来像这样：

<$ p<发电机对象St ... 51DB0><发电机对象St ... 56EB8><发电机对象St ... 51F10><发电机对象St. ..51F68>]

用于生成列表的代码如下：

 用于events_list中的事件：
用于事件中的记录：
 data_list.append（[record.Timestamp，record.Value，record。记录中的记录的名称，record.desc]

由于事件列表数据结构的缘故，这是必需的。

我是否可以通过生成器列表来创建数据框？如果有，那将节省时间吗？我的意思是，我用列表理解替换普通的for循环节省了很多时间，但是，如果创建数据框需要更多时间，则此操作将毫无意义。

解决方案

只需将您的 data_list 转换为生成器表达式。例如：

 从集合中导入namedtuple 
 
 MyData = namedtuple（ MyData，[ a ]）
 data =（da在（MyData（i）在范围（100）中为i的da中））
 df = pd.DataFrame（data）

就可以了。因此，您应该做的是：

  data =（（record.Timestamp，record.Value，record.Name，record。 desc）记录中的记录）
 df = pd.DataFrame（data，columns = [ Timestamp， Value， Name， Desc]）

您的方法不起作用的实际原因是因为您在 data_list 中只有一个条目我想是142538条记录的生成器。熊猫会尝试将您的 data_list 中的单个条目填充到单行中（因此所有142538个条目，每个条目包含四个元素）都会失败，因为它期望4

编辑：您当然可以使生成器表达式更复杂，这是沿着事件的附加循环的示例：

 从集合导入namedtuple 
 MyData = namedtuple（ MyData，[ a， b]）
数据=（对于范围（j）的j（（da，db），对于范围（i）的i（MyData（j，j + i））（100）的d））
 pd.DataFrame（data，columns = [ a， b]）

编辑：这也是一个使用数据结构的示例，例如：

  Record = namedtuple（ Record，[ Timestamp， Value， Name， desc] ）
 
 event_list = [[Record（Timestamp = 1，Value = 1，Name = 1，desc = 1），
 Record（Timestamp = 2，Value = 2，Name = 2， desc = 2）]，
 [Record（Timestamp = 3，Value = 3，N ame = 3，desc = 3）]] 
 
 data =（（r.Timestamp，r.Value，r.Name，r.desc）对于event_list中的事件，对于r中的事件）
 pd.DataFrame（data，columns = [ timestamp， value， name， desc]）

输出：

 时间戳记值名称desc 
 0 1 1 1 1 
 1 2 2 2 2 
 2 3 3 3 3

I have to following question. Is there a way to build a DataFrame from a list of python Generator objects. I used list comprehension to create the list with data for the dataframe:

data_list.append([record.Timestamp,record.Value, record.Name, record.desc] for record in records)

I did it this way because normal list append in a for loop is taking like 20x times longer:

for record in records:
    data_list.append(record.Timestamp,record.Value, record.Name, record.desc)

I tried to create the dataframe but it doesn't work:

This:

dataframe = pd.DataFrame(data_list, columns=['timestamp', 'value', 'name', 'desc'])

Throws exception:

ValueError: 4 columns passed, passed data had 142538 columns.

I also tried to use itertools like this:

dataframe = pd.DataFrame(data=([list(elem) for elem in itt.chain.from_iterable(data_list)]), columns=['timestamp', 'value', 'name', 'desc'])

This results as a empty DataFrame:

Empty DataFrame\nColumns: [timestamp, value, name, desc]\nIndex: []

data_list looks like this:

[<generator object St...51DB0>, <generator object St...56EB8>,<generator object St...51F10>, <generator object St...51F68>]

Code for generating the list looks like this:

for events in events_list:
    for record in events:
        data_list.append([record.Timestamp,record.Value, record.Name, record.desc] for record in records)

This is required because of events list data structure.

Is there a way for me to create a dataframe out of list of Generators? If there is, is it going to be time efficient? What I mean is that I save a lot of time with replacing normal for loop with list comprehension, however if the creation of dataframe takes more time, this action will be pointless.

解决方案

Just turn your data_list into a generator expression as well. For example:

from collections import namedtuple

MyData = namedtuple("MyData", ["a"])
data = (d.a for d in (MyData(i) for i in range(100)))
df = pd.DataFrame(data)

will work just fine. So what you should do is have:

data = ((record.Timestamp,record.Value, record.Name, record.desc) for record in records)
df = pd.DataFrame(data, columns=["Timestamp", "Value", "Name", "Desc"])

The actual reason why your approach does not work is because you have a single entry in your data_list which is a generator over - I suppose - 142538 records. Pandas will try to cram that single entry in your data_list into a single row (so all the 142538 entries, each a list of four elements) and fails, since it expects rather 4 columns to be passed.

Edit: you can of course make the generator expression more complex, here's an example along the lines of your additional loop over events:

from collections import namedtuple
MyData = namedtuple("MyData", ["a", "b"])
data = ((d.a, d.b) for j in range(100) for d in (MyData(j, j+i) for i in range(100)))
pd.DataFrame(data, columns=["a", "b"])

edit: here's also an example using data structures like you are using:

Record = namedtuple("Record", ["Timestamp", "Value", "Name", "desc"])

event_list = [[Record(Timestamp=1, Value=1, Name=1, desc=1),
               Record(Timestamp=2, Value=2, Name=2, desc=2)],
              [Record(Timestamp=3, Value=3, Name=3, desc=3)]]

data = ((r.Timestamp, r.Value, r.Name, r.desc) for events in event_list for r in events)
pd.DataFrame(data, columns=["timestamp", "value", "name", "desc"])

Output:

    timestamp   value   name    desc
0   1   1   1   1
1   2   2   2   2
2   3   3   3   3

这篇关于从发电机列表创建 pandas 数据框的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从发电机列表创建 pandas 数据框 [英] Create Pandas Dataframe from List of Generators

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从发电机列表创建 pandas 数据框 [英] Create Pandas Dataframe from List of Generators

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭