在pandas如何读列在列中的csv文件? [英] In pandas how to read csv files with lists in a column?

查看:214
本文介绍了在pandas如何读列在列中的csv文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv文件,其中一些列如下:

  df = pd.DataFrame ':[[8.6,1.3,2.5],[7.5,1.2],...,[b] []],'c':[[12,23,79],[42,10],[]]} 

Out [1]:abc
0 [ID1, ID2,ID3] [8.6,1.3,2.5] [12,23,79]
1 [ID1,ID4] [7.5,1.2] [42,10]
2 [] [] []

这是我读的时候用 pandas.read_csv ,Python将这些列视为字符串。有没有办法通过作为选项,它是一个列表中的数字列? (也许一些 dtype = something



PS:我可以做一个列表解析 ast.literal_eval 之后,但它需要一段时间,所以我宁愿在阅读csv之后。



PS2:原始的csv文件是600 000行长(这就是为什么它需要一些时间 literal_eval 它的列包含:

 '项目的ID''postcode''city''最后3列中的列表的列表''其他项目的id''从初始项目'jetlag的距离项目'
对象int字符串int字符串列表浮点列表ints列表


解决方案

为此,您可以使用 pd.read_csv 中的转换器函数( read_csv的文档):



使用您的示例,

 '项目ID'邮政编码城市列表的列表在最后3列''其他项目的id''从初始项目的距离''从初始项目的jetlag 
对象int字符串int列表的字符串列表的列表ints

可以这样做:

  import pandas as pd 
import ast
generic = lambda x:ast.literal_eval(x)
conv = {' :generic,
'与初始项目的距离':generic,
'从初始项目jetlag':generic}

df = pd.read_csv('your_file.csv',converters = conv)

您必须定义使用转换的资料栏,



转换器函数将在您的csv导入期间应用,如果您的文件太大,您可以随时读取csv的块。 / p>

I have a csv file in which some columns which look like this:

df = pd.DataFrame({'a':[['ID1','ID2','ID3'],['ID1','ID4'],[]],'b':[[8.6,1.3,2.5],[7.5,1.2],[]],'c':[[12,23,79],[42,10],[]]})

Out[1]:     a               b                c
        0   [ID1, ID2, ID3] [8.6, 1.3, 2.5] [12, 23, 79]
        1   [ID1, ID4]      [7.5, 1.2]      [42, 10]
        2   []              []              []

The thing is that when I read it, with pandas.read_csv, Python considers those columns as strings. Is there a way to pass as option that it is a list of numbers within those columns? (maybe some dtype = something)

PS: I can do a list comprehension with ast.literal_eval afterwards, but it takes a while, so I'd rather have it as soon as I read the csv.

PS2: the original csv file is 600 000 lines long (which is why it takes some time to literal_eval. Its columns contain :

'ID of the project'  'postcode'    'city'       'len of the lists in the last 3 columns'  'ids of other projects'   'distance from initial project'  'jetlag from initial project'
 object                int          string       int                                       list of strings           list of floats                   list of ints

解决方案

To do this, you can make use of the converters in the pd.read_csv function (Documentation for read_csv:

Using your example,

'ID of the project'  'postcode'    'city'       'len of the lists in the last 3 columns'  'ids of other projects'   'distance from initial project'  'jetlag from initial project'
 object                int          string       int                                       list of strings           list of floats                   list of ints

it could be done in this way:

import pandas as pd
import ast
generic = lambda x: ast.literal_eval(x)
conv = {'ids of other projects': generic,
        'distance from initial project': generic,
        'jetlag from initial project': generic}

df = pd.read_csv('your_file.csv', converters=conv)

You would have to define for which columns to use your conversion, but this should not be a problem in your case.

The converter function will be applied during your csv import, and if your file gets too large, you can always read the csv in chunks.

这篇关于在pandas如何读列在列中的csv文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆