在pandas如何读列在列中的csv文件? [英] In pandas how to read csv files with lists in a column?
问题描述
我有一个csv文件,其中一些列如下:
df = pd.DataFrame ':[[8.6,1.3,2.5],[7.5,1.2],...,[b] []],'c':[[12,23,79],[42,10],[]]}
Out [1]:abc
0 [ID1, ID2,ID3] [8.6,1.3,2.5] [12,23,79]
1 [ID1,ID4] [7.5,1.2] [42,10]
2 [] [] []
这是我读的时候用 pandas.read_csv
,Python将这些列视为字符串。有没有办法通过作为选项,它是一个列表中的数字列? (也许一些 dtype = something
)
PS:我可以做一个列表解析
ast.literal_eval
之后,但它需要一段时间,所以我宁愿在阅读csv之后。 PS2:原始的csv文件是600 000行长(这就是为什么它需要一些时间 literal_eval
它的列包含:
'项目的ID''postcode''city''最后3列中的列表的列表''其他项目的id''从初始项目'jetlag的距离项目'
对象int字符串int字符串列表浮点列表ints列表
为此,您可以使用 pd.read_csv
中的转换器
函数( read_csv的文档):
使用您的示例,
'项目ID'邮政编码城市列表的列表在最后3列''其他项目的id''从初始项目的距离''从初始项目的jetlag
对象int字符串int列表的字符串列表的列表ints
可以这样做:
import pandas as pd
import ast
generic = lambda x:ast.literal_eval(x)
conv = {' :generic,
'与初始项目的距离':generic,
'从初始项目jetlag':generic}
df = pd.read_csv('your_file.csv',converters = conv)
您必须定义使用转换的资料栏,
转换器函数将在您的csv导入期间应用,如果您的文件太大,您可以随时读取csv的块。 / p>
I have a csv file in which some columns which look like this:
df = pd.DataFrame({'a':[['ID1','ID2','ID3'],['ID1','ID4'],[]],'b':[[8.6,1.3,2.5],[7.5,1.2],[]],'c':[[12,23,79],[42,10],[]]})
Out[1]: a b c
0 [ID1, ID2, ID3] [8.6, 1.3, 2.5] [12, 23, 79]
1 [ID1, ID4] [7.5, 1.2] [42, 10]
2 [] [] []
The thing is that when I read it, with pandas.read_csv
, Python considers those columns as strings. Is there a way to pass as option that it is a list of numbers within those columns? (maybe some dtype = something
)
PS: I can do a list comprehension with ast.literal_eval
afterwards, but it takes a while, so I'd rather have it as soon as I read the csv.
PS2: the original csv file is 600 000 lines long (which is why it takes some time to literal_eval
. Its columns contain :
'ID of the project' 'postcode' 'city' 'len of the lists in the last 3 columns' 'ids of other projects' 'distance from initial project' 'jetlag from initial project'
object int string int list of strings list of floats list of ints
To do this, you can make use of the converters
in the pd.read_csv
function (Documentation for read_csv:
Using your example,
'ID of the project' 'postcode' 'city' 'len of the lists in the last 3 columns' 'ids of other projects' 'distance from initial project' 'jetlag from initial project'
object int string int list of strings list of floats list of ints
it could be done in this way:
import pandas as pd
import ast
generic = lambda x: ast.literal_eval(x)
conv = {'ids of other projects': generic,
'distance from initial project': generic,
'jetlag from initial project': generic}
df = pd.read_csv('your_file.csv', converters=conv)
You would have to define for which columns to use your conversion, but this should not be a problem in your case.
The converter function will be applied during your csv import, and if your file gets too large, you can always read the csv in chunks.
这篇关于在pandas如何读列在列中的csv文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!