从CSV读取到数据框 pandas python时dict对象转换为字符串 [英] dict objects converting to string when read from csv to dataframe pandas python

查看:160
本文介绍了从CSV读取到数据框 pandas python时dict对象转换为字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv文件,其中有很多列.一列包含dict对象和字符串形式的数据.

I have a csv file, which has got many columns. One column contains data in the form of dict objects as well as strings.

例如:列包含以下数据:{"a":5,"b":6,"c":8},"usa","india",{"a":9,"b": 10,"c":11}

For eg: Column contains data like : {"a":5,"b":6,"c":8},"usa","india",{"a":9,"b":10,"c":11}

当我使用以下命令将csv读入数据框时:

When I read this csv into a dataframe using :

df = pd.read_csv(path)

当我执行df.applymap(type)时,此列数据被识别为字符串 检查存储在此特定列中的每个元素的类型.

this column data is recognised as string when i did df.applymap(type) to check the type of each element stored in this particular column.

但是,无论在csv中还是在数据帧中,数据都没有引号.但是仍然将字典对象转换为字符串并存储在数据框中.

But data does not have quotes around it neither in csv nor in the dataframe. But still dict objects are converted to string and stored in dataframe.

在检查列的类型时,它证明是对象.

On checking type of column, it turns out to be object.

请建议如何从csv读取数据帧,以便在此特定列中将dict对象识别为dict,将字符串识别为字符串.

Please suggest how to read from csv into dataframe such that dict objects are recognised as dict and strings as strings in this particular column.

推荐答案

您可以使用

You can convert the strings that should be dicts (or other types) using literal_eval:

from ast import literal_eval

def try_literal_eval(s):
    try:
        return literal_eval(s)
    except ValueError:
        return s

现在您可以将其应用于您的DataFrame:

Now you can apply this to your DataFrame:

In [11]: df = pd.DataFrame({'A': ["hello","world",'{"a":5,"b":6,"c":8}',"usa","india",'{"d":9,"e":10,"f":11}']})

In [12]: df.loc[2, "A"]
Out[12]: '{"a":5,"b":6,"c":8}'

In [13]: df
Out[13]:
                       A
0                  hello
1                  world
2    {"a":5,"b":6,"c":8}
3                    usa
4                  india
5  {"d":9,"e":10,"f":11}


In [14]: df.applymap(try_literal_eval)
Out[14]:
                            A
0                       hello
1                       world
2    {'a': 5, 'b': 6, 'c': 8}
3                         usa
4                       india
5  {'d': 9, 'e': 10, 'f': 11}

In [15]: df.applymap(try_literal_eval).loc[2, "A"]
Out[15]: {'a': 5, 'b': 6, 'c': 8}

注意:就其他调用而言,这是非常昂贵的(在时间上),但是,当您处理DataFrames/Series中的字典时,您一定会默认使用python对象,因此事情会变得相对较慢...进行非规范化可能是一个好主意,即将数据作为列返回,例如使用 json_normalize .

Note: This is pretty expensive (time-wise) as far as other calls go, however when you're dealing with dictionaries in DataFrames/Series you're necessarily defaulting back to python objects so things are going to be relatively slow... It's probably a good idea to denormalize i.e. get the data back as columns e.g. using json_normalize.

这篇关于从CSV读取到数据框 pandas python时dict对象转换为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆