如何使用Pandas完全忽略csv中的空格 [英] How to completely ignore whitespaces in csv with Pandas

查看:616
本文介绍了如何使用Pandas完全忽略csv中的空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试以最小限度的人类可读性也易于熊猫读取的格式来创建.csv文件.这意味着列应该整齐地分开,以便您可以轻松识别每个值所属的列.问题是,用空格填充它会降低熊猫功能.到目前为止,我所拥有的是

I am trying to make a .csv file in a format that is both minimally human-readable and also easily pandas-readable. That means columns should be neatly separated so you can easily identify to which column each value belongs. Problem is, filling it up with whitespaces has some cut-downs in pandas functionality. So far what I've got is

work    ,roughness  ,unstab ,corr_c_w   ,u_star ,c_star
us      ,True       ,True   ,-0.39      ,0.35   ,-.99
wang    ,False      ,       ,-0.5       ,       ,
cheng   ,           ,True   ,           ,       ,
watanabe,           ,       ,           ,0.15   ,-.80

如果我取出上述.csv上的所有空格并直接用pd.read_csv读取,则效果很好.前两列为布尔值,其他为浮点数.但是,如果没有空格,则根本无法阅读.当我用

If I take out all the whitespaces on the above .csv and read it directly with pd.read_csv it works perfectly. The first two columns are booleans and the others are floats. However, it is not human-readable at all without the whitespaces. When I read the above .csv with

pd.read_csv('bibrev.csv', index_col=0)

它不起作用,因为显然所有的列和认为的字符串都包含空格.当我使用

it doesn't work because all the columns and considered string that include, obviously, the whitespaces. When I use

pd.read_csv('bibrev.csv', index_col=0, skipinitialspace=True)

然后它可以工作,因为浮点数被读取为浮点数,缺失值被读取为NaN s,这是一个很大的改进.但是,列名和布尔列仍然是带空格的字符串.

then it kind of works, because floats are read as floats and missing values are read as NaNs, which is a big improvement. However, the column names and boolean columns are still strings with whitespaces.

有直接用熊猫读取.csv的方法吗?还是有可能将csv格式转换为一点,并且仍然可以通过人类可读的.csv进行清晰阅读?

Any method of reading that .csv directly with pandas? Or maybe chance the csv format a bit and still have a clean-read with a human-readable .csv?

PS .:我试图避免使用python作为字符串读取所有内容,替换空白,然后将其提供给pandas,并且还尝试避免定义某些函数,并通过converters关键字将其传递给pandas.

PS.: I am trying to avoid reading everything with python as a string, replacing whitespaces and then feeding it to pandas and also trying to avoid defining some functions and passing it to pandas through the converters keyword.

推荐答案

尝试一下:

import pandas as pd

def booleator(col):
    if str(col).lower() in ['true', 'yes']:
        return True
    #elif str(col).lower() == "false":
    #    return False
    else:
        return False

df = pd.read_csv('data.csv', sep='\s*,\s*', index_col=0,
                 converters={'roughness': booleator, 'unstab': booleator},
                 engine='python')
print(df)
print(df.dtypes)

输出:

         roughness unstab  corr_c_w  u_star  c_star
work
us            True   True     -0.39    0.35   -0.99
wang         False  False     -0.50     NaN     NaN
cheng        False   True       NaN     NaN     NaN
watanabe     False  False       NaN    0.15   -0.80
roughness       bool
unstab          bool
corr_c_w     float64
u_star       float64
c_star       float64
dtype: object

此版本还处理布尔值-所有NaN都将转换为False,否则Pandas会将dtype提升为Object(请参阅我的评论中的详细信息)...

This version also takes care of booleans - all NaN's will be converted to False, otherwise Pandas will promote dtype to Object (see details in my comment)...

这篇关于如何使用Pandas完全忽略csv中的空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆