如何使用Pandas完全忽略csv中的空格 [英] How to completely ignore whitespaces in csv with Pandas
问题描述
我正在尝试以最小限度的人类可读性也易于熊猫读取的格式来创建.csv文件.这意味着列应该整齐地分开,以便您可以轻松识别每个值所属的列.问题是,用空格填充它会降低熊猫功能.到目前为止,我所拥有的是
I am trying to make a .csv file in a format that is both minimally human-readable and also easily pandas-readable. That means columns should be neatly separated so you can easily identify to which column each value belongs. Problem is, filling it up with whitespaces has some cut-downs in pandas functionality. So far what I've got is
work ,roughness ,unstab ,corr_c_w ,u_star ,c_star
us ,True ,True ,-0.39 ,0.35 ,-.99
wang ,False , ,-0.5 , ,
cheng , ,True , , ,
watanabe, , , ,0.15 ,-.80
如果我取出上述.csv上的所有空格并直接用pd.read_csv
读取,则效果很好.前两列为布尔值,其他为浮点数.但是,如果没有空格,则根本无法阅读.当我用
If I take out all the whitespaces on the above .csv and read it directly with pd.read_csv
it works perfectly. The first two columns are booleans and the others are floats. However, it is not human-readable at all without the whitespaces. When I read the above .csv with
pd.read_csv('bibrev.csv', index_col=0)
它不起作用,因为显然所有的列和认为的字符串都包含空格.当我使用
it doesn't work because all the columns and considered string that include, obviously, the whitespaces. When I use
pd.read_csv('bibrev.csv', index_col=0, skipinitialspace=True)
然后它可以工作,因为浮点数被读取为浮点数,缺失值被读取为NaN
s,这是一个很大的改进.但是,列名和布尔列仍然是带空格的字符串.
then it kind of works, because floats are read as floats and missing values are read as NaN
s, which is a big improvement. However, the column names and boolean columns are still strings with whitespaces.
有直接用熊猫读取.csv的方法吗?还是有可能将csv格式转换为一点,并且仍然可以通过人类可读的.csv进行清晰阅读?
Any method of reading that .csv directly with pandas? Or maybe chance the csv format a bit and still have a clean-read with a human-readable .csv?
PS .:我试图避免使用python作为字符串读取所有内容,替换空白,然后将其提供给pandas,并且还尝试避免定义某些函数,并通过converters
关键字将其传递给pandas.
PS.: I am trying to avoid reading everything with python as a string, replacing whitespaces and then feeding it to pandas and also trying to avoid defining some functions and passing it to pandas through the converters
keyword.
推荐答案
尝试一下:
import pandas as pd
def booleator(col):
if str(col).lower() in ['true', 'yes']:
return True
#elif str(col).lower() == "false":
# return False
else:
return False
df = pd.read_csv('data.csv', sep='\s*,\s*', index_col=0,
converters={'roughness': booleator, 'unstab': booleator},
engine='python')
print(df)
print(df.dtypes)
输出:
roughness unstab corr_c_w u_star c_star
work
us True True -0.39 0.35 -0.99
wang False False -0.50 NaN NaN
cheng False True NaN NaN NaN
watanabe False False NaN 0.15 -0.80
roughness bool
unstab bool
corr_c_w float64
u_star float64
c_star float64
dtype: object
此版本还处理布尔值-所有NaN都将转换为False,否则Pandas会将dtype提升为Object(请参阅我的评论中的详细信息)...
This version also takes care of booleans - all NaN's will be converted to False, otherwise Pandas will promote dtype to Object (see details in my comment)...
这篇关于如何使用Pandas完全忽略csv中的空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!