格式化在Python中单调增加的数据 [英] Formatting the data which increases monotonically in Python

查看:141
本文介绍了格式化在Python中单调增加的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经根据需要格式化了数据.现在,我的最终数据或数据帧并没有单调增加,而输入数据根据第一列字段(freq)单调增加.这是Data_input_truncated.txt链接.我的python代码在下面:

I have formatted the data according to the need. Now my final data or dataframe is not monotonically increasing whereas the input data is increasing monotonically according to the 1st column field (freq). Here is the link for Data_input_truncated.txt. My python code is in the below:

import pandas as pd

#create DataFrame from csv with columns f and v 
df = pd.read_csv('Data_input.txt', sep="\s+", names=['freq','v'])

#boolean mask for identify columns of new df   
m = df['v'].str.endswith(')')
#new column by replace NaNs by forward filling
df['g'] = df['v'].where(m).ffill()
#get original ordering for new columns
cols = df['g'].unique()
#remove rows with same values in v and g columns
df = df[df['v'] != df['g']]
#reshape by pivoting with change ordering of columns by reindex
df = df.pivot('freq', 'g', 'v').rename_axis(None, axis=1).reindex(columns=cols).reset_index()

df.columns = [x.replace('(','').replace(')','').replace(',',':') for x in df.columns]
df.to_csv('target.txt', index=False, sep='\t')

现在创建的target.txt不是单调的.这是target.txt链接.保存为文件之前如何使其单调?

Now the created target.txt is not monotonic. Here is the link for target.txt. How can I make it monotonic before saving as a file?

我正在使用Spyder 3.2.6(Anaconda),其中嵌入了python 3.6.4 64位.

I am using Spyder 3.2.6 (Anaconda) where python 3.6.4 64-bit is embedded.

推荐答案

问题是您的数据是str而不是float,并且在旋转时,它会按字母顺序重新排序.一种选择是将freq列的类型更改为float,然后如果科学数字格式很重要,则可以在to_csv期间设置float_format参数:

The problem is that your data is str and not a float, and while pivoting, it is reorder with alphabetical order. One option could be to change the type of the freq column to float, and then if the formatting as scientific number is important, you can set the float_format parameter during to_csv:

### same code before
#remove rows with same values in v and g columns
df = df[df['v'] != df['g']]
# convert to float
df['freq']= df['freq'].astype(float)

#reshape by pivoting with change ordering of columns by reindex
df = df.pivot('freq', 'g', 'v').rename_axis(None, axis=1).reindex(columns=cols).reset_index()

df.columns = [x.replace('(','').replace(')','').replace(',',':') for x in df.columns]
df.to_csv('target.txt', index=False, sep='\t', float_format='%.17E' ) # add float_format='%.17E'

注意float_format='%.17E'表示科学计数法,在输入中的.后面有17个数字,但是如果您不重要,则可以将其更改为所需的任何人.

Note float_format='%.17E' means scientific notation with 17 numbers after the . as in your input, but you can change this number to anyone you want if they are not important.

我在target.txt(前5行和3列)中得到此结果

I get this result in target.txt (first 5 rows and 3 columns)

freq    R1:1    R1:2
0.00000000000000000E+00 4.07868642871600962E0   3.12094533520232087E-13
1.00000000000000000E+06 4.43516799439728793E0   4.58503433913467795E-3
2.00000000000000000E+06 4.54224931058591253E0   1.21517855438593236E-2
3.00000000000000000E+06 4.63952376349496909E0   2.10017318391844077E-2
4.00000000000000000E+06 4.74002677709486608E0   3.05258806632440871E-2

这篇关于格式化在Python中单调增加的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆