使用 pandas 对列表进行排序和安排 [英] Sorting and arranging a list using pandas

查看:162
本文介绍了使用 pandas 对列表进行排序和安排的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的输入文件,该文件需要按以下顺序排列:键值需要按升序排列,而最后一个不存在的键则需要打印出来. 我正在按要求的格式排列数据,但顺序丢失了.

I have an input file as shown below which needs to be arranged in such an order that the key values need to be in ascending order, while the keys which are not present need to be printed in the last. I am getting the data arranged in the required format but the order is missing.

我尝试使用sort()方法,但显示列表没有属性排序". 请提出解决方案,并提出是否需要任何修改.

I have tried using sort() method but it shows "list has no attribute sort". Please suggest solution and also suggest if any modifications required.

输入文件:

3=1388|4=1388|5=IBM|8=157.75|9=88929|1021=1500|854=n|388=157.75|394=157.75|474=157.75|1584=88929|444=20160713|459=93000546718000|461=7|55=93000552181000|22=89020|400=157.75|361=0.73|981=0|16=1468416600.6006|18=1468416600.6006|362=0.46
3=1388|4=1388|5=IBM|8=157.73|9=100|1021=0|854=p|394=157.73|474=157.749977558|1584=89029|444=20160713|459=93001362639104|461=26142|55=93001362849000|22=89120|361=0.71|981=0|16=1468416601.372|18=1468416601.372|362=0.45
3=1388|4=1388|5=IBM|8=157.69|9=100|1021=600|854=p|394=157.69|474=157.749910415|1584=89129|444=20160713|459=93004178882560|461=27052|55=93004179085000|22=89328|361=0.67|981=1|16=1468416604.1916|18=1468416604.1916|362=0.43

我尝试的代码:

import pandas as pd
import numpy as np
df = pd.read_csv('inputfile', index_col=None, names=['text'])
s = df.text.str.split('|')
ds = [dict(w.split('=', 1) for w in x) for x in s]
p = pd.DataFrame.from_records(ds)
p1 = p.replace(np.nan,'n/a', regex=True)
st = p1.stack(level=0,dropna=False)
dfs = [g for i,g in st.groupby(level=0)]
#print st
i = 0
while i < len(dfs):    
    #index of each column
    print ('\nindex[%d]'%i)
    for (_,k),v in dfs[i].iteritems():
        print k,'\t',v
    i = i + 1

输出获取:

index[0]
1021  1500
1584  88929
16    1468416600.6006
18    1468416600.6006
22    89020
3     1388
361   0.73
362   0.46
388   157.75
394   157.75
4     1388
400   157.75
444   20160713
459   93000546718000
461   7
474   157.75
5     IBM
55    93000552181000
8     157.75
854   n
9     88929
981   0

index[1]
1021  0
1584  89029
16    1468416601.372
18    1468416601.372
22    89120
3     1388
361   0.71
362   0.45
388   n/a
394   157.73
4     1388
400   n/a
444   20160713
459   93001362639104
461   26142
474   157.749977558
5     IBM
55    93001362849000
8     157.73
854   p
9     100
981   0

预期输出:

index[0]
3     1388
4     1388
5     IBM
8     157.75
9     88929
16    1468416600.6006
18    1468416600.6006
22    89020
55    93000552181000
361   0.73
362   0.46
388   157.75
394   157.75
400   157.75
444   20160713
459   93000546718000
461   7
474   157.75
854   n
981   0
1021  1500
1584  88929

index[1]
3     1388
4     1388
5     IBM
8     157.75
9     88929
16    1468416600.6006
18    1468416600.6006
22    89020
55    93000552181000
361   0.73
362   0.46
394   157.75
444   20160713
459   93000546718000
461   7
474   157.75
854   n
981   0
1021  1500
1584  88929
388   n/a
400   n/a

推荐答案

ds = [{int(pair[0]): pair[1] for pair in [w.split('=', 1) for w in x]} for x in s]

将索引转换为整数,以便对其进行数字排序

To convert the index to an integer so it will be sorted numerically

要在最后输出n/a值,可以使用pandas选择首先输出非空值,然后输出空值,例如:

To output the n/a values at the end, you could use the pandas selection to output the nonnull values first, then the null values, e.g:

for (ix, series) in p.iterrows():
    print('\nindex[%d]' % ix)
    output_series(ix, series[pd.notnull])
    output_series(ix, series[pd.isnull].fillna('n/a'))

顺便说一句,您还可以简化堆栈,分组方式,打印到:

btw, you can also simplify your stack, groupby, print to:

for (ix, series) in p1.iterrows():
    print('\nindex[%d]' % ix)
    for tag, value in series.iteritems():
        print(tag, '\t', value)

因此整个脚本变为:

def output_series(ix, series):
    for tag, value in series.iteritems():
        print(tag, '\t', value)

df = pd.read_csv('inputfile', index_col=None, names=['text'])
s = df.text.str.split('|')
ds = [{int(pair[0]): pair[1] for pair in [w.split('=', 1) for w in x]} for x in s]
p = pd.DataFrame.from_records(ds)
for (ix, series) in p.iterrows():
    print('\nindex[%d]' % ix)
    output_series(ix, series[pd.notnull])
    output_series(ix, series[pd.isnull].fillna('n/a'))

这篇关于使用 pandas 对列表进行排序和安排的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆