从pandas.core.series.Series中删除前导零 [英] Removing leading zeros from pandas.core.series.Series

查看:379
本文介绍了从pandas.core.series.Series中删除前导零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有数据的pandas.core.series.Series

I have a pandas.core.series.Series with data

0    [00115840, 00110005, 001000033, 00116000...
1    [00267285, 00263627, 00267010, 0026513...
2                             [00335595, 00350750]

我想从系列中删除前导零.我尝试过

I want to remove leading zeros from the series.I tried

x.astype('int64')

但是收到错误消息

ValueError: setting an array element with a sequence.

您能建议我如何在python 3.x中执行此操作吗?

Can you suggest me how to do this in python 3.x?

推荐答案

如果要将string的列表转换为integers的列表,请使用list comprehension:

If want list of strings convert to list of integerss use list comprehension:

s = pd.Series([[int(y) for y in x] for x in s], index=s.index)

s = s.apply(lambda x: [int(y) for y in x])

示例:

a = [['00115840', '00110005', '001000033', '00116000'],
     ['00267285', '00263627', '00267010', '0026513'],
     ['00335595', '00350750']]

s = pd.Series(a)
print (s)
0    [00115840, 00110005, 001000033, 00116000]
1      [00267285, 00263627, 00267010, 0026513]
2                         [00335595, 00350750]
dtype: object

s = s.apply(lambda x: [int(y) for y in x])
print (s)
0    [115840, 110005, 1000033, 116000]
1      [267285, 263627, 267010, 26513]
2                     [335595, 350750]
dtype: object

如果只需要integer,则可以将值展平并强制转换为int:

If want integers only you can flatten values and cast to ints:

s = pd.Series([item for sublist in s for item in sublist]).astype(int)

替代解决方案:

import itertools
s = pd.Series(list(itertools.chain(*s))).astype(int)

print (s)
0     115840
1     110005
2    1000033
3     116000
4     267285
5     263627
6     267010
7      26513
8     335595
9     350750
dtype: int32

时间:

a = [['00115840', '00110005', '001000033', '00116000'],
     ['00267285', '00263627', '00267010', '0026513'],
     ['00335595', '00350750']]

s = pd.Series(a)
s = pd.concat([s]*1000).reset_index(drop=True)

In [203]: %timeit pd.Series([[int(y) for y in x] for x in s], index=s.index)
100 loops, best of 3: 4.66 ms per loop

In [204]: %timeit s.apply(lambda x: [int(y) for y in x])
100 loops, best of 3: 5.13 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ sol
In [205]: %%timeit
     ...: v = pd.Series(np.concatenate(s.values.tolist()))
     ...: v.astype(int).groupby(s.index.repeat(s.str.len())).agg(pd.Series.tolist)
     ...: 
1 loop, best of 3: 226 ms per loop

#Wen solution
In [211]: %timeit pd.Series(s.apply(pd.Series).stack().astype(int).groupby(level=0).apply(list))
1 loop, best of 3: 1.12 s per loop

扁平化的解决方案(@cᴏʟᴅsᴘᴇᴇᴅ的想法):

Solutions with flatenning (idea of @cᴏʟᴅsᴘᴇᴇᴅ):

In [208]: %timeit pd.Series([item for sublist in s for item in sublist]).astype(int)
100 loops, best of 3: 2.55 ms per loop

In [209]: %timeit pd.Series(list(itertools.chain(*s))).astype(int)
100 loops, best of 3: 2.2 ms per loop

#cᴏʟᴅsᴘᴇᴇᴅ sol
In [210]: %timeit pd.Series(np.concatenate(s.values.tolist()))
100 loops, best of 3: 7.71 ms per loop

这篇关于从pandas.core.series.Series中删除前导零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆