pandas 系列的矢量化格式功能 [英] Vectorized format function for Pandas series

查看:149
本文介绍了 pandas 系列的矢量化格式功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我从系列未格式化的电话号码(字符串)开头,我想将它们格式化为(XXX)YYY-ZZZZ。



我可以使用正则表达式获取输入的子组件, str.match str.extract 。我可以使用以下任一结果执行格式化:

  ser = pd.Series(data = ['1234567890',' (r'(\d {3})(\d {3})(\d {4})($'$'
$ b matched = ser.str.match ')

extract = ser.astype(str).str.extract(r'(?P< first> \d {3})(?P< second> \d {3} )(?P <3rd> \ d {4})')

formatmatched = matched.apply(lambda x:'({0}){1} - {2}'。format * b $ b print'formatmatched'
print formatmatched

formatextracted = extracted.apply(lambda x:'({first}){second} - {third}'。格式(** x.to_dict()),axis = 1)
print'formatextracted'
print formatextracted

结果:

$ p $ formatmatched
0(123)456-7890
1(234)567-8901
2(345)678-9012
dtype:object
formatextracted
0(123)456-7890
1(234)567 -8901
2(345)678-9012
dtype:object

是否有向量化的方式在上下文中应用格式化命令?

解决方案


$

  In [47]:s = pandas.Series([1234567890,5552348866,13434])

在[49]中:s
Out [49]:
0 1234567890
1 5552348866
2 13434
dtype:object

In [50]:s.str.replace(r(\d {3})(\ d {3})(\ d {4}),r(\ 1)\ 2-\\ (3)
Out [50]:
0(123)456-7890
1(555)234-8866
2 13434
dtype:object

您也可以想象首先执行另一个转换来移除任何非数字字符。 b

Say I start with a Series of unformatted phone numbers (as strings), and I would like to format them as (XXX) YYY-ZZZZ.

I can get the sub-components of my input using regular expressions and str.match or str.extract. And I can perform the formatting using the result of either:

ser = pd.Series(data=['1234567890', '2345678901', '3456789012']) 

matched = ser.str.match(r'(\d{3})(\d{3})(\d{4})')

extracted = ser.astype(str).str.extract(r'(?P<first>\d{3})(?P<second>\d{3})(?P<third>\d{4})')

formatmatched = matched.apply(lambda x: '({0}) {1}-{2}'.format(*x))
print 'formatmatched'
print formatmatched

formatextracted = extracted.apply(lambda x: '({first}) {second}-{third}'.format(**x.to_dict()), axis=1)
print 'formatextracted'
print formatextracted

Results:

formatmatched
0    (123) 456-7890
1    (234) 567-8901
2    (345) 678-9012
dtype: object
formatextracted
0    (123) 456-7890
1    (234) 567-8901
2    (345) 678-9012
dtype: object

Is there a vectorized way to apply that formatting command in either context?

解决方案

You can do this directly with Series.str.replace():

In [47]: s = pandas.Series(["1234567890", "5552348866", "13434"])

In [49]: s
Out[49]: 
0    1234567890
1    5552348866
2         13434
dtype: object

In [50]: s.str.replace(r"(\d{3})(\d{3})(\d{4})", r"(\1) \2-\3")
Out[50]: 
0    (123) 456-7890
1    (555) 234-8866
2             13434
dtype: object

You could also imagine doing another transformation first to remove any non-digit characters.

这篇关于 pandas 系列的矢量化格式功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆