pandas 系列的矢量化格式功能 [英] Vectorized format function for Pandas series

查看：149 发布时间：2018/2/4 11:48:50 python string formatting pandas

本文介绍了 pandas 系列的矢量化格式功能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我从系列未格式化的电话号码（字符串）开头，我想将它们格式化为（XXX）YYY-ZZZZ。

我可以使用正则表达式获取输入的子组件， str.match 或 str.extract 。我可以使用以下任一结果执行格式化：

  ser = pd.Series（data = ['1234567890'，' （r'（\d {3}）（\d {3}）（\d {4}）（$'$'
 $ b matched = ser.str.match '）
 
 extract = ser.astype（str）.str.extract（r'（？P< first> \d {3}）（？P< second> \d {3} ）（？P <3rd> \ d {4}）'）
 
 formatmatched = matched.apply（lambda x：'（{0}）{1}  -  {2}'。format * b $ b print'formatmatched'
 print formatmatched 
 
 formatextracted = extracted.apply（lambda x：'（{first}）{second}  -  {third}'。格式（** x.to_dict（）），axis = 1）
 print'formatextracted'
 print formatextracted

结果：

$ p $ formatmatched 0（123）456-7890 1（234）567-8901 2（345）678-9012 dtype：object formatextracted 0（123）456-7890 1（234）567 -8901 2（345）678-9012 dtype：object

是否有向量化的方式在上下文中应用格式化命令？

解决方案

  In [47]：s = pandas.Series（[1234567890，5552348866，13434]）
 
在[49]中：s 
 Out [49]：
 0 1234567890 
 1 5552348866 
 2 13434 
 dtype：object 
 
 In [50]：s.str.replace（r（\d {3}）（\ d {3}）（\ d {4}），r（\ 1）\ 2-\\ （3）
 Out [50]：
 0（123）456-7890 
 1（555）234-8866 
 2 13434 
 dtype：object

您也可以想象首先执行另一个转换来移除任何非数字字符。 b

Say I start with a Series of unformatted phone numbers (as strings), and I would like to format them as (XXX) YYY-ZZZZ.

I can get the sub-components of my input using regular expressions and str.match or str.extract. And I can perform the formatting using the result of either:
ser = pd.Series(data=['1234567890', '2345678901', '3456789012']) matched = ser.str.match(r'(\d{3})(\d{3})(\d{4})') extracted = ser.astype(str).str.extract(r'(?P<first>\d{3})(?P<second>\d{3})(?P<third>\d{4})') formatmatched = matched.apply(lambda x: '({0}) {1}-{2}'.format(*x)) print 'formatmatched' print formatmatched formatextracted = extracted.apply(lambda x: '({first}) {second}-{third}'.format(**x.to_dict()), axis=1) print 'formatextracted' print formatextracted
Results:
formatmatched 0 (123) 456-7890 1 (234) 567-8901 2 (345) 678-9012 dtype: object formatextracted 0 (123) 456-7890 1 (234) 567-8901 2 (345) 678-9012 dtype: object
Is there a vectorized way to apply that formatting command in either context?
解决方案
You can do this directly with Series.str.replace():
In [47]: s = pandas.Series(["1234567890", "5552348866", "13434"]) In [49]: s Out[49]: 0 1234567890 1 5552348866 2 13434 dtype: object In [50]: s.str.replace(r"(\d{3})(\d{3})(\d{4})", r"(\1) \2-\3") Out[50]: 0 (123) 456-7890 1 (555) 234-8866 2 13434 dtype: object
You could also imagine doing another transformation first to remove any non-digit characters.

这篇关于 pandas 系列的矢量化格式功能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 系列的矢量化格式功能 [英] Vectorized format function for Pandas series

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 系列的矢量化格式功能 [英] Vectorized format function for Pandas series

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭