将字符串解析为列python pandas/xa0而不是空格 [英] parse string into columns python pandas /xa0 in stead of white space

查看:486
本文介绍了将字符串解析为列python pandas/xa0而不是空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何快速创建包含文件"列中包含的三个块的新列?

How do I quickly make new columns that hold the three chunks contained in the column 'File'?

收到这样的混乱数据

d = {   'File' : pd.Series(['firstname lastname                   05/31/1996                     9999999999  ', 'FN SometimesMiddileInitial. LN                    05/31/1996                 9999999999  ']), 
    'Status' : pd.Series([0., 0.]), 
    'Error' : pd.Series([2., 2.])}
df=pd.DataFrame(d)

更新 实际上,我是从一个非常混乱的Excel文件开始的,并且我的数据在字符串字符之间有'\ xa0 \ xa0'. 所以我的第一次尝试看起来是

UPDATE In reality, i'm starting from a very messy excel file and my data has '\xa0 \xa0' between string characters. so my first attempt looks like

from pandas import DataFrame, ExcelFile
import pandas as pd
location = r'c:/users/meinzerc/Desktop/table.xlsx'
xls = ExcelFile(location)
table = xls.parse('Sheet1')
splitdf = df['File'].str.split('\s*)

我的尝试根本行不通.为什么?

My attempt doesn't work at all. WHY?

推荐答案

您可以使用正则表达式选择至少两个空格:

You could use a regex to pick up at least two spaces:

In [11]: df.File.str.split('\s\s+')
Out[11]: 
0       [firstname lastname, 05/31/1996, 9999999999, ]
1    [FN SometimesMiddileInitial. LN, 05/31/1996, 9...
Name: File, dtype: object

也许更好的选择是使用提取(也许还有更整洁的正则表达式!):

Perhaps a better option is to use extract (and perhaps there is a neater regex!!):

In [12]: df.File.str.extract('\s*(?P<name>.*?)\s+(?P<date>\d+/\d+/\d+)\s+(?P<number>\w+)\s*')
Out[12]: 
                             name        date      number
0              firstname lastname  05/31/1996  9999999999
1  FN SometimesMiddileInitial. LN  05/31/1996  9999999999

[2 rows x 3 columns]

这篇关于将字符串解析为列python pandas/xa0而不是空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆