Python解析数据框元素 [英] Python parse dataframe element

查看：119 发布时间：2017/3/26 4:08:31 python regex parsing pandas dataframe

本文介绍了Python解析数据框元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个大熊猫数据框列（Data Type），我想分成三列

  target_table_df = LoadS_A [ '属性名'，
'数据类型'，
'主键指示符']]

示例输入（target_table_df）

 属性名称数据类型主键指示符
 0 ACC_LIM DECIMAL（18， 4）False 
 1 ACC_NO NUMBER（11,0）False 
 2 ACC_OPEN_DT DATE False 
 3 ACCB DECIMAL（18,4）False 
 4 ACDB DECIMAL（18,4）False 
 5 AGRMNT_ID NUMBER（11,0）True 
 6 BRNCH_NUM NUMBER（11,0）False 
 7 CLRD_BAL DECIMAL（18,4）False 
 8 CR_INT_ACRD_GRSS DECIMAL（18,4）False 
 9 CR_INT_ACRD_NET DECIMAL（18,4）False

我的目标是：

重新分配'数据类型'到括号前面的文本

[..如果括号存在于数据类型中）： / p>

创建新列'Precision'并分配给第一个逗号分隔
值

创建新列'Scale'并分配给第二个逗号分隔值

因此，预期的输出将成为：

 数据类型精度量表
 0十进制18 4 
 1数11 0 
 2日期
 3十进制18 4 
 4十进制18 4 
 5数4 0

我曾经尝试过愤怒实现这一点，但我是新的数据帧....无法解决如果我要迭代所有行，或者是否有办法应用于数据框中的所有值？

任何帮助非常感谢

解决方案

使用 target_table_df ['Data Type']。str.extract（pattern）

您需要将模式分配为一个正则表达式，捕获您要查找的每个组件。

  pattern = r'（[^ \（] +）（\（（[^，] *），（。*） \）$'

（[^ \（] +） 表示抓住尽可能多的非开放括号，最多可以打开第一个圆括号。

\（（ [^，] *，表示在开一个括号后抓取第一组非逗号字符，并以逗号停止。

，（。*）\）说要抓住逗号和圆括号之间的其余字符。

（\（（[^，] *），（。*）\））？

解决方案

所有内容一起看起来像这样：

  pattern = r'（[^ \（] +）（\（（[^，] *）， （。*）\））？'
 df = s.str.extract（pattern，expand = True）.iloc [:, [0，2，3]] 
 
＃格式化以获得您想要的
 df.columns = ['数据类型'，'精度'，'缩放'
 df.index.name =无
打印df

我把一个 .iloc [:, [0，2，3]] 最后，因为我使用的模式在列 1 中抓取整个括号，我想跳过它。

 数据类型精度量表
 0十进制18 4 
 1数11 0 
 2日期NaN NaN 
 3十进制18 4 
 4十进制18 4 
 5数11 0

I have a pandas dataframe column (Data Type) which I want to split into three columns

target_table_df = LoadS_A [['Attribute Name',
                              'Data Type',
                              'Primary Key Indicator']]

Example input (target_table_df)

                 Attribute Name      Data Type Primary Key Indicator
0                       ACC_LIM  DECIMAL(18,4)                 False
1                        ACC_NO   NUMBER(11,0)                 False
2                   ACC_OPEN_DT           DATE                 False
3                          ACCB  DECIMAL(18,4)                 False
4                          ACDB  DECIMAL(18,4)                 False
5                     AGRMNT_ID   NUMBER(11,0)                  True
6                     BRNCH_NUM   NUMBER(11,0)                 False
7                      CLRD_BAL  DECIMAL(18,4)                 False
8              CR_INT_ACRD_GRSS  DECIMAL(18,4)                 False
9               CR_INT_ACRD_NET  DECIMAL(18,4)                 False

I aim to:

Reassign 'Data Type' to the text preceding the parenthesis

[..if parenthesis exists in 'Data Type']:

Create new column 'Precision' and assign to first comma separated value
Create new column 'Scale' and assign to second comma separated value

Intended output would therefore become:

    Data Type   Precision   Scale
0   decimal 18  4
1   number  11  0
2   date        
3   decimal 18  4
4   decimal 18  4
5   number  4   0

I have tried in anger to achieve this but i'm new to dataframes....can't work out if I am to iterate over all rows or if there is a way to apply to all values in the dataframe?

Any help much appreciated

解决方案

Use target_table_df['Data Type'].str.extract(pattern)

You'll need to assign pattern to be a regular expression that captures each of the components you're looking for.

pattern = r'([^\(]+)(\(([^,]*),(.*)\))?'

([^\(]+) says grab as many non-open parenthesis characters you can up to the first open parenthesis.

\(([^,]*, says to grab the first set of non-comma characters after an open parenthesis and stop at the comma.

,(.*)\) says to grab the rest of the characters between the comma and the close parenthesis.

(\(([^,]*),(.*)\))? says the whole parenthesis thing may not even happen, grab it if you can.

Solution

everything together looks like this:

pattern = r'([^\(]+)(\(([^,]*),(.*)\))?'
df = s.str.extract(pattern, expand=True).iloc[:, [0, 2, 3]]

# Formatting to get it how you wanted
df.columns = ['Data Type', 'Precision', 'Scale']
df.index.name = None
print df

I put a .iloc[:, [0, 2, 3]] at the end because the pattern I used grabs the whole parenthesis in column 1 and I wanted to skip it. Leave it off and see.

  Data Type Precision Scale
0   decimal        18     4
1    number        11     0
2      date       NaN   NaN
3   decimal        18     4
4   decimal        18     4
5    number        11     0

这篇关于Python解析数据框元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python解析数据框元素 [英] Python parse dataframe element

问题描述

解决方案

Solution

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python解析数据框元素 [英] Python parse dataframe element

问题描述

解决方案

Solution

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭