Python解析数据框元素 [英] Python parse dataframe element
问题描述
我有一个大熊猫数据框列(Data Type),我想分成三列
target_table_df = LoadS_A [ '属性名',
'数据类型',
'主键指示符']]
示例输入(target_table_df)
属性名称数据类型主键指示符
0 ACC_LIM DECIMAL(18, 4)False
1 ACC_NO NUMBER(11,0)False
2 ACC_OPEN_DT DATE False
3 ACCB DECIMAL(18,4)False
4 ACDB DECIMAL(18,4)False
5 AGRMNT_ID NUMBER(11,0)True
6 BRNCH_NUM NUMBER(11,0)False
7 CLRD_BAL DECIMAL(18,4)False
8 CR_INT_ACRD_GRSS DECIMAL(18,4)False
9 CR_INT_ACRD_NET DECIMAL(18,4)False
我的目标是:
- 重新分配'数据类型'到括号前面的文本
[..如果括号存在于数据类型中): / p>
- 创建新列'Precision'并分配给第一个逗号分隔
值 - 创建新列'Scale'并分配给第二个逗号分隔值
因此,预期的输出将成为:
数据类型精度量表
0十进制18 4
1数11 0
2日期
3十进制18 4
4十进制18 4
5数4 0
我曾经尝试过愤怒实现这一点,但我是新的数据帧....无法解决如果我要迭代所有行,或者是否有办法应用于数据框中的所有值?
任何帮助非常感谢
使用 target_table_df ['Data Type']。str.extract(pattern)
您需要将模式
分配为一个正则表达式,捕获您要查找的每个组件。
pattern = r'([^ \(] +)(\(([^,] *),(。*) \)$'
([^ \(] +)
表示抓住尽可能多的非开放括号,最多可以打开第一个圆括号。
\(( [^,] *,
表示在开一个括号后抓取第一组非逗号字符,并以逗号停止。
,(。*)\)
说要抓住逗号和圆括号之间的其余字符。
(\(([^,] *),(。*)\))?
解决方案
所有内容一起看起来像这样:
pattern = r'([^ \(] +)(\(([^,] *), (。*)\))?'
df = s.str.extract(pattern,expand = True).iloc [:, [0,2,3]]
#格式化以获得您想要的
df.columns = ['数据类型','精度','缩放'
df.index.name =无
打印df
我把一个 .iloc [:, [0,2,3]]
最后,因为我使用的模式在列 1
中抓取整个括号,我想跳过它。
数据类型精度量表
0十进制18 4
1数11 0
2日期NaN NaN
3十进制18 4
4十进制18 4
5数11 0
I have a pandas dataframe column (Data Type) which I want to split into three columns
target_table_df = LoadS_A [['Attribute Name',
'Data Type',
'Primary Key Indicator']]
Example input (target_table_df)
Attribute Name Data Type Primary Key Indicator
0 ACC_LIM DECIMAL(18,4) False
1 ACC_NO NUMBER(11,0) False
2 ACC_OPEN_DT DATE False
3 ACCB DECIMAL(18,4) False
4 ACDB DECIMAL(18,4) False
5 AGRMNT_ID NUMBER(11,0) True
6 BRNCH_NUM NUMBER(11,0) False
7 CLRD_BAL DECIMAL(18,4) False
8 CR_INT_ACRD_GRSS DECIMAL(18,4) False
9 CR_INT_ACRD_NET DECIMAL(18,4) False
I aim to:
- Reassign 'Data Type' to the text preceding the parenthesis
[..if parenthesis exists in 'Data Type']:
- Create new column 'Precision' and assign to first comma separated value
- Create new column 'Scale' and assign to second comma separated value
Intended output would therefore become:
Data Type Precision Scale
0 decimal 18 4
1 number 11 0
2 date
3 decimal 18 4
4 decimal 18 4
5 number 4 0
I have tried in anger to achieve this but i'm new to dataframes....can't work out if I am to iterate over all rows or if there is a way to apply to all values in the dataframe?
Any help much appreciated
Use target_table_df['Data Type'].str.extract(pattern)
You'll need to assign pattern
to be a regular expression that captures each of the components you're looking for.
pattern = r'([^\(]+)(\(([^,]*),(.*)\))?'
([^\(]+)
says grab as many non-open parenthesis characters you can up to the first open parenthesis.
\(([^,]*,
says to grab the first set of non-comma characters after an open parenthesis and stop at the comma.
,(.*)\)
says to grab the rest of the characters between the comma and the close parenthesis.
(\(([^,]*),(.*)\))?
says the whole parenthesis thing may not even happen, grab it if you can.
Solution
everything together looks like this:
pattern = r'([^\(]+)(\(([^,]*),(.*)\))?'
df = s.str.extract(pattern, expand=True).iloc[:, [0, 2, 3]]
# Formatting to get it how you wanted
df.columns = ['Data Type', 'Precision', 'Scale']
df.index.name = None
print df
I put a .iloc[:, [0, 2, 3]]
at the end because the pattern I used grabs the whole parenthesis in column 1
and I wanted to skip it. Leave it off and see.
Data Type Precision Scale
0 decimal 18 4
1 number 11 0
2 date NaN NaN
3 decimal 18 4
4 decimal 18 4
5 number 11 0
这篇关于Python解析数据框元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!