pandas wide_to_long 后缀参数 [英] pandas wide_to_long suffix parameter

查看:31
本文介绍了pandas wide_to_long 后缀参数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对在 pandas 中使用 wide_to_long 时的参数有疑问.有一个参数叫做suffix,我不明白.

I have a question about a parameter when using wide_to_long in pandas. There is parameter called suffix that I do not understand.

在文档中它说:

后缀:str,默认‘\d+’

suffix : str, default ‘\d+’

捕获所需后缀的正则表达式.‘\d+’ 捕获数字后缀.可以使用否定字符类\D+"指定没有数字的后缀.您还可以进一步消除后缀的歧义,例如,如果您的宽变量的形式为 Aone, Btwo,..,并且您有一个不相关的列 Arating,您可以通过指定 suffix='(!?one|two)'

A regular expression capturing the wanted suffixes. ‘\d+’ captures numeric suffixes. Suffixes with no numbers could be specified with the negated character class ‘\D+’. You can also further disambiguate suffixes, for example, if your wide variables are of the form Aone, Btwo,.., and you have an unrelated column Arating, you can ignore the last one by specifying suffix=’(!?one|two)’

0.20.0 版的新功能.

New in version 0.20.0.

问题:什么可以用作后缀?

Question: What can be used for suffix?

我发现有人在使用wide_to_long时使用了这样的后缀:suffix='.'它有什么作用?

And I found someone uses suffix like this when using wide_to_long: suffix='.' What does it do?

推荐答案

TLDR:正则表达式捕获组可用于后缀参数.

TLDR: Regex capturing groups can be used for the suffix parameter.

suffix 参数告诉 pandas.wide_to_long 它应该根据存根后的后缀将哪些列包含在转换中.

The suffix parameter tells pandas.wide_to_long which columns it should include in the transformation based on the suffix after the stub.

宽到长的默认行为假设您的列标有数字,因此例如列A1、A2、A3、A4无需指定后缀即可正常工作参数,而 Aone、Atwo、Athree、Afour 将失败.

The default behavior of wide to long assumes that your columns are labeled with numbers so for instance columns A1, A2, A3, A4 will work fine without specifying the suffix parameter, while Aone, Atwo, Athree, Afour will fail.

正如所解释的,在极少数情况下,它还有各种其他用途,例如您的列可能是 A1、A2、A3、A4、A100,而您不想实际包含 A100 因为它实际上与其他 A# 列无关.

As explained, it also has various other uses in the rare cases that your columns may be A1, A2, A3, A4, A100, and you don't want to actually include A100 because it isn't actually related to the other A# columns.

以下是一些说明性示例.

Here are some illustrative examples.

import pandas as pd
df = pd.DataFrame({'id': [1,2], 'A_1': ['a', 'b'],
                  'A_2': ['aa', 'bb'], 'A_3': ['aaa', 'bbb'],
                  'A_person': ['Mike', 'Amy']})

pd.wide_to_long(df, stubnames='A_', i='id', j='num')
#       A_person   A_
#id num              
#1  1       Mike    a
#2  1        Amy    b
#1  2       Mike   aa
#2  2        Amy   bb
#1  3       Mike  aaa
#2  3        Amy  bbb

因为默认行为是只考虑数字,'A_person' 被忽略.如果您想将其添加到转换中,则可以使用 suffix 参数.让我们告诉它我们想要数字或单词.

Because the default behavior is to only consider numbers, 'A_person' was ignored. If you wanted to add that to the conversion, then you would use the suffix parameter. Let's tell it we want either numbers or words.

pd.wide_to_long(df, stubnames='A_', i='id', j='suffix', suffix='(\d+|\w+)')
#             A_
#id suffix         
#1  1          a
#2  1          b
#1  2         aa
#2  2         bb
#1  3        aaa
#2  3        bbb
#1  person  Mike
#2  person   Amy

现在,如果您的 df 开始时没有数字后缀,您也可以使用 suffix 参数来处理.默认调用将失败,因为它需要数字,但告诉它查找单词可以满足您的需求.

Now if your df starts without numeric suffixes, you can take care of that with the suffix parameter too. The default call will fail because it expects numbers, but telling it to look for words gives you what you want.

df = pd.DataFrame({'id': [1,2], 'A_one': ['a', 'b'],
                  'A_two': ['aa', 'bb'], 'A_three': ['aaa', 'bbb'],
                  'A_person': ['Mike', 'Amy']})

pd.wide_to_long(df, stubnames='A_', i='id', j='num')
#Empty DataFrame
#Columns: [A_three, A_person, A_one, A_two, A_]
#Index: []

pd.wide_to_long(df, stubnames='A_', i='id', j='suffix', suffix='\w+')
#             A_
#id suffix         
#1  one        a
#2  one        b
#1  person  Mike
#2  person   Amy
#1  three    aaa
#2  three    bbb
#1  two       aa
#2  two       bb

如果您不想包含 A_person,您可以告诉后缀参数只包含某些存根.

And if you don't want to include A_person you can tell the suffix parameter to only include certain stubs.

pd.wide_to_long(df, stubnames='A_', i='id', j='num', suffix='(one|two|three)')
#         A_person   A_
#id num                
#1  one       Mike    a
#2  one        Amy    b
#1  three     Mike  aaa
#2  three      Amy  bbb
#1  two       Mike   aa
#2  two        Amy   bb

基本上,如果您可以使用正则表达式捕获它,则可以将其传递给后缀以仅使用您想要的列.

Basically, if you can capture it with regex, you can pass it to suffix to use only the columns you want.

这篇关于pandas wide_to_long 后缀参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆