pandas wide_to_long 后缀参数 [英] pandas wide_to_long suffix parameter
问题描述
我对在 pandas 中使用 wide_to_long 时的参数有疑问.有一个参数叫做suffix
,我不明白.
I have a question about a parameter when using wide_to_long in pandas.
There is parameter called suffix
that I do not understand.
在文档中它说:
后缀:str,默认‘\d+’
suffix : str, default ‘\d+’
捕获所需后缀的正则表达式.‘\d+’ 捕获数字后缀.可以使用否定字符类\D+"指定没有数字的后缀.您还可以进一步消除后缀的歧义,例如,如果您的宽变量的形式为 Aone, Btwo,..,并且您有一个不相关的列 Arating,您可以通过指定 suffix='(!?one|two)'
A regular expression capturing the wanted suffixes. ‘\d+’ captures numeric suffixes. Suffixes with no numbers could be specified with the negated character class ‘\D+’. You can also further disambiguate suffixes, for example, if your wide variables are of the form Aone, Btwo,.., and you have an unrelated column Arating, you can ignore the last one by specifying suffix=’(!?one|two)’
0.20.0 版的新功能.
New in version 0.20.0.
问题:什么可以用作后缀?
Question: What can be used for suffix?
我发现有人在使用wide_to_long时使用了这样的后缀:suffix='.'
它有什么作用?
And I found someone uses suffix like this when using wide_to_long: suffix='.'
What does it do?
推荐答案
TLDR:正则表达式捕获组可用于后缀参数.
TLDR: Regex capturing groups can be used for the suffix parameter.
suffix
参数告诉 pandas.wide_to_long
它应该根据存根后的后缀将哪些列包含在转换中.
The suffix
parameter tells pandas.wide_to_long
which columns it should include in the transformation based on the suffix after the stub.
宽到长的默认行为假设您的列标有数字,因此例如列A1、A2、A3、A4
无需指定后缀即可正常工作参数,而 Aone、Atwo、Athree、Afour
将失败.
The default behavior of wide to long assumes that your columns are labeled with numbers so for instance columns A1, A2, A3, A4
will work fine without specifying the suffix parameter, while Aone, Atwo, Athree, Afour
will fail.
正如所解释的,在极少数情况下,它还有各种其他用途,例如您的列可能是 A1、A2、A3、A4、A100
,而您不想实际包含 A100
因为它实际上与其他 A#
列无关.
As explained, it also has various other uses in the rare cases that your columns may be A1, A2, A3, A4, A100
, and you don't want to actually include A100
because it isn't actually related to the other A#
columns.
以下是一些说明性示例.
Here are some illustrative examples.
import pandas as pd
df = pd.DataFrame({'id': [1,2], 'A_1': ['a', 'b'],
'A_2': ['aa', 'bb'], 'A_3': ['aaa', 'bbb'],
'A_person': ['Mike', 'Amy']})
pd.wide_to_long(df, stubnames='A_', i='id', j='num')
# A_person A_
#id num
#1 1 Mike a
#2 1 Amy b
#1 2 Mike aa
#2 2 Amy bb
#1 3 Mike aaa
#2 3 Amy bbb
因为默认行为是只考虑数字,'A_person'
被忽略.如果您想将其添加到转换中,则可以使用 suffix
参数.让我们告诉它我们想要数字或单词.
Because the default behavior is to only consider numbers, 'A_person'
was ignored. If you wanted to add that to the conversion, then you would use the suffix
parameter. Let's tell it we want either numbers or words.
pd.wide_to_long(df, stubnames='A_', i='id', j='suffix', suffix='(\d+|\w+)')
# A_
#id suffix
#1 1 a
#2 1 b
#1 2 aa
#2 2 bb
#1 3 aaa
#2 3 bbb
#1 person Mike
#2 person Amy
现在,如果您的 df
开始时没有数字后缀,您也可以使用 suffix 参数来处理.默认调用将失败,因为它需要数字,但告诉它查找单词可以满足您的需求.
Now if your df
starts without numeric suffixes, you can take care of that with the suffix parameter too. The default call will fail because it expects numbers, but telling it to look for words gives you what you want.
df = pd.DataFrame({'id': [1,2], 'A_one': ['a', 'b'],
'A_two': ['aa', 'bb'], 'A_three': ['aaa', 'bbb'],
'A_person': ['Mike', 'Amy']})
pd.wide_to_long(df, stubnames='A_', i='id', j='num')
#Empty DataFrame
#Columns: [A_three, A_person, A_one, A_two, A_]
#Index: []
pd.wide_to_long(df, stubnames='A_', i='id', j='suffix', suffix='\w+')
# A_
#id suffix
#1 one a
#2 one b
#1 person Mike
#2 person Amy
#1 three aaa
#2 three bbb
#1 two aa
#2 two bb
如果您不想包含 A_person
,您可以告诉后缀参数只包含某些存根.
And if you don't want to include A_person
you can tell the suffix parameter to only include certain stubs.
pd.wide_to_long(df, stubnames='A_', i='id', j='num', suffix='(one|two|three)')
# A_person A_
#id num
#1 one Mike a
#2 one Amy b
#1 three Mike aaa
#2 three Amy bbb
#1 two Mike aa
#2 two Amy bb
基本上,如果您可以使用正则表达式捕获它,则可以将其传递给后缀以仅使用您想要的列.
Basically, if you can capture it with regex, you can pass it to suffix to use only the columns you want.
这篇关于pandas wide_to_long 后缀参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!