如何在pandas DataFrame中的行之间标准化字符串？ [英] How to standardize strings between rows in a pandas DataFrame?

查看：40 发布时间：2020/10/17 1:11:31 python python-3.x pandas dataframe

本文介绍了如何在pandas DataFrame中的行之间标准化字符串？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Python3.x中具有以下熊猫DataFrame：

I have the following pandas DataFrame in Python3.x:

import pandas as pd

dict1 = {
    'ID':['first', 'second', 'third', 'fourth', 'fifth'], 
    'pattern':['AAABCDEE', 'ABBBBD', 'CCCDE', 'AA', 'ABCDE']
}

df = pd.DataFrame(dict1)

>>> df
       ID   pattern
0   first  AAABCDEE
1  second    ABBBBD
2   third     CCCDE
3  fourth        AA
4   fifth     ABCDE

有两列， ID 和模式。 pattern 中最长的字符串位于第一行 len（'AAABCDEE'），即长度8.

There are two columns, ID and pattern. The string in pattern with the longest length is in the first row, len('AAABCDEE'), which is length 8.

我的目标是标准化字符串，使它们具有相同的长度，且后跟空格为？。

My goal is to standardize the strings such that these are the same length, with the trailing spaces as ?.

输出如下所示：

>>> df
       ID   pattern
0   first  AAABCDEE
1  second  ABBBBD?? 
2   third  CCCDE???
3  fourth  AA??????
4   fifth  ABCDE???

如果我能够将尾随空格设为 NaN ，那么我可以尝试以下操作：

If I was able to make the trailing spaces NaN, then I could try something like:

df = df.applymap(lambda x: int(x) if pd.notnull(x) else str("?"))

但我不确定如何高效（1）在模式中找到最长的字符串，然后（2）然后添加 NaN 将字符串的末尾加起来到这个长度？这可能是一种复杂的方法。

But I'm not sure how one would efficiently (1) find the longest string in pattern and (2) then add NaN add the end of the strings up to this length? This may be a convoluted approach...

推荐答案

您可以使用 Series.str.ljust 中获取列中的最大字符串长度。


You can use Series.str.ljust for this, after acquiring the max string length in the column. 
df.pattern.str.ljust(df.pattern.str.len().max(), '?')

# 0    AAABCDEE
# 1    ABBBBD??
# 2    CCCDE???
# 3    AA??????
# 4    ABCDE???
# Name: pattern, dtype: object

在熊猫资源中 0.22.0  在此处可以看到，恰好完全等同于 pad 和 side ='right'，因此请选择您认为更清晰的那个。 
In the source for Pandas 0.22.0 here it can be seen that ljust is entirely equivalent to pad with side='right', so pick whichever you find more clear. 

                        这篇关于如何在pandas DataFrame中的行之间标准化字符串？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何在pandas DataFrame中的行之间标准化字符串？ [英] How to standardize strings between rows in a pandas DataFrame?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在pandas DataFrame中的行之间标准化字符串？ [英] How to standardize strings between rows in a pandas DataFrame?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭