当连续两行不重复使用NaN时，用单词替换NaN值 [英] Replacing NaN value with a word when NaN is not repeated in two consecutive rows

查看：83 发布时间：2020/5/24 4:06:43 python pandas

本文介绍了当连续两行不重复使用NaN时，用单词替换NaN值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

对于以下数据框:

index   Sent    col_1   col_2   col_3
    1   AB       NaN      DD     CC
    1             0       1       0
    2   SA        FA      FB      NaN
    2             1       1       NaN
    3   FF       Sha      NaN     PA
    3             1        0       1

当在两个连续行中不重复NAN时，我需要用"F"替换col_1，col_2，col_3中的NAN值.输出是这样的:

I need to replace NAN value in col_1, col_2, col_3 with "F" when NAN is not repeated in two Consecutive rows. The output is like this:

     index   Sent   col_1   col_2   col_3
        1   AB        F       DD     CC
        1             0       1       0
        2   SA        FA      FB      NaN
        2             1       1       NaN
        3   FF       Sha      F       PA
        3             1       0       1

This is my code:
for col in ['col_1', 'col_2', 'col_3']:
    data = np.reshape(df[col].values, (-1, 2))
    need_fill = np.logical_and(data[:, 0] == '', data[:, 1] != '')
    data[np.where(need_fill),1] = 'F'

但是它将NAN值下的0替换为F.如何修复将NAN替换为F的代码.

But it replace the 0 under NAN value to F. How I can fix the code to replace NAN to F.

推荐答案

也许有更好的方法，但是一种方法是尝试使用shift在上面看到row在下面看到row.但是，对于第一行和最后一行，都会产生问题.因此，如果添加多余的行并在以后删除它不是问题，则可以尝试以下操作:

May be there is something better, but one way would be to try using shift to see a row above and a row below. However, for first and last row, it would create issue. So, if it is not a problem to add extra rows and remove it later, you can try following:

# Appending row to the top: https://stackoverflow.com/a/24284680/5916727
df.loc[-1] = [0 for n in range(len(df.columns))]
df.index = df.index + 1  # shifting index
df = df.sort_index()  # sorting by index

# Append row to below it
df.loc[df.shape[0]] = [0 for n in range(len(df.columns))]
print(df)

   index Sent col_1 col_2 col_3
0      0    0     0     0     0
1      1   AB   NaN    DD    CC
2      1          0     1     0
3      2   SA    FA    FB   NaN
4      2          1     1   NaN
5      3   FF   Sha   NaN    PA
6      3          1     0     1
7      0    0     0     0     0

现在，使用shift(-1)和shift(1)的masking和shift检查连续的行:

Now, check for consecutive rows using shift with masking by shift(-1) and shift(1):

columns = ["col_1", "col_2","col_3"]
for column in columns:
    df.loc[df[column].isnull() & df[column].shift(-1).notnull() &  df[column].shift(1).notnull(), column] = "F"
df = df [1:-1] # remove extra rows
print(df)

输出:

   index Sent col_1 col_2 col_3
1      1   AB     F    DD    CC
2      1          0     1     0
3      2   SA    FA    FB   NaN
4      2          1     1   NaN
5      3   FF   Sha     F    PA
6      3          1     0     1

如果需要，您也可以删除似乎重复的index列.

If you want you can remove extra index column as well which seems to have duplicates.

我在测试csv文件中关注过.

I had following in the test csv file.

index,Sent,col_1,col_2,col_3
1,AB,,DD,CC
1, ,0,1,0
2,SA,FA,FB,NA
2, ,1,1,NA
3,FF,Sha,,PA
3, ,1,0,1

然后，您可以使用以下命令创建输入dataframe:

Then, you can use following to create input dataframe:

import pandas as pd
df = pd.read_csv("FILENAME.csv")

这篇关于当连续两行不重复使用NaN时，用单词替换NaN值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

当连续两行不重复使用NaN时，用单词替换NaN值 [英] Replacing NaN value with a word when NaN is not repeated in two consecutive rows

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

当连续两行不重复使用NaN时，用单词替换NaN值 [英] Replacing NaN value with a word when NaN is not repeated in two consecutive rows

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭