获取数据框中匹配和不匹配列数据的计数 [英] Get the count of matching and not matching columns data in a dataframe

查看:92
本文介绍了获取数据框中匹配和不匹配列数据的计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个类似的数据框, 这是输入的csv数据.

I have two dataframes which are like, This is the input csv data .

Document_ID OFFSET  PredictedFeature
    0         0            2000
    0         8            2000
    0         16           2200
    0         23           2200
    0         30           2200
    1          0            2100
    1          5            2100
    1          7            2100

现在我也有输出数据

 Document_ID    OFFSET   PredictedFeature
        0         0            2000
        0         8            2100
        0         16           2100
        0         23           2100
        0         30           2200
        1          0           2000
        1          5           2000
        1          7           2100

现在,在这里我要尝试的是匹配结果(无论是否获得).

Now, Here what I am trying to do is that matching the results weather they are getting or not.

所以我做到了,

df1_inputPredictedFeature_column['new'] = df1_inputPredictedFeature_column['PredictedFeature'] == df1_predictedFeature_column['PredictedFeature']

这将添加一列,以告诉天气它是否与预测的功能列匹配.

This adds one columns which will tell weather it is matching the predictedfeature column or not.

现在我正在尝试的是,

总共有2个特征,其中输入csv的预测特征为2000.但是在输出csv中,它仅匹配第一位,而不是第二位.

there are a total of 2 features where 2000 is in the predictedfeature of input csv. But in the output csv only first place it is matching and not in the second place.

所以我正在尝试获取像这样的数据,

SO I am trying to get this data like ,

predictedFeatureClass  inputCsvOccured   outputcsvmatched  

 2000                        2                1

2200                         3                 1

那么,我将如何获得这些数据?任何帮助都会很棒.

SO, How will I get this data ? any help will be great.

推荐答案

您可以使用groupby进行操作

you can do it using groupby like below

df1_inputPredictedFeature_column = pd.DataFrame([['0', '0', '2000'], ['0', '8', '2000'], ['0', '16', '2200'], ['0', '23', '2200'], ['0', '30', '2200'], ['1', '0', '2100'], ['1', '5', '2100'], ['1', '7', '2100']], columns=('Document_ID', 'OFFSET', 'PredictedFeature'))
df1_predictedFeature_column = pd.DataFrame([['0', '0', '2000'], ['0', '8', '2100'], ['0', '16', '2100'], ['0', '23', '2100'], ['0', '30', '2200'], ['1', '0', '2000'], ['1', '5', '2000'], ['1', '7', '2100']], columns=('Document_ID', 'OFFSET', 'PredictedFeature'))

df1_inputPredictedFeature_column['new'] = (df1_inputPredictedFeature_column['PredictedFeature'] == df1_predictedFeature_column['PredictedFeature']).astype(np.int)

result = df1_inputPredictedFeature_column.groupby("PredictedFeature").agg({"PredictedFeature":"count", "new":np.sum})

result.columns = ["inputCsvOccured", "outputcsvmatched"]
result.index.name = "predictedFeatureClass"

result.reset_index(inplace=True)
print(result)

结果

predictedFeatureClass  inputCsvOccured  outputcsvmatched
0                  2000                2                 1
1                  2100                3                 1
2                  2200                3                 1

这篇关于获取数据框中匹配和不匹配列数据的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆