获取数据框中匹配和不匹配列数据的计数 [英] Get the count of matching and not matching columns data in a dataframe
问题描述
我有两个类似的数据框, 这是输入的csv数据.
I have two dataframes which are like, This is the input csv data .
Document_ID OFFSET PredictedFeature
0 0 2000
0 8 2000
0 16 2200
0 23 2200
0 30 2200
1 0 2100
1 5 2100
1 7 2100
现在我也有输出数据
Document_ID OFFSET PredictedFeature
0 0 2000
0 8 2100
0 16 2100
0 23 2100
0 30 2200
1 0 2000
1 5 2000
1 7 2100
现在,在这里我要尝试的是匹配结果(无论是否获得).
Now, Here what I am trying to do is that matching the results weather they are getting or not.
所以我做到了,
df1_inputPredictedFeature_column['new'] = df1_inputPredictedFeature_column['PredictedFeature'] == df1_predictedFeature_column['PredictedFeature']
这将添加一列,以告诉天气它是否与预测的功能列匹配.
This adds one columns which will tell weather it is matching the predictedfeature column or not.
现在我正在尝试的是,
总共有2个特征,其中输入csv的预测特征为2000.但是在输出csv中,它仅匹配第一位,而不是第二位.
there are a total of 2 features where 2000 is in the predictedfeature of input csv. But in the output csv only first place it is matching and not in the second place.
所以我正在尝试获取像这样的数据,
SO I am trying to get this data like ,
predictedFeatureClass inputCsvOccured outputcsvmatched
2000 2 1
2200 3 1
那么,我将如何获得这些数据?任何帮助都会很棒.
SO, How will I get this data ? any help will be great.
推荐答案
您可以使用groupby进行操作
you can do it using groupby like below
df1_inputPredictedFeature_column = pd.DataFrame([['0', '0', '2000'], ['0', '8', '2000'], ['0', '16', '2200'], ['0', '23', '2200'], ['0', '30', '2200'], ['1', '0', '2100'], ['1', '5', '2100'], ['1', '7', '2100']], columns=('Document_ID', 'OFFSET', 'PredictedFeature'))
df1_predictedFeature_column = pd.DataFrame([['0', '0', '2000'], ['0', '8', '2100'], ['0', '16', '2100'], ['0', '23', '2100'], ['0', '30', '2200'], ['1', '0', '2000'], ['1', '5', '2000'], ['1', '7', '2100']], columns=('Document_ID', 'OFFSET', 'PredictedFeature'))
df1_inputPredictedFeature_column['new'] = (df1_inputPredictedFeature_column['PredictedFeature'] == df1_predictedFeature_column['PredictedFeature']).astype(np.int)
result = df1_inputPredictedFeature_column.groupby("PredictedFeature").agg({"PredictedFeature":"count", "new":np.sum})
result.columns = ["inputCsvOccured", "outputcsvmatched"]
result.index.name = "predictedFeatureClass"
result.reset_index(inplace=True)
print(result)
结果
predictedFeatureClass inputCsvOccured outputcsvmatched
0 2000 2 1
1 2100 3 1
2 2200 3 1
这篇关于获取数据框中匹配和不匹配列数据的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!