如何使用 SMOTE 将合成数据集保存在 CSV 文件中 [英] How to save synthetic dataset in CSV file using SMOTE
问题描述
I am using Credit card data for oversampling using SMOTE. I am using the code written in geeksforgeeks.org (Link)
After running the following code, it states something like that:
print("Before OverSampling, counts of label '1': {}".format(sum(y_train == 1)))
print("Before OverSampling, counts of label '0': {}
".format(sum(y_train == 0)))
# import SMOTE module from imblearn library
# pip install imblearn (if you don't have imblearn in your system)
from imblearn.over_sampling import SMOTE
sm = SMOTE(random_state = 2)
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel())
print('After OverSampling, the shape of train_X: {}'.format(X_train_res.shape))
print('After OverSampling, the shape of train_y: {}
'.format(y_train_res.shape))
print("After OverSampling, counts of label '1': {}".format(sum(y_train_res == 1)))
print("After OverSampling, counts of label '0': {}".format(sum(y_train_res == 0)))
Output:
Before OverSampling, counts of label '1': 345
Before OverSampling, counts of label '0': 199019
After OverSampling, the shape of train_X: (398038, 29)
After OverSampling, the shape of train_y: (398038,)
After OverSampling, counts of label '1': 199019
After OverSampling, counts of label '0': 199019
As I am totally new in this area. I cant understand how to show these data in CSV format. I will be very glad if anyone help me regarding this issue.
Or if there is any reference from where I can make synthetic data from a dataset using SMOTE and save the updated dataset in a CSV file, please mention it.
Something like following image:
Thanks in advance.
From what I can see from you code, your X_train_res
and others are Python Numpy arrays. You can do something like this:
import numpy as np
import pandas as pd
y_train_res = y_train_res.reshape(-1, 1) # reshaping y_train to (398038,1)
data_res = np.concatenate((X_train_res, y_train_res), axis = 1)
data.savetxt('sample_smote.csv', data_res, delimiter=",")
Cannot run and check it, but let me know if you face any issues.
Note: You will have to do something more to add column labels to it. Let me know once you are through this and need help for that.
这篇关于如何使用 SMOTE 将合成数据集保存在 CSV 文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!