如何使用 SMOTE 将合成数据集保存在 CSV 文件中 [英] How to save synthetic dataset in CSV file using SMOTE

查看:29
本文介绍了如何使用 SMOTE 将合成数据集保存在 CSV 文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

I am using Credit card data for oversampling using SMOTE. I am using the code written in geeksforgeeks.org (Link)

After running the following code, it states something like that:

print("Before OverSampling, counts of label '1': {}".format(sum(y_train == 1))) 
print("Before OverSampling, counts of label '0': {} 
".format(sum(y_train == 0))) 

# import SMOTE module from imblearn library 
# pip install imblearn (if you don't have imblearn in your system) 
from imblearn.over_sampling import SMOTE 
sm = SMOTE(random_state = 2) 
X_train_res, y_train_res = sm.fit_sample(X_train, y_train.ravel()) 

print('After OverSampling, the shape of train_X: {}'.format(X_train_res.shape)) 
print('After OverSampling, the shape of train_y: {} 
'.format(y_train_res.shape)) 

print("After OverSampling, counts of label '1': {}".format(sum(y_train_res == 1))) 
print("After OverSampling, counts of label '0': {}".format(sum(y_train_res == 0))) 

Output:

Before OverSampling, counts of label '1': 345
Before OverSampling, counts of label '0': 199019 

After OverSampling, the shape of train_X: (398038, 29)
After OverSampling, the shape of train_y: (398038,) 

After OverSampling, counts of label '1': 199019
After OverSampling, counts of label '0': 199019

As I am totally new in this area. I cant understand how to show these data in CSV format. I will be very glad if anyone help me regarding this issue.

Or if there is any reference from where I can make synthetic data from a dataset using SMOTE and save the updated dataset in a CSV file, please mention it.

Something like following image:

Thanks in advance.

解决方案

From what I can see from you code, your X_train_res and others are Python Numpy arrays. You can do something like this:

import numpy as np
import pandas as pd

y_train_res = y_train_res.reshape(-1, 1) # reshaping y_train to (398038,1)
data_res = np.concatenate((X_train_res, y_train_res), axis = 1)
data.savetxt('sample_smote.csv', data_res, delimiter=",")

Cannot run and check it, but let me know if you face any issues.

Note: You will have to do something more to add column labels to it. Let me know once you are through this and need help for that.

这篇关于如何使用 SMOTE 将合成数据集保存在 CSV 文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆