如何在Keras中使用fit_generator（）平衡数据集？ [英] How to balance dataset using fit_generator() in Keras?

查看：238 发布时间：2020/10/19 21:41:08 python machine-learning keras deep-learning generator

本文介绍了如何在Keras中使用fit_generator（）平衡数据集？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用keras拟合CNN模型以对2类数据进行分类。我的数据集不平衡，我想平衡数据。我不知道可以在 model.fit_generator 中使用class_weight。我想知道是否在 model.fit_generator

<中使用了

 class_weight = balanced  p> 主要代码：
  def generate_arrays_for_training（indexPat，path，start = 0，end = 100）：
而True：
 from_ = int（len（路径）/ 100 *开始）
 to_ = int（len（路径）/ 100 *结束）
 for i in range（from_ ，int（to_））：
f = paths [i] 
x = np.load（PathSpectogramFolder + f）
x = np.expand_dims（x，axis = 0）
 
 if（'f'中的'P'）：
y = np.repeat（[[[0,1]]，x.shape [0]，axis = 0）
 else：
y = np.repeat（[[1,0]]，x.shape [0]，轴= 0）
 yield（x，y）
 history = model.fit_generator（generate_arrays_for_training（indexPat，filesPath，end = 75），
validation_data = generate_arrays_for_training（indexPat，filesPath，start = 75），
s teps_per_epoch = int（（len（filesPath）-int（len（filesPath）/ 100 * 25））），
validation_steps = int（（len（filesPath）-int（len（filesPath）/ 100 * 75）） ），
 verbose = 2，
 epochs = 15，max_queue_size = 2，shuffle = True，callbacks = [callback]）
 
  
 
 
解决方案
如果不想更改数据创建过程，可以使用 class_weight 在您的健康生成器中。您可以使用字典来设置您的class_weight并进行微调。例如，当不使用class_weight时，class0有50个示例，class1有100个示例。然后，损失函数统一计算损失。这意味着class1将是一个问题。但是，当您设置以下内容时：
  class_weight = {0：2，1：1} 
  
这意味着损失函数现在将给您的班级0 2倍的权重。因此，对代表性不足的数据进行错误分类将比以前多付2倍的惩罚。因此，模型可以处理不平衡的数据。
 
 如果您使用 class_weight ='balanced'模型，则可以自动进行该设置。但是我的建议是，创建一个像 class_weight = {0：a1，1：a2} 的字典，并为a1和a2尝试不同的值，这样您就可以理解差异。 / p> 
 
此外，您可以对数据不平衡使用欠采样方法，而不是使用class_weight。为此，请检查  Bootstrapping  方法。
 
I am trying to use keras to fit a CNN model to classify 2 classes of data . I have imbalanced dataset I want to balance the data. I don't know can I use class_weight in model.fit_generator . I wonder if I used class_weight="balanced"  in  model.fit_generator
The main code:
def generate_arrays_for_training(indexPat, paths, start=0, end=100):      
    while True:
        from_=int(len(paths)/100*start)
        to_=int(len(paths)/100*end)
        for i in range(from_, int(to_)):
            f=paths[i]
            x = np.load(PathSpectogramFolder+f) 
            x = np.expand_dims(x, axis=0) 
            
            if('P' in f):
                y = np.repeat([[0,1]],x.shape[0], axis=0)
            else:
                y =np.repeat([[1,0]],x.shape[0], axis=0)
            yield(x,y)   
history=model.fit_generator(generate_arrays_for_training(indexPat, filesPath, end=75), 
                                validation_data=generate_arrays_for_training(indexPat, filesPath, start=75),
                                steps_per_epoch=int((len(filesPath)-int(len(filesPath)/100*25))), 
                                validation_steps=int((len(filesPath)-int(len(filesPath)/100*75))),
                                verbose=2,
                                epochs=15, max_queue_size=2, shuffle=True, callbacks=[callback])


 解决方案 
If you don't want to change your data creation process, you can use class_weight in your fit generator. You can use dictionary to set your class_weight and observe with fine tuning. For instance when class_weight is not used, and you have 50 examples for class0 and 100 examples for class1. Then, loss function calculate loss uniformly. It means that class1 will be a problem. But, when you set:
class_weight = {0:2 , 1:1}
It means that loss function will give 2 times weight to your class 0 now. Therefore, misclassification of underrepresented data will take 2 times more punishment than before. Thus, model can handle imbalanced data.
If you use class_weight='balanced' model can make that setting automatically. But my suggestion is that, create a dictionary like class_weight = {0:a1 , 1:a2} and try different values for a1 and a2, so you can understand difference.
Also, you can use undersampling methods for imbalanced data instead of using class_weight. Check Bootstrapping methods for that purpose.

                        这篇关于如何在Keras中使用fit_generator（）平衡数据集？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何在Keras中使用fit_generator（）平衡数据集？ [英] How to balance dataset using fit_generator() in Keras?

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何在Keras中使用fit_generator（）平衡数据集？ [英] How to balance dataset using fit_generator() in Keras?

问题描述

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭