如何在训练CNN期间删除重复项? [英] How to remove duplicate items during training CNN?

查看：49 发布时间：2021/4/22 20:08:15 python image-processing keras deep-learning conv-neural-network

本文介绍了如何在训练CNN期间删除重复项?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用CNN处理图像分类问题.我有一个包含重复图像的图像数据集.当我使用此数据训练CNN时，它已经过拟合.因此，我需要删除那些重复项.

I'm working on image classification problem by using CNN. I have an image data set which contains duplicated images. when I train the CNN with this data, it has over fitting. Therefore, I need to remove those duplicates.

推荐答案

对于算法而言，我们很难将其称为重复项.您的重复项可以是:

What we loosely refer to as duplicates can be difficult for algorithms to discern. Your duplicates can be either:

精确重复
几乎完全相同的副本.(图像的少量修改等)
感知重复(内容相同，但视图，相机等不同)

1号&2比较容易解决.No. 3是非常主观的，仍然是一个研究主题.我可以提供No1&的解决方案2.两种解决方案都使用出色的图像哈希哈希库: https://github.com/JohannesBuchner/imagehash

No1 & 2 are easier to solve. No 3. is very subjective and still a research topic. I can offer a solution for No1 & 2. Both solutions use the excellent image hash- hashing library: https://github.com/JohannesBuchner/imagehash

完全重复可以使用感知哈希方法找到确切的重复项.phash库在这方面非常擅长.我通常用它来清洁训练数据.用法(来自github站点)非常简单:

from PIL import Image
import imagehash

# image_fns : List of training image files
img_hashes = {}

for img_fn in sorted(image_fns):
    hash = imagehash.average_hash(Image.open(image_fn))
    if hash in img_hashes:
        print( '{} duplicate of {}'.format(image_fn, img_hashes[hash]) )
    else:
        img_hashes[hash] = image_fn

几乎完全重复在这种情况下，您将必须设置一个阈值并比较散列值与每个散列值之间的距离其他.对于图像内容，必须通过反复试验来完成.

from PIL import Image
import imagehash

# image_fns : List of training image files
img_hashes = {}
epsilon = 50

for img_fn1, img_fn2 in zip(image_fns, image_fns[::-1]):
    if image_fn1 == image_fn2:
        continue

    hash1 = imagehash.average_hash(Image.open(image_fn1))
    hash2 = imagehash.average_hash(Image.open(image_fn2))
    if hash1 - hash2 < epsilon:
        print( '{} is near duplicate of {}'.format(image_fn1, image_fn2) )

这篇关于如何在训练CNN期间删除重复项?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在训练CNN期间删除重复项? [英] How to remove duplicate items during training CNN?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在训练CNN期间删除重复项? [英] How to remove duplicate items during training CNN?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭