如何消除可疑条形码(如123456)数据 [英] How to eliminate suspicious barcode (like 123456) data

查看:43
本文介绍了如何消除可疑条形码(如123456)数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是pandas数据库中的一些条形码数据

Here's some bar code data from a pandas database

737318  Sikat Botol Pigeon          4902508045506   75170
737379  Natur Manual Breast Pump    8850851860016   75170
738753  Sunlight                    1232131321313   75261
739287  Bodymist bodyshop           1122334455667   75296
739677  Bodymist ale                1234567890123   75367

我想删除诸如123213132131311223344556671234567890123等可疑数据(即重复或连续数字过多).我非常容忍误报,但想避免误报(错误的条形码).

I want to remove data that is suspicious (i.e. has too many repeated or successive digits) like 1232131321313 , 1122334455667, 1234567890123, etc. I am very tolerant of false negatives, but want to avoid false positives (bad bar codes) as much as possible.

推荐答案

第一步,我将使用内置在验证机制中的条形码,即校验和.由于您的条形码似乎是GTIN条形码(特别是GTIN-13),因此您可以使用这种方法:

As a first step I would use the barcodes built in validation mechanism, the checksum. As your barcodes appear to be GTIN barcodes (specifically GTIN-13), you can use this method:

>>> import math
>>> def CheckBarcode(s):
        sum = 0
        for i in range(len(s[:-1])):
            sum += int(s[i]) * ((i%2)*2+1)
        return math.ceil(sum/10)*10-sum == int(s[-1])

>>> CheckBarcode("4902508045506")
True
>>> CheckBarcode("8850851860016")
True
>>> CheckBarcode("1232131321313")
True
>>> CheckBarcode("1122334455667")
False
>>> CheckBarcode("1234567890123")
False

这篇关于如何消除可疑条形码(如123456)数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆