如何消除可疑条形码(如123456)数据 [英] How to eliminate suspicious barcode (like 123456) data
本文介绍了如何消除可疑条形码(如123456)数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这是pandas
数据库中的一些条形码数据
Here's some bar code data from a pandas
database
737318 Sikat Botol Pigeon 4902508045506 75170
737379 Natur Manual Breast Pump 8850851860016 75170
738753 Sunlight 1232131321313 75261
739287 Bodymist bodyshop 1122334455667 75296
739677 Bodymist ale 1234567890123 75367
我想删除诸如1232131321313
,1122334455667
,1234567890123
等可疑数据(即重复或连续数字过多).我非常容忍误报,但想避免误报(错误的条形码).
I want to remove data that is suspicious (i.e. has too many repeated or successive digits) like 1232131321313
, 1122334455667
, 1234567890123
, etc. I am very tolerant of false negatives, but want to avoid false positives (bad bar codes) as much as possible.
推荐答案
第一步,我将使用内置在验证机制中的条形码,即校验和.由于您的条形码似乎是GTIN条形码(特别是GTIN-13),因此您可以使用这种方法:
As a first step I would use the barcodes built in validation mechanism, the checksum. As your barcodes appear to be GTIN barcodes (specifically GTIN-13), you can use this method:
>>> import math
>>> def CheckBarcode(s):
sum = 0
for i in range(len(s[:-1])):
sum += int(s[i]) * ((i%2)*2+1)
return math.ceil(sum/10)*10-sum == int(s[-1])
>>> CheckBarcode("4902508045506")
True
>>> CheckBarcode("8850851860016")
True
>>> CheckBarcode("1232131321313")
True
>>> CheckBarcode("1122334455667")
False
>>> CheckBarcode("1234567890123")
False
这篇关于如何消除可疑条形码(如123456)数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文