根据定界符从字符串中提取子字符串 [英] Extracting substring from string based on delimiter
问题描述
我正在尝试从编码的2D条形码中提取数据.提取部分工作正常,我可以在文本输入中获取值.
I am trying to extract the data out of an encoded 2D barcode. The extraction part is working fine, and I can get the value in a text input.
例如,解码后的字符串是
E.g., the decoded string is
] d2 01 05000456013482 17 201200 10 00001/: 21 0000000001
]d20105000456013482172012001000001/:210000000001
基于以下规则(无法获得适当的表格减价,因此无法附加图片),我正在尝试从上述字符串中提取子字符串.
Based on the following rules (couldn't get the proper table markdown thus attaching a picture), I am trying to extract the substrings from the string mentioned above.
我要提取的子字符串:
05000456013482(位于定界符01之后)
05000456013482 (which is after the delimiter 01)
201200(位于定界符17之后)
201200 (which is after delimiter 17)
00001(位于定界符10之后)
00001 (which is after delimiter 10)
0000000001(位于分隔符21之后)
0000000001 (which is after delimiter 21)
P.S->原始字符串(]d2
)中的前3个字符始终是相同的,因为它只是表示解码方法.
P.S - > the first 3 chars in the original string (]d2
) are always the same since it just simply signifies the decoding method.
现在有一些怪癖:
1)分隔符10
之后的字母数不固定.因此,在上面给出的示例中,即使它是00001
,它甚至可能是001
.同样,定界符21
之后的字母数也不固定,并且长度可以变化.
1) The number of letters after delimiter 10
is not fixed. So, in the above-given example even though it is 00001
it could be even 001
. Similarly, the number of letters after delimiter 21
is also not fixed and it could be of varying length.
对于不同的长度定界符,我添加了一个常量/:
来确定通过手持设备扫描后编码何时结束.
For different length delimiters, I have added a constant /:
to determine when encoding has ended after scanning through a handheld device.
现在,我在定界符10之后查找/:
并提取字符串,直到命中/:
或EOL,然后找到定界符21并删除字符串,直到命中/:
或EOL
Now, I have a look for /:
after delimiter 10 and extract the string until it hits /:
or EOL and find delimiter 21 and remove the string until it hits /:
or EOL
2)分隔符01
和17
之后的字母数始终是固定的(分别为14个字母和6个字母),如表所示.
2) The number of letters after delimiter 01
and 17
are always fixed (14 letter and six letters respectively)
as shown in the table.
注意:分隔符的位置可能会改变.顺序编码后的条形码可以用不同的顺序写.
Note: The position of delimiters could change. In order words, the encoded barcode could be written in a different sequence.
] d2 01 05000456013482 17 201200 10 00001/: 21 0000000001-注意:否/:
因为是停产,所以在21组之后签收
]d20105000456013482172012001000001/:210000000001 - Note: No /:
sign after 21 group since it is EOL
] d2 17 201200 10 00001/: 21 0000000001/: 01 05000456013482-注意:两者都为10和21组具有/.
符号表示我们必须提取直到该符号
]d2172012001000001/:210000000001/:0105000456013482 - Note: Both 10 and 21 group have /.
sign to signify we have to extract until that sign
] d2 10 00001/: 21 0000000001/: 01 05000456013482 17 201200-前两个是长度有所变化,接下来的两个长度是固定的.
]d21000001/:210000000001/:010500045601348217201200 - First two are of varying length, and the next two are of fixed length.
我不是正则表达式方面的专家,到目前为止,我仅尝试使用一些简单的模式,例如(01)(\d*)(21)(\d*)(10)(\d*)(17)(\d*)$
,该模式在给定的示例中不起作用,因为它看起来像前两个字符一样为10.另外,当我知道必须拔出哪些索引时,使用substring(x, x)
方法仅适用于固定长度的字符串.
I am not an expert in regex and thus far I only tried using some simple patterns like (01)(\d*)(21)(\d*)(10)(\d*)(17)(\d*)$
which doesn't work in the given an example since it looks for 10 like the first 2 chars. Also, using substring(x, x)
method only works in case of a fixed length string when I am aware of which indexes I have to pluck the string.
P.S-感谢JS和jQuery的帮助.
P.S - Either JS and jQuery help is appreciated.
推荐答案
虽然您可以尝试制作一个非常复杂的正则表达式来执行此操作,但它更易读,并且更易于维护,可以分步解析该字符串.
While you could try to make a very complicated regex to do this, it would be more readable, and maintainable to parse through the string in steps.
基本步骤是:
- 删除解码方法字符(] d2).
- 从步骤1的结果中分割出前两个字符.
- 使用它来选择提取数据的方法
- 从字符串中删除并保存该数据,重复执行步骤2,直到耗尽字符串为止.
现在,由于您有了一张AI/数据结构表,因此可以采用多种方法来提取不同形式的数据
Now since you have a table of the structure of the AI/data you can make several methods to extract the different forms of data
例如,由于AI:01、11、15、17都是固定长度,因此您可以只使用具有长度的字符串的slice方法
For instance, since AI: 01, 11, 15, 17 are all fixed length you can just use string's slice method with the length
str.slice(0,14); //for 01
str.slice(0,6); //for 11 15 17
虽然像AI 21这样的变量会像
While the variable ones like AI 21, would be something like
var fnc1 = "/:";
var fnc1Index = str.indexOf(fnc1);
str.slice(0,fnc1Index);
演示
var dataNames = {
'01': 'GTIN',
'10': 'batchNumber',
'11': 'prodDate',
'15': 'bestDate',
'17': 'expireDate',
'21': 'serialNumber'
};
var input = document.querySelector("input");
document.querySelector("button").addEventListener("click",function(){
var str = input.value;
console.log( parseGS1(str) );
});
function parseGS1(str) {
var fnc1 = "/:";
var data = {};
//remove ]d2
str = str.slice(3);
while (str.length) {
//get the AI identifier: 01,10,11 etc
let aiIdent = str.slice(0, 2);
//get the name we want to use for the data object
let dataName = dataNames[aiIdent];
//update the string
str = str.slice(2);
switch (aiIdent) {
case "01":
data[dataName] = str.slice(0, 14);
str = str.slice(14);
break;
case "10":
case "21":
let fnc1Index = str.indexOf(fnc1);
//eol or fnc1 cases
if(fnc1Index==-1){
data[dataName] = str.slice(0);
str = "";
} else {
data[dataName] = str.slice(0, fnc1Index);
str = str.slice(fnc1Index + 2);
}
break;
case "11":
case "15":
case "17":
data[dataName] = str.slice(0, 6);
str = str.slice(6);
break;
default:
console.log("unexpected ident encountered:",aiIndent);
return false;
break;
}
}
return data;
}
<input><button>Parse</button>
这篇关于根据定界符从字符串中提取子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!