根据定界符从字符串中提取子字符串 [英] Extracting substring from string based on delimiter

查看:151
本文介绍了根据定界符从字符串中提取子字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从编码的2D条形码中提取数据.提取部分工作正常,我可以在文本输入中获取值.

I am trying to extract the data out of an encoded 2D barcode. The extraction part is working fine, and I can get the value in a text input.

例如,解码后的字符串是

E.g., the decoded string is

] d2 01 05000456013482 17 201200 10 00001/: 21 0000000001

]d20105000456013482172012001000001/:210000000001

基于以下规则(无法获得适当的表格减价,因此无法附加图片),我正在尝试从上述字符串中提取子字符串.

Based on the following rules (couldn't get the proper table markdown thus attaching a picture), I am trying to extract the substrings from the string mentioned above.

我要提取的子字符串:

05000456013482(位于定界符01之后)

05000456013482 (which is after the delimiter 01)

201200(位于定界符17之后)

201200 (which is after delimiter 17)

00001(位于定界符10之后)

00001 (which is after delimiter 10)

0000000001(位于分隔符21之后)

0000000001 (which is after delimiter 21)

P.S->原始字符串(]d2)中的前3个字符始终是相同的,因为它只是表示解码方法.

P.S - > the first 3 chars in the original string (]d2) are always the same since it just simply signifies the decoding method.

现在有一些怪癖:

1)分隔符10之后的字母数不固定.因此,在上面给出的示例中,即使它是00001,它甚至可能是001.同样,定界符21之后的字母数也不固定,并且长度可以变化.

1) The number of letters after delimiter 10 is not fixed. So, in the above-given example even though it is 00001 it could be even 001. Similarly, the number of letters after delimiter 21 is also not fixed and it could be of varying length.

对于不同的长度定界符,我添加了一个常量/:来确定通过手持设备扫描后编码何时结束.

For different length delimiters, I have added a constant /: to determine when encoding has ended after scanning through a handheld device.

现在,我在定界符10之后查找/:并提取字符串,直到命中/:或EOL,然后找到定界符21并删除字符串,直到命中/:或EOL

Now, I have a look for /: after delimiter 10 and extract the string until it hits /: or EOL and find delimiter 21 and remove the string until it hits /: or EOL

2)分隔符0117之后的字母数始终是固定的(分别为14个字母和6个字母),如表所示.

2) The number of letters after delimiter 01 and 17 are always fixed (14 letter and six letters respectively)

 as shown in the table.

注意:分隔符的位置可能会改变.顺序编码后的条形码可以用不同的顺序写.

Note: The position of delimiters could change. In order words, the encoded barcode could be written in a different sequence.

] d2 01 05000456013482 17 201200 10 00001/: 21 0000000001-注意:否/:因为是停产,所以在21组之后签收



]d20105000456013482172012001000001/:210000000001 - Note: No /: sign after 21 group since it is EOL

] d2 17 201200 10 00001/: 21 0000000001/: 01 05000456013482-注意:两者都为10和21组具有/.符号表示我们必须提取直到该符号

]d2172012001000001/:210000000001/:0105000456013482 - Note: Both 10 and 21 group have /. sign to signify we have to extract until that sign

] d2 10 00001/: 21 0000000001/: 01 05000456013482 17 201200-前两个是长度有所变化,接下来的两个长度是固定的.

]d21000001/:210000000001/:010500045601348217201200 - First two are of varying length, and the next two are of fixed length.

我不是正则表达式方面的专家,到目前为止,我仅尝试使用一些简单的模式,例如(01)(\d*)(21)(\d*)(10)(\d*)(17)(\d*)$,该模式在给定的示例中不起作用,因为它看起来像前两个字符一样为10.另外,当我知道必须拔出哪些索引时,使用substring(x, x)方法仅适用于固定长度的字符串.

I am not an expert in regex and thus far I only tried using some simple patterns like (01)(\d*)(21)(\d*)(10)(\d*)(17)(\d*)$ which doesn't work in the given an example since it looks for 10 like the first 2 chars. Also, using substring(x, x) method only works in case of a fixed length string when I am aware of which indexes I have to pluck the string.

P.S-感谢JS和jQuery的帮助.

P.S - Either JS and jQuery help is appreciated.

推荐答案

虽然您可以尝试制作一个非常复杂的正则表达式来执行此操作,但它更易读,并且更易于维护,可以分步解析该字符串.

While you could try to make a very complicated regex to do this, it would be more readable, and maintainable to parse through the string in steps.

基本步骤是:

  1. 删除解码方法字符(] d2).
  2. 从步骤1的结果中分割出前两个字符.
  3. 使用它来选择提取数据的方法
  4. 从字符串中删除并保存该数据,重复执行步骤2,直到耗尽字符串为止.

现在,由于您有了一张AI/数据结构表,因此可以采用多种方法来提取不同形式的数据

Now since you have a table of the structure of the AI/data you can make several methods to extract the different forms of data

例如,由于AI:01、11、15、17都是固定长度,因此您可以只使用具有长度的字符串的slice方法

For instance, since AI: 01, 11, 15, 17 are all fixed length you can just use string's slice method with the length

str.slice(0,14); //for 01
str.slice(0,6);  //for 11 15 17

虽然像AI 21这样的变量会像

While the variable ones like AI 21, would be something like

var fnc1 = "/:";
var fnc1Index = str.indexOf(fnc1);
str.slice(0,fnc1Index);

演示

var dataNames = {
  '01': 'GTIN',
  '10': 'batchNumber',
  '11': 'prodDate',
  '15': 'bestDate',
  '17': 'expireDate',
  '21': 'serialNumber'
};

var input = document.querySelector("input");
document.querySelector("button").addEventListener("click",function(){
  var str = input.value;
  console.log( parseGS1(str) );
});

function parseGS1(str) {
  var fnc1 = "/:";
  var data = {};
  
  //remove ]d2
  str = str.slice(3);
  while (str.length) {
    //get the AI identifier: 01,10,11 etc
    let aiIdent = str.slice(0, 2);
    //get the name we want to use for the data object
    let dataName = dataNames[aiIdent];
    //update the string
    str = str.slice(2);

    switch (aiIdent) {
      case "01":
        data[dataName] = str.slice(0, 14);
        str = str.slice(14);
        break;
      case "10":
      case "21":
        let fnc1Index = str.indexOf(fnc1);
        //eol or fnc1 cases
        if(fnc1Index==-1){
          data[dataName] = str.slice(0);
          str = "";
        } else {
          data[dataName] = str.slice(0, fnc1Index);
          str = str.slice(fnc1Index + 2);
        }
        break;
      case "11":
      case "15":
      case "17":
        data[dataName] = str.slice(0, 6);
        str = str.slice(6);
      break;
      default:
        console.log("unexpected ident encountered:",aiIndent);
        return false;
        break;
    }
  }
  return data;
}

<input><button>Parse</button>

这篇关于根据定界符从字符串中提取子字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆