解析 MySQL 列以提取数据 [英] Parsing Though MySQL Column to Extract Data

查看:54
本文介绍了解析 MySQL 列以提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 1950 年代汽车零件手册的数字化版本中,模型条目旨在供人眼阅读,以节省空间,因此对其进行了缩写.当然,这不适用于我的搜索引擎,我对正则表达式知之甚少,对 REGEXP 知之甚少,所以我希望这里有人可以提供帮助.我确实看过 MySQL 手册,但恐怕没有这样的东西.数据也有其他变化,但我想从这一点开始.我目前正在使用 PHP 脚本进行解析,该脚本需要数小时才能运行,即使它只需要在有更新数据时运行,它也需要很长时间.

On a digitized version of a 1950s automotive parts book, the model entries were designed to be read by human eyes so to save space they were abbreviated. Of course, this doesn't work with my search engine and I know very little about regular expressions and even less about REGEXP so I hope someone here can assist. I did look through the MySQL manual but I'm afraid there was nothing like this. There are other variations of the data too but I want to start with this bit. I am currently doing the parsing using a PHP script that takes hours to run and, even though it needs to be run only when there is updated data, it takes far too long.

这是单行字段中的数据,用分号分隔(注意:这些在原始文件中的一行,但我在此处添加了返回值以使其更易于阅读):

Here is the data in a single row field, separated by semicolons (NOTE: these are on one line in the original but I added returns here to make it more easily readable):

2452-62-65-67-72-92-95-98; 
2552-62-65-72-77-92-95; 
2650-51-52-62-65-72-77-92-95-97; 
5450-51-52-62-65-67-72-77-82-85-92-95-97

... 那些需要解析的总是有破折号,但我需要它以某种方式解析它们看起来像这样.它需要忽略任何文本,因为有些文本具有 ALL 24TH 之类的内容;25日;26日单独发帖:

... and those that need parsing will always have the dashes but I need it to somehow parse them to look like this. It needs to ignore any text since some have things like ALL 24TH; 25TH; 26TH which post separately:

2452 2462 2465 2467 2472 2492 2495 2498; 
2552 2562 2565 2572 2577 2592 2595; 
2650 2651 2652 2662 2665 2672 2677 2692 2695 2697; 
5450 5451 5452 5462 5465 5467 5472 5477 5482 5485 5492 5495 5497

这个简单的查询将拉出一个四位数,但我不知道如何继续.

This simple query will pull up a four digit number but I don't how to proceed.

SELECT
    Models 
FROM
    parts_listing
WHERE
     Models REGEXP '^[0-9]{4}$'
ORDER BY Models;

以下是基本行或给定零件的模型分组的简短示例:

Here's a shorter example of a basic row, or grouping of models for a given part:

2206-13-26-33; 2302-06-13-32-33

...但该行还可能包含其他信息(我最终也需要解析),如这些示例所示.对于第一个和第二个,我们需要解析出 ALL 和序数,对于第三个,我们需要解析出括号之间的任何内容.第四个示例以 ALL 值开始和结束,但中间有一些用短划线分隔的值.

... but the row may also contain other info (that I'll need to eventually parse too) as shown in these examples. For the first and second we need to parse out the ALL and the ordinal and for the third we need to parse out anything between parenthesis. The fourth is an example that begins and ends with ALL values but has some dash-separated values in between.

ALL 24TH; 25TH; 26TH
2262-65-70-71-72-75-76-77-79-80-82-86-92-93-95; ALL 23RD
5401-11-31 (BODIES 5467-77-97)
ALL 22ND; 2301-02-13-32; ALL 24TH; 25TH; 26TH; 54TH

也有一些带有 LHD(或有时是 RHD)并且在开头或在某些情况下在行中的其他位置,并且后面总是有一个逗号:

There also some with LHD (or sometimes RHD) and either at the beginning or in some cases in other places in the row and always has a comma after it:

LHD, 2401 (BODIES 2462-65-92-95-98); 2501; 2601-33; 5400-33

可能还有其他变体,但目前我只需要基本的模型信息.

It's possible there are other variations too but for now it's only the basic model information that I'm after.

推荐答案

如果该行最多有 9 个部分,这将起作用.
如果有更多部分的情况,您可以扩展子查询以包含多于 9 的数字:

This will work if the row has up to 9 parts.
If there is a case for more parts you can extend the subquery to include more numers than 9:

select  
  group_concat(
    replace(t.part, '-', concat(' ', left(t.part, 2)))
    order by t.partno
    separator ' '                                      
  ) Models
from (
  select t.Models, p.partno,
    replace(replace(
      substring_index(t.Models, ';', p.partno),
      substring_index(t.Models, ';', p.partno - 1),
      ''
    ), ';', '') part 
  from parts_listing t cross join (
    select 1 partno union all select 2 union all select 3 union all
    select 4 union all select 5 union all select 6 union all
    select 7 union all select 8 union all select 9
  ) p 
  where replace(replace(Models, '-', ''), ';', '') regexp'^[0-9]*$'
) t
where t.part <> ''
group by t.Models
order by t.Models 

查看演示.

这篇关于解析 MySQL 列以提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆