将字符串分割成具有特定图案和条件的部分 [英] Splitting string to parts with specific pattern and conditions

查看:137
本文介绍了将字符串分割成具有特定图案和条件的部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大约5k +字符串的下面的数组作为某个应用程序的输出(为了安全起见,我可能不提供确切的数据,但是示例格式与实际数据非常相似):



'pre> kasdfhkasdhfaskdfj42345sdsadkfdkfhasdf5345534askfhsad
asdfasdf66sdafsdfsdf4560sdfasdfasdf
sdfaasdfs96sadfasdf65459asdfasdf
sadfasdf8asdfasdas06666654asdfasdfsd
fasdjfsdjfhgasdf6456sadfasdfasdf9sdfasdfsadf

只要我有不间断的字母数字字符串,由 5部分组成:

  [拉丁字母] [1位或更多位数] [拉丁字母] [1位或更多位数] [拉丁字母] 

字母部分的长度以及数字的数字是随机的,整个字符串长度可能会从几个变为2-3 数百 的字符,但是模式仍然如上。



实际上我对领先和尾随的字符串部分,即 [1或更多数字] [拉丁字母] [1或更多数字] 可能只是被丢弃,但另外两个字符串应该提取分离单元格。



我尝试了 SUBSTITUTE SEARCH 函数,但是我仍然可能不会处理随机数字的数字。 VBA是最后期望的方法,但是如果纯公式是无用的,则可以接受。此外,解决方案应该是灵活的,以便将来可能使用类似的模式 - 所以任何正确的指导/一般方法将不胜感激。

解决方案

如果您不介意使用 MS Word 而不是Excel - 对于包含内置搜索和替换的此类任务,有一个非常简单的方法例行使用通配符。假设可以在Word中打开数据,请执行以下操作:


  1. CTRL + H < kbd>用于替换对话框打开

  2. 勾选使用通配符选项。

  3. 您要丢弃的数据与以下模式匹配: [0-9] {1,} * [0-9] {1,} - 表示任何数字1次或更多次,任何字符之间。根据您的区域设置,您需要; 而不是

  4. 指定您喜欢的任何字符,例如 ^ t (Tab)或; - 进一步部件拆分。

  5. 执行替换。

  6. 您也可以使用功能区 Insert>表>将文本转换为表格功能。

现在您需要保存/粘贴结果



实际上,这种方法是非常强大的,并且与特殊技能和/或编程相关的许多常规文本数据解析任务可能会很快完成。而且您不需要任何第三方工具 - 每个PC都安装了现在的Word。



详细了解模式和适用案例:




I have the below-like array of about 5k+ strings as output from certain application (for security reasons I may not provide the exact data, but the example format is pretty much similar to the actual data):

kasdfhkasdhfaskdfj42345sdsadkfdkfhasdf5345534askfhsad
asdfasdf66sdafsdfsdf4560sdfasdfasdf
sdfaasdfs96sadfasdf65459asdfasdf
sadfasdf8asdfasdas06666654asdfasdfsd
fasdjfsdjfhgasdf6456sadfasdfasdf9sdfasdfsadf

Simply, I have non-breaking alphanumeric string that consists of 5 parts:

[latin letters][1 or more digits][latin letters][1 or more digits][latin letters]

Length of letter parts, as well as amount of digits is random, overall string length may vary from several to 2-3 hundreds of chars, but the pattern is still as above.

Practically I'm interested in leading and trailing string parts, i.e. [1 or more digits][latin letters][1 or more digits] may be just thrown away, but 2 other strings should be extracted to separate cells.

I tried SUBSTITUTE and SEARCH functions, but I still may not handle random amount of digits. VBA is the last desired approach, however it is acceptable in case pure formulas are useless. Moreover, the solution should be flexible for possible future use with similar patterns - so any right guidance / general approach will be appreciated.

解决方案

If you don't mind using MS Word instead of Excel - there's a very straightforward approach for such tasks which involves built-in Search and Replace routine using wildcards. Assuming data may be opened in Word, do the following:

  1. Press CTRL+H for Replace dialog opening.
  2. Tick Use wildcards option.
  3. The part of your data you want to throw away match to the following pattern: [0-9]{1,}*[0-9]{1,} - which means any digit 1 or more times with any chars between. Depending on your regional settings you'll need ; instead of , here.
  4. Specify as a replacement any char you like, e.g. ^t (Tab) or ; - for further parts splitting.
  5. Perform replacement.
  6. Optionally you may convert the rest to table using Ribbon Insert > Table > Convert Text to Table... feature.

All you need now is to save / paste the result obtained.

Actually, the approach is quite powerful, and many routine text data parsing tasks similar to your may be quickly done without special skills and/or programming. And you don't need any 3rd party tool for this - every PC has Word installed nowadays.

Read more about patterns and applicable cases:

这篇关于将字符串分割成具有特定图案和条件的部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆