拆分包含字母和数字的字符串,在 PHP 中不被任何特定的分隔符分隔 [英] Splitting string containing letters and numbers not separated by any particular delimiter in PHP

查看:39
本文介绍了拆分包含字母和数字的字符串,在 PHP 中不被任何特定的分隔符分隔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前我正在开发一个 Web 应用程序来获取 Twitter 流并尝试自己创建一个自然语言处理.

Currently I am developing a web application to fetch Twitter stream and trying to create a natural language processing by my own.

由于我的数据来自 Twitter(限制为 140 个字符),因此缩短了许多单词,或者在这种情况下,省略了空格.

Since my data is from Twitter (limited by 140 characters) there are many words shortened, or on this case, omitted space.

例如:

"Hi, my name is Bob. I m 19yo and 170cm tall"

应该被标记为:

- hi
- my
- name
- bob
- i
- 19
- yo
- 170
- cm
- tall

注意1919yo中的yo之间<​​strong>没有空格.我主要用它来提取带有单位的数字.

Notice that 19 and yo in 19yo have no space between them. I use it mostly for extracting numbers with their units.

简单地说,我需要的是一种方法来分解"每个包含数字的标记,通过大块数字或字母没有分隔符.

Simply, what I need is a way to 'explode' each tokens that has number in it by chunk of numbers or letters without delimiter.

'123abc' 将是 ['123', 'abc']

'abc123' 将是 ['abc', '123']

'abc123xyz' 将是 ['abc', '123', 'xyz']

等等.

在 PHP 中实现它的最佳方法是什么?

What is the best way to achieve it in PHP?

我发现了一些接近它的东西,但它是 C# 并且专门用于日/月拆分.如何在 C# 中根据字母和数字拆分字符串

I found something close to it, but it's C# and spesifically for day/month splitting. How do I split a string in C# based on letters and numbers

推荐答案

您可以使用 preg_split

$string = "Hi, my name is Bob. I m 19yo and 170cm tall";
$parts = preg_split("/(,?s+)|((?<=[a-z])(?=d))|((?<=d)(?=[a-z]))/i", $string);
var_dump ($parts);

匹配数字字母边界时,正则表达式匹配必须为零宽度.字符本身不得包含在匹配中.为此,零宽度环视很有用.

When matching against the digit-letter boundary, the regular expression match must be zero-width. The characters themselves must not be included in the match. For this the zero-width lookarounds are useful.

http://codepad.org/i4Y6r6VS

这篇关于拆分包含字母和数字的字符串,在 PHP 中不被任何特定的分隔符分隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆