将句子拆分成单独的单词 [英] Split a sentence into separate words

查看:209
本文介绍了将句子拆分成单独的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将中文句子拆分为单独的单词.中文的问题是没有空格.例如,该句子可能看起来像:主楼怎么走(带空格的地方是:主楼 怎么 走).

I need to split a Chinese sentence into separate words. The problem with Chinese is that there are no spaces. For example, the sentence may look like: 主楼怎么走 (with spaces it would be: 主楼 怎么 走).

目前,我可以想到一种解决方案.我有一本有中文单词的字典(在数据库中).该脚本将:

At the moment I can think of one solution. I have a dictionary with Chinese words (in a database). The script will:

  1. 尝试在数据库(主楼)中查找句子的前两个字符,

  1. try to find the first two characters of the sentence in the database (主楼),

如果主楼实际上是一个单词,并且它在数据库中,脚本将尝试查找前三个字符(主楼怎). 主楼怎不是单词,所以不在数据库中=>我的应用程序现在知道主楼是一个单独的单词.

if 主楼 is actually a word and it's in the database the script will try to find first three characters (主楼怎). 主楼怎 is not a word, so it's not in the database => my application now knows that 主楼 is a separate word.

尝试使用其余字符.

我真的不喜欢这种方法,因为即使分析很小的文本,它也会查询数据库太多次.

I don't really like this approach, because to analyze even a small text it would query the database too many times.

还有其他解决方案吗?

推荐答案

感谢大家的帮助!

经过一番研究,我发现了一些工作工具(牢记您的所有建议),这就是为什么我回答自己的问题.

After a little research I've found some working tools (having in mind all your suggestions), that's why I'm answering my own question.

  1. PHP类( http://www.phpclasses.org/browse/package/2431.html )

Drupal模块,基本上是另一个PHP解决方案,具有4种不同的细分算法(非常容易理解它的工作原理)(

A Drupal module, basically another PHP solution with 4 different segmentation algorithms (pretty easy to understand how it works) (http://drupal.org/project/csplitter)

用于中文分词的PHP扩展程序( http://code.google.com/p/phpcws/)

A PHP extension for Chinese word segmentation (http://code.google.com/p/phpcws/)

如果您尝试在baidu.com上搜索中文分词",则还有其他解决方案

There are some other solutions availabe if you try searching baidu.com for "中文分词"

此致

Equ

这篇关于将句子拆分成单独的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆