php :单词邻近脚本? [英] php : word proximity script?

查看:19
本文介绍了php :单词邻近脚本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的 - 所以,我花了很长时间在 Google 上搜索,甚至在 hotscripts 等、几个 php 论坛和这个地方进行了一些特定的搜索......什么都没有(无论如何都没有用).

Okay - so, I've spent ages searching in Google, and even went through a few specific searches at hotscripts etc., several php forums and this place ... nothing (not of use anyway).

我希望能够获取一段文本(页面/文件/文档)并将其拉开以找到特定术语之间的距离"(找到接近度/相对距离等).

i want to be able to take a block of text (page/file/doc) and pull it apart to find the "distance" between specific terms (find the proximity/raltional distance etc.).

我原以为周围至少会有一些这样的东西 - 但我没有找到它们.所以 - 这可能比我想象的要难.我知道这可能有点饥饿"——因为它可能对大型文档相当密集——但肯定有可能吗?

I woudl have thought there'd be at least a few such thigns around - but I'm not finding them. So - it may be harder than I thought. I understand it may be a somewhat "hungry" endevour - as it's likely to be fairly intensive on large documents - but surely it is possible?

事实上 - 环顾四周 - 我发现的大多数参考资料(除了 lamo-repeat SEO 网站)似乎都建议进行高级语言研究、安装到服务器上的奇怪/高级软件包等.

Infact - whilst looking around - the majority of references that I find (apart from lamo-repeat SEO sites) seems to suggest advanced linguistic studies, strange/advanced packages to install onto a server etc.

我是否认为接近"实际上是一个非常复杂的问题,并且需要大量的资源和大量的开发?(老实说 - 在我看来这似乎有点温和 - 所以我想知道我到底缺少什么(注意:相对而言很简单......我会将它与简单(密度/计数)到困难进行比较(词干提取/基础/同义词词典))).

Am I to assume that "proximity" is infact a highly complex issue, and will require serious resources and an awful lot of development? (Honestly - in my mind it seems somewhat moderate - so I'm wondering exactly what it is I'm missing (Note: Simple in a relative sense ... I would compare it to easy (density/count) through to difficult(word stemming/base/thesaurusing)).

所以 - 参考/建议/想法/想法???

So - references/suggestions/ideas/thoughts???

推荐答案

您的示例搜索了 Word1 ... Word2,是否也匹配 Word2 ... Word1?一个简单的解决方案是使用 RegEx:

your example searched Word1 ... Word2, should Word2 ... Word1 also be matched? A simple solution is to use RegEx:

即:

  1. 使用正则表达式:\bWord1\b(.*)\bWord2\b
  2. 在第一个匹配组中,使用空格(或任何边界)将其拆分为一个数组,并进行计数

这是最直接的方法,但绝对不是最好的(即性能方面)方法.如果您想要更具体的答案,我认为您需要澄清您的需求.

this is the most straight forward method, but definitely not the best (i.e. performance wise) method. I think you need to clarify your needs if you want a more specific answer.

更新:

合并2个问题后,我看到其他答案提到soundex,levinstein和hamming distance等.我建议theclueless1澄清要求,以便人们可以提供有用的帮助.如果这是与搜索或文档聚类相关的应用程序,我还建议您查看成熟的全文索引/搜索解决方案,例如 sphinx 或 lucene.我认为它们中的任何一个都可以与 PHP 一起使用.

After the 2 questions are merged, I see other answers mentioning soundex, levinstein and hamming distance etc. I would suggest theclueless1 to CLARIFY the requirements so that people can give useful help. If this is an application related to searching or document clustering, I also suggest you to take a look at mature full text indexing/searching solutions such as sphinx or lucene. I think any of them can be used with PHP.

这篇关于php :单词邻近脚本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆