在无空格字符串中的单词之间添加空格 [英] Add spaces between words in spaceless string

查看:84
本文介绍了在无空格字符串中的单词之间添加空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 OS X 上,在 Objective-c 中我正在尝试转换

I'm on OS X, and in objective-c I'm trying to convert

例如,波巴特青苹果"

进入鲍勃吃了一个青苹果"

into "Bob ate a green apple"

有什么方法可以有效地做到这一点吗?涉及拼写检查器的东西会起作用吗?

Is there any way to do this efficiently? Would something involving a spell checker work?

只是一些额外的信息:我正在尝试构建一些格式错误的文本(例如,从旧的 pdf 粘贴的文本副本,最终没有空格,尤其是来自 JSTOR 等互联网档案).由于格式错误的文本可能会很长......好吧,我只是想弄清楚这是否可行,然后我才真正尝试实际编写系统才发现修复一段文本需要 2 个小时.

Just some extra information: I'm attempting to build something that takes some misformatted text (for example, text copy pasted from old pdfs that end up without spaces, especially from internet archives like JSTOR). Since the misformatted text is probably going to be long... well, I'm just trying to figure out whether this is feasibly possible before I actually attempt to actually write system only to find out it takes 2 hours to fix a paragraph of text.

推荐答案

一种可能性,我将以非操作系统特定的方式描述,是在组成字母集合的所有可能的单词中执行搜索.

One possibility, which I will describe this in a non-OS specific manner, is to perform a search through all the possible words that make up the collection of letters.

基本上,您将字母集合的第一个字母砍掉,并将其添加到您正在形成的当前单词中.如果它生成一个单词(例如字典查找),则将其添加到当前句子中.如果你设法用完你收藏中的所有字母并用它们组成单词,那么你就有了一个完整的句子.但是,您不必停在这里.相反,你一直在跑,最终你会产生所有可能的句子.

Basically you chop off the first letter of your letter collection and add it to the current word you are forming. If it makes a word (eg dictionary lookup) then add it to the current sentence. If you manage to use up all the letters in your collection and form words out of all of them, then you have a full sentence. But, you don't have to stop here. Instead, you keep running, and eventually you will produce all possible sentences.

伪代码看起来像这样:

FindWords(vector<Sentence> sentences, Sentence s, Word w, Letters l)
{
    if (l.empty() and w.empty())
        add s to sentences;
        return;
    if (l.empty())
        return;
    add first letter from l to w;
    if w in dictionary
    {
        add w to s;
        FindWords(sentences, s, empty word, l)
        remove w from s
    }
    FindWords(sentences, s, w, l)
    put last letter from w back onto l
}

当然,您可以执行许多优化以使其快速运行.例如检查单词是否是字典中任何单词的词干.但是,这是为您提供所有可能句子的基本方法.

There are, of course, a number of optimizations you could perform to make it go fast. For instance checking if the word is the stem of any word in the dictionary. But, this is the basic approach that will give you all possible sentences.

这篇关于在无空格字符串中的单词之间添加空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆