将句子分成单词和标点符号 [英] divide sentence into words and punctuations

查看：116 发布时间：2018/12/5 21:49:25 java string split

本文介绍了将句子分成单词和标点符号的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要将类 Sentence 解析为单词和标点符号（空格被视为标点符号），然后将所有内容添加到常规 ArrayList< ;句子> 。

I need to parse class Sentence into word and punctuation (whitespace is considered as a punctuation mark), then add all of it into general ArrayList<Sentence>.

例句：

一个人，一个计划，一个运河 - 巴拿马！

A => word

whitespase =>标点符号

man => word

，+ space =>标点符号

a => word

[...]

A man, a plan, a canal — Panama!
A => word
whitespase => punctuation
man => word
, + space => punctuation
a => word
[...]

我试着一次一个字符地阅读整个句子并收集相同的内容，并从这个集合中创建新单词或新的标点符号。

I tried to read this whole sentence one character at a time and collect the same and create new word or new Punctuation from this collection.

这是我的代码：

public class Sentence {

    private String sentence;
    private LinkedList<SentenceElement> elements;

    /**
     * Constructs a sentence.
     * @param aText a string containing all characters of the sentence
     */
    public Sentence(String aText) {
        sentence = aText.trim();
        splitSentence();
    }

    public String getSentence() {
        return sentence;
    }

    public LinkedList<SentenceElement> getElements() {
        return elements;
    }

    /**
     * Split sentance into words and punctuations
     */
    private void splitSentence() {
        if (sentence == "" || sentence == null || sentence == "\n") {
            return;
        }

        StringBuilder builder = new StringBuilder();

        int j = 0;
        boolean mark = false;
        while (j < sentence.length()) {
            //char current = sentence.charAt(j);

            while (Character.isLetter(sentence.charAt(j))) {
                if (mark) {
                    elements.add(new Punctuation(builder.toString()));
                    builder.setLength(0);
                    mark = false;
                }
                builder.append(sentence.charAt(j));
                j++;
            } 
            mark = true;

            while (!Character.isLetter(sentence.charAt(j))) {
                if (mark) {
                    elements.add(new Word(builder.toString()));
                    builder.setLength(0);
                    mark = false;
                }
                builder.append(sentence.charAt(j));
                j++;
            }
            mark = true;
        }
    }

但splitSentence（）的逻辑无法正常工作。我无法找到合适的解决方案。

But logic of splitSentence() isn't work correctly. And I can't to find right solution for it.

我想实现这个，因为我们读取第一个字符=>添加到builder =>直到下一个元素是相同的类型（字母或标点符号）继续添加到builder =>当下一个元素不同于builder =>创建新单词或标点符号并将构建器设置为开始时。

I want to implement this as we read first character => add to builder => till next element are the same type (letter or punctuation) keep adding to builder => when next element are different than content of builder => create new word or punctuation and set builder to start.

再次执行相同的逻辑。

如何以正确的方式实现此检查逻辑？

How to implement this checking logic at right way?

推荐答案

在字边界上拆分字符串（第一个除外）：

Split the string on word boundaries (except the first):

String[] parts = sentence.split("(?<!^)\\b");

数组将包含交替的单词/标点符号/单词/标点符号/单词等。

The array will contain alternating word/punctuation/word/punctuation/word etc.

以下是一些测试代码：

String sentence = "A man, a plan, a canal — Panama!";
String[] parts = sentence.split("(?<!^)\\b");
for (String part : parts)
    System.out.println('"' + part + "\" (" + (part.matches("\\w+") ? "word" : "punctuation") + ")");

输出：

"A" (word)
" " (punctuation)
"man" (word)
", " (punctuation)
"a" (word)
" " (punctuation)
"plan" (word)
", " (punctuation)
"a" (word)
" " (punctuation)
"canal" (word)
" — " (punctuation)
"Panama" (word)
"!" (punctuation)

这篇关于将句子分成单词和标点符号的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将句子分成单词和标点符号 [英] divide sentence into words and punctuations

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

将句子分成单词和标点符号 [英] divide sentence into words and punctuations

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭