如何用C ++语言查找段落中的单词,句子,段落的数量,最重复出现的单词. [英] How to find number of words, sentences, paragraphs, most repeated word occurrences in a paragraph in C++ language.

查看:175
本文介绍了如何用C ++语言查找段落中的单词,句子,段落的数量,最重复出现的单词.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

朋友,
我试图获取每个句子中的单词数,但面临一些困难.我必须在每个句子以及段落中找到最多重复的单词.所以朋友,我需要你的帮助.

我的代码:

Hi Friends,
I am trying to get the number of words in each sentences but facing some difficulties. I have to find the most number of repeated words in each sentence as well as in paragraph. So friends I need your help.

My code:

#include "stdafx.h"
#include "iostream"
#include "string"
#include "sstream"

using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
	string userInput="India is a country in South Asia. It is the Seventh-Largest country by area and second-largest by population and most populous democracy in the world.";
	int words = 1;     
	int sentences = 0;
	int paragraphs = 1;

	//cout << "Enter some text: ";
	//getline (cin, userInput);

	for (int i = 0; i < int(userInput.length()); i++) 
    { 		
		if (userInput.empty()) 
		{
			words--;
			paragraphs--;
		}

		if (userInput[i] == '' '')  
			words++ ;

		if (userInput[i] == ''.'')
			sentences++;
			

		if (userInput[i] == ''\n'' && userInput[i] == ''\t'')
			paragraphs++;
    }
	cout << "words: " << words << endl;
	cout << "sentences: " << sentences << endl;
	cout << "paragraphs: " << paragraphs << endl;

	//cout << "Number of words in sentence :" << endl;
	
	/*
     istringstream iss(userInput);
       do
       {
         string sub;
         iss >> sub;
         cout << "Substring: " << sub << endl;
       } 
	while (iss);*/

	return 0;
}



预先感谢
正如您所说,Johny_sa



Advance Thanks
Johny_sa

推荐答案

它是C ++.因此,您必须利用C ++作为面向对象的语言.使用标准的C ++库,制作自己的对象.您的任务的每个部分都必须分别解决.如果您想在单个主要功能中解决所有问题,例如在学生作业方法中,这是错误的.分离任务的可能性是C语言的一大优势.更好地分离任务的可能性是C ++的更大优势.最有可能的是,您必须为每个部分上一堂课.例如,文本类,Paragraf类,句子类.文本必须包含Paragraf对象数组. Paragraf必须包含Sentence对象数组.句子负责检测大多数重复的单词.
利用C ++标准模板库,例如矢量,地图,迭代器.
例如,为了计算单词出现的次数,您可以使用< string,int>的映射.对于每个遇到的用作键的单词,增加其值.
As you said. It is C++. So, you have to take advantage of C++ as object oriented language. Use standard C++ libraries, make your own objects. Each part of your task have to be solved separately. If you want to solve all question in a single main function, like in student homework approach, this is wrong. Possibility to separe tasks is a big advantage of C language. And possibility to separate tasks much better is a much bigger advantage of C++. Most likely, for each part you have to make a class. For instance class Text, class Paragraf, class Sentence. The Text have to contain array of Paragraf objects. Paragraf have to contain array of Sentence objects. Sentence is responsible for detecting most repeated words.
Take advantage of C++ standard template libraries, such as vectors, maps, iterators.
For instance to count word occurence you may use map of <string,int>. For each encountered word used as a key, increment its value.


1.尝试编写三种不同的功能,每种功能用于计算单词,句子和段落.您可能需要重复一些代码,但是通过将这三个任务分开,您将可以更轻松地测试正确的条件,并且一次可以解决一个问题.

2.在定义变量时初始化变量是一个好主意,但应使用合理的值.我了解您使用1而不是0进行初始化的原因,但它的确看起来很奇怪,并且使您难以理解代码并遵循其逻辑.

3.您的空格测试不考虑其他空白"情况,例如制表符,回车,换页或多个空白字符.它可能不适用于您在此处进行测试的情况,但是如果您仅基于所拥有的特定测试用例来编写代码,则不妨手动计算单词数并返回这些数字,而不是围绕它编写整个算法. .

4.."不是结束句子的唯一方法.另外,根据您从何处获取文本,可能会遇到多个标点符号的序列!!! ;-)

5.条件
1. Try writing three different functions, one each for counting words, sentences, and paragraphs. You may have to duplicate some code, but by separating the the three tasks you will have an easier time to test for the correct conditions, and you will be able to solve one problem at a time.

2. It''s a good idea to initialize variables when you define them, but you should use reasonable values. I understand your reasons to initialize some with 1 instead of 0, but it definitely looks odd, and makes it harder to understand your code and follow its logic.

3. Your test for blanks does not consider other cases of ''whitespace'', such as tab characters, carriage return, form feed, or multiple whitespace characters. It may not apply to the case you are testing here, but if you base your code only on the specific test case you have, you may as well count the words by hand and return these numbers rather than write an entire algorithm around it...

4. ''.'' is not the only way to end a sentence. Also, depending on where you get your text from, you may be confronted with sequences of multiple punctuation marks!!! ;-)

5. The condition
if (userInput[i] == '\n' && userInput[i] == '\t')


无论文本如何,始终为假,并且始终为假.此外,为什么还要测试制表符(``\ t'')?重新考虑您使用的段落的定义,或者重新考虑分隔段落的定义.

6.作为一般规则,请始终考虑一些特殊情况:e. G.多个分隔符,您只希望一个,在文本末尾省略或添加一个分隔符,使用常用分隔符的变体,或解释不是可读文本一部分但不是您捕获的分隔符之一的字符.


is always false and always will be, no matter the text. Besides, why do you test for a tab character ( ''\t'' )? Reconsider the definition of paragraph that you use, or, rather, the definition of what separates paragraphs.

6. As a general rule, always consider corner cases: e. g. multiple separater characters where you only expect one, omitting or adding a separator at the end of the text, using variants on the commonly used separators, or interpreting characters that are not part of the readable text, but not one of the separators you catch either.


我的解决方案正好回答了您的问题.我在Java中有代码,因此,如果您可能能够理解它的好处.我的解决方案将在每个段落中显示前十个最重复的单词...

这是代码的一瞥,可找到原始的解决方案访问
单击此处以获取Java原始源代码.

well my solution exactly answers your question. I have code in java so if u might able to understand it its good. My solution will display top ten most repeated words in each paragraph...

here is a glimpse of code to find original solution visit
Click here to get to original source code in java.

for (int i = keys.length - 1, count = 0; i >= 0; i--)
            {
                if (count == 10) {
                    break;
                }
                count++;
                System.out.println(count + ". " + keys[i] + ",    \tFrequency "+ map1.get(keys[i]));
            }


这篇关于如何用C ++语言查找段落中的单词,句子,段落的数量,最重复出现的单词.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆