从文件中查看我的BST中存在哪些单词 [英] See which words exist in my BST from a file

查看:106
本文介绍了从文件中查看我的BST中存在哪些单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从我正在插入的文件中输出拼写错误(不在BST中的单词)单词。所以基本上我有一个功能齐全的二叉树。唯一需要的功能是插入和存在。当我插入我的字典(到目前为止很好)并读取每行上有多个单词的文件时,它会崩溃(显示从大写到小写的转换后的单词和有标点符号的单词。但是当我插入一个文件时每个单词都在不同的行上,程序会给我拼写错误的单词。



I am trying to output the "misspelled" (words that are not in BST) words from a file which I am inserting. So basically I have a fully functional binary tree. The only functions needed are insert and exist. When I insert my dictionary (so far so good) and read a file that has multiple words on each line, it crashes (displays the converted words from upper case to lower case and the ones that had punctuation. But when I insert a file that every word is on a different line the program gives me the misspelled words.

<pre>#include <iostream>
#include <fstream>
#include <cstdlib>
#include <string>
#include <algorithm>
#include "bst.h"

using namespace std;

int main()
{
    char dictionaryFile[50]; //input dictionary file
	char file[50]; // input file
	
	string misspelt; // misspelt 
	string wordsDictionary;
	string words;
	
	ifstream inputDictionaryFile; // file input
	ifstream inputFile;
	BinarySearchTree *tree = new BinarySearchTree();
	
	/* GETTING DICTIONARY*/
	
	cout << "Enter dictionary file name: ";
	
	cin.getline(dictionaryFile,1000); // getting lines 
	
	inputDictionaryFile.open(dictionaryFile); //open
	
	//if it fails to open - Error
	if(!inputDictionaryFile.is_open())
	{
		cout << "Fail to open file" << endl;
		exit(EXIT_FAILURE);
	}
	
	while(inputDictionaryFile >> wordsDictionary)
	{
		int i =0;
		for( i = 0;wordsDictionary[i]!='\0'; i++) 
		{
			//find upperCase letters
			if(wordsDictionary[i] >= 'A' && wordsDictionary[i] <= 'Z')
			{
				//overwrite to lowerCase
				wordsDictionary[i] = tolower(wordsDictionary[i]);			
					
			}//end of if statement 
			//ignore tab
			if(wordsDictionary[i] == '\t')
			{ 
				wordsDictionary[i] = ' ';
			}//end of if statement
	
			//ignoring punctuation 	
			if(wordsDictionary[i] == ',' || wordsDictionary[i] == '.' || wordsDictionary[i] == '!' || wordsDictionary[i] == '?' || wordsDictionary[i] == '"' || wordsDictionary[i] == ':' || wordsDictionary[i] ==';' || wordsDictionary[i] == '-' || wordsDictionary[i] == '/' || wordsDictionary[i] == '`' || wordsDictionary[i] == '&' || wordsDictionary[i] == '@' || wordsDictionary[i] == '^' || wordsDictionary[i] == '(' || wordsDictionary[i] == ')' || wordsDictionary[i] == '<' || wordsDictionary[i] == '>' || wordsDictionary[i] == '#' || wordsDictionary[i] == '%' || wordsDictionary[i] == '{' || wordsDictionary[i] == '}' || wordsDictionary[i] == '[' || wordsDictionary[i] == ']' || wordsDictionary[i] == '|' || wordsDictionary[i] == '+' || wordsDictionary[i] == '*')
			{
				wordsDictionary[i] = ' ';
			}//end of is statement 
				
			//ignore if there is double space
			if(wordsDictionary[i] == '  ')
			{
				wordsDictionary[i] = ' ';
			}//end of if statement
		}
		tree->insert(wordsDictionary); // insert to file
	}
	
	if(tree == nullptr)
	{
		cout << "Empty tree" << endl;
	}
	
	/* GETTING FILE*/
	
	cout << "Enter file name: ";
	
	cin.getline(file,1000); // getting lines 
	
	inputFile.open(file); //open
	//if it fails to open - Error
	if(!inputFile.is_open())
	{
		cout << "Fail to open file" << endl;
		exit(EXIT_FAILURE);
	}
	
	while(inputFile >> words)
	{	
		int i =0;
		for( i = 0;words[i]!='\0'; i++) 
		{
			//find upperCase letters
			if(words[i] >= 'A' && words[i] <= 'Z')
			{
				//overwrite to lowerCase
				words[i] = tolower(words[i]);			
				
			}//end of if statement 
				
			//ignore tab
			if(words[i] == '\t')
			{ 
				words[i] = ' ';
			}//end of if statement
	
			//ignoring punctuation 	
			if(words[i] == ',' || words[i] == '.' || words[i] == '!' || words[i] == '?' || words[i] == '"' || words[i] == ':' || words[i] ==';' || words[i] == '-' || words[i] == '/' || words[i] == '`' || words[i] == '&' || words[i] == '@' || words[i] == '^' || words[i] == '(' || words[i] == ')' || words[i] == '<' || words[i] == '>' || words[i] == '#' || words[i] == '%' || words[i] == '{' || words[i] == '}' || words[i] == '[' || words[i] == ']' || words[i] == '|' || words[i] == '+' || words[i] == '*')
			{
				words[i] = ' ';
			}//end of is statement 
				
			//ignore if there is double space
			if(words[i] == '  ')
			{
				words[i] = ' ';
			}//end of if statement
			
		} //end of for loop	
		//tree->exists(words);
		if(!tree->exists(words))
		{
			cout <<"Misspelled: " << words << endl;
		}
	}
	
	delete tree;
	
	return 0;
}



^

|

spellChecker.cpp文件




^
|
spellChecker.cpp file

<pre lang="c++"><pre>// Checks if a word is in the tree
bool BinarySearchTree::exists(std::string word) const
{
    Node* node = root;
	while(node != nullptr)
	{
		if(node->data == word) 
		{
			return true;
		}
		else
		{
			if (word > node->data)
			{
				node = node->right;
			}
			else
			{
				node = node->left;
			}
		}
	}
	return false;
}







<pre>//Helper function to insert a word into the tree
void insertHelper(Node **node, std::string word)
{
	//Check if nullptr. If so set new node
	if(*node == nullptr)
	{
		//Create new node
		*node = new Node;
		//Set new word
		(*node)-> data = word;
		//Set branches to nullptr
		(*node)-> left = nullptr;
		(*node)->right = nullptr;
	}
	else  // if not empty
	{
		if(word < (*node)->data)
			insertHelper(&(*node)->left,word);
		else if(word > (*node)->data)
			insertHelper(&(*node)->right, word);
		else
			return;
	}
}

// Adds a word to the tree
void BinarySearchTree::insert(std::string word)
{
	insertHelper(&root, word);
}







单词输入文件




Single words input file

C 
is
the
most
commonly
used
programming
language
for
writing
operating
systems
The 
first
operatingg
system
written 
in 
C 
is 
Unix
Later 
operating
systems 
like 
Linux 
were 
all 
written 
in 
C 
Not 
only 
is 
C 
the 
language 
of 
operating 
systems 
it 
is 
the 
precursor 
and 
insspiration 
for 
almost 
all 
of 
the 
most 
popular 
high 
level 
languages 
available 
today 
In 
fact
Perl 
PHP 
Python 
and 
Ruby 
are 
all 
writtten 
in
c



另一个是simmilar但是单词在同一行。还有标签。



我尝试过:



我尝试更改标点符号函数


And the other is simmilar but words are on same line. There are tabs as well.

What I have tried:

I tried changing my punctuation function

//ignoring punctuation 	
			if(wordsDictionary[i] == ',' || wordsDictionary[i] == '.' || wordsDictionary[i] == '!' || wordsDictionary[i] == '?' || wordsDictionary[i] == '"' || wordsDictionary[i] == ':' || wordsDictionary[i] ==';' || wordsDictionary[i] == '-' || wordsDictionary[i] == '/' || wordsDictionary[i] == '`' || wordsDictionary[i] == '&' || wordsDictionary[i] == '@' || wordsDictionary[i] == '^' || wordsDictionary[i] == '(' || wordsDictionary[i] == ')' || wordsDictionary[i] == '<' || wordsDictionary[i] == '>' || wordsDictionary[i] == '#' || wordsDictionary[i] == '%' || wordsDictionary[i] == '{' || wordsDictionary[i] == '}' || wordsDictionary[i] == '[' || wordsDictionary[i] == ']' || wordsDictionary[i] == '|' || wordsDictionary[i] == '+' || wordsDictionary[i] == '*')
				wordsDictionary[i] = ' ';
			}//end of is statement 
				
			//ignore if there is double sp



到ispunct但它不起作用。不知道如何忽略标点符号而不用其他东西替换它。在不同的循环中尝试插入和现有函数。尝试使用如下方法:


to ispunct but it didn't work. Have no idea how to ignore punctuation without replacing it with something else. Tryed inserting and existing function in a different loop. Tried with a method something like this :

#include <iostream>
#include <string>
#include <algorithm>
using namespace std;

int main() {
    string str = "this. is my string. it's here.";

    transform(str.begin(), str.end(), str.begin(), [](char ch)
    {
        if( ispunct(ch) )
            return '\0';
        return ch;
    });
}

推荐答案

//ignore if there is double space
if(words[i] == '  ') // ???
{
	words[i] = ' ';
}//end of if statement



您不能创建多个字符的字符常量,也不能将单个字符与两个字符进行比较。我假设您正在压制或忽略编译器警告消息。


You cannot create character constants of more than one character, and you cannot compare a single character to two characters. I assume you are suppressing, or ignoring, compiler warning messages.


学习使用 debugger 来检查你的代码。



我会用另一种方式解决字符检查:只将允许字符添加到进入树中的单词。我认为这可以解决你丢失的CR和LF处理问题。



代码可以通过这种方式简化:

Learn to use the debugger to inspect your code.

I would solve the character check the other way: only add allow character to a word which comes into your tree. I think that would work around your missing CR and LF handling.

The code can get simplified in that way:
void insertHelper(Node *node, std::string word)

我会添加一个节点构造函数,它接受一个字符串和将left和right设置为NULL。

And I would add a node constructor which takes a string and sets left and right to NULL.


这篇关于从文件中查看我的BST中存在哪些单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆