从文件中查看我的BST中存在哪些单词 [英] See which words exist in my BST from a file
问题描述
我正在尝试从我正在插入的文件中输出拼写错误(不在BST中的单词)单词。所以基本上我有一个功能齐全的二叉树。唯一需要的功能是插入和存在。当我插入我的字典(到目前为止很好)并读取每行上有多个单词的文件时,它会崩溃(显示从大写到小写的转换后的单词和有标点符号的单词。但是当我插入一个文件时每个单词都在不同的行上,程序会给我拼写错误的单词。
I am trying to output the "misspelled" (words that are not in BST) words from a file which I am inserting. So basically I have a fully functional binary tree. The only functions needed are insert and exist. When I insert my dictionary (so far so good) and read a file that has multiple words on each line, it crashes (displays the converted words from upper case to lower case and the ones that had punctuation. But when I insert a file that every word is on a different line the program gives me the misspelled words.
<pre>#include <iostream>
#include <fstream>
#include <cstdlib>
#include <string>
#include <algorithm>
#include "bst.h"
using namespace std;
int main()
{
char dictionaryFile[50]; //input dictionary file
char file[50]; // input file
string misspelt; // misspelt
string wordsDictionary;
string words;
ifstream inputDictionaryFile; // file input
ifstream inputFile;
BinarySearchTree *tree = new BinarySearchTree();
/* GETTING DICTIONARY*/
cout << "Enter dictionary file name: ";
cin.getline(dictionaryFile,1000); // getting lines
inputDictionaryFile.open(dictionaryFile); //open
//if it fails to open - Error
if(!inputDictionaryFile.is_open())
{
cout << "Fail to open file" << endl;
exit(EXIT_FAILURE);
}
while(inputDictionaryFile >> wordsDictionary)
{
int i =0;
for( i = 0;wordsDictionary[i]!='\0'; i++)
{
//find upperCase letters
if(wordsDictionary[i] >= 'A' && wordsDictionary[i] <= 'Z')
{
//overwrite to lowerCase
wordsDictionary[i] = tolower(wordsDictionary[i]);
}//end of if statement
//ignore tab
if(wordsDictionary[i] == '\t')
{
wordsDictionary[i] = ' ';
}//end of if statement
//ignoring punctuation
if(wordsDictionary[i] == ',' || wordsDictionary[i] == '.' || wordsDictionary[i] == '!' || wordsDictionary[i] == '?' || wordsDictionary[i] == '"' || wordsDictionary[i] == ':' || wordsDictionary[i] ==';' || wordsDictionary[i] == '-' || wordsDictionary[i] == '/' || wordsDictionary[i] == '`' || wordsDictionary[i] == '&' || wordsDictionary[i] == '@' || wordsDictionary[i] == '^' || wordsDictionary[i] == '(' || wordsDictionary[i] == ')' || wordsDictionary[i] == '<' || wordsDictionary[i] == '>' || wordsDictionary[i] == '#' || wordsDictionary[i] == '%' || wordsDictionary[i] == '{' || wordsDictionary[i] == '}' || wordsDictionary[i] == '[' || wordsDictionary[i] == ']' || wordsDictionary[i] == '|' || wordsDictionary[i] == '+' || wordsDictionary[i] == '*')
{
wordsDictionary[i] = ' ';
}//end of is statement
//ignore if there is double space
if(wordsDictionary[i] == ' ')
{
wordsDictionary[i] = ' ';
}//end of if statement
}
tree->insert(wordsDictionary); // insert to file
}
if(tree == nullptr)
{
cout << "Empty tree" << endl;
}
/* GETTING FILE*/
cout << "Enter file name: ";
cin.getline(file,1000); // getting lines
inputFile.open(file); //open
//if it fails to open - Error
if(!inputFile.is_open())
{
cout << "Fail to open file" << endl;
exit(EXIT_FAILURE);
}
while(inputFile >> words)
{
int i =0;
for( i = 0;words[i]!='\0'; i++)
{
//find upperCase letters
if(words[i] >= 'A' && words[i] <= 'Z')
{
//overwrite to lowerCase
words[i] = tolower(words[i]);
}//end of if statement
//ignore tab
if(words[i] == '\t')
{
words[i] = ' ';
}//end of if statement
//ignoring punctuation
if(words[i] == ',' || words[i] == '.' || words[i] == '!' || words[i] == '?' || words[i] == '"' || words[i] == ':' || words[i] ==';' || words[i] == '-' || words[i] == '/' || words[i] == '`' || words[i] == '&' || words[i] == '@' || words[i] == '^' || words[i] == '(' || words[i] == ')' || words[i] == '<' || words[i] == '>' || words[i] == '#' || words[i] == '%' || words[i] == '{' || words[i] == '}' || words[i] == '[' || words[i] == ']' || words[i] == '|' || words[i] == '+' || words[i] == '*')
{
words[i] = ' ';
}//end of is statement
//ignore if there is double space
if(words[i] == ' ')
{
words[i] = ' ';
}//end of if statement
} //end of for loop
//tree->exists(words);
if(!tree->exists(words))
{
cout <<"Misspelled: " << words << endl;
}
}
delete tree;
return 0;
}
^
|
spellChecker.cpp文件
^
|
spellChecker.cpp file
<pre lang="c++"><pre>// Checks if a word is in the tree
bool BinarySearchTree::exists(std::string word) const
{
Node* node = root;
while(node != nullptr)
{
if(node->data == word)
{
return true;
}
else
{
if (word > node->data)
{
node = node->right;
}
else
{
node = node->left;
}
}
}
return false;
}
<pre>//Helper function to insert a word into the tree
void insertHelper(Node **node, std::string word)
{
//Check if nullptr. If so set new node
if(*node == nullptr)
{
//Create new node
*node = new Node;
//Set new word
(*node)-> data = word;
//Set branches to nullptr
(*node)-> left = nullptr;
(*node)->right = nullptr;
}
else // if not empty
{
if(word < (*node)->data)
insertHelper(&(*node)->left,word);
else if(word > (*node)->data)
insertHelper(&(*node)->right, word);
else
return;
}
}
// Adds a word to the tree
void BinarySearchTree::insert(std::string word)
{
insertHelper(&root, word);
}
单词输入文件
Single words input file
C
is
the
most
commonly
used
programming
language
for
writing
operating
systems
The
first
operatingg
system
written
in
C
is
Unix
Later
operating
systems
like
Linux
were
all
written
in
C
Not
only
is
C
the
language
of
operating
systems
it
is
the
precursor
and
insspiration
for
almost
all
of
the
most
popular
high
level
languages
available
today
In
fact
Perl
PHP
Python
and
Ruby
are
all
writtten
in
c
另一个是simmilar但是单词在同一行。还有标签。
我尝试过:
我尝试更改标点符号函数
And the other is simmilar but words are on same line. There are tabs as well.
What I have tried:
I tried changing my punctuation function
//ignoring punctuation
if(wordsDictionary[i] == ',' || wordsDictionary[i] == '.' || wordsDictionary[i] == '!' || wordsDictionary[i] == '?' || wordsDictionary[i] == '"' || wordsDictionary[i] == ':' || wordsDictionary[i] ==';' || wordsDictionary[i] == '-' || wordsDictionary[i] == '/' || wordsDictionary[i] == '`' || wordsDictionary[i] == '&' || wordsDictionary[i] == '@' || wordsDictionary[i] == '^' || wordsDictionary[i] == '(' || wordsDictionary[i] == ')' || wordsDictionary[i] == '<' || wordsDictionary[i] == '>' || wordsDictionary[i] == '#' || wordsDictionary[i] == '%' || wordsDictionary[i] == '{' || wordsDictionary[i] == '}' || wordsDictionary[i] == '[' || wordsDictionary[i] == ']' || wordsDictionary[i] == '|' || wordsDictionary[i] == '+' || wordsDictionary[i] == '*')
wordsDictionary[i] = ' ';
}//end of is statement
//ignore if there is double sp
到ispunct但它不起作用。不知道如何忽略标点符号而不用其他东西替换它。在不同的循环中尝试插入和现有函数。尝试使用如下方法:
to ispunct but it didn't work. Have no idea how to ignore punctuation without replacing it with something else. Tryed inserting and existing function in a different loop. Tried with a method something like this :
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main() {
string str = "this. is my string. it's here.";
transform(str.begin(), str.end(), str.begin(), [](char ch)
{
if( ispunct(ch) )
return '\0';
return ch;
});
}
推荐答案
//ignore if there is double space
if(words[i] == ' ') // ???
{
words[i] = ' ';
}//end of if statement
您不能创建多个字符的字符常量,也不能将单个字符与两个字符进行比较。我假设您正在压制或忽略编译器警告消息。
You cannot create character constants of more than one character, and you cannot compare a single character to two characters. I assume you are suppressing, or ignoring, compiler warning messages.
学习使用 debugger 来检查你的代码。
我会用另一种方式解决字符检查:只将允许字符添加到进入树中的单词。我认为这可以解决你丢失的CR和LF处理问题。
代码可以通过这种方式简化:
Learn to use the debugger to inspect your code.
I would solve the character check the other way: only add allow character to a word which comes into your tree. I think that would work around your missing CR and LF handling.
The code can get simplified in that way:
void insertHelper(Node *node, std::string word)
我会添加一个节点构造函数,它接受一个字符串和将left和right设置为NULL。
And I would add a node constructor which takes a string and sets left and right to NULL.
这篇关于从文件中查看我的BST中存在哪些单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!