如何通过在文档和词汇中的词汇中索引来替换文档的词汇 [英] how to replace vocabulary of document by index it in vocabulary in both document and vocabulary
问题描述
我从文件
中提取词汇
,代码如下。
I extract Vocabulary
from document
with below code.
//document is a string variable
List<string> Vocabulary= Vocabulary.Union((Regex.Replace(document, "\\p{P}", " ")).
Split(' ')).ToList();
如何替换<$的每个单词c $ c> Vocabulay 在 Vocabulay
和<$ c的词汇
中有索引$ c> document 与顶级代码中的提取操作同时
例如
document =>书籍是
将值更改为0 1 1 2 3 2
或此值存储在列表< int>中变量
词汇[0] =>
将值更改为0
词汇[1] =>书
将值更改为1
词汇[2] =>是
将值更改为2
词汇[3] =>
将值更改为3
how replace every word of Vocabulay
with index it in Vocabulary
in both Vocabulay
and document
Simultaneously with extraction operation in top code
for example
document=>"the book book is are is"
change value to "0 1 1 2 3 2"
or this values store in a List<int> variable
Vocabulary[0]=>"the"
change value to 0
Vocabulary[1]=>"book"
change value to 1
Vocabulary[2]=>"is"
change value to 2
Vocabulary[3]=>"are
" change value to 3
推荐答案
有趣的问题...
这看起来像一个基于单词的文档压缩过程。 br />
但我有点困惑为什么你会替换词汇$中的所有信息c $ c>从单词字符串到相应的数字。这是单词和数字之间映射的唯一信息。没有它,您将无法来反转映射并重新创建原始字符串。 所以,基本上,输出几乎是任意的,因为没有办法重建任何有用的东西!
如果真的需要更换词汇值然后:
Interesting problem...
This looks like a word-based document "compression" process.
But I'm a little confused as to why you would replace all of the info inVocabulary
from the word strings to the corresponding number. That is the only information of the mapping between the words and the numbers. Without it, you will have no way to reverse the mapping and recreate the original string. So, essentially, the output can be almost arbitrary since there's no way to reconstruct anything useful!
If it is really required to replace the vocabulary values then:
Vocabulary = Enumerable.Range(0, Vocabulary.Count).Select(n => n.ToString()).ToList();
用于替换 document
,我可能会使用 Dictionary< string,int>
来保存word-to-number映射而不是需要扫描每个单词的词汇
列表
。
另一种选择是迭代 Vocabulary
列表,并应用 Regex.Replace
对于具有相应编号的每个单词,在整个文档
中。如果 document
可以包含与任何单词替换值相同的数字,这几乎肯定是行为不端的。此外, O(N²)的长度为文件
。
For doing the replacements in document
, I'd probably use a Dictionary<string, int>
to hold the word-to-number mapping instead of needing to scan the Vocabulary
List
at every word.
Another option would be to iterate through the Vocabulary
list, and apply a Regex.Replace
across the whole document
for each word with the corresponding number. This will almost certainly misbehave if document
can contain numbers that are the same as any of the word replacement values. Also, this is O(N²) on the length of the document
.
这篇关于如何通过在文档和词汇中的词汇中索引来替换文档的词汇的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!