如何通过在文档和词汇中的词汇中索引来替换文档的词汇 [英] how to replace vocabulary of document by index it in vocabulary in both document and vocabulary

查看:98
本文介绍了如何通过在文档和词汇中的词汇中索引来替换文档的词汇的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从文件中提取词汇,代码如下。

I extract Vocabulary from document with below code.

//document is a string variable 
List<string> Vocabulary= Vocabulary.Union((Regex.Replace(document, "\\p{P}", " ")).
Split(' ')).ToList();



如何替换<$的每个单词c $ c> Vocabulay 在 Vocabulay 和<$ c的词汇中有索引$ c> document 与顶级代码中的提取操作同时

例如

document =>书籍是将值更改为0 1 1 2 3 2

或此值存储在列表< int>中变量

词汇[0] =>将值更改为0

词汇[1] =>书将值更改为1

词汇[2] =>是将值更改为2

词汇[3] =>将值更改为3


how replace every word of Vocabulay with index it in Vocabulary in both Vocabulay and document Simultaneously with extraction operation in top code
for example
document=>"the book book is are is" change value to "0 1 1 2 3 2"
or this values store in a List<int> variable
Vocabulary[0]=>"the" change value to 0
Vocabulary[1]=>"book" change value to 1
Vocabulary[2]=>"is" change value to 2
Vocabulary[3]=>"are" change value to 3

推荐答案

有趣的问题...



这看起来像一个基于单词的文档压缩过程。 br />


但我有点困惑为什么你会替换词汇从单词字符串到相应的数字。这是单词和数字之间映射的唯一信息。没有它,您将无法来反转映射并重新创建原始字符串。 所以,基本上,输出几乎是任意的,因为没有办法重建任何有用的东西!



如果真的需要更换词汇值然后:

Interesting problem...

This looks like a word-based document "compression" process.

But I'm a little confused as to why you would replace all of the info in Vocabulary from the word strings to the corresponding number. That is the only information of the mapping between the words and the numbers. Without it, you will have no way to reverse the mapping and recreate the original string. So, essentially, the output can be almost arbitrary since there's no way to reconstruct anything useful!

If it is really required to replace the vocabulary values then:
Vocabulary = Enumerable.Range(0, Vocabulary.Count).Select(n => n.ToString()).ToList();



用于替换 document ,我可能会使用 Dictionary< string,int> 来保存word-to-number映射而不是需要扫描每个单词的词汇 列表



另一种选择是迭代 Vocabulary 列表,并应用 Regex.Replace 对于具有相应编号的每个单词,在整个文档中。如果 document 可以包含与任何单词替换值相同的数字,这几乎肯定是行为不端的。此外, O(N²)的长度为文件


For doing the replacements in document, I'd probably use a Dictionary<string, int> to hold the word-to-number mapping instead of needing to scan the Vocabulary List at every word.

Another option would be to iterate through the Vocabulary list, and apply a Regex.Replace across the whole document for each word with the corresponding number. This will almost certainly misbehave if document can contain numbers that are the same as any of the word replacement values. Also, this is O(N²) on the length of the document.


这篇关于如何通过在文档和词汇中的词汇中索引来替换文档的词汇的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆