c#中的逆文档频率(IDF)方法出错 [英] error occur in Inverse Document frequency(IDF) method in c#

查看:56
本文介绍了c#中的逆文档频率(IDF)方法出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

IDF(衡量一个词重要性的流行度量.IDF不可避免地出现在信息检索中使用的一系列启发式测量中。但是,到目前为止,IDF本身就是一种启发式方法。



mathaically IDF是



IDF(t,D)= log(文件总数/文件匹配期限);

实际上我已经开发了一个用于文档聚类的应用程序。在这里我有



一个IDF方法就像



  private   static   float  FindInverseDocumentFrequency( string  term)
{
/ / DocumentVector dv = new DocumentVector();

// 找到包含谁的术语的文档编号文档集合
int count = documentCollection.ToArray()。其中​​(s = > r.Split(s.ToUpper())。ToArray()。包含(term.ToUpper()))。Count();
/ *
*集合中文档总数与no之比的日志。包含术语
*的文档,我们也可以使用Math.Log(count /(1 + documentCollection.Count))来处理除零情况;
* /

return float )Math.Log( ( float )documentCollection.Count /( float )count);

}





此方法在程序中使用以下声明的陈述





documentCollection如



 documentCollection = collection。 DocumentList [dv.content]  as  Hashtable; 





DocumentList就像



  private  DocumentCollection docCollection =  new  DocumentCollection(){DocumentList =  new  Hashtable()}; 





s就是这样的字符串



 List< string> removeList =  new 列表< string>(){  \\ \\  \ rr  \ n     [ ]  { } ,< span class =code-string>   }; 
foreach string s in removeList)
{
distinctTerms.Remove(s);
}





r是正则表达式

  private   static 正则表达式r = 正则表达式( ([\\t {}()\,:;。\ n])); 





IDF方法有一些错误如:



 documentcollection.toarray()发生错误 喜欢 < span class =code-keyword> as  

'System。 Collections.Hashtable'不包含'ToArray'的定义,并且没有扩展方法'ToArray'接受类型'System.Collections.Hashtable'的第一个参数可以找到(你是否缺少using指令或汇编引用?)







please slove这个错误。





请帮助我。谢谢你

解决方案

不要使用非泛型类型(此处不适用的专用类型除外)。早在.NET v.2.0引入泛型时,它就已经过时了。看看你在做什么:使用的动态案例作为运算符。泛型(+经典OOP)的重点是避免它。



使用类型 System.Collections.Generic.HashSet< T> 而是使用 ToArray< T>()方法:

https://msdn.microsoft.com/en-us/library/bb359438%28v=vs.110%29。 aspx [ ^ ],

https ://msdn.microsoft.com/en-us/library/bb298736(v = vs.110).aspx [ ^ ],

https://msdn.microsoft.com/en-us/library/bb298736(v=vs.110).aspx [ ^ ]。



-SA


IDF( is a popular measure of a word's importance. The IDF invari- ably appears in a host of heuristic measures used in information retrieval. However, so far the IDF has itself been a heuristic.

mathamatically IDF is the

IDF(t,D)=log(Total Number documents/Number of Document matching term);
Actually i have develop one application for document clustering. in this i have

one IDF method like as

private static float FindInverseDocumentFrequency(string term)
       {
          // DocumentVector dv = new DocumentVector();

           //find the  no. of document that contains the term in whole document collection
           int count = documentCollection.ToArray().Where(s => r.Split(s.ToUpper()).ToArray().Contains(term.ToUpper())).Count();
           /*
            * log of the ratio of  total no of document in the collection to the no. of document containing the term
            * we can also use Math.Log(count/(1+documentCollection.Count)) to deal with divide by zero case;
            */
           return (float)Math.Log((float)documentCollection.Count / (float)count);

       }



this method use the following declared statments in program


documentCollection like as

documentCollection = collection.DocumentList[dv.content] as Hashtable;



DocumentList is like as

private DocumentCollection docCollection=  new DocumentCollection() { DocumentList = new Hashtable() };



s is the string like as

List<string> removeList = new List<string>(){"\"","\r","\n","(",")","[","]","{","}","","."," ",","};
            foreach (string s in removeList)
            {
                distinctTerms.Remove(s);
            }



r is the Regular expression

private static Regex r = new Regex("([ \\t{}()\",:;. \n])");



IDF method have some error like as:

"

 documentcollection.toarray() occur error like as

"'System.Collections.Hashtable' does not contain a definition for 'ToArray' and no extension method 'ToArray' accepting a first argument of type 'System.Collections.Hashtable' could be found (are you missing a using directive or an assembly reference?)




please slove this error.


please help me.thank u

解决方案

Don't use non-generic types (except specialized which are not applicable here). The have been rendered obsolete as early as of .NET v.2.0 when generics were introduced. Look what you are doing: using dynamic case with as operator. The whole point of generics (+ classic OOP) was to avoid it.

Use the type System.Collections.Generic.HashSet<T> instead, with its ToArray<T>() methods:
https://msdn.microsoft.com/en-us/library/bb359438%28v=vs.110%29.aspx[^],
https://msdn.microsoft.com/en-us/library/bb298736(v=vs.110).aspx[^],
https://msdn.microsoft.com/en-us/library/bb298736(v=vs.110).aspx[^].

—SA


这篇关于c#中的逆文档频率(IDF)方法出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆