我想比较两个文件或文本框以找到它们之间的相似程度 [英] I want to compare two files or text boxes to find the degree of similarity between them

查看:132
本文介绍了我想比较两个文件或文本框以找到它们之间的相似程度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很抱歉的干扰。我已经删除了code和编辑的帖子...

Sorry for the disturbance. I've removed the code and edited the post...

真正的问题是我试图找出相似或剽窃行为的程度两种文本或文件之间。我该怎么办呢?如果你指导我...

Real problem is I'm trying to find out the degree of similarity or plagiarism act between two texts or files. how can I do that? If you guide me ...

我需要code代表被列入我的项目上面的算法。

I need the code for the above algorithm to be included in my project.

使用的Visual Studio 2013 C#

using visual studio 2013 ... c#

编辑:
ķ到目前为止,我已经做到了这一点......

EDITED: k so far I've done this ...

        int i = 0;
        int j = 0;
        long lena1 = txtFile1.Text.Length;
        long lenb1 = lena1;
        long len2 = txtFile2.Text.Length;
        string str1 = txtFile1.Text;
        string str2 = txtFile2.Text;
        string str3;
        bool match = false;
        int count = 0;
        int nowords1 = 0;
        int nowords2 = 0;
        string str4;
        int k = 0;
        int m = 0;
        int nowords_match = 0;


        char[] array1 = str1.ToArray();
        char[] array2 = str2.ToArray();
        int[] loc1 = new int[1048576];
        int[] loc2 = new int[1048576];

        while (i < array1.Length)
        {
            if (array1[i] == ' ')
            {
                nowords1++;
                loc1[j] = i;
                j++;
            }

            i++;

        }

        i = j = 0;

        while (i < array2.Length)
        {

            if (array2[i] == ' ')
            {
                nowords2++;
                loc2[j] = i;
                j++;
            }

            i++;

        }

        i = j = 0;
        m = 0;

        for (k = 0; k < loc1.Length-2; k++)
        {
            str3 = str1.Substring(loc1[m], loc1[m + 1] - loc1[m]);
            match = true;

            if (match == true && count > 3)
            {
               txtPlagiarism.Text += " " + loc1[i-3] + loc1[i-2] + " " + loc1[i];
            }

            else
            {
                count = 0;
                match = false;
            }

            j = 0;
            i = 0;

            while (i < nowords2)
            {

                if (j != nowords2)
                {
                    str4 = str2.Substring(loc2[j], loc2[j + 1] - (loc2[j]));
                }

                else
                {
                    break;
                }

                if (str4.Equals(str3)) 
                {
                    nowords_match++;
                    count ++;
                }

                j++;
                i++;

            }

            m++;

        }

我只是计算匹配,这样我可以挑选一些单词从first_file文本复制情况文本的字数。
但我在里面得到一个运行时错误。

I'm just counting the number of words matched so that I can pick that number of words from the first_file text to the copy-case text. but I'm getting a run-time error in it.

**System.ArgumentOutOfRangeException was unhandled
  HResult=-2146233086
  Message=Length cannot be less than zero.
Parameter name: length
  Source=mscorlib
  ParamName=length
  StackTrace:
       at System.String.InternalSubStringWithChecks(Int32 startIndex, Int32 length, Boolean fAlwaysCopy)
   at System.String.Substring(Int32 startIndex, Int32 length)
   at Calculate_File_Checksum.Form1.btnDetectPlagiairism_Click(Object sender, EventArgs e) in c:\Users\BLOOM\Documents\Visual Studio 2013\App2Test\Calculate_File_Checksum\Calculate_File_Checksum\Form1.cs:line 363
   at System.Windows.Forms.Control.OnClick(EventArgs e)
   at System.Windows.Forms.Button.OnClick(EventArgs e)
   at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
   at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
   at System.Windows.Forms.Control.WndProc(Message& m)
   at System.Windows.Forms.ButtonBase.WndProc(Message& m)
   at System.Windows.Forms.Button.WndProc(Message& m)
   at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
   at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
   at System.Windows.Forms.NativeWindow.DebuggableCallback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
   at System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG& msg)
   at System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr dwComponentID, Int32 reason, Int32 pvLoopData)
   at System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner(Int32 reason, ApplicationContext context)
   at System.Windows.Forms.Application.ThreadContext.RunMessageLoop(Int32 reason, ApplicationContext context)
   at System.Windows.Forms.Application.Run(Form mainForm)
   at Calculate_File_Checksum.Program.Main() in c:\Users\BLOOM\Documents\Visual Studio 2013\App2Test\Calculate_File_Checksum\Calculate_File_Checksum\Program.cs:line 19
   at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
   at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
   at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
   at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()
  InnerException:** 

我不明白为什么它会如此?因为我在这给正确的价值观......请帮助任何人。

I don't understand why it is going so ?? because I've given the correct values in it ... please help anyone.

推荐答案

众多的方式来比较字符串的相似性。这里有一个算法马丁放在一起的 Levenshtein距离

There are numerous ways to compare the similarity of strings. Here's an algorithm Martin put together for the Levenshtein distance

这篇关于我想比较两个文件或文本框以找到它们之间的相似程度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆