查找重复导致计数中的性能问题 [英] Finding Duplicate causing Performance Issue in Count

查看:56
本文介绍了查找重复导致计数中的性能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 private void CutData()
{
if(Collection.SelectedCells == null)return;
var sb = new StringBuilder();
var controlList = new List< string>();
foreach(Collection.SelectedCells中的var变量)
{
controlList.Add(variable.Column.Header.ToString());
if(controlList.Count!= controlList.Distinct()。Count())
{
sb.AppendLine(null);
controlList.Clear();
controlList.Add(variable.Column.Header.ToString());
}
switch(variable.Column.Header.ToString())
{
case" Drug":
sb.Append(((Model)variable。项)。药物).Append(QUOT; \t");
((Model)variable.Item).Drug ="" ;;
休息;
case" Dosage":
sb.Append(((Model)variable.Item).Dosage).Append(" \t");
((Model)variable.Item).Dosage ="" ;;
休息;
case" Patient" ;:
sb.Append(((Model)variable.Item).Patient).Append(" \t");
((Model)variable.Item).Patient ="" ;;
休息;
case" Date":
sb.Append(((Model)variable.Item).Date).Append(" \t");
((Model)variable.Item).Date ="" ;;
休息;
}
}
Clipboard.SetText(sb.ToString(),TextDataFormat.Text);
sb.Clear();
}




问题陈述:


< pre class ="prettyprint"style =""> controlList.Count!= controlList.Distinct()。Count()

此行在我处理时导致性能问题Lakh的数据。


字符串列表包含不超过8个项目...因为我将获得8个项目的重复值

之后我将清除列表



需要提高速度。





解决方案

对Clear的调用不会是一个性能问题,除非你经常调用它。您发布了一段代码,其中显示了一些性能数据,但您确实需要在应用的上下文中查看此数据。清除甚至不在此列表中。 Count方法
执行时间为1.7秒,而枚举值为1.3。这真的很慢。注意显示的成员信息。它显示Count在Enumerable上,这意味着它是您正在进行的Count()方法调用。对于count方法
来运行它必须执行表达式。你的表达式是一个不同的调用,它需要枚举控制列表并丢弃重复。你每次循环都这样做。 


我试图理解你的if语句的逻辑。您似乎正在寻找重复的变量标头。如果发生这种情况,我不明白的是逻辑。您擦除整个标题列表,然后重新开始。这真的是你想要的
吗?此外,你每次都在循环中这样做,可能会丢掉你以前完成的工作。除了用于重复检测之外,你甚至不使用这个控制列表,所以我质疑它的必要性。  


我的建议是重做你的重复检测逻辑,这样你就不需要列表了。完全取决于您的应用程序的工作方式,但一个简单的方法可能是简单地存储您在列表中看到的列列表,然后通过简单地每次通过调用
包含在列表中来检测重复。如果您已经看过它,请擦除列表并重新开始。不是真正的内存效率,但它会工作。另一种方法是使用bitset或其他东西,每次只重置位。 


另一种方法是将列的枚举与数据的写入分开。例如,您可以创建一个临时对象(包含您正在写出的数据的临时类型)并将其添加到列表中。每次循环都会根据当前列设置temp
对象的数据。如果您已经看过该列,则在使用一组新值时创建一个新的临时对象(并添加到列表中)。一旦你完成枚举列,那么你有一个临时对象列表和
你枚举它们来生成你需要的字符串,而不需要重复检测。


当然这都是很多工作。另一种解决方案是简单地重新评估您的流程,看看您是否已经拥有所需的数据,并且可以在没有额外温度的情况下使用它。例如,您可能枚举所选行(您的模型)和每个模型的
,然后枚举所选单元格并写出值。由于你已经拥有行(这是当前逻辑正在破坏的地方),所以你不需要临时列表和重复检测。


  private void CutData()
        {
            if (Collection.SelectedCells == null) return;
            var sb = new StringBuilder();
            var controlList = new List<string>();
            foreach (var variable in Collection.SelectedCells)
            {
                controlList.Add(variable.Column.Header.ToString());
               if (controlList.Count != controlList.Distinct().Count())
                {
                    sb.AppendLine(null);
                    controlList.Clear();
                    controlList.Add(variable.Column.Header.ToString());
                }
                switch (variable.Column.Header.ToString())
                {
                    case "Drug":
                        sb.Append(((Model)variable.Item).Drug).Append("\t");
                        ((Model)variable.Item).Drug = "";
                        break;
                    case "Dosage":
                        sb.Append(((Model)variable.Item).Dosage).Append("\t");
                        ((Model)variable.Item).Dosage = "";
                        break;
                    case "Patient":
                        sb.Append(((Model)variable.Item).Patient).Append("\t");
                        ((Model)variable.Item).Patient = "";
                        break;
                    case "Date":
                        sb.Append(((Model)variable.Item).Date).Append("\t");
                        ((Model)variable.Item).Date = "";
                        break;                  
                }
            }
            Clipboard.SetText(sb.ToString(), TextDataFormat.Text);
            sb.Clear();
        }


Problem Statement :

controlList.Count != controlList.Distinct().Count()

This Line is Causing Performance Issue when I am Dealing with Lakh's of Data.

List of string contains not more then 8 items...Because I will get the Duplicate Value with in 8 items
After that I will clear the List

Need to Increase the Speed.


解决方案

A call to Clear isn't going to be a performance issue unless you're calling it a lot. You posted a snippet of code that shows some perf numbers but you really need to look at this in the context of your app. Clear isn't even in this list. The Count method is taking 1.7 seconds to execute while the enumeration is taking 1.3. This is really slow. Notice the member info that is shown. It is showing the Count being on Enumerable which means it is the Count() method call that you're making. For the count method to run it has to execute the expression. Your expression is a Distinct call which requires enumerating the control list and throwing out the dups. You're doing this each time through the loop. 

I'm trying to understand the logic for your if statement. It appears that you're looking for duplicate variable headers. What I don't understand is the logic if that case occurs. You wipe the entire list of headers and then start over again. Is that really what you want? Furthermore you're doing this each time through the loop and potentially throwing away previous work you had done. Other than for dup detection you don't even use this control list so I question the need for it.  

My recommendation is to rework your dup detection logic so you don't need the list. Depends completely on how your app works but a simple approach might be to simply store the list of columns you've seen in the list and then detect dups by simply calling Contains on the list each time through. If you've already seen it then wipe the list and start again. Not really memory efficient but it would work. An alternative is to use a bitset or something and just reset the bits each time. 

Another alternative is to separate the enumeration of the columns from the writing of the data. You could, for example create a temp object (of a temp type containing the data you're writing out) and add it to a list. Each time through the loop set the temp object's data based upon the current column. If you've already seen that column then create a new temp object (and add to list) as you're working with a new set of values. Once you've finished enumerating the columns then you have a list of temp objects and you enumerate them to generate the string you need without the need for dup detection.

Of course this is all a lot of work. Yet another solution is to simply reevaluate your process to see if you already have the data you need and can just use it without the extra temp stuff. For example maybe you enumerate the selected rows (your model) and, for each model, then enumerate the selected cells and write out the values. Since you already have the rows (which is where your current logic is breaking stuff up) you eliminate the need for a temp list and dup detection altogether.


这篇关于查找重复导致计数中的性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆