如何通过LINQ直接从CSV文件中查询和提取数据 [英] How to query and extract data from CSV file directly by LINQ

查看:107
本文介绍了如何通过LINQ直接从CSV文件中查询和提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

虽然我几乎不使用LINQ,但在LINQ中却不是很好.

I hardly use LINQ and not good in LINQ though.

我开发了显示CSR和时间间隔数据的输出 像下面一样

I have developed output which display CSR wise and time interval wise data like below

+----------+--------------------+----------+----------+---------------+-------------+
| CSR Name |   Time Interval    | Incoming | OutGoing | Call Transfer | Missed Call |
+----------+--------------------+----------+----------+---------------+-------------+
| ACCTS    | 14:30:01--15:00:00 |        0 |        0 |             0 |           0 |
| ACCTS    | 15:00:01--15:30:00 |        0 |        0 |             1 |           0 |
| ACCTS    | 13:30:01--14:00:00 |        5 |        2 |             0 |           2 |
| ACCTS    | 14:00:01--14:30:00 |        1 |        0 |             0 |           0 |
| ACCTS    | 16:30:01--17:00:00 |        0 |        3 |             0 |           0 |
| ACCTS    | 17:00:01--17:30:00 |        4 |        0 |             1 |           2 |
| Christy  | 14:30:01--15:00:00 |        1 |        5 |             0 |           0 |
| Christy  | 15:00:01--15:30:00 |        2 |        7 |             1 |           0 |
| Christy  | 13:30:01--14:00:00 |        0 |       11 |             0 |           2 |
| SUZY     | 14:30:01--15:00:00 |        1 |        0 |             0 |           0 |
| SUZY     | 15:00:01--15:30:00 |        0 |        0 |             2 |           2 |
| SUZY     | 13:30:01--14:00:00 |        2 |        1 |             0 |           0 |
+----------+--------------------+----------+----------+---------------+-------------+

在这里,我提供了可带来上述输出的C#程序代码.

Here I am giving my C# program code which brings the above output.

private void FetchData(string tableName)
{
    TimeSpan tsStart, tsEnd;
    string strSql = "";
    srcTable = new DataTable();
    srcTable.TableName = "data";
    srcTable.Columns.Add("CSR Name");
    srcTable.Columns.Add("Time");
    srcTable.Columns.Add("Incoming Calls");
    srcTable.Columns.Add("Outgoing Calls");
    srcTable.Columns.Add("Calls Transfer");
    srcTable.Columns.Add("Missed Calls");
    DataRow dr = null;
    int incall = 0, outcall = 0, transfercall = 0, totmisscall = 0;
    bool flag = true;
    string StartTime = "", EndTime = "";

    string DayOfWeek = DateTime.Parse(dtVal.Value.ToString()).ToString("ddd");
    string st_Time = string.Empty, end_Time = string.Empty;

    st_Time="08:00:00";
    end_Time="17:30:00";

    // GetAllCSR function return datatable with all distinct CSR name excluding name start with VM and Voice Mail
    DataTable dtCSRName = GetAllCSR(tableName,txtCSRName.Text);
    string strCSRName = "";

    if (dtCSRName != null)
    {
        foreach (DataRow row in dtCSRName.Rows)
        {
            if (st_Time.Trim() != "" && end_Time.Trim() != "")
            {
                tsStart = new TimeSpan(DateTime.Parse(st_Time.Trim()).Hour, DateTime.Parse(st_Time.Trim()).Minute, DateTime.Parse(st_Time.Trim()).Second);
                tsEnd = new TimeSpan(DateTime.Parse(end_Time.Trim()).Hour, DateTime.Parse(end_Time.Trim()).Minute, DateTime.Parse(end_Time.Trim()).Second);
            }
            else
            {
                tsStart = new TimeSpan(09, 00, 00);
                tsEnd = new TimeSpan(17, 30, 0);
            }

    // iterate in all CSR name
            if (row["party1name"] != DBNull.Value)
            {
                strCSRName = row["party1name"].ToString();

                // iterate in all time interval like 08:00:00 to 08:30:00, 08:30:00 to 09:00:00, 09:00:00 to 09:30:00.....17:00:00 to 17:30:00
                while (tsStart <= tsEnd)
                {
                    if (!flag)
                    {
                        tsStart = new TimeSpan(tsStart.Hours, tsStart.Minutes, int.Parse("01"));
                    }
                    flag = false;

                    StartTime = tsStart.ToString();
                    tsStart = tsStart.Add(new TimeSpan(00, 30, 00));
                    EndTime = (tsStart.Hours >= 10 ? tsStart.Hours.ToString() : ("0" + tsStart.Hours.ToString())) + ":" + (tsStart.Minutes >= 10 ? tsStart.Minutes.ToString() : ("0" + tsStart.Minutes.ToString())) + ":00";

                    strSql = "select (select count(*) as incoming from " + tableName + " where direction='I' and ";
                    strSql = strSql + "CONVERT(datetime,right([Call Start],8)) >='" + StartTime + "' and ";
                    strSql = strSql + "CONVERT(datetime,right([Call Start],8)) <='" + EndTime + "' ";
                    strSql = strSql + "and Is_Internal=0 and continuation=0 and RIGHT(convert(varchar,[call duration]),8)<> '00:00:00' ";
                    strSql = strSql + "and party1name='" + strCSRName + "') as incoming, ";

                    strSql = strSql + "(select count(*) as OutGoing from " + tableName + " ";
                    strSql = strSql + "where direction='O' and ";
                    strSql = strSql + "CONVERT(datetime,right([Call Start],8)) >='" + StartTime + "' and ";
                    strSql = strSql + "CONVERT(datetime,right([Call Start],8)) <='" + EndTime + "' ";
                    strSql = strSql + "and Is_Internal=0 and continuation=0  and party1name not in ('Voice Mail') ";
                    strSql = strSql + "and party1name='" + strCSRName + "') as OutGoing, ";

                    strSql = strSql + "(select count(*) as CallTransfer from " + tableName + " ";
                    strSql = strSql + "where continuation=1  and ";
                    strSql = strSql + "CONVERT(datetime,right([Call Start],8)) >='" + StartTime + "' and ";
                    strSql = strSql + "CONVERT(datetime,right([Call Start],8)) <='" + EndTime + "' ";
                    strSql = strSql + "and RIGHT(convert(varchar,[call duration]),8)<> '00:00:00' and party1name not in ('Voice Mail') ";
                    strSql = strSql + "and party1name='" + strCSRName + "') as CallTransfer; ";

                    strSql = strSql + "SELECT count(*) as UnansweredCalls_DuringBusinessHours from ";
                    strSql = strSql + tableName + " where direction='I' and " + Environment.NewLine;
                    strSql = strSql + "CONVERT(datetime,right([Call Start],8)) >='" + StartTime + "' and ";
                    strSql = strSql + "CONVERT(datetime,right([Call Start],8)) <='" + EndTime + "' ";
                    strSql = strSql + "and RIGHT(convert(varchar,[call duration]),8)= '00:00:00' and [Ring duration]>0 " + Environment.NewLine;
                    strSql = strSql + "and party1name='" + strCSRName + "'" + Environment.NewLine;

                    if (Business.CurrentCountry.CountryCode == "US" || Business.CurrentCountry.CountryCode == "MX" || Business.CurrentCountry.CountryCode == "ES" || Business.CurrentCountry.CountryCode == "NL" || Business.CurrentCountry.CountryCode == "PL")
                    {
                        strSql = strSql + " and Park_Time=0";
                    }

                    strSql = strSql + Environment.NewLine;

                    DataSet oDS = Common.GetDataSet(strSql, "");

                    if (oDS.Tables.Count > 0)
                    {
                        if (oDS.Tables[0].Rows.Count > 0)
                        {
                            dr = srcTable.NewRow();
                            dr[0] = strCSRName;
                            dr[1] = StartTime + "--" + EndTime;
                            if (oDS.Tables[0].Rows[0]["incoming"] != DBNull.Value)
                            {
                                dr[2] = oDS.Tables[0].Rows[0]["incoming"].ToString();
                                incall = incall + int.Parse(oDS.Tables[0].Rows[0]["incoming"].ToString());
                            }
                            else
                            {
                                dr[2] = "0";
                            }

                            if (oDS.Tables[0].Rows[0]["OutGoing"] != DBNull.Value)
                            {
                                dr[3] = oDS.Tables[0].Rows[0]["OutGoing"].ToString();
                                outcall = outcall + int.Parse(oDS.Tables[0].Rows[0]["OutGoing"].ToString());
                            }
                            else
                            {
                                dr[3] = "0";
                            }

                            if (oDS.Tables[0].Rows[0]["CallTransfer"] != DBNull.Value)
                            {
                                dr[4] = oDS.Tables[0].Rows[0]["CallTransfer"].ToString();
                                transfercall = transfercall + int.Parse(oDS.Tables[0].Rows[0]["CallTransfer"].ToString());
                            }
                            else
                            {
                                dr[4] = "0";
                            }
                        }

                        if (oDS.Tables[1].Rows.Count > 0)
                        {
                            if (oDS.Tables[1].Rows[0]["UnansweredCalls_DuringBusinessHours"] != DBNull.Value)
                            {
                                dr[5] = oDS.Tables[1].Rows[0]["UnansweredCalls_DuringBusinessHours"].ToString();
                                totmisscall = totmisscall + int.Parse(oDS.Tables[1].Rows[0]["UnansweredCalls_DuringBusinessHours"].ToString());
                            }
                        }
                        srcTable.Rows.Add(dr);
                    }
                }
            }
        }
    }

    txtIncomming.Text = incall.ToString();
    txtOutGoing.Text = outcall.ToString();
    txtCallTransfer.Text = transfercall.ToString();
    txtMissCall.Text = totmisscall.ToString();
    //srcTable.DefaultView.Sort = "[CSR Name], [Time]";

    DataSet oDs = new DataSet();
    oDs.Tables.Add(srcTable);

    this.outlookGrid1.ExpandIcon = global::BBA.Properties.Resources.Collapse;
    this.outlookGrid1.CollapseIcon = global::BBA.Properties.Resources.Expand;

    if (srcTable.Rows.Count > 0)
    {
        outlookGrid1.BindData(oDs, "data");
        View = "BoundCategory";
        DataGridViewCellEventArgs evt = new DataGridViewCellEventArgs(0, -1);
        object sender = new object();
        dgResult_CellClick(sender, evt);
        outlookGrid1.ExpandAll();
    }
    else
    {
        outlookGrid1.Rows.Clear();
        MessageBox.Show("No data found");
    }
}

那我在做什么?我先读取一个CSV文件,然后将CSV文件数据插入数据库表中,然后在有条件的情况下获取这些数据并在2个循环中形成实际数据.

So what I am doing? I am reading a CSV file first and then insert CSV file data into database table and later fetch those data with condition and form the actual data in 2 loops.

上面的代码和过程运行正常,但速度较慢.有人告诉我仅使用LINQ读取csf文件数据并形成输出.尽管在LINQ中我很虚弱,所以需要您的帮助以形成正确的linq查询以获取准确的输出.

The above code and process works fine but slow. I was told to use LINQ only to read csf file data and form the output. I am weak though in LINQ so need help from you to form the right linq query to have exact output.

到目前为止,我一直以这种方式来构成LINQ查询,但老实说不能正确地安排它.这是我的linq查询,需要添加更多内容才能获得完整代码. 在这里,我粘贴了运行缓慢的linq查询,但没有引发错误.

This way I tried so far to form the LINQ query but honestly not being able to arrange it properly. Here is my linq query which need to add more to be full code. Here I am pasting my working linq query which is slow but not throwing error.

void Main()
{
    var csvlines = File.ReadAllLines(@"M:\smdr(backup08-06-2015).csv");
    var csvLinesData = csvlines.Skip(1).Select(l => l.Split(',').ToArray());

    var users = csvLinesData.Select(data => new User
    {
        CSRName = data[12],
        Incomming = csvLinesData.Count(w => w[4] == "I" 
                                        &&  w[8] == "0"
                                        &&  w[10] == "0"
                                        &&  w[1].ToString().Substring(w[1].ToString().Length-8)!="00:00:00"),

        outgoing = csvLinesData.Count(w => w[4] == "O"
                                        && w[8] == "0"
                                        && w[10] == "0"
                                        &&  w[1].ToString().Substring(w[1].ToString().Length-8)!="00:00:00"),

    })
    .Where(u => !u.CSRName.Contains("VM") && u.CSRName != "Voice Mail").OrderBy(u=> u.CSRName)
    .ToList();
    users.Dump();
}

class User
{
    public string CSRName;
    public int outgoing;
    public int Incomming;

}

因此,我要求LINQ的所有专家请看一下我的C#代码,并帮助我编写完整的linq查询,该查询带来相同的输出而无需循环访问.

So my request to all who are expert in LINQ to please have a look at my C# code and help me to compose the full fledged linq query which bring the same output without iterating in loop.

这是我的csv文件url http://s000.tinyupload.com/?file_id=22225722172003614028,任何人都可以从中下载.如果有人感兴趣,只需单击test.csv文件链接.谢谢

here is my csv file url http://s000.tinyupload.com/?file_id=22225722172003614028 from where anyone can download. just click on test.csv file link if anyone is interested. thanks

推荐答案

我个人认为Linq不会比格式良好的for循环快得多. (除非您利用PLinq),我认为更大的问题是您的数据模型可能需要不同的设计.我要做的是创建一个新类,该类仅代表一行及其重要细节.

I don't personally think that Linq is going to be significantly faster than a well formed for loop. (unless you are taking advantage of PLinq) I think the bigger issue is that your data model may require a different design. What I would do is instead create a new class that represents a row and its important details only.

class CallInformation {
    public DateTime CallStart { get; private set; }
    public Boolean IsOutGoing {get; private set; }
    public String CSRName {get; private set; }
    public int InComingCount { get; set; }
    public int OutgoingCount { get; set; }

    public CallInformation(String[] parts) {
        IsOutGoing = parts[4] == "O";
        CallStart = DateTime.Parse(parts[0]);
        CSRName = parts[12];
    }
    //... Continue with the important properties
}

现在,IO流也可以提高性能,但是为了清楚起见,我将其排除在外.排序和合并项目通常比搜索要快得多.如果可以避免繁琐的字符串比较/创建,那么通常会大大提高性能.

Now a IO Stream may also improve performance however i will exclude that for clarity. It is often much faster to sort and combine your items as you go instead of searching. If you can avoid heavy string comparison/creation you will often greatly improve performance.

在您的

Dictionary<String, CallInformation> callDictionary = new Dictionary<String,CallInformation>();
var csvLinesData = csvlines.Skip(1).Select(l => l.Split(',').ToArray());

foreach(string[] parts in csvLinesData) {
     //Then place this call into a sortedlist or Dictionary. 
     //Here i am counting up the incoming and outgoing calls. 
     if(callDictionary.containsKey(parts[12])) {
         if(parts[4] == "I") {
             callDictionary[parts[12]].InComingCount++;
         } else { 
             callDictionary[parts[12]].OutGoingCount++; 
         }
     } else {
         //Construct your new object based on this row.
         CallInformation call = new CallInformation(parts);
         callDictionary.add(call.CSRName, call);
     }
}

可以根据StartTime或其他字段对排序后的列表进行排序.决定您的输出.我个人无法围绕您要分组和处理的确切方法来解决问题.我个人认为您的表现是基于以下事实:您正在查询数据集以获取永远不会出现的信息.但是,请让我知道您如何对信息进行分组.我认为这更多地是您将如何对信息进行分组.

The sorted list can be sorted based on the StartTime or other field. the that governs your output. I am personally having trouble wrapping my head around the exact method you are grouping and processing. I personally believe your performance is based on the fact that you are querying your data set for information that will never be present. But please let me know about how you are grouping the information. I think this is more of how you are going to be grouping information.

如果您尝试根据通话持续时间来对通话进行存储,则可以使用DateTime部分来找出字典中放置它的存储桶",而不是我上面所做的CSR名称(通过调整(call.CSRName,containsKey[12]),您可以改为创建持续时间"字符串.通过使用日期时间对象,您可以基于时间差进行比较,而不是基于字符串比较.

If you are trying to bucket calls based on their time duration then you can then use the DateTime part to figure out which "bucket" in the Dictionary to place it in, instead of the CSR name i have done above (by adjusting the call.CSRName, and the containsKey[12]) you can instead create Time duration strings. By using the date time object you can make comparisons based on time differences instead of string comparisons.

要将数据从此集合中提取出来,您实际上想使用Linq来执行该操作

To pull the data out of this collection you would actually want to use Linq to perform that

 callDictionary.Where((keyValuePair) => { return keyValuePair.Value.CSRName != "VM" && keyValuePair.Value.CSRName != "VoiceMail"; }).OrderBy(o.Value.StartTime);

OrderBy子句实际上取决于您的外观,因为我们已经取消"了东西,所以我们可以按开始时间进行排序,为了获得更好的控制,您希望StartTime作为开始TimeDuration.任何1个特定项目中的任何一个,大约都是正确的.您还可以添加另一个OrderBy来按CSRName对其进行排序.

The OrderBy clause is really up to you by the looks you are Since we are "bucking" things already we can just sort by the start time, to get finer control you would want the StartTime to be the start of the TimeDuration. of any 1 particular item and it will be approximately right. You could also add another OrderBy to sort it by the CSRName as well.

这篇关于如何通过LINQ直接从CSV文件中查询和提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆