在 DataTable 中搜索和替换字符串列很慢? [英] Search and replace inside string column in DataTable is slow?

查看:28
本文介绍了在 DataTable 中搜索和替换字符串列很慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 DataTable (.dt) 的字符串列中获取不同的单词,然后用另一个值替换唯一值,因此基本上将单词更改为其他单词.下面列出的两种方法都有效,但是,对于 90k 记录,该过程不是很快.有没有办法加快这两种方法的速度?

I am fetching distinct words in a string column of a DataTable (.dt) and then replacing the unique values with another value, so essentially changing words to other words. Both approaches listed below work, however, for 90k records, the process is not very fast. Is there a way to speed up either approach?

第一种方法,如下:

   'fldNo is column number in dt
   For Each Word As String In DistinctWordList
      Dim myRow() As DataRow
      myRow = dt.Select(MyColumnName & "='" & Word & "'")
      For Each row In myRow
         row(fldNo) = dicNewWords(Word)
      Next
   Next

第二种基于 LINQ 的方法如下,但实际上也不是很快:

A second LINQ-based approach is as follows, and is actually not very fast either:

   Dim flds as new List(of String)
   flds.Add(myColumnName)
   For Each Word As String In DistinctWordsList
     Dim rowData() As DataRow = dt.AsEnumerable().Where(Function(f) flds.Where(Function(el) f(el) IsNot DBNull.Value AndAlso f(el).ToString = Word).Count = flds.Count).ToArray
     ReDim foundrecs(rowData.Count)
     Cnt = 0
     For Each row As DataRow In rowData
       Dim Index As Integer = dt.Rows.IndexOf(row)
       foundrecs(Cnt) = Index + 1 'row.RowId
       Cnt += 1
     Next
     For i = 0 To Cnt
       dt(foundrecs(i))(fldNo) = dicNewWords(Word)
     Next 
   Next

推荐答案

所以你有你的替换字典:

So you have your dictionary of replacements:

Dim d as New Dictionary(Of String, String)
d("foo") = "bar"
d("baz") = "buf"

您可以将它们应用于表格的 ReplaceMe 列:

You can apply them to your table's ReplaceMe column:

Dim rep as String = Nothing
For Each r as DataRow In dt.Rows
  If d.TryGetValue(r.Field(Of String)("ReplaceMe"), rep) Then r("ReplaceMe") = rep 
Next r

在我的机器上,100 万次替换需要 340 毫秒.我可以通过使用列号而不是名称将其减少到 260 毫秒 - If d.TryGetValue(r.Field(Of String)(0), rep) Then r(0) = rep

On my machine it takes 340ms for 1 million replacements. I can cut that down to 260ms by using column number rather than name - If d.TryGetValue(r.Field(Of String)(0), rep) Then r(0) = rep

时间:

    'setup, fill a dict with string replacements like "1" -> "11", "7" -> "17"
    Dim d As New Dictionary(Of String, String)
    For i = 0 To 9
        d(i.ToString()) = (i + 10).ToString()
    Next

    'put a million rows in a datatable, randomly assign dictionary keys as row values
    Dim dt As New DataTable
    dt.Columns.Add("ReplaceMe")
    Dim r As New Random()
    Dim k = d.Keys.ToArray()
    For i = 1 To 1000000
        dt.Rows.Add(k(r.Next(k.Length)))
    Next

    'what range of values do we have in our dt?
    Dim minToMaxBefore = dt.Rows.Cast(Of DataRow).Min(Function(ro) ro.Field(Of String)("ReplaceMe")) & " - " & dt.Rows.Cast(Of DataRow).Max(Function(ro) ro.Field(Of String)("ReplaceMe"))

    'it's a crappy way to time, but it'll prove the point
    Dim start = DateTime.Now

    Dim rep As String = Nothing
    For Each ro As DataRow In dt.Rows
        If d.TryGetValue(ro.Field(Of String)("ReplaceMe"), rep) Then ro("ReplaceMe") = rep
    Next

    Dim ennd = DateTime.Now

    'what range of values do we have now
    Dim minToMaxAfter = dt.Rows.Cast(Of DataRow).Min(Function(ro) ro.Field(Of String)("ReplaceMe")) & " - " & dt.Rows.Cast(Of DataRow).Max(Function(ro) ro.Field(Of String)("ReplaceMe"))


    MessageBox.Show($"min to max before of {minToMaxBefore} became {minToMaxAfter} proving replacements occurred, it took {(ennd - start).TotalMilliseconds} ms for 1 million replacements")

这篇关于在 DataTable 中搜索和替换字符串列很慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆