在 DataTable 中搜索和替换字符串列很慢? [英] Search and replace inside string column in DataTable is slow?
问题描述
我在 DataTable (.dt) 的字符串列中获取不同的单词,然后用另一个值替换唯一值,因此基本上将单词更改为其他单词.下面列出的两种方法都有效,但是,对于 90k 记录,该过程不是很快.有没有办法加快这两种方法的速度?
I am fetching distinct words in a string column of a DataTable (.dt) and then replacing the unique values with another value, so essentially changing words to other words. Both approaches listed below work, however, for 90k records, the process is not very fast. Is there a way to speed up either approach?
第一种方法,如下:
'fldNo is column number in dt
For Each Word As String In DistinctWordList
Dim myRow() As DataRow
myRow = dt.Select(MyColumnName & "='" & Word & "'")
For Each row In myRow
row(fldNo) = dicNewWords(Word)
Next
Next
第二种基于 LINQ 的方法如下,但实际上也不是很快:
A second LINQ-based approach is as follows, and is actually not very fast either:
Dim flds as new List(of String)
flds.Add(myColumnName)
For Each Word As String In DistinctWordsList
Dim rowData() As DataRow = dt.AsEnumerable().Where(Function(f) flds.Where(Function(el) f(el) IsNot DBNull.Value AndAlso f(el).ToString = Word).Count = flds.Count).ToArray
ReDim foundrecs(rowData.Count)
Cnt = 0
For Each row As DataRow In rowData
Dim Index As Integer = dt.Rows.IndexOf(row)
foundrecs(Cnt) = Index + 1 'row.RowId
Cnt += 1
Next
For i = 0 To Cnt
dt(foundrecs(i))(fldNo) = dicNewWords(Word)
Next
Next
推荐答案
所以你有你的替换字典:
So you have your dictionary of replacements:
Dim d as New Dictionary(Of String, String)
d("foo") = "bar"
d("baz") = "buf"
您可以将它们应用于表格的 ReplaceMe 列:
You can apply them to your table's ReplaceMe column:
Dim rep as String = Nothing
For Each r as DataRow In dt.Rows
If d.TryGetValue(r.Field(Of String)("ReplaceMe"), rep) Then r("ReplaceMe") = rep
Next r
在我的机器上,100 万次替换需要 340 毫秒.我可以通过使用列号而不是名称将其减少到 260 毫秒 - If d.TryGetValue(r.Field(Of String)(0), rep) Then r(0) = rep
On my machine it takes 340ms for 1 million replacements. I can cut that down to 260ms by using column number rather than name - If d.TryGetValue(r.Field(Of String)(0), rep) Then r(0) = rep
时间:
'setup, fill a dict with string replacements like "1" -> "11", "7" -> "17"
Dim d As New Dictionary(Of String, String)
For i = 0 To 9
d(i.ToString()) = (i + 10).ToString()
Next
'put a million rows in a datatable, randomly assign dictionary keys as row values
Dim dt As New DataTable
dt.Columns.Add("ReplaceMe")
Dim r As New Random()
Dim k = d.Keys.ToArray()
For i = 1 To 1000000
dt.Rows.Add(k(r.Next(k.Length)))
Next
'what range of values do we have in our dt?
Dim minToMaxBefore = dt.Rows.Cast(Of DataRow).Min(Function(ro) ro.Field(Of String)("ReplaceMe")) & " - " & dt.Rows.Cast(Of DataRow).Max(Function(ro) ro.Field(Of String)("ReplaceMe"))
'it's a crappy way to time, but it'll prove the point
Dim start = DateTime.Now
Dim rep As String = Nothing
For Each ro As DataRow In dt.Rows
If d.TryGetValue(ro.Field(Of String)("ReplaceMe"), rep) Then ro("ReplaceMe") = rep
Next
Dim ennd = DateTime.Now
'what range of values do we have now
Dim minToMaxAfter = dt.Rows.Cast(Of DataRow).Min(Function(ro) ro.Field(Of String)("ReplaceMe")) & " - " & dt.Rows.Cast(Of DataRow).Max(Function(ro) ro.Field(Of String)("ReplaceMe"))
MessageBox.Show($"min to max before of {minToMaxBefore} became {minToMaxAfter} proving replacements occurred, it took {(ennd - start).TotalMilliseconds} ms for 1 million replacements")
这篇关于在 DataTable 中搜索和替换字符串列很慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!