VBA 将异常字符串转换为日期 [英] VBA convert unusual string to Date
问题描述
我想从雅虎抓取数据作为练习,然后从中制作图表.我遇到了一个问题,当我抓取日期时,它们的格式很奇怪:
I wanted to scrape data from yahoo as an excercise and then make a graph from it. I encountered a problem where when I scrape the dates, they are in a rather weird format:
<代码>?10??Aug?, ?2020
字符串中的问号并不是真正的问号,它们是一些我不知道的字符,所以我不能用 Replace() 删除它们.
The question marks in the string are not realy question marks, they are some characters unknown to me, so I cannot remove them with Replace().
然后,当我尝试使用 CDate() 将其转换为日期格式时,代码因类型不匹配"而崩溃;错误.
Then, when I try to use CDate() to convert this to Date format, the code crashed on "Type mismatch" error.
我需要的是找到一种方法来找出这些字符是什么,以便使用 Replace() 删除它们,或者以某种方式将这种奇怪的格式转换为日期.或者,以某种方式改进抓取程序 - 到目前为止我一直在使用例如
What I would need is to either find a way to find out what those characters are in order to remove them with Replace(), or to somehow convert even this weird format to a Date. Alternatively, somehow improving the scraping procedure - so far I've been using for example
ie.document.getElementsByClassName("Py(10px) Ta(start) Pend(10px)")(3).innerText
获取数据 - 也可以解决这个问题.
to get the data - would also solve this problem.
如果有人想尝试抓取它,还有一个示例网址:
If anyone wanted to try to scrape it, too an example url:
https://finance.yahoo.com/quote/LAC/history?period1=1469404800&period2=1627171200&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true
我的代码示例如下:
DateString = doc.getElementsByClassName("Py(10px) Ta(start) Pend(10px)")(j).innerText
LeftDateString = Clean_NonPrintableCharacters(DateString)
Worksheets("Stock_data").Range("A2").Value = CDate(LeftDateString)
推荐答案
使用正则表达式:
Function GetDate(txt)
' set a reference to 'Microsoft VBScript Regular Expression 5.5' in Tools->References VBE menu
Dim re As New RegExp, retval(0 To 2), patterns, i, result
patterns = Array("\b\d\d\b", "\b[a-zA-Z]+\b", "\b\d{4}\b")
For i = 0 To 2
re.Pattern = patterns(i)
Set result = re.Execute(txt)
If result Is Nothing Then Exit Function 'If no day, month or year is found, GetDate() returns ""
retval(i) = result(0)
Next
GetDate = Join(retval)
End Function
Sub Usage()
For Each txt In Array("?10? ?Aug?, ?2020", "Jul 13, 2020", "2021, March?, 18?")
Debug.Print GetDate(txt)
Next
End Sub
打印:
10 Aug 2020
13 Jul 2020
18 March 2021
编辑 2
Function GetDate2(txt)
' set a reference to 'Microsoft VBScript Regular Expression 5.5' in Tools->References VBE menu
Static re As RegExp, months As Collection
Dim result
If re Is Nothing Then 'do it once
Set re = New RegExp
re.Pattern = "[^a-zA-Z0-9]"
re.Global = True
Set months = New Collection
cnt = 1
For Each m In Split("jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec", ",")
months.Add cnt, m
cnt = cnt + 1
Next
End If
result = Split(WorksheetFunction.Trim(re.Replace(txt, " ")))
For i = 0 To UBound(result)
If Not IsNumeric(result(i)) Then
result(i) = Left(LCase(result(i)), 3)
On Error Resume Next
result(i) = months(result(i))
On Error GoTo 0
End If
Next
result = Join(result)
If IsDate(result) Then GetDate2 = CDate(result)
End Function
Sub Usage2()
For Each txt In Array("?10? ?Aug?, ?2020", "Jul 13, 2020", "2021, March?, 18?", _
"01/12/2021", "04.18.2020", "15 10 20")
Debug.Print GetDate2(txt)
Next
End Sub
打印:
10.08.2020
13.07.2020
18.03.2021
01.12.2021
18.04.2020
15.10.2020
注意.dd 和 mm 的顺序可能会有所不同
Note. The order of the dd and mm may be vary
这篇关于VBA 将异常字符串转换为日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!