VBA 将异常字符串转换为日期 [英] VBA convert unusual string to Date

查看:76
本文介绍了VBA 将异常字符串转换为日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从雅虎抓取数据作为练习,然后从中制作图表.我遇到了一个问题,当我抓取日期时,它们的格式很奇怪:

I wanted to scrape data from yahoo as an excercise and then make a graph from it. I encountered a problem where when I scrape the dates, they are in a rather weird format:

<代码>?10??Aug?, ?2020

字符串中的问号并不是真正的问号,它们是一些我不知道的字符,所以我不能用 Replace() 删除它们.

The question marks in the string are not realy question marks, they are some characters unknown to me, so I cannot remove them with Replace().

然后,当我尝试使用 CDate() 将其转换为日期格式时,代码因类型不匹配"而崩溃;错误.

Then, when I try to use CDate() to convert this to Date format, the code crashed on "Type mismatch" error.

我需要的是找到一种方法来找出这些字符是什么,以便使用 Replace() 删除它们,或者以某种方式将这种奇怪的格式转换为日期.或者,以某种方式改进抓取程序 - 到目前为止我一直在使用例如

What I would need is to either find a way to find out what those characters are in order to remove them with Replace(), or to somehow convert even this weird format to a Date. Alternatively, somehow improving the scraping procedure - so far I've been using for example

ie.document.getElementsByClassName("Py(10px) Ta(start) Pend(10px)")(3).innerText

获取数据 - 也可以解决这个问题.

to get the data - would also solve this problem.

如果有人想尝试抓取它,还有一个示例网址:

If anyone wanted to try to scrape it, too an example url:

https://finance.yahoo.com/quote/LAC/history?period1=1469404800&period2=1627171200&interval=1d&filter=history&frequency=1d&includeAdjustedClose=true

我的代码示例如下:

DateString = doc.getElementsByClassName("Py(10px) Ta(start) Pend(10px)")(j).innerText
LeftDateString = Clean_NonPrintableCharacters(DateString)
Worksheets("Stock_data").Range("A2").Value = CDate(LeftDateString)

推荐答案

使用正则表达式:

Function GetDate(txt)
    ' set a reference to 'Microsoft VBScript Regular Expression 5.5' in Tools->References VBE menu
    Dim re As New RegExp, retval(0 To 2), patterns, i, result
    patterns = Array("\b\d\d\b", "\b[a-zA-Z]+\b", "\b\d{4}\b")
    For i = 0 To 2
        re.Pattern = patterns(i)
        Set result = re.Execute(txt)
        If result Is Nothing Then Exit Function 'If no day, month or year is found, GetDate() returns ""
        retval(i) = result(0)
    Next
    
    GetDate = Join(retval)
End Function

Sub Usage()
    For Each txt In Array("?10? ?Aug?, ?2020", "Jul 13, 2020", "2021, March?, 18?")
        Debug.Print GetDate(txt)
    Next
End Sub

打印:

10 Aug 2020
13 Jul 2020
18 March 2021

编辑 2

Function GetDate2(txt)
    ' set a reference to 'Microsoft VBScript Regular Expression 5.5' in Tools->References VBE menu
    Static re As RegExp, months As Collection
    Dim result
    
    If re Is Nothing Then   'do it once
        Set re = New RegExp
        re.Pattern = "[^a-zA-Z0-9]"
        re.Global = True
        Set months = New Collection
        cnt = 1
        For Each m In Split("jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec", ",")
            months.Add cnt, m
            cnt = cnt + 1
        Next
    End If
    
    result = Split(WorksheetFunction.Trim(re.Replace(txt, " ")))
    For i = 0 To UBound(result)
        If Not IsNumeric(result(i)) Then
            result(i) = Left(LCase(result(i)), 3)
            On Error Resume Next
            result(i) = months(result(i))
            On Error GoTo 0
        End If
    Next
    result = Join(result)
    If IsDate(result) Then GetDate2 = CDate(result)
End Function

Sub Usage2()
    For Each txt In Array("?10? ?Aug?, ?2020", "Jul 13, 2020", "2021, March?, 18?", _
                          "01/12/2021", "04.18.2020", "15 10 20")
        Debug.Print GetDate2(txt)
    Next
End Sub

打印:

10.08.2020 
13.07.2020 
18.03.2021 
01.12.2021 
18.04.2020 
15.10.2020 

注意.dd 和 mm 的顺序可能会有所不同

Note. The order of the dd and mm may be vary

这篇关于VBA 将异常字符串转换为日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆