修复PDF信用声明中的粘贴文本 [英] Fix pasted text from PDF credit statement

查看:110
本文介绍了修复PDF信用声明中的粘贴文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个从信用卡对帐单复制的文本文件

示例行1:

August 18       August 18       Balance Conversion      :02/06          4,671.30

第2行示例:

August 1        August 2        *Php-Anytiefit-Ezypay   Kuala Lumpur    2,300.00

我将其从PDF文件复制到MS Excel文件中.我将在下面获得带有双倍空格的文本,并且每行都被粘贴到一个单元格中,如下所示.

我尝试使用文本函数=RIGHT(B73,LEN(B73)-E73+2)和数组=MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},B73&"0123456789"))等.从研究中获得的数组,但是我仍然会调整公式,因为月份的字符数每个月以及一个或两位数的天都会发生变化.

该数量使用逗号和点分隔符在两位小数位置恒定.除非有分期付款的生产线,例如01/24,这个二十四分之一"将在金额2,916.25之前出现,例如0 1 / 2 4 2 , 9 1 6 . 2 5

我正在寻找使用VBA解决方案或函数来修复粘贴的值.

自定义1 8自定义1 8功率玛格伦特-G b 3:0 1/2 4 2,9 1 6. 2 5
奥古斯特1 8奥古斯特1 8 B a l a n c e C o n v e r s i o n:0 2/0 6 4,6 7 1. 3 0
A u g u t 1 A u g u t 2 * P h p-A n y t i m e f i t-E z y a p ay a u u a l a u m p u r 2,3 0 0. 0 0
奥古斯特1 3奥古斯特1 5斯塔布科克斯科恩格斯Q C 2 7 5. 0 0

解决方案

这是一些测试代码,通过通过msWord运行pdf文件的内容将其导入excel.

Sub pdf2excel()

    ' import pdf file text into excel, using msWord as a proxy

    ' set reference to microsoft word object library

    Dim wdApp As Word.Application
    Set wdApp = New Word.Application

    Dim file As String
    file = "C:\statements\statement.pdf"

    Dim wdDoc As Word.Document
    Set wdDoc = wdApp.Documents.Open( _
                    Filename:=file, ConfirmConversions:=False, _
                    ReadOnly:=True, AddToRecentFiles:=False, _
                    PasswordDocument:="", PasswordTemplate:="", Revert:=False, _
                    WritePasswordDocument:="", WritePasswordTemplate:="", _
                    Format:=wdOpenFormatAuto, XMLTransform:="")

'   wdApp.Visible = false                   ' can make msWord visible if you want ... would help in determining location of data

    Dim cel As Range
    Set cel = Range("d2")                   ' put paragraph text in column D

    Dim prgf As Paragraph
    For Each prgf In wdDoc.Paragraphs
        cel = prgf.Range.Text               ' put paragraph into worksheet cell
        Set cel = cel.offset(1)             ' point to next cell down
    Next prgf

    Set cel = Range("b2")                   ' put word text in column D

    Dim wrd As Word.Range
    For Each wrd In wdDoc.Words
        cel = wrd.Text
        Set cel = cel.offset(1)
    Next wrd

    wdDoc.Close False
    Set wdDoc = Nothing

    wdApp.Quit
    Set wdApp = Nothing

End Sub

I have this text file that I copied from a credit card statement

Sample line 1:

August 18       August 18       Balance Conversion      :02/06          4,671.30

Sample line 2:

August 1        August 2        *Php-Anytiefit-Ezypay   Kuala Lumpur    2,300.00

I copy it from a PDF file into an MS Excel file. I would get the text below with double spaces and each line is just pasted into one cell like below.

I tried using text functions =RIGHT(B73,LEN(B73)-E73+2) and array =MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},B73&"0123456789")) etc. The array I got from research but I would still tweak the formula since the month character number changes every month, and the single or double digit day.

The amount is constant at two decimals places, using a comma and dot separator. Unless there is an installment line e.g. 01/24, this "one of twenty-four" will come before the amount 2,916.25 like 0 1 / 2 4 2 , 9 1 6 . 2 5

I'm looking to use a VBA solution or function to fix the pasted values.

A u g u s t 1 8 A u g u s t 1 8 P o w e r M a c C e n t e r - G b 3 : 0 1 / 2 4 2 , 9 1 6 . 2 5
A u g u s t 1 8 A u g u s t 1 8 B a l a n c e C o n v e r s i o n : 0 2 / 0 6 4 , 6 7 1 . 3 0
A u g u s t 1 A u g u s t 2 * P h p - A n y t i m e f i t - E z y p a y K u a l a L u m p u r 2 , 3 0 0 . 0 0
A u g u s t 1 3 A u g u s t 1 5 S t a r b u c k s C o n g r e s s Q c 2 7 5 . 0 0

解决方案

this is some test code that imports the content of a pdf file into excel by running it through msWord

Sub pdf2excel()

    ' import pdf file text into excel, using msWord as a proxy

    ' set reference to microsoft word object library

    Dim wdApp As Word.Application
    Set wdApp = New Word.Application

    Dim file As String
    file = "C:\statements\statement.pdf"

    Dim wdDoc As Word.Document
    Set wdDoc = wdApp.Documents.Open( _
                    Filename:=file, ConfirmConversions:=False, _
                    ReadOnly:=True, AddToRecentFiles:=False, _
                    PasswordDocument:="", PasswordTemplate:="", Revert:=False, _
                    WritePasswordDocument:="", WritePasswordTemplate:="", _
                    Format:=wdOpenFormatAuto, XMLTransform:="")

'   wdApp.Visible = false                   ' can make msWord visible if you want ... would help in determining location of data

    Dim cel As Range
    Set cel = Range("d2")                   ' put paragraph text in column D

    Dim prgf As Paragraph
    For Each prgf In wdDoc.Paragraphs
        cel = prgf.Range.Text               ' put paragraph into worksheet cell
        Set cel = cel.offset(1)             ' point to next cell down
    Next prgf

    Set cel = Range("b2")                   ' put word text in column D

    Dim wrd As Word.Range
    For Each wrd In wdDoc.Words
        cel = wrd.Text
        Set cel = cel.offset(1)
    Next wrd

    wdDoc.Close False
    Set wdDoc = Nothing

    wdApp.Quit
    Set wdApp = Nothing

End Sub

这篇关于修复PDF信用声明中的粘贴文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆