从文件中提取日期和货币值(用逗号分隔) [英] Extract Date and Currency value(separated by comma) from file

查看:197
本文介绍了从文件中提取日期和货币值(用逗号分隔)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目标:



提取字符串数据,货币值,[货币类型]和日期。



文件内容:


[[1234567890,您上个月的订阅点是< RS | $ | QR |#> 5,200.33。当前月份的订阅点是< RS | 15,200.33,去年总分< RS | $ | QR |#> 5589965.26以及在游戏中丢失的点数是< RS | $ | QR |#> 11520您今年的订阅将于19-04-2013到期。 9.返回],[1234567890,您上个月的订阅点是 5,200.33。当前月份订阅点是< RS | $ | QR |#> 1 ,15,200.33,去年总积分< RS | $ | QR |#> 5589965.26在游戏中丢失的点数是< RS | $ | QR |#> 11520您今年的订阅将于19-04-2013到期。返回]]


到目前为止我所做的:

  def read_file():
fp = open('D:\\ReadData2.txt', 'rb')
content = fp.read()
data = eval(content)
l1 = [%s%x [1] for x in data]
return l1

def check_currency(l2) :
导入re
(l2 .__ len __()):
newstr2 = l2 [i]
val_currency = []
val_currency.extend(re .findall(r'([+ - ]?\d +(?: \,\d +)*?\d +(?: \.\d +)?)',newstr2))
打印列表%s%val_currency $ b $我在范围内(len(val_currency)):
val2 = val_currency [i]
remove_commas = re.compile(r',(?= \\ ($'$')
val3 = remove_commas.sub('',val2)
print val3

if __name __ ==__ main __:main()




编辑UDP
我可以提取货币值,但用-ve值的货币与日期格式(dd-mm-yyyy)冲突。在提取字符串值的同时,它也提取[。|,|]如何不读取这些字符。


$ b

check_currency输出

 >列表['5,200.33','1,15,200.33','5589965.26','11520' ,'19','-04','-2013'] 
> 5200.33
> 115200.33
> 5589965.26
> 11520
> 19
> -04
> -2013

预期输出check_currency

 >列表['5,200.33','1,15,200.33','5589965.26',' 11520'] 
> 5200.33
> 115200.33
> 5589965.26
> 11520


解决方案

我添加了这个< RS | $ | QR |#> \ s * 在正则表达式的第一部分,以
作为要匹配的货币值的前缀。



您可以将代码更改为这个:

pre $ def $ check $ curb $ l $ b $ import $ $ b $ __len __()):
newstr2 = l2 [i]
val_currency = []
val_currency.extend(re.findall(r'< RS | $ | QR |#> \ s *([+ - ]?\ d +(?: \,\ d +)*?\ d +(?: \.\d +)?)',newstr2))
#字符串并删除逗号字符
val_currency = [v.replace(',','')for v in val_currency if v]
printList%s%val_currency $
for i in范围(len(val_currency)):
val2 = val_currency [i]
remove_commas = re.compile(r',(?= \ d +)*?')
val3 = remove_commas。 sub('',val2)
print val3

输出:

 列表['5200.33','115200.33','5589965.26','11520'] 
5200.33
115200.33
5589965.26
11520


$ p $ <$ p $ <$ p $ val_currency.extend(re.findall(R'< RS | $ | QR |#>有\s *([+ - ] \d +(?: \,\d +)* \d + (?:\.\d +)?)',newstr2))
val_currency = [v.replace(',','')for v in val_currency if v]


Objective:

Extract String data, Currency value , [type of currency] and date.

Content of file:

[["1234567890","Your previous month subscription point is <RS|$|QR|#> 5,200.33.Your current month month subscription point is <RS|$|QR|#> 1,15,200.33, Last Year total point earned <RS|$|QR|#> 5589965.26 and point lost in game is <RS|$|QR|#> 11520 your this year subscription will expire on 19-04-2013. 9. Back"],["1234567890","Your previous month subscription point is <RS|$|QR|#> 5,200.33.Your current month month subscription point is <RS|$|QR|#> 1,15,200.33, Last Year total point earned <RS|$|QR|#> 5589965.26 and point lost in game is <RS|$|QR|#> 11520 your this year subscription will expire on 19-04-2013. 9. Back"]]

What I have done so far:

def read_file():
        fp = open('D:\\ReadData2.txt', 'rb')
        content = fp.read()
        data = eval(content)  
        l1 = ["%s" % x[1] for x in data]
        return l1

    def check_currency(l2):
        import re
        for i in range(l2.__len__()):
            newstr2  = l2[i]
            val_currency = []
            val_currency.extend(re.findall(r'([+-]?\d+(?:\,\d+)*?\d+(?:\.\d+)?)',newstr2))
            print " List %s " %  val_currency
            for i in range(len(val_currency)):
                val2 =  val_currency[i]
                remove_commas = re.compile(r',(?=\d+)*?')
                val3 = remove_commas.sub('', val2)
                print val3              

     if __name__=="__main__":main()

EDIT UDP I am able to extract the currency value but with the currency of -ve value are conflicting with date format(dd-mm-yyyy). And during extracting string value its also extracting [.|,|] how not to read these characters.

Ouput of check_currency:

>List ['5,200.33', '1,15,200.33', '5589965.26', '11520', '19', '-04', '-2013'] 
>5200.33
>115200.33
>5589965.26
>11520
>19
>-04
>-2013

Expected Ouput of check_currency:

>List ['5,200.33', '1,15,200.33', '5589965.26', '11520'] 
        >5200.33
        >115200.33
        >5589965.26
        >11520

解决方案

I added this <RS|$|QR|#>\s* at the first part of your regular expression so as to be used as prefix for the currency value you want to match.

You can change your code to this one:

def check_currency(l2):
import re
for i in range(l2.__len__()):
    newstr2  = l2[i]
    val_currency = []
    val_currency.extend(re.findall(r'<RS|$|QR|#>\s*([+-]?\d+(?:\,\d+)*?\d+(?:\.\d+)?)',newstr2))
    # skip empty strings and remove comma characters
    val_currency = [v.replace(',', '') for v in val_currency if v]
    print " List %s " %  val_currency$                                                            
    for i in range(len(val_currency)):
        val2 =  val_currency[i]
        remove_commas = re.compile(r',(?=\d+)*?')
        val3 = remove_commas.sub('', val2)
        print val3

Output:

List ['5200.33', '115200.33', '5589965.26', '11520']
5200.33
115200.33
5589965.26
11520

aditions in the code:

val_currency.extend(re.findall(r'<RS|$|QR|#>\s*([+-]?\d+(?:\,\d+)*?\d+(?:\.\d+)?)',newstr2))
val_currency = [v.replace(',', '') for v in val_currency if v]

这篇关于从文件中提取日期和货币值(用逗号分隔)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆