从文件中提取日期和货币值（用逗号分隔） [英] Extract Date and Currency value(separated by comma) from file

查看：197 发布时间：2017/11/4 22:07:31 python regex file-io

本文介绍了从文件中提取日期和货币值（用逗号分隔）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

目标：

提取字符串数据，货币值，[货币类型]和日期。

文件内容：

[[1234567890，您上个月的订阅点是< RS | $ | QR |＃> 5,200.33。当前月份的订阅点是< RS | 15,200.33，去年总分< RS | $ | QR |＃> 5589965.26以及在游戏中丢失的点数是< RS | $ | QR |＃> 11520您今年的订阅将于19-04-2013到期。 9.返回]，[1234567890，您上个月的订阅点是 5,200.33。当前月份订阅点是< RS | $ | QR |＃> 1 ，15,200.33，去年总积分< RS | $ | QR |＃> 5589965.26在游戏中丢失的点数是< RS | $ | QR |＃> 11520您今年的订阅将于19-04-2013到期。返回]]

到目前为止我所做的：

  def read_file（）：
 fp = open（'D：\\ReadData2.txt'， 'rb'）
 content = fp.read（） 
 data = eval（content）
 l1 = [％s％x [1] for x in data] 
 return l1 
 
 def check_currency（l2） ：
导入re 
（l2 .__ len __（））：
 newstr2 = l2 [i] 
 val_currency = [] 
 val_currency.extend（re .findall（r'（[+  - ]？\d +（?: \，\d +）*？\d +（?: \.\d +）？）'，newstr2））
打印列表％s％val_currency $ b $我在范围内（len（val_currency））：
 val2 = val_currency [i] 
 remove_commas = re.compile（r'，（？= \\ （$'$'）
 val3 = remove_commas.sub（''，val2）
 print val3 
 
 if __name __ ==__ main __：main（）

编辑UDP
我可以提取货币值，但用-ve值的货币与日期格式（dd-mm-yyyy）冲突。在提取字符串值的同时，它也提取[。|，|]如何不读取这些字符。

$ b

check_currency输出

 >列表['5,200.33'，'1,15,200.33'，'5589965.26'，'11520' ，'19'，'-04'，'-2013'] 
> 5200.33 
> 115200.33 
> 5589965.26 
> 11520 
> 19 
> -04 
> -2013

预期输出check_currency ：

 >列表['5,200.33'，'1,15,200.33'，'5589965.26'，' 11520'] 
> 5200.33 
> 115200.33 
> 5589965.26 
> 11520

解决方案

我添加了这个< RS | $ | QR |＃> \ s * 在正则表达式的第一部分，以
作为要匹配的货币值的前缀。

您可以将代码更改为这个：

pre $ def $ check $ curb $ l $ b $ import $ $ b $ __len __（））：
newstr2 = l2 [i]
val_currency = []
val_currency.extend（re.findall（r'< RS | $ | QR |＃> \ s *（[+ - ]？\ d +（?: \，\ d +）*？\ d +（?: \.\d +）？）'，newstr2））
＃字符串并删除逗号字符
val_currency = [v.replace（'，'，''）for v in val_currency if v]
printList％s％val_currency $
for i in范围（len（val_currency））：
val2 = val_currency [i]
remove_commas = re.compile（r'，（？= \ d +）*？'）
val3 = remove_commas。 sub（''，val2）
print val3

输出：

列表['5200.33'，'115200.33'，'5589965.26'，'11520'] 5200.33 115200.33 5589965.26 11520

$ p $ <$ p $ <$ p $ val_currency.extend（re.findall（R'< RS | $ | QR |＃>有\s *（[+ - ] \d +（?: \，\d +）* \d + （？：\.\d +）？）'，newstr2）） val_currency = [v.replace（'，'，''）for v in val_currency if v]

Objective:

Extract String data, Currency value , [type of currency] and date.

Content of file:

[["1234567890","Your previous month subscription point is <RS|$|QR|#> 5,200.33.Your current month month subscription point is <RS|$|QR|#> 1,15,200.33, Last Year total point earned <RS|$|QR|#> 5589965.26 and point lost in game is <RS|$|QR|#> 11520 your this year subscription will expire on 19-04-2013. 9. Back"],["1234567890","Your previous month subscription point is <RS|$|QR|#> 5,200.33.Your current month month subscription point is <RS|$|QR|#> 1,15,200.33, Last Year total point earned <RS|$|QR|#> 5589965.26 and point lost in game is <RS|$|QR|#> 11520 your this year subscription will expire on 19-04-2013. 9. Back"]]

What I have done so far:
def read_file(): fp = open('D:\\ReadData2.txt', 'rb') content = fp.read() data = eval(content) l1 = ["%s" % x[1] for x in data] return l1 def check_currency(l2): import re for i in range(l2.__len__()): newstr2 = l2[i] val_currency = [] val_currency.extend(re.findall(r'([+-]?\d+(?:\,\d+)*?\d+(?:\.\d+)?)',newstr2)) print " List %s " % val_currency for i in range(len(val_currency)): val2 = val_currency[i] remove_commas = re.compile(r',(?=\d+)*?') val3 = remove_commas.sub('', val2) print val3 if __name__=="__main__":main()

EDIT UDP I am able to extract the currency value but with the currency of -ve value are conflicting with date format(dd-mm-yyyy). And during extracting string value its also extracting [.|,|] how not to read these characters.

Ouput of check_currency:
>List ['5,200.33', '1,15,200.33', '5589965.26', '11520', '19', '-04', '-2013'] >5200.33 >115200.33 >5589965.26 >11520 >19 >-04 >-2013
Expected Ouput of check_currency:
>List ['5,200.33', '1,15,200.33', '5589965.26', '11520'] >5200.33 >115200.33 >5589965.26 >11520

解决方案
I added this <RS|$|QR|#>\s* at the first part of your regular expression so as to be used as prefix for the currency value you want to match.

You can change your code to this one:
def check_currency(l2): import re for i in range(l2.__len__()): newstr2 = l2[i] val_currency = [] val_currency.extend(re.findall(r'<RS|$|QR|#>\s*([+-]?\d+(?:\,\d+)*?\d+(?:\.\d+)?)',newstr2)) # skip empty strings and remove comma characters val_currency = [v.replace(',', '') for v in val_currency if v] print " List %s " % val_currency$ for i in range(len(val_currency)): val2 = val_currency[i] remove_commas = re.compile(r',(?=\d+)*?') val3 = remove_commas.sub('', val2) print val3
Output:
List ['5200.33', '115200.33', '5589965.26', '11520'] 5200.33 115200.33 5589965.26 11520
aditions in the code:
val_currency.extend(re.findall(r'<RS|$|QR|#>\s*([+-]?\d+(?:\,\d+)*?\d+(?:\.\d+)?)',newstr2)) val_currency = [v.replace(',', '') for v in val_currency if v]

这篇关于从文件中提取日期和货币值（用逗号分隔）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从文件中提取日期和货币值（用逗号分隔） [英] Extract Date and Currency value(separated by comma) from file

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

从文件中提取日期和货币值（用逗号分隔） [英] Extract Date and Currency value(separated by comma) from file

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭