使用正则表达式逗号分隔南亚编号系统中的大量数字 [英] Using regular expression to comma separate a large number in south asian numbering system

查看:97
本文介绍了使用正则表达式逗号分隔南亚编号系统中的大量数字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据南亚数字编号系统找到一个正则表达式以逗号分隔大量数字

一些示例:


  • 1,000,000 (阿拉伯语)为 10,00,000 (印度/印度/南亚)

  • 1,000,000,000 (阿拉伯语)为 100,00,00,000 (印度/ H / SA)。

  • 1,000,000 (Arabic) is 10,00,000 (Indian/Hindu/South Asian)
  • 1,000,000,000 (Arabic) is 100,00,00,000 (Indian/H/SA).

逗号模式每7位重复一次。例如,
1,00,00,000,00,00,000

The comma pattern repeats for every 7 digits. For example, 1,00,00,000,00,00,000.

来自Mastering Regular书Friedl的表达式,对于阿拉伯数字系统,我有以下正则表达式:

From the book Mastering Regular Expressions by Friedl , I have the following regular expression for Arabic numbering system:

r'(?<=\d)(?=(\d{3})+(?!\d))'

对于印度编号系统,我想出了以下表达式,但不适用于8位以上的数字:

For Indian numbering system, I have come up with the following expression but it doesn't work for numbers with more than 8 digits:

r'(?<=\d)(?=(((\d{2}){0,2}\d{3})(?=\b)))'

使用上述模式,我得到 100000000,00,00,000

Using the above pattern, I get 100000000,00,00,000.

我正在使用Python re 模块( re.sub())。有什么想法吗?

I am using the Python re module (re.sub()). Any ideas?

推荐答案

尝试一下:

(?<=\d)(?=(\d{2}){0,2}\d{3}(\d{7})*(?!\d))

例如:

>>> import re
>>> inp = ["1" + "0"*i for i in range(20)]
>>> [re.sub(r"(?<=\d)(?=(\d{2}){0,2}\d{3}(\d{7})*(?!\d))", ",", i) 
     for i in inp]
['1', '10', '100', '1,000', '10,000', '1,00,000', '10,00,000', '1,00,00,000', 
 '10,00,00,000', '100,00,00,000', '1,000,00,00,000', '10,000,00,00,000', 
 '1,00,000,00,00,000', '10,00,000,00,00,000', '1,00,00,000,00,00,000', 
 '10,00,00,000,00,00,000', '100,00,00,000,00,00,000', 
 '1,000,00,00,000,00,00,000', '10,000,00,00,000,00,00,000',
 '1,00,000,00,00,000,00,00,000']

作为正则表达式:

result = re.sub(
    r"""(?x)       # Enable verbose mode (comments)
    (?<=\d)        # Assert that we're not at the start of the number.
    (?=            # Assert that it's possible to match:
     (\d{2}){0,2}  # 0, 2 or 4 digits,
     \d{3}         # followed by 3 digits,
     (\d{7})*      # followed by 0, 7, 14, 21 ... digits,
     (?!\d)        # and no more digits after that.
    )              # End of lookahead assertion.""", 
    ",", subject)

这篇关于使用正则表达式逗号分隔南亚编号系统中的大量数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆