字节以供人类读取,然后返回.没有数据丢失 [英] bytes to human readable, and back. without data loss

查看:97
本文介绍了字节以供人类读取,然后返回.没有数据丢失的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将包含内存使用量(以字节为单位)的字符串转换为精确的字符串,例如:1048576(1M),以供人们阅读,反之亦然.

I need to convert strings which contain the memory usage in bytes, like: 1048576 (which is 1M) into exactly that, a human-readable version, and visa-versa.

注意:我已经看过这里了: 可重用的库以获取文件大小的可读版本吗? /a>

Note: I looked here already: Reusable library to get human readable version of file size?

在这里(即使它不是python): 如何将人类可读的内存大小转换为字节?

And here (even though it isn't python): How to convert human readable memory size into bytes?

到目前为止,没有任何帮助,所以我去了其他地方.

Nothing so far helped me, so I looked elsewhere.

我在这里找到了一些可以帮助我的事情: http://goo.gl/zeJZl

I have found something that does this for me here: http://code.google.com/p/pyftpdlib/source/browse/trunk/test/bench.py?spec=svn984&r=984#137 or, for shorter URL: http://goo.gl/zeJZl

代码:

def bytes2human(n, format="%(value)i%(symbol)s"):
    """
    >>> bytes2human(10000)
    '9K'
    >>> bytes2human(100001221)
    '95M'
    """
    symbols = ('B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y')
    prefix = {}
    for i, s in enumerate(symbols[1:]):
        prefix[s] = 1 << (i+1)*10
    for symbol in reversed(symbols[1:]):
        if n >= prefix[symbol]:
            value = float(n) / prefix[symbol]
            return format % locals()
    return format % dict(symbol=symbols[0], value=n)

还有另一种转换功能(相同的网站):

And also a function for conversion the other way (same site):

def human2bytes(s):
    """
    >>> human2bytes('1M')
    1048576
    >>> human2bytes('1G')
    1073741824
    """
    symbols = ('B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y')
    letter = s[-1:].strip().upper()
    num = s[:-1]
    assert num.isdigit() and letter in symbols
    num = float(num)
    prefix = {symbols[0]:1}
    for i, s in enumerate(symbols[1:]):
        prefix[s] = 1 << (i+1)*10
    return int(num * prefix[letter])

这很不错,但是有一些信息丢失,例如:

This is great and all, but it has some information loss, example:

>>> bytes2human(10000)
'9K'
>>> human2bytes('9K')
9216

要尝试解决此问题,请更改函数bytes2human

To try to solve this, I change the formatting on the function bytes2human

进入:format="%(value).3f%(symbol)s")

哪个更好,给我这些结果:

Which is much nicer, giving me these results:

>>> bytes2human(10000)
'9.766K'

但是当我尝试使用human2bytes函数将它们转换回去时:

but when I try to convert them back with the human2bytes function:

>>> human2bytes('9.766K')

Traceback (most recent call last):
  File "<pyshell#366>", line 1, in <module>
    human2bytes('9.766K')
  File "<pyshell#359>", line 12, in human2bytes
    assert num.isdigit() and letter in symbols
AssertionError

这是因为.

所以我的问题是,如何在不丢失数据的情况下将人类可读的版本转换回字节版本?

注意:我知道小数点后三位也是一点数据丢失.但是出于这个问题的目的,让我们暂时忽略它,我可以随时将其更改为更大的值.

Note: I know that 3 decimal places is also a little bit of data loss. But for the purposes of this question, lets ignore that for now, I can always change that to something greater.

推荐答案

因此,答案比我想象的要简单得多-我提供的链接之一实际上导致了功能:

So it turns out the answer was much simpler than I thought - one of the links that I provided actually led to a much more detailed version of the function:

我可以处理的任何范围.

Which is able to deal with any scope I give it.

但是谢谢您的帮助:

此处为后代复制的代码:

The code copied here for posterity:

## {{{ http://code.activestate.com/recipes/578019/ (r15)
#!/usr/bin/env python

"""
Bytes-to-human / human-to-bytes converter.
Based on: http://goo.gl/kTQMs
Working with Python 2.x and 3.x.

Author: Giampaolo Rodola' <g.rodola [AT] gmail [DOT] com>
License: MIT
"""

# see: http://goo.gl/kTQMs
SYMBOLS = {
    'customary'     : ('B', 'K', 'M', 'G', 'T', 'P', 'E', 'Z', 'Y'),
    'customary_ext' : ('byte', 'kilo', 'mega', 'giga', 'tera', 'peta', 'exa',
                       'zetta', 'iotta'),
    'iec'           : ('Bi', 'Ki', 'Mi', 'Gi', 'Ti', 'Pi', 'Ei', 'Zi', 'Yi'),
    'iec_ext'       : ('byte', 'kibi', 'mebi', 'gibi', 'tebi', 'pebi', 'exbi',
                       'zebi', 'yobi'),
}

def bytes2human(n, format='%(value).1f %(symbol)s', symbols='customary'):
    """
    Convert n bytes into a human readable string based on format.
    symbols can be either "customary", "customary_ext", "iec" or "iec_ext",
    see: http://goo.gl/kTQMs

      >>> bytes2human(0)
      '0.0 B'
      >>> bytes2human(0.9)
      '0.0 B'
      >>> bytes2human(1)
      '1.0 B'
      >>> bytes2human(1.9)
      '1.0 B'
      >>> bytes2human(1024)
      '1.0 K'
      >>> bytes2human(1048576)
      '1.0 M'
      >>> bytes2human(1099511627776127398123789121)
      '909.5 Y'

      >>> bytes2human(9856, symbols="customary")
      '9.6 K'
      >>> bytes2human(9856, symbols="customary_ext")
      '9.6 kilo'
      >>> bytes2human(9856, symbols="iec")
      '9.6 Ki'
      >>> bytes2human(9856, symbols="iec_ext")
      '9.6 kibi'

      >>> bytes2human(10000, "%(value).1f %(symbol)s/sec")
      '9.8 K/sec'

      >>> # precision can be adjusted by playing with %f operator
      >>> bytes2human(10000, format="%(value).5f %(symbol)s")
      '9.76562 K'
    """
    n = int(n)
    if n < 0:
        raise ValueError("n < 0")
    symbols = SYMBOLS[symbols]
    prefix = {}
    for i, s in enumerate(symbols[1:]):
        prefix[s] = 1 << (i+1)*10
    for symbol in reversed(symbols[1:]):
        if n >= prefix[symbol]:
            value = float(n) / prefix[symbol]
            return format % locals()
    return format % dict(symbol=symbols[0], value=n)

def human2bytes(s):
    """
    Attempts to guess the string format based on default symbols
    set and return the corresponding bytes as an integer.
    When unable to recognize the format ValueError is raised.

      >>> human2bytes('0 B')
      0
      >>> human2bytes('1 K')
      1024
      >>> human2bytes('1 M')
      1048576
      >>> human2bytes('1 Gi')
      1073741824
      >>> human2bytes('1 tera')
      1099511627776

      >>> human2bytes('0.5kilo')
      512
      >>> human2bytes('0.1  byte')
      0
      >>> human2bytes('1 k')  # k is an alias for K
      1024
      >>> human2bytes('12 foo')
      Traceback (most recent call last):
          ...
      ValueError: can't interpret '12 foo'
    """
    init = s
    num = ""
    while s and s[0:1].isdigit() or s[0:1] == '.':
        num += s[0]
        s = s[1:]
    num = float(num)
    letter = s.strip()
    for name, sset in SYMBOLS.items():
        if letter in sset:
            break
    else:
        if letter == 'k':
            # treat 'k' as an alias for 'K' as per: http://goo.gl/kTQMs
            sset = SYMBOLS['customary']
            letter = letter.upper()
        else:
            raise ValueError("can't interpret %r" % init)
    prefix = {sset[0]:1}
    for i, s in enumerate(sset[1:]):
        prefix[s] = 1 << (i+1)*10
    return int(num * prefix[letter])


if __name__ == "__main__":
    import doctest
    doctest.testmod()
## end of http://code.activestate.com/recipes/578019/ }}}

这篇关于字节以供人类读取,然后返回.没有数据丢失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆