Twitter图像编码挑战 [英] Twitter image encoding challenge

查看:196
本文介绍了Twitter图像编码挑战的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



如果图片的价值为1000字,强>:这是它的人!赏金截止日期是在这里,经过一些艰难的审议,我已决定支持的任何栅格格式都算合理。

  • 您的程序必须输出可在140个或更少的Unicode代码点中表示的消息;在 U + 0000 - U + 10FFFF 范围内的140个代码点> U + FFFE , U + FFFF U + n FFFE U + n FFFF 其中 n 1 - 10 十六进制,范围 U + FDD0 - U + FDEF )和代理码点( U + D800 - U + DFFF )。它可以以您选择的任何合理编码输出;






    我提供 500代表(加上StackOverflow所用的50)对于我喜欢的解决方案,最好的,基于上述标准。



    截止日期



    这个比赛将一直运行,直到奖金用完,大约下午6点在星期六,5月30日。我不能说精确的时间,它将结束;它可以是从5至7 PM的任何地方。我会保证,我会看看下午2点提交的所有条目,我会尽我所能看看下午4点提交的所有条目;如果在此之后提交解决方案,我可能没有机会给他们公平的看,我必须作出我的决定。此外,您提交的时间越早,您投票的机会越多,可以帮助我选择最佳解决方案,因此请在截止日期前尝试提交,而不是提交。



    Unicode注释



    对于允许的Unicode字符,还有一些混淆。可能的Unicode代码点的范围是 U + 0000 U + 10FFFF 。有一些代码点永远不能在任何开放的数据交换中用作Unicode字符;这些是非字符代理代码点。非字符在 Unidode标准5.1.0第16.7节中定义为值 U + FFFE U + FFFF U + em> n FFFE U + FFFF 其中 n 1 - 10 范围 U + FDD0 - U + FDEF 。这些值旨在用于特定于应用程序的内部使用,并且符合应用程序可以从由它们处理的文本中剥离这些字符。代理代码点,在 Unicode标准5.1.0第3.8节中定义作为 U + D800 - U + DFFF 用于UTF-16中的基本多语言平面之外的字符编码;因此,不可能将这些代码点直接表示在UTF-16编码中,并且在任何其他编码中对它们进行编码是无效的。因此,为了这个比赛的目的,我将允许任何编码图像的程序,从范围 U + 0000 - <$的序列不超过140个Unicode代码点c $ c> U + 10FFFF ,不包括上面定义的所有非字符和代理对。



    解决方案,只使用分配的字符,甚至更好的使用聪明的子集的指定字符或做一些有趣的字符集,他们使用的东西。有关已分配字符的列表,请参阅 Unicode字符数据库;请注意,某些字符是直接列出的,而一些只列出作为范围的开始和结束。还要注意,代理代码点在数据库中列出,但是如上所述被禁止。如果您想利用字符的某些属性来使您输出的文本更有趣,则可以使用品种可用的字符信息数据库,例如命名代码块列表各种字符属性



    因为Twitter没有指定他们支持的确切字符集,所以我会宽松的解决方案,实际上不能与Twitter一起工作,因为某些字符计数额外或某些字符被剥离。优选但不是必需的是,所有编码的输出应当能够通过Twitter或另一微博服务被无损地传送,例如 identi.ca 。我看到一些文档,说明Twitter实体编码<>和&,因此计数那些分别为4,4和5个字符,但我没有自己测试,他们的JavaScript字符计数器看起来像这样计数。



    提示&链接




    • 规则中有效的Unicode字符的定义有点复杂。

    • 您可以使用现有的图片库,例如 ImageMagick Python图像库,用于图像处理。

    • 如果您需要帮助了解Unicode字符集及其各种编码,请参阅此快速指南在Linux和Unix上的UTF-8的详细常见问题

    • 您得到解决方案的时间越早,其他人投票)必须看看它。如果您改进它,您可以编辑您的解决方案;

    • 如果你想要一个简单的图像格式来解析和书写(而不想要的话)只需使用现有的格式),我建议使用 PPM格式。这是一种非常易于使用的基于文本的格式,您可以使用 ImageMagick 进行转换。



    编辑日志



    5月30日:赏金截止日期,

    5月28日:截止日期的更精确的详细信息

    5月28日:清理链接,提及PPM

    5月27日:重组和清理规则和指南;添加的提示和链接

    5月26日:提及的截止期限问题并删除了100行的软上限

    5月22日:添加了参考图像,关于Unicode的说明以及粗略的评分标准>
    5月21日:更新为允许140个任意Unicode字符。虽然Twitter网络表单按UTF-16代码单位计算,但该API允许140个任意Unicode字符,因此我们将使用该定义来允许更大的灵活性。

    5月21日:添加截止时间信息

    5月21日:发布的原始挑战

    解决方案

    好了,这是我的: nanocrunch.cpp CMakeLists.txt 文件,以使用 CMake 构建它。它依赖于 Magick ++ ImageMagick API大部分的图像处理。它还需要 GMP 库的bignum算术字符​​串编码。



    基于我的解决方案从分形图像压缩,与一些独特的扭曲。基本思想是拍摄图像,将副本缩小到50%,并查找看起来类似于原始图像中的非重叠块的各种方向的图像。这个搜索需要一个非常强力的方法,但这只是为了更容易介绍我的修改。



    第一个修改是,而不是只看90度旋转和翻转,我的程序也考虑45度方向。



    另一件事是为每个块的每个颜色分量存储一个对比度/亮度调整是每个块的一个颜色分量,方式太贵了。相反,我存储一个重量化的颜色(调色板只有4 * 4 * 4 = 64颜色),只是以一定比例混合。在数学上,这相当于每种颜色的可变亮度和恒定对比度调整。



    一旦计算出每个块的位置,方向和颜色,就会将其编码为UTF-8串。首先,它生成一个非常大的bignum来表示块表中的数据和图像大小。这种方法类似于Sam Hocevar的解决方案 - 一个大数字,具有随位置而变化的基数。



    然后,它将它转换为一个基数可用字符集的大小。默认情况下,它充分利用分配的unicode字符集,减去小于,大于,与号,控制,组合,代理和私有字符。它不漂亮,但它的工作。您也可以注释掉默认表,并选择可打印的7位ASCII(再次排除<>,和&字符)或CJK统一表意文字。



    无论如何,这里有一些图像和时间(如测量的那样)。这些字符代码可用的表存储有以无效和有效字符的交替运行编码的游程长度。在我的旧的3.0GHz P4),并压缩到140个字符在完全分配的unicode集如上所述。总的来说,我非常高兴他们都是如此。如果我有更多的时间来工作,我可能会尝试减少解压缩图像的块效应。不过,我认为结果是极好的压缩比。解压缩的图像是位印象,但我发现它相对容易看到位对应于原始。



    Stack Overflow Logo(8.6s编码,7.9s解码,485字节):



    Lena(32.8秒编码,13.0秒解码,477字节):



    Mona Lisa(43.2秒编码,14.5秒解码,490字节):



    编辑:CJK统一字符



    Sam在关于使用CJK的评论中提出。这里是一个从CJK Unified字符集压缩到139个字符的Mona Lisa版本:




    咏璘驞凄脒鵚鵚蛥鸂拗朐朖朖辿辿緍緍緍緍緍緍緍緍緍緍渫渫渫渫渫渫渫渫渫渫渫渫渫渫櫰矍昀昀鰛掾撄粂敽諽粳蔍蔍螎葙峬覧绌抆抆惫諽諽諽諽媄媄媄媄媄狌狌狌諽諽諽諽諽諽諽鷍鸮駫駫毤埙悖萜萜旖鞰萗勹勹鈱哳垬濅鬒鬒鬒瞛洆洆狋狋氙闼籴珵珵仾氙熜謋勋勋勋縿縿珝珝勋勋縿縿珝珝珝擸萿



    我使用的程序顶部的调整参数为:19,19,4,4,3,10,11,1000,1000.我也注释掉了第一个定义的number_assigned和代码,并取消注释它们的最后定义以选择CJK统一字符集。


    If a picture's worth 1000 words, how much of a picture can you fit in 140 characters?

    Note: That's it folks! Bounty deadline is here, and after some tough deliberation, I have decided that Boojum's entry just barely edged out Sam Hocevar's. I will post more detailed notes once I've had a chance to write them up. Of course, everyone should feel free to continue to submit solutions and improve solutions for people to vote on. Thank you to everyone who submitted and entry; I enjoyed all of them. This has been a lot of fun for me to run, and I hope it's been fun for both the entrants and the spectators.

    I came across this interesting post about trying to compress images into a Twitter comment, and lots of people in that thread (and a thread on Reddit) had suggestions about different ways you could do it. So, I figure it would make a good coding challenge; let people put their money where their mouth is, and show how their ideas about encoding can lead to more detail in the limited space that you have available.

    I challenge you to come up with a general purpose system for encoding images into 140 character Twitter messages, and decoding them into an image again. You can use Unicode characters, so you get more than 8 bits per character. Even allowing for Unicode characters, however, you will need to compress images into a very small amount of space; this will certainly be a lossy compression, and so there will have to be subjective judgements about how good each result looks.

    Here is the result that the original author, Quasimondo, got from his encoding (image is licensed under a Creative Commons Attribution-Noncommercial license):

    Can you do better?

    Rules

    1. Your program must have two modes: encoding and decoding.
    2. When encoding:

      1. Your program must take as input a graphic in any reasonable raster graphic format of your choice. We'll say that any raster format supported by ImageMagick counts as reasonable.
      2. Your program must output a message which can be represented in 140 or fewer Unicode code points; 140 code points in the range U+0000U+10FFFF, excluding non-characters (U+FFFE, U+FFFF, U+nFFFE, U+nFFFF where n is 110 hexadecimal, and the range U+FDD0U+FDEF) and surrogate code points (U+D800U+DFFF). It may be output in any reasonable encoding of your choice; any encoding supported by GNU iconv will be considered reasonable, and your platform native encoding or locale encoding would likely be a good choice. See Unicode notes below for more details.

    3. When decoding:

      1. Your program should take as input the output of your encoding mode.
      2. Your program must output an image in any reasonable format of your choice, as defined above, though for output vector formats are OK as well.
      3. The image output should be an approximation of the input image; the closer you can get to the input image, the better.
      4. The decoding process may have no access to any other output of the encoding process other than the output specified above; that is, you can't upload the image somewhere and output the URL for the decoding process to download, or anything silly like that.

    4. For the sake of consistency in user interface, your program must behave as follows:

      1. Your program must be a script that can be set to executable on a platform with the appropriate interpreter, or a program that can be compiled into an executable.
      2. Your program must take as its first argument either encode or decode to set the mode.
      3. Your program must take input in one or more of the following ways (if you implement the one that takes file names, you may also read and write from stdin and stdout if file names are missing):

        1. Take input from standard in and produce output on standard out.

          my-program encode <input.png >output.txt
          my-program decode <output.txt >output.png
          

        2. Take input from a file named in the second argument, and produce output in the file named in the third.

          my-program encode input.png output.txt
          my-program decode output.txt output.png
          

    5. For your solution, please post:

      1. Your code, in full, and/or a link to it hosted elsewhere (if it's very long, or requires many files to compile, or something).
      2. An explanation of how it works, if it's not immediately obvious from the code or if the code is long and people will be interested in a summary.
      3. An example image, with the original image, the text it compresses down to, and the decoded image.
      4. If you are building on an idea that someone else had, please attribute them. It's OK to try to do a refinement of someone else's idea, but you must attribute them.

    Guidelines

    These are basically rules that may be broken, suggestions, or scoring criteria:

    1. Aesthetics are important. I'll be judging, and suggest that other people judge, based on:

      1. How good the output image looks, and how much it looks like the original.
      2. How nice the text looks. Completely random gobbledigook is OK if you have a really clever compression scheme, but I also want to see answers that turn images into mutli-lingual poems, or something clever like that. Note that the author of the original solution decided to use only Chinese characters, since it looked nicer that way.
      3. Interesting code and clever algorithms are always good. I like short, to the point, and clear code, but really clever complicated algorithms are OK too as long as they produce good results.

    2. Speed is also important, though not as important as how good a job compressing the image you do. I'd rather have a program that can convert an image in a tenth of a second than something that will be running genetic algorithms for days on end.
    3. I will prefer shorter solutions to longer ones, as long as they are reasonably comparable in quality; conciseness is a virtue.
    4. Your program should be implemented in a language that has a freely-available implementation on Mac OS X, Linux, or Windows. I'd like to be able to run the programs, but if you have a great solution that only runs under MATLAB or something, that's fine.
    5. Your program should be as general as possible; it should work for as many different images as possible, though some may produce better results than others. In particular:

      1. Having a few images built into the program that it matches and writes a reference to, and then produces the matching image upon decoding, is fairly lame and will only cover a few images.
      2. A program that can take images of simple, flat, geometric shapes and decompose them into some vector primitive is pretty nifty, but if it fails on images beyond a certain complexity it is probably insufficiently general.
      3. A program that can only take images of a particular fixed aspect ratio but does a good job with them would also be OK, but not ideal.
      4. You may find that a black and white image can get more information into a smaller space than a color image. On the other hand, that may limit the types of image it's applicable to; faces come out fine in black and white, but abstract designs may not fare so well.
      5. It is perfectly fine if the output image is smaller than the input, while being roughly the same proportion. It's OK if you have to scale the image up to compare it to the original; what's important is how it looks.

    6. Your program should produce output that could actually go through Twitter and come out unscathed. This is only a guideline rather than a rule, since I couldn't find any documentation on the precise set of characters supported, but you should probably avoid control characters, funky invisible combining characters, private use characters, and the like.

    Scoring rubric

    As a general guide to how I will be ranking solutions when choosing my accepted solution, lets say that I'll probably be evaluating solutions on a 25 point scale (this is very rough, and I won't be scoring anything directly, just using this as a basic guideline):

    • 15 points for how well the encoding scheme reproduces a wide range of input images. This is a subjective, aesthetic judgement
      • 0 means that it doesn't work at all, it gives the same image back every time, or something
      • 5 means that it can encode a few images, though the decoded version looks ugly and it may not work at all on more complicated images
      • 10 means that it works on a wide range of images, and produces pleasant looking images which may occasionally be distinguishable
      • 15 means that it produces perfect replicas of some images, and even for larger and more complex images, gives something that is recognizable. Or, perhaps it does not make images that are quite recognizable, but produces beautiful images that are clearly derived from the original.
    • 3 points for clever use of the Unicode character set
      • 0 points for simply using the entire set of allowed characters
      • 1 point for using a limited set of characters that are safe for transfer over Twitter or in a wider variety of situations
      • 2 points for using a thematic subset of characters, such as only Han ideographs or only right-to-left characters
      • 3 points for doing something really neat, like generating readable text or using characters that look like the image in question
    • 3 points for clever algorithmic approaches and code style
      • 0 points for something that is 1000 lines of code only to scale the image down, treat it as 1 bit per pixel, and base64 encode that
      • 1 point for something that uses a standard encoding technique and is well written and brief
      • 2 points for something that introduces a relatively novel encoding technique, or that is surprisingly short and clean
      • 3 points for a one liner that actually produces good results, or something that breaks new ground in graphics encoding (if this seems like a low number of points for breaking new ground, remember that a result this good will likely have a high score for aesthetics as well)
    • 2 points for speed. All else being equal, faster is better, but the above criteria are all more important than speed
    • 1 point for running on free (open source) software, because I prefer free software (note that C# will still be eligible for this point as long as it runs on Mono, likewise MATLAB code would be eligible if it runs on GNU Octave)
    • 1 point for actually following all of the rules. These rules have gotten a bit big and complicated, so I'll probably accept otherwise good answers that get one small detail wrong, but I will give an extra point to any solution that does actually follow all of the rules

    Reference images

    Some folks have asked for some reference images. Here are a few reference images that you can try; smaller versions are embedded here, they all link to larger versions of the image if you need those:

    Prize

    I am offering a 500 rep bounty (plus the 50 that StackOverflow kicks in) for the solution that I like the best, based on the above criteria. Of course, I encourage everyone else to vote on their favorite solutions here as well.

    Note on deadline

    This contest will run until the bounty runs out, about 6 PM on Saturday, May 30. I can't say the precise time it will end; it may be anywhere from 5 to 7 PM. I will guarantee that I'll look at all entries submitted by 2 PM, and I will do my best to look at all entries submitted by 4 PM; if solutions are submitted after that, I may not have a chance to give them a fair look before I have to make my decision. Also, the earlier you submit, the more chance you will have for voting to be able to help me pick the best solution, so try and submit earlier rather than right at the deadline.

    Unicode notes

    There has also been some confusion on exactly what Unicode characters are allowed. The range of possible Unicode code points is U+0000 to U+10FFFF. There are some code points which are never valid to use as Unicode characters in any open interchange of data; these are the noncharacters and the surrogate code points. Noncharacters are defined in the Unidode Standard 5.1.0 section 16.7 as the values U+FFFE, U+FFFF, U+nFFFE, U+nFFFF where n is 110 hexadecimal, and the range U+FDD0U+FDEF. These values are intended to be used for application-specific internal usage, and conforming applications may strip these characters out of text processed by them. Surrogate code points, defined in the Unicode Standard 5.1.0 section 3.8 as U+D800U+DFFF, are used for encoding characters beyond the Basic Multilingual Plane in UTF-16; thus, it is impossible to represent these code points directly in the UTF-16 encoding, and it is invalid to encode them in any other encoding. Thus, for the purpose of this contest, I will allow any program which encodes images into a sequence of no more than 140 Unicode code points from the range U+0000U+10FFFF, excluding all noncharacters and surrogate pairs as defined above.

    I will prefer solutions that use only assigned characters, and even better ones that use clever subsets of assigned characters or do something interesting with the character set they use. For a list of assigned characters, see the Unicode Character Database; note that some characters are listed directly, while some are listed only as the start and end of a range. Also note that surrogate code points are listed in the database, but forbidden as mentioned above. If you would like to take advantage of certain properties of characters for making the text you output more interesting, there are a variety of databases of character information available, such as a list of named code blocks and various character properties.

    Since Twitter does not specify the exact character set they support, I will be lenient about solutions which do not actually work with Twitter because certain characters count extra or certain characters are stripped. It is preferred but not required that all encoded outputs should be able to be transferred unharmed via Twitter or another microblogging service such as identi.ca. I have seen some documentation stating that Twitter entity-encodes <, >, and &, and thus counts those as 4, 4, and 5 characters respectively, but I have not tested that out myself, and their JavaScript character counter doesn't seem to count them that way.

    Tips & Links

    • The definition of valid Unicode characters in the rules is a bit complicated. Choosing a single block of characters, such as CJK Unified Ideographs (U+4E00–U+9FCF) may be easier.
    • You may use existing image libraries, like ImageMagick or Python Imaging Library, for your image manipulation.
    • If you need some help understanding the Unicode character set and its various encodings, see this quick guide or this detailed FAQ on UTF-8 in Linux and Unix.
    • The earlier you get your solution in, the more time I (and other people voting) will have to look at it. You can edit your solution if you improve it; I'll base my bounty on the most recent version when I take my last look through the solutions.
    • If you want an easy image format to parse and write (and don't want to just use an existing format), I'd suggest using the PPM format. It's a text based format that's very easy to work with, and you can use ImageMagick to convert to and from it.

    Edit log

    May 30: Bounty deadline up, picked winner
    May 28: More precise details on deadline
    May 28: Cleanup of links, mention PPM
    May 27: Reorganized and cleaned up rules and guidelines; added tips and links
    May 26: Mentioned deadline issues and removed the soft cap of 100 lines
    May 22: Added reference images, clarifications about Unicode, and a rough scoring rubric
    May 21: Updated to allow 140 arbitrary Unicode characters. While the Twitter web form counts by UTF-16 code units, the API allows 140 arbitrary Unicode characters, so we'll use that definition to allow more flexibility.
    May 21: Added deadline information
    May 21: Posted original challenge

    解决方案

    Alright, here's mine: nanocrunch.cpp and the CMakeLists.txt file to build it using CMake. It relies on the Magick++ ImageMagick API for most of its image handling. It also requires the GMP library for bignum arithmetic for its string encoding.

    I based my solution off of fractal image compression, with a few unique twists. The basic idea is to take the image, scale down a copy to 50% and look for pieces in various orientations that look similar to non-overlapping blocks in the original image. It takes a very brute force approach to this search, but that just makes it easier to introduce my modifications.

    The first modification is that instead of just looking at ninety degree rotations and flips, my program also considers 45 degree orientations. It's one more bit per block, but it helps the image quality immensely.

    The other thing is that storing a contrast/brightness adjustment for each of color component of each block is way too expensive. Instead, I store a heavily quantized color (the palette has only 4 * 4 * 4 = 64 colors) that simply gets blended in in some proportion. Mathematically, this is equivalent to a variable brightness and constant contrast adjustment for each color. Unfortunately, it also means there's no negative contrast to flip the colors.

    Once it's computed the position, orientation and color for each block, it encodes this into a UTF-8 string. First, it generates a very large bignum to represent the data in the block table and the image size. The approach to this is similar to Sam Hocevar's solution -- kind of a large number with a radix that varies by position.

    Then it converts that into a base of whatever the size of the character set available is. By default, it makes full use of the assigned unicode character set, minus the less than, greater than, ampersand, control, combining, and surrogate and private characters. It's not pretty but it works. You can also comment out the default table and select printable 7-bit ASCII (again excluding <, >, and & characters) or CJK Unified Ideographs instead. The table of which character codes are available is stored a run-length encoded with alternating runs of invalid and valid characters.

    Anyway, here are some images and times (as measured on my old 3.0GHz P4), and compressed to 140 characters in the full assigned unicode set described above. Overall, I'm fairly pleased with how they all turned out. If I had more time to work on this, I'd probably try to reduce the blockiness of the decompressed images. Still, I think the results are pretty good for the extreme compression ratio. The decompressed images are bit impressionistic, but I find it relatively easy to see how bits correspond to the original.

    Stack Overflow Logo (8.6s to encode, 7.9s to decode, 485 bytes):

    Lena (32.8s to encode, 13.0s to decode, 477 bytes):

    Mona Lisa (43.2s to encode, 14.5s to decode, 490 bytes):

    Edit: CJK Unified Characters

    Sam asked in the comments about using this with CJK. Here's a version of the Mona Lisa compressed to 139 characters from the CJK Unified character set:

    咏璘驞凄脒鵚据蛥鸂拗朐朖辿韩瀦魷歪痫栘璯緍脲蕜抱揎頻蓼債鑡嗞靊寞柮嚛嚵籥聚隤慛絖銓馿渫櫰矍昀鰛掾撄粂敽牙稉擎蔍螎葙峬覧絀蹔抆惫冧笻哜搀澐芯譶辍澮垝黟偞媄童竽梀韠镰猳閺狌而羶喙伆杇婣唆鐤諽鷍鴞駫搶毤埙誖萜愿旖鞰萗勹鈱哳垬濅鬒秀瞛洆认気狋異闥籴珵仾氙熜謋繴茴晋髭杍嚖熥勳縿餅珝爸擸萿

    The tuning parameters at the top of the program that I used for this were: 19, 19, 4, 4, 3, 10, 11, 1000, 1000. I also commented out the first definition of number_assigned and codes, and uncommented out the last definitions of them to select the CJK Unified character set.

    这篇关于Twitter图像编码挑战的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆