突出显示字符串差异 [英] Highlight String Differences

查看:125
本文介绍了突出显示字符串差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种方法来突出显示两个字符串之间的差异。这个想法是在终端中显示iconv改变了什么字符。两个字符串都已处理,以删除前导和尾随的空格,但必须处理内部空格。

I am looking for a way to highlight the differences between 2 strings. The idea is to show, in a terminal, what characters were changed by iconv. Both strings are already processed to remove leading and trailing spaces, but internal spaces must be handled.

RED="$(tput setaf 1)"    ##    Short variables for the tput ->
CYA="$(tput setaf 6)"    ## -> commands to make output strings ->
CLS="$(tput sgr0)"       ## -> easier to read
str1="[String nâmè™]"    # String prior to iconv
str2="[String name[tm]]" # String after iconv -f utf-8 -t ascii//translit

最后,我想自动化差异的格式,使它们被tput颜色代码可以回显终端。

Ultimately I want to automate the formatting of the differences so they are surrounded by tput color codes that I can echo to the terminal.

$ {str1} =以红色突出显示,两个字符不常见的字符

${str1} = Highlight in red, characters not common to both strings

$ {str2} =以青色突出显示,两个字符不常见的字符

${str2} = Highlight in cyan, characters not common to both strings

Wanted Output:

Wanted Output:

output1="[String n${RED}â${CLS}m${RED}è™${CLS}]"
output2="[String n${CYA}a${CLS}m${CYA}e[tm]${CLS}]"

大多数diff实用程序我在线或单词级别查看工作。我想解析cmp的输出的第一个diff的字节#,但我将不得不重新解析多个差异看起来。

Most diff utilities I looked at work on the line or word level. I was thinking of parsing the output of cmp for the byte# of the first diff, but I would have to re-parse for multiple differences it seems.

无论如何我想关于它,它似乎是一个涉及的过程,所以我只是想确保我没有缺少一个明显的解决方案或工具。

Anyway I think about it, it seems like it going to be an involved process so I just want to make sure I'm not missing an obvious solution or tool.

现在我' m最简单的方法是格式化每个字符串,在一个新行上放置一个单个字节,然后我的选项打开。

Right now I'm thinking the easiest way would be to format each string to put a single byte on a new line and then my options open up.

nstr1="$(fold -w1 <<< "$(echo "${str1}")")"
nstr2="$(fold -w1 <<< "$(echo "${str2}")")"
diff <(echo -e "${nstr1}") <(echo -e "${nstr2}")

这是我的,不想进一步,除非我在正确的轨道。我确定有一个数百万的方法来做到这一点,但是有更有效的方法去这里吗?

This is as far as i got and didn't want to go further unless I was on the right track. I'm certain there is a zillion ways to do this but is there a more efficient way to go here?

推荐答案

它们一起:

#!/usr/bin/env bash

# Using stdin input, outputs each char. on its own line, with actual newlines
# in the input represented as literal '\n'.
toSingleCharLines() {
  sed 's/\(.\)/\1\'$'\n''/g; s/\n$/\'$'\n''\\n/'
}

# Using stdin input, reassembles a string split into 1-character-per-line output
# by toSingleCharLines().
fromSingleCharLines() {
  awk '$0=="\\n" { printf "\n"; next} { printf "%s", $0 }'
}

# Prints a colored string read from stdin by interpreting embedded color references such
# as '${RED}'.
printColored() {
  local str=$(</dev/stdin)
  local RED="$(tput setaf 1)" CYA="$(tput setaf 6)" RST="$(tput sgr0)"
  str=${str//'${RED}'/${RED}}
  str=${str//'${CYA}'/${CYA}}
  str=${str//'${RST}'/${RST}}
  printf '%s\n' "$str"
}

# The non-ASCII input string.
strOrg='[String nâmè™]'

# Create its ASCII-chars.-only transliteration.
strTransLit=$(iconv -f utf-8 -t ascii//translit <<<"$strOrg")

# Print the ORIGINAL string with the characters that NEED transliteration
# highlighted in RED.
diff --changed-group-format='${RED}%=${RST}' \
  <(toSingleCharLines <<<"$strOrg") <(toSingleCharLines <<<"$strTransLit") |
    fromSingleCharLines | printColored

# Print the TRANSLITERATED string with the characters that RESULT FROM
# transliteration highlighted in CYAN.
diff --changed-group-format='${CYA}%=${RST}' \
  <(toSingleCharLines <<<"$strTransLit") <(toSingleCharLines <<<"$strOrg") |
    fromSingleCharLines | printColored

这会产生:

这篇关于突出显示字符串差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆