使用Python编码解析CS：GO语言文件 [英] Parsing a CS:GO language file with encoding in Python

查看：213 发布时间：2017/8/16 21:39:53 python parsing encoding utf-8

本文介绍了使用Python编码解析CS：GO语言文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

本主题与在Python中解析CS：GO脚本文件有关主题，但还有另一个问题。
我正在从CS：GO处理内容，现在我试图使一个python工具将/ scripts /文件夹中的所有数据导入到Python字典中。

解析数据后的下一步是从/资源中分析语言资源文件，并在词典和语言之间建立关系。

有一个用于Eng本地化的原始文件：
https://github.com/spec45as/PySteamBot/blob/ master / csgo_english.txt

文件格式与上一个任务相似，但我遇到了另一个问题。所有语言文件都是以UTF-16-LE编码，我无法理解使用Python编码的文件和字符串的方式（我主要使用Java）
我试图制作一些解决方案在 open（fileName，encoding ='utf-16-le'）。read（），但是我不知道如何使用这样的编码字符串进行解析。 / p>

pyparsing.ParseException：预期引用的字符串，以
结尾（以char为空）开始，（行：1， col：1）

另一个问题是带有\样表达式的行，例如：

 musickit_midnightriders_01_desc\HAPPY HOLIDAYS，**** ERS！\\\\
 -Midnight Riders

如果我想离开这些行，如何解析这些符号？

解决方案

此输入文件有一些新的皱纹不在原始的CS：GO示例：

嵌入 \某些值字符串中的转义引号

一些引用的值字符串跨越多行

某些值以尾随环境条件（例如 [$ WIN32] ， [$ OSX] ）

在文件中嵌入注释，标有'//'

前两个通过修改 value_qs 的定义来解决。既然值现在比键功能更全面，我决定为它们使用单独的QuotedString定义：

  key_qs = QuotedString（' '）.setName（key_qs）
 value_qs = QuotedString（''，escChar ='\\'，multiline = True）.setName（value_qs）

第三个需要更多的重构。使用这些限定条件与C中的 #IFDEF 宏类似，只有在环境符合条件的情况下，才能启用/禁用定义。这些条件中的一些甚至是布尔表达式：

[！$ PS3] / li>
[$ WIN32 || $ X360 || $ OSX]

[！$ X360&&$ PS3]

这可能会导致重复的键定义文件，例如这些行：

 Menu_Dlg_Leaderboards_Lost_Connection您必须连接到Xbox LIVE才能查看排行榜检查您的连接，然后重试。 [$ X360] 
Menu_Dlg_Leaderboards_Lost_Connection您必须连接到PlayStation®Network和Steam才能查看排行榜，请检查您的连接并重试。 [$ PS3] 
Menu_Dlg_Leaderboards_Lost_Connection您必须连接到Steam才能查看排行榜，请检查您的连接并重试。

其中包含3个关键字Menu_Dlg_Leaderboards_Lost_Connection的定义，具体取决于设置了哪些环境值。 p>

为了在解析文件时不丢失这些值，我选择在解析时通过附加条件来修改密钥。此代码实现更改：

  LBRACK，RBRACK = map（Suppress，[]）
 qualExpr = Word （alphanums +'$！& |'）
 qualExprCondition = LBRACK + qualExpr + RBRACK 
 
 key_value =组（key_qs + value +可选（qualExprCondition（qual）））
 def addQualifierToKey（tokens）：
 tt = tokens [0] 
如果tt中的'qual'：
 tt [0] + ='/'+ tt.pop（-1）
 key_value.setParseAction（addQualifierToKey）

所以在上面的示例中，你会得到3个键：

Menu_Dlg_Leaderboards_Lost_Connection / $ X360

Menu_Dlg_Leaderboards_Lost_Connection / $ PS3

Menu_Dlg_Leaderboards_Lost_Connection

最后，处理评论可能是最简单的。 Pyparsing内置支持跳过评论，就像空白一样。您只需要定义注释的表达式，并使顶级解析器忽略它。为了支持这个功能，在pyparsing中预先定义了几个常见的注释表单。在这种情况下，解决方案只是将最终的解析器定义更改为：

  parser.ignore（dblSlashComment）

最后，在QuotedString的实现中存在一个小错误，其中标准的空格字符串文字，如 \t 和\\\不处理，只被视为不必要的转义't'或'n' 。所以现在，当这条线被解析时：

 SFUI_SteamOverlay_Text这个功能要求Steam社区游戏中启用。在Steam中启用此功能后，您可能需要重新启动游戏：\\\
Steam  - >文件 - >设置 - >游戏中：启用Steam社区游戏中\ WIN32]

对于您刚刚得到的值字符串：

 此功能需要Steam社区游戏中启用。在
中启用此功能后，您可能需要重新启动游戏。Steam：nSteam  - >文件 - >设置 - >游戏中：启用Steam社区
 In-Gamen

而不是：

 此功能需要Steam社区游戏中启用。 
 
您可能需要在Steam中启用此功能后重新启动游戏：
 Steam  - >文件 - >设置 - >在游戏中：启用Steam社区游戏中

我将不得不在下一个版本中修复此行为的pyparsing。

这是最终的解析器代码：

  from pyparsing import（Suppress，QuotedString，Forward，Group，Dict，
 ZeroOrMore，Word，alphanums，Optional，dblSlashComment）
 
 LBRACE，RBRACE = map（Suppress，{}）
 
 key_qs = QuotedString（''）setName（key_qs）
 value_qs = QuotedString（'''，escChar ='\\'，multiline = True） value_qs）
 
＃使用此代码将解析时的整数值转换成ints 
 def convert_integers（令牌）：
如果令牌[0] .isdigit（）：
 tokens [0] = int（tokens [0]）
 value_qs.setParseAction（convert_integers）
 
 LBRACK，RBRACK = map（Suppress，[]）
 qualExpr = Word（alphanums +'$！& |'）
 qualExprCondition = LBRACK + qualExpr + RBRACK 
 
 value =转发（）
 key_value =组（key_qs + va lue +可选（qualExprCondition（qual）））
 def addQualifierToKey（令牌）：
 tt = tokens [0] 
如果tt中的'qual'：
 tt [0 ] + ='/'+ tt.pop（-1）
 key_value.setParseAction（addQualifierToKey）
 
 struct =（LBRACE + Dict（ZeroOrMore（key_value））+ RBRACE）.setName struct）
 value<< =（value_qs |结构）
 parser = Dict（key_value）
 parser.ignore（dblSlashComment）
 
 sample = open（'cs_go_sample2.txt'）。read（）
 config = parser.parseString（sample）
 
 
 print（config.keys（））
在config.lang.keys（）中的k：
 print（' - ' + k）
 
＃〜config.lang.pprint（）
 print（config.lang.Tokens.StickerKit_comm01_burn_them_all）
 print（config.lang.Tokens ['SFUI_SteamOverlay_Text / $ WIN32']）

This topic is related to the Parsing a CS:GO script file in Python theme, but there is another problem. I'm working on a content from CS:GO and now i'm trying to make a python tool importing all data from from /scripts/ folder into Python dictionaries.

The next step after parsing data is parsing Language resource file from /resources and making relations between dictionaries and language.

There is an original file for Eng localization: https://github.com/spec45as/PySteamBot/blob/master/csgo_english.txt

The file format is similar to the previous task, but I have faced with another problems. All language files is in UTF-16-LE encode, i couldn't understand the way of working with encoded files and strings in Python (I'm mostly working with Java) I have tried to make some solutions, based on open(fileName, encoding='utf-16-le').read(), but i don't know how to work with such encoded strings in pyparsing.

pyparsing.ParseException: Expected quoted string, starting with " ending with " (at char 0), (line:1, col:1)

Another problem is lines with \"-like expressions, for example:

"musickit_midnightriders_01_desc"       "\"HAPPY HOLIDAYS, ****ERS!\"\n    -Midnight Riders"

How to parse these symbols if I want to leave these lines as they are?

解决方案

There are a few new wrinkles to this input file that were not in the original CS:GO example:

embedded \" escaped quotes in some of the value strings
some of the quoted value strings span multiple lines
some of the values end with a trailing environment condition (such as [$WIN32], [$OSX])
embedded comments in the file, marked with '//'

The first two are addressed by modifying the definition of value_qs. Since values are now more fully-featured than keys, I decided to use separate QuotedString definitions for them:

key_qs = QuotedString('"').setName("key_qs")
value_qs = QuotedString('"', escChar='\\', multiline=True).setName("value_qs")

The third requires a bit more refactoring. The use of these qualifying conditions is similar to #IFDEF macros in C - they enable/disable the definition only if the environment matches the condition. Some of these conditions were even boolean expressions:

[!$PS3]
[$WIN32||$X360||$OSX]
[!$X360&&!$PS3]

This could lead to duplicate keys in the definition file, such as in these lines:

"Menu_Dlg_Leaderboards_Lost_Connection"     "You must be connected to Xbox LIVE to view Leaderboards. Please check your connection and try again." [$X360]
"Menu_Dlg_Leaderboards_Lost_Connection"     "You must be connected to PlayStation®Network and Steam to view Leaderboards. Please check your connection and try again." [$PS3]
"Menu_Dlg_Leaderboards_Lost_Connection"     "You must be connected to Steam to view Leaderboards. Please check your connection and try again."

which contain 3 definitions for the key "Menu_Dlg_Leaderboards_Lost_Connection", depending on what environment values were set.

In order to not lose these values when parsing the file, I chose to modify the key at parse time by appending the condition if one was present. This code implements the change:

LBRACK,RBRACK = map(Suppress, "[]")
qualExpr = Word(alphanums+'$!&|')
qualExprCondition = LBRACK + qualExpr + RBRACK

key_value = Group(key_qs + value + Optional(qualExprCondition("qual")))
def addQualifierToKey(tokens):
    tt = tokens[0]
    if 'qual' in tt:
        tt[0] += '/' + tt.pop(-1)
key_value.setParseAction(addQualifierToKey)

So that in the sample above, you would get 3 keys:

Menu_Dlg_Leaderboards_Lost_Connection/$X360
Menu_Dlg_Leaderboards_Lost_Connection/$PS3
Menu_Dlg_Leaderboards_Lost_Connection

Lastly, the handling of comments, probably the easiest. Pyparsing has built-in support for skipping over comments, just like whitespace. You just need to define the expression for the comment, and have the top-level parser ignore it. To support this feature, several common comment forms are pre-defined in pyparsing. In this case, the solution is just to change the final parser defintion to:

parser.ignore(dblSlashComment)

And LASTLY lastly, there is a minor bug in the implementation of QuotedString, in which standard whitespace string literals like \t and \n are not handled, and are just treated as an unnecessarily-escaped 't' or 'n'. So for now, when this line is parsed:

"SFUI_SteamOverlay_Text"  "This feature requires Steam Community In-Game to be enabled.\n\nYou might need to restart the game after you enable this feature in Steam:\nSteam -> File -> Settings -> In-Game: Enable Steam Community In-Game\n" [$WIN32]

For the value string you just get:

This feature requires Steam Community In-Game to be enabled.nnYou 
might need to restart the game after you enable this feature in 
Steam:nSteam -> File -> Settings -> In-Game: Enable Steam Community 
In-Gamen

instead of:

This feature requires Steam Community In-Game to be enabled.

You might need to restart the game after you enable this feature in Steam:
Steam -> File -> Settings -> In-Game: Enable Steam Community In-Game

I will have to fix this behavior in the next release of pyparsing.

Here is the final parser code:

from pyparsing import (Suppress, QuotedString, Forward, Group, Dict, 
    ZeroOrMore, Word, alphanums, Optional, dblSlashComment)

LBRACE,RBRACE = map(Suppress, "{}")

key_qs = QuotedString('"').setName("key_qs")
value_qs = QuotedString('"', escChar='\\', multiline=True).setName("value_qs")

# use this code to convert integer values to ints at parse time
def convert_integers(tokens):
    if tokens[0].isdigit():
        tokens[0] = int(tokens[0])
value_qs.setParseAction(convert_integers)

LBRACK,RBRACK = map(Suppress, "[]")
qualExpr = Word(alphanums+'$!&|')
qualExprCondition = LBRACK + qualExpr + RBRACK

value = Forward()
key_value = Group(key_qs + value + Optional(qualExprCondition("qual")))
def addQualifierToKey(tokens):
    tt = tokens[0]
    if 'qual' in tt:
        tt[0] += '/' + tt.pop(-1)
key_value.setParseAction(addQualifierToKey)

struct = (LBRACE + Dict(ZeroOrMore(key_value)) + RBRACE).setName("struct")
value <<= (value_qs | struct)
parser = Dict(key_value)
parser.ignore(dblSlashComment)

sample = open('cs_go_sample2.txt').read()
config = parser.parseString(sample)


print (config.keys())
for k in config.lang.keys():
    print ('- ' + k)

#~ config.lang.pprint()
print (config.lang.Tokens.StickerKit_comm01_burn_them_all)
print (config.lang.Tokens['SFUI_SteamOverlay_Text/$WIN32'])

这篇关于使用Python编码解析CS：GO语言文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Python编码解析CS：GO语言文件 [英] Parsing a CS:GO language file with encoding in Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Python编码解析CS：GO语言文件 [英] Parsing a CS:GO language file with encoding in Python

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭