解析用C iCalendar文件 [英] Parsing an iCalendar file in C

查看:174
本文介绍了解析用C iCalendar文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我期待解析使用C.的iCalendar文件,我有一个现有的结构设置和所有准备阅读,并希望通过与组件线解析线。

I am looking to parse iCalendar files using C. I have an existing structure setup and reading in all ready and want to parse line by line with components.

例如,我需要分析类似如下:

For example I would need to parse something like the following:

UID:uid1@example.com
DTSTAMP:19970714T170000Z
ORGANIZER;CN=John Doe;SENT-BY="mailto:smith@example.com":mailto:john.doe@example.com
CATEGORIES:Project Report, XYZ, Weekly Meeting
DTSTART:19970714T170000Z
DTEND:19970715T035959Z
SUMMARY:Bastille Day Party

下面是一些规则:


  • 在每行第一个字是在属性名

  • 属性名称将跟一个冒号(:)或分号(;)

  • 如果它是一个冒号,然后在属性值将直接到内容到行的末尾

  • 复杂性的另一个层这里作为一个逗号分隔值允许那么这将被存储在一个数组列表中添加。因此,类别一个例子就是有3个元素的值的数组

  • 如果在属性名后,分号是存在的,那么有遵循
  • 可选参数
  • 可选参数格式为PARAMNAME = ParamValue。再一个逗号分隔的列表在这里支持。

  • 有关于管理​​行看到的可不止一个可选参数。有也只是另一个分号之后的下一个参数和值。

  • 而在另一个扳手扔了,报价被允许在值。如果事情是在引号的价值它需要被视为值,而不是被语法的一部分的一部分。所以在报价分号并不意味着有另外一个参数,这将是值的一部分。

  • The first word on each line is the property name
  • The property name will be followed by a colon (:) or a semicolon (;)
  • If it is a colon then the property value will be directly to the right of the content to the end of the line
  • A further layer of complexity is added here as a comma separated list of values are allowed that would then be stored in an array. So the CATEGORIES one for example would have 3 elements in an array for the values
  • If after the property name a semi colon is there, then there are optional parameters that follow
  • The optional parameter format is ParamName=ParamValue. Again a comma separated list is supported here.
  • There can be more than one optional parameter as seen on the ORGANIZER line. There would just be another semicolon followed by the next parameter and value.
  • And to throw in yet another wrench, quotations are allowed in the values. If something is in quotes for the value it would need to be treated as part of the value instead of being part of the syntax. So a semicolon in a quotation would not mean that there is another parameter it would be part of the value.

我要对此使用和strchr()的strtok(),并已得到了一些基本要素,但它是变得非常杂乱和无组织似乎并没有要做到这一点的正确方法。

I was going about this using strchr() and strtok() and have got some basic elements from that, however it is getting very messy and unorganized and does not seem to be the right way to do this.

我怎样才能实现与标准C库(或POSIX正则表达式库)这样一个复杂的解析器? (不找整体解决方案,只是起点)

How can I implement such a complex parser with the standard C libraries (or the POSIX regex library)? (not looking for whole solution, just starting point)

推荐答案

这答案是假定你想使用滚C.标准你自己的解析器在实践中通常是更好地使用现有的解析器,因为他们已经想到了并处理了所有的奇怪的事情,可以上来了。

This answer is supposing that you want to roll your own parser using Standard C. In practice it is usually better to use an existing parser because they have already thought of and handled all the weird things that can come up.

我的高层次的做法是:


  • 阅读行

  • 传递指针开始此行的一个函数 parse_line

    • 使用 strcspn 上的指针,以确定位置的第一个; (中止如果没有标记找到)

    • 保存文本,只要属性名

    • 当解析指针指向;

      • 调用函数 extract_name_value_pair <​​/ code>您解析指针的传球地址。

      • 该函数将提取并保存名称和价值,并更新指针指向; 下面的条目。当然,这个函数必须处理的值引号,事实上,他们可能是; 中的价值

      • Read a line
      • Pass pointer to start of this line to a function parse_line:
        • Use strcspn on the pointer to identify the location of the first : or ; (aborting if no marker found)
        • Save the text so far as the property name
        • While the parsing pointer points to ;:
          • Call a function extract_name_value_pair passing address of your parsing pointer.
          • That function will extract and save the name and value, and update the pointer to point to the ; or : following the entry. Of course this function must handle quote marks in the value and the fact that their might be ; or : in the value

          功能 parse_csv extract_name_value_pair <​​/ code>其实应该被开发和第一次测试。做一个测试套件,并检查它们正常工作。然后写在需要它调用这些函数的整体解析器函数。

          The functions parse_csv and extract_name_value_pair should in fact be developed and tested first. Make a test suite and check that they work properly. Then write your overall parser function which calls those functions as needed.

          另外,写的所有内存分配code作为独立的功能。想想你想你的分析结果存储在什么数据结构中,然后code了该数据结构,并对其进行测试,完全独立于解析code的。只有,写解析code和调用函数中的数据结构中插入得到的数据。

          Also, write all the memory allocation code as separate functions. Think of what data structure you want to store your parsed result in. Then code up that data structure, and test it, entirely independently of the parsing code. Only then, write the parsing code and call functions to insert the resulting data in the data structure.

          您的真正的不希望有内存管理code混合与分析code。这使得它更难指数调试。

          You really don't want to have memory management code mixed up with parsing code. That makes it exponentially harder to debug.

          在制定接受字符串(例如上述所有三个命名功能,再加上其他任何助手,你决定你需要),你有几种选择一个函数作为他们的接口:

          When making a function that accepts a string (e.g. all three named functions above, plus any other helpers you decide you need) you have a few options as to their interface:


          • 接受指向空终止字符串

          • 接受指针开始和一过去,中高端

          • 接受指针开始,和整数长度

          每个方式有其优点和缺点:这很烦人无处不在,再后来写空终结unwrite他们如果需要的话;但是当你想使用 strcspn 或其它字符串函数,但你收到了长度计数的绳子这也是恼人的。

          Each way has its pros and cons: it's annoying to write null terminators everywhere and then unwrite them later if need be; but it's also annoying when you want to use strcspn or other string functions but you received a length-counted piece of string.

          此外,当函数需要让来电者知道它在解析多少文本消耗,你有两个选择:

          Also, when the function needs to let the caller know how much text it consumed in parsing, you have two options:


          • 接受指针字符,返回消耗的字符数;调用函数将添加两个在一起,知道发生了什么

          • 接受指针指向字符,并更新指向字符。然后返回值可用于错误code。

          有没有一个正确的答案,有经验,你将得到更好的在决定哪些选项引导到干净code。

          There's no one right answer, with experience you will get better at deciding which option leads to the cleanest code.

          这篇关于解析用C iCalendar文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆