字符串的标记化 [英] Tokenization of string

查看:93
本文介绍了字符串的标记化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

今天早上我在解析我的文本文件时遇到了这个问题。

我有简单的代码解析字符串如下,

  const   char  * LastLineOfFile = FinalExecutionOfJob [NumberEntriesInLastExecution + 1] ; 
char * pch;
char * TempStr = strstr(( char *)LastLineOfFile, );
const char * list [ 5 ];
pch = strtok(TempStr, );
const char * ListOfToken [ 5 ];
int tokIndex = 0 ;
while (pch!= NULL)
{
pch = strtok(NULL, );
switch (tokIndex)
{
case 0
ListOfToken [ 0 ] = pch;
break ;
case 1
ListOfToken [ 1 ] = pch;
break ;
case 2
ListOfToken [ 2 ] = pch;
break ;
case 3
ListOfToken [ 3 ] = pch;
break ;
case 4
ListOfToken [ 4 ] = pch;
break ;
}
tokIndex ++;
}



对于字符串,例如,

03-May-2013 18:04:03服务105快照被销毁



我需要清单[4] =快照被销毁;

但是像往常一样我在(空间)的基础上进行标记化所以得到字符串是打破格式,有三个部分,比如,

快照



销毁

i需要全部在解决方案

这是 strtok 无法实现的,因为此函数会插入 NULL 令牌末尾的字符。

您可以执行以下操作以获得所需内容 -

 pch = strchr(TempStr,' '); 
pch = strchr(pch + 1 ' ');
pch = strchr(pch + 1 ' ');
pch = strchr(pch + 1 ' ');



我在你的代码中注意到的另一件事是第一次, strtok 在存储到列表中之前会被调用两次。

所以你可能想在开关之后放置 strtok 声明。


Hi guys,
this morning i got this prorblem while parsing my text file.
I have simple code to parse string is as follow,

const char* LastLineOfFile = FinalExecutionOfJob[NumberEntriesInLastExecution+1];
char * pch;
char* TempStr = strstr((char*)LastLineOfFile,""); 
const char* list[5];
pch = strtok (TempStr," ");
const char* ListOfToken[5];
int tokIndex = 0;
while (pch != NULL)
{
	pch = strtok (NULL, " ");
	switch(tokIndex)
         {
	         case 0:
			ListOfToken [0] = pch;
			break;
		case 1:
			ListOfToken [1] = pch;
			break;
		case 2:
			ListOfToken [2] = pch;
			break;
		case 3:
		         ListOfToken [3] = pch;
			break;
		case 4:
			ListOfToken [4] = pch;
			break;
	}
	tokIndex++;
}			


For String like,
"03-May-2013 18:04:03 service 105 snapshots were destroyed"

I need list[4] = "snapshots were destroyed";
But as usual i tokenize on the basis of " "(Space) so get string is breaking format that is in three parts like,
Snapshots
were
destroyed
i need all of this in sinble line.

解决方案

This is not possible with strtok because this function inserts a NULL character at the end of the token.
You could do the following to get what you want -

pch = strchr(TempStr, ' ');
pch = strchr(pch + 1, ' ');
pch = strchr(pch + 1, ' ');
pch = strchr(pch + 1, ' ');


Another thing I noticed in your code is that the first time, strtok would be called twice before storing in the list.
So you may want to place the strtok after the switch statement.


这篇关于字符串的标记化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆