MATLAB:简单的字符串分析-查找位置 [英] MATLAB: Simple string analysis - Find locations

查看:60
本文介绍了MATLAB:简单的字符串分析-查找位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在这里,我有一个文学作品的例子,我想对其进行简单的分析.请注意不同的部分:

Here I have an example of a piece of literature that I would like to do a simple analysis on. Notice the different sections:

str =   "Random info - at beginning-man. "+ ...
        "Random info still continues. "+ ...
        "CHAPTER 1. " + ...
        "Random info in middle one, "+ ...
        "Random info still continues. "+ ...
        "1 This is sentence one of verse one, "+ ...
        "This still sentence one of verse one. "+ ...
        "2 This is sentence one of verse two. "+ ...
        "This is sentence two of verse two. "+ ...
        "3 This is sentence one of verse three; "+ ...
        "this still sentence one of verse three. "+ ...
        "CHAPTER 2. " + ...
        "Random info in middle two. "+ ...
        "Random info still continues. "+ ...
        "1 This is sentence four? "+ ...
        "2 This is sentence five, "+ ...
        "3 this still sentence five but verse three!"+ ...
        "Random info at end's end."+ ...
        "Random info still continues. ";

我感兴趣的是,所有数据数据都可以称为中间的随机信息",它位于章节名称之后,诗句开始之前.

I'm interested all the data dat can be called "Random info in middle", which is after a Chapter name, and before a verse beginning.

我想使用功能"extractBetween"提取在章#"之间找到的信息.和"1"(第一句).

I would like to use the function "extractBetween" to extract the information found between "CHAPTER #" and "1"(First Verse).

我知道如何使用函数"extractBetween",但是如何确定"CHAPTER#"之前的位置.紧随"1"(第一节)之后的任何章节数量?

I know how to use the function "extractBetween", but how can I determine the locations just before "CHAPTER #" and just after "1"(First Verse), for any amount of Chapters?

最后,我想得到一个这样的答案,其中每个章节的随机信息都分配在一个表中:

At the end I would like to have such an answer, where the random information for each Chapter is allocated in a table:

我已经尝试过regexp()和findstr(),但是没有成功.所有帮助将不胜感激.谢谢!

I've tried, regexp() and findstr(), but have no success. All help will be appreciated. Thanks!

推荐答案

您可以将正则表达式与

You can use a regular expression with regexp to match the text.

[tokens, matches] = regexp(str, '(CHAPTER \d)\.\s*(.*?)1', 'tokens', 'match');

for k = 1:numel(tokens)
    fprintf('%s\t%s\n', tokens{k}(1), tokens{k}(2)); 
    % or: fprintf('%s\t%s\n', tokens{k}); 
end

将打印

CHAPTER 1   Random info in middle one, Random info still continues. 
CHAPTER 2   Random info in middle two. Random info still continues. 

解释正则表达式(CHAPTER \ d)\.\ s *(.*?)1 :

  • (CHAPTER \ d)匹配任何数字的章,并且其周围的()括号将在 tokens 变量中捕获该匹配项.
  • \.匹配时间段
  • \ s * 匹配任何可能的空格
  • (.*?)1 将捕获任何文本,直到文本中的下一个1.请注意问号以使其与惰性匹配,否则它将与所有文本匹配,直到 str 中的最后1个字符.
  • (CHAPTER \d) matches CHAPTER with any number, and the () brackets surrounding it will capture the match in the tokens variable.
  • \. matches the period
  • \s* matches any possible whitespace
  • (.*?)1 will capture any text till the next 1 in the text. Note the questionmark to make it match lazy, otherwise it will match all the text till the last 1 in str.

这篇关于MATLAB:简单的字符串分析-查找位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆