寻找包括最后一个元素的功能,如break [英] looking for a function like break that include the last element

查看:79
本文介绍了寻找包括最后一个元素的功能,如break的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

应该很简单,但是我不知道怎么做...让我们输入以下字符串:"0@workspace_command-7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c"

部分0@workspace_command可以是其他内容,但永远不会包含'-',我想要一个返回以下结果的函数:["0@workspace_command-","7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c"],我发现的所有函数都为我提供了以下结果: ...

解决方案

以下是我在上面的评论中主张的这种方法的一个示例,以展示这种方法的轻巧程度.我将对解析器组合器使用 regex-applicative ;一种类似的方法(有一些调整)将与其他组合器库一起使用.我还将使用 uuid 软件包.因此,样板:

import Data.List
import Data.UUID
import Text.Regex.Applicative
import Text.Regex.Applicative.Common

一种自定义的类型(我在不真正知道它们的用途的情况下选择了一个名称):

data IndexedCmd = IndexedCmd
    { index :: Int
    , command :: String
    , uuid :: UUID
    } deriving (Eq, Ord, Read, Show)

如果我们可以尝试其中之一,则它提供了许多break版本所没有的功能,并且不需要太多的代码来编写.这是我们建立一个的方法.我们首先需要为UUID建立一个解析器:

hexDigitAsChar :: RE Char Char
hexDigitAsChar = psym $ \c -> or
    [ '0' <= c && c <= '9'
    , 'a' <= c && c <= 'f'
    , 'A' <= c && c <= 'F'
    ]

parseUUID :: RE Char UUID
parseUUID = id
    . fmap read
    . sequenceA
    . intercalate [sym '-']
    $ [replicate n hexDigitAsChar | n <- [8,4,4,4,12]]

一旦我们准备好了,我们的IndexedCmd s解析器就会简短而甜美:

parseIndexedCmd :: RE Char IndexedCmd
parseIndexedCmd = pure IndexedCmd
    <*> decimal <* sym '@'
    <*> many anySym <* sym '-'
    <*> parseUUID

这就是整个发展过程.它比其他答案要长一些,但是它还做了很多事情,包括即使您具有break的确切变体,您可能仍然想做的很多工作.例如,它提取0@workspace_command-前缀的结构化表示形式.并检查UUID的格式是否正确,这对于使用Data.List裸露的功能来说实在是太烦人了,以至于如果我没有解析器组合器的话,我可能会厌倦编写代码并完全跳过它.

如果需要,我们现在可以使用match解析单个字符串:

> match parseIndexedCmd "0@workspace_command-7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c"
Just (IndexedCmd {index = 0, command = "workspace_command", uuid = 7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c})

作为附带的好处,我们现在甚至可以处理其中带有破折号的命令,使用break作为原语来复制该效果确实非常繁琐:

> match parseIndexedCmd "0@workspace-command-7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c"
Just (IndexedCmd {index = 0, command = "workspace-command", uuid = 7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c})

我们还可以继续进行开发,并将此解析器嵌入更大的解析器中,以容纳包含这些字符串的整个文件或作为某些其他结构化文件格式的一部分.

Something that should be easy but somehow I don't find how to do it... let's have the following string: "0@workspace_command-7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c"

the section 0@workspace_command can be something else but will never contain a '-', I want a function that returns the following result : ["0@workspace_command-","7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c"] , all the functions that I have found give me the following result : ["0@workspace_command","-7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c"]...

解决方案

Here is an example of the kind of approach I advocate in my comment above, to show how lightweight it can be. I'll use regex-applicative for my parser combinators; a similar approach (with some tweaks) will work with other combinator libraries. I'll also use the uuid package. So, the boilerplate:

import Data.List
import Data.UUID
import Text.Regex.Applicative
import Text.Regex.Applicative.Common

A custom-tailored type (I picked a name without really knowing what these are for):

data IndexedCmd = IndexedCmd
    { index :: Int
    , command :: String
    , uuid :: UUID
    } deriving (Eq, Ord, Read, Show)

If we can get our hands on one of these, it offers a lot of features that your break version doesn't, and didn't require much code to cook up. Here's how we build one. We need to build a parser for UUIDs first:

hexDigitAsChar :: RE Char Char
hexDigitAsChar = psym $ \c -> or
    [ '0' <= c && c <= '9'
    , 'a' <= c && c <= 'f'
    , 'A' <= c && c <= 'F'
    ]

parseUUID :: RE Char UUID
parseUUID = id
    . fmap read
    . sequenceA
    . intercalate [sym '-']
    $ [replicate n hexDigitAsChar | n <- [8,4,4,4,12]]

Once we have that in place, our parser for IndexedCmds is short and sweet:

parseIndexedCmd :: RE Char IndexedCmd
parseIndexedCmd = pure IndexedCmd
    <*> decimal <* sym '@'
    <*> many anySym <* sym '-'
    <*> parseUUID

That's the whole development. It's a bit longer than the other answers, but it also does a lot more, including a lot of work that you would probably want to do anyway even if you had the exact variant of break that you want. For example, it extracts a structured representation of the 0@workspace_command- prefix; and it checks that the UUID is in the right format, a task that is so annoying to do with bare Data.List functions that I would probably tire of writing the code and skip it entirely if I were doing this without parser combinators.

We can now use match to parse a single string if we want:

> match parseIndexedCmd "0@workspace_command-7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c"
Just (IndexedCmd {index = 0, command = "workspace_command", uuid = 7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c})

As a side bonus, we can now even handle commands which have dashes in them, an effect that would be very tedious indeed to replicate using break as our primitive:

> match parseIndexedCmd "0@workspace-command-7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c"
Just (IndexedCmd {index = 0, command = "workspace-command", uuid = 7a86cb8b-0987-fe40-61f2-42b0f6fc3e1c})

We can also continue our development and embed this parser in a larger one for entire files full of these strings or as part of some other structured file format.

这篇关于寻找包括最后一个元素的功能,如break的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆