简单的rss下载器在哈斯克尔 [英] simple rss downloader in haskell

查看:100
本文介绍了简单的rss下载器在哈斯克尔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

昨天,我试图在Haskell中使用 Network的帮助编写一个简单的rss下载程序。 HTTP Feed 库。我想从rss项目下载链接,并在项目标题后面命名下载的文件。



这是我的短代码:

  import Control.Monad 
import Control.Applicative
import Network.HTTP
import Text.Feed.Import
import Text.Feed.Query
import Text.Feed.Types
import Data.Maybe
将限定的Data.ByteString导入为B
import Network.URI(parseURI,uriToString)

getTitleAndUrl :: Item - > (Maybe String,Maybe String)
getTitleAndUrl item =(getItemTitle item,getItemLink item)

downloadUri ::(String,String) - > IO()
downloadUri(title,link)= do
file< - get link
B.writeFile title file
where
get url = let uri = case
的parseURI网址无 - >错误$无效的uri++ url
只是你 - >在
simpleHTTP(defaultGETRequest_ uri)>> = getResponseBody

getTuples :: IO(Maybe [(Maybe String,Maybe String)])
getTuples = fmap(map getTitleAndUrl)< $> fmap(feedItems)< $> parseFeedString< $> (simpleHTTP(getRequesthttp://index.hu/24ora/rss/)>> = getResponseBody)

我达到了一个包含元组的列表,其中包含名称和相应的链接。我有一个 downloadUri 函数,它可以正确地将给定的链接下载到一个名为rss项目标题的文件中。



我已经尝试修改 downloadUri 来处理(可能是String,也许是字符串) with fmap - 获得 writeFile 但是失败了, p>


  • 如何将 downloadUri 函数应用于 getTuples 函数。我想实现以下主要功能:
    $ b $ p $ main $ ::()


    main =一些魔法咒语donwloadUri更多咒语getTuples


  • 结果的字符编码 getItemTitle 破碎,它将代码点放在重音字符的位置。 feed是utf8编码的,我认为所有的haskell字符串操作函数默认为utf8。



编辑:

感谢您的帮助,我成功实施了我的主要和辅助功能。代码如下:

  downloadUri ::(可能是String,也许是String) - > IO()
downloadUri(Just title,Just link)= do
item< - get link
B.writeFile title item
where
get url = let uri =案例parseURI
的网址无 - >错误$无效的uri++ url
只是你 - > $ in
simpleHTTP(defaultGETRequest_ uri)>> = getResponseBody
downloadUri _ = printSomewhere something goes Nothing
$ b $ getTuples :: IO(Maybe [(Maybe String, Maybe String)])
getTuples = fmap(map getTitleAndUrl)< $> fmap(feedItems)< $> parseFeedString< $> decodeString< $> (simpleHTTP(getRequesthttp://index.hu/24ora/rss/)>>> = getResponseBody)

downloadAllItems :: Maybe [(Maybe String,Maybe String)] - > IO()
downloadAllItems(只是feedlist)= mapM_ downloadUri $ feedlist
downloadAllItems _ =错误feed不会被解析
$ b $ main = getTuples>>> = downloadAllItems

字符编码问题已部分解决,我把 decodeString 在提要解析之前,所以文件得到正确的命名。但是,如果我想打印出来,问题仍然存在。最小工作示例:

  main = getTuples 


解决方案

听起来好像是 Maybe ,这给你带来麻烦。有很多方法可以处理 Maybe 值,以及一些有用的库函数,如 fromMaybe fromJust 。然而,最简单的方法是在 Maybe 值上进行模式匹配。我们可以调整您的downloadUri函数以使用 Maybe 值。下面是一个例子:

  downloadUri ::(可能是String,也许是String) - > IO()
downloadUri(Just title,Just link)= do
file< - get link
B.writeFile title file
where
get url = let uri =案例parseURI
的网址无 - >错误$无效的uri++ url
只是你 - > $ in
simpleHTTP(defaultGETRequest_ uri)>> = getResponseBody
downloadUri _ =错误我的一个参数是Nothing。

或者也许你可以让标题默认为空,在这种情况下,你可以在前一个例子中的最后一行:

  downloadUri(Nothing,Just link)= downloadUri(只是,只是链接)

现在唯一需要处理的 Maybe 是外部的一个,应用于元组数组。再次,我们可以模式匹配。编写这样的帮助函数可能是最清楚的:

  downloadAllItems(Just ts)= ??? - 提示:试试`mapM` 
downloadAllItems Nothing = ??? - 不要做任何事情,或报告错误,或...

至于你的编码问题,我的猜测是:


  1. 您正在从非UTF-8编码的文件中读取信息,或者您的系统没有不知道它是UTF-8编码的。

  2. 您正在正确地阅读信息,但在输出时会被搞乱。

为了帮助你解决这个问题,我需要看一个完整的代码示例,它显示了你如何阅读信息以及如何输出它。


Yesterday i tried to write a simple rss downloader in Haskell wtih hte help of the Network.HTTP and Feed libraries. I want to download the link from the rss item and name the downloaded file after the title of the item.

Here is my short code:

import Control.Monad
import Control.Applicative
import Network.HTTP
import Text.Feed.Import
import Text.Feed.Query
import Text.Feed.Types
import Data.Maybe
import qualified Data.ByteString as B
import Network.URI (parseURI, uriToString)

getTitleAndUrl :: Item -> (Maybe String, Maybe String)
getTitleAndUrl item = (getItemTitle item, getItemLink item)

downloadUri :: (String,String) -> IO ()
downloadUri (title,link) = do
  file <- get link
  B.writeFile title file
    where
      get url = let uri = case parseURI url of
                      Nothing -> error $ "invalid uri" ++ url
                      Just u -> u in
                simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody

getTuples :: IO (Maybe [(Maybe String, Maybe String)])
getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)

I reached a state where i got a list which contains tuples, which contains name and the corresponding link. And i have a downloadUri function which properly downloads the given link to a file which has the name of the rss item title.

I already tried to modify downloadUri to work on (Maybe String,Maybe String) with fmap- ing on get and writeFile but failed with it horribly.

  • How can i apply my downloadUri function to the result of the getTuples function. I want to implement the following main function

    main :: IO ()
    main = some magic incantation donwloadUri more incantation getTuples

  • The character encoding of the result of getItemTitle broken, it puts code points in the places of the accented characters. The feed is utf8 encoded, and i thought that all haskell string manipulation functions are defaulted to utf8. How can i fix this?

Edit:

Thanks for you help, i implemented successfully my main and helper functions. Here comes the code:

downloadUri :: (Maybe String,Maybe String) -> IO ()
downloadUri (Just title,Just link) = do
  item <- get link
  B.writeFile title item
    where
      get url = let uri = case parseURI url of
                      Nothing -> error $ "invalid uri" ++ url
                      Just u -> u in
                simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
downloadUri _ = print "Somewhere something went Nothing"

getTuples :: IO (Maybe [(Maybe String, Maybe String)])
getTuples = fmap (map getTitleAndUrl) <$> fmap (feedItems) <$> parseFeedString <$> decodeString <$> (simpleHTTP (getRequest "http://index.hu/24ora/rss/") >>= getResponseBody)

downloadAllItems :: Maybe [(Maybe String, Maybe String)] -> IO ()
downloadAllItems (Just feedlist) = mapM_ downloadUri $ feedlist
downloadAllItems _ = error "feed does not get parsed"

main = getTuples >>= downloadAllItems

The character encoding issue has been partially solved, i put decodeString before the feed parsing, so the files get named properly. But if i want to print it out, the issue still happens. Minimal working example:

main = getTuples

解决方案

It sounds like it's the Maybes that are giving you trouble. There are many ways to deal with Maybe values, and some useful library functions like fromMaybe and fromJust. However, the simplest way is to do pattern matching on the Maybe value. We can tweak your downloadUri function to work with the Maybe values. Here's an example:

downloadUri :: (Maybe String, Maybe String) -> IO ()
downloadUri (Just title, Just link) = do
  file <- get link
  B.writeFile title file
    where
      get url = let uri = case parseURI url of
                      Nothing -> error $ "invalid uri" ++ url
                      Just u -> u in
                simpleHTTP (defaultGETRequest_ uri) >>= getResponseBody
downloadUri _ = error "One of my parameters was Nothing".

Or maybe you can let the title default to blank, in which case you could insert this just before the last line in the previous example:

downloadUri (Nothing, Just link) = downloadUri (Just "", Just link)

Now the only Maybe you need to work with is the outer one, applied to the array of tuples. Again, we can pattern match. It might be clearest to write a helper function like this:

downloadAllItems (Just ts) = ??? -- hint: try a `mapM`
downloadAllItems Nothing = ??? -- don't do anything, or report an error, or...

As for your encoding issue, my guesses are:

  1. You're reading the information from a file that isn't UTF-8 encoded, or your system doesn't realise that it's UTF-8 encoded.
  2. You are reading the information correctly, but it gets messed up when you output it.

In order to help you with this problem, I need to see a full code example, which shows how you're reading the information and how you output it.

这篇关于简单的rss下载器在哈斯克尔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆