有没有一种方法可以使用可读性(文本提取算法)和python中的自定义算法从文本中提取链接? [英] Is there a way to use readability (text extraction algorithm) and a custom algorithm in python to extract links from text?

查看：200 发布时间：2020/6/18 19:18:28 python html-content-extraction text-extraction

本文介绍了有没有一种方法可以使用可读性(文本提取算法)和python中的自定义算法从文本中提取链接?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

有没有一种方法可以使用可读性(文本提取算法)和python中的自定义算法从文本中提取链接?

Is there a way to use readability (text extraction algorithm) and a custom algorithm in python to extract links from text?

我想找出一种提取文本正文中链接的方法.

I'd like to figure out a way of extracting links that are in the body of text.

1.) I use readability in python https://github.com/gfxmonk/python-readability

2.)我想以某种方式将提取的文本与原始html文本进行比较，以提取文章实际正文中的链接.

2.) I'd like to somehow compare the extracted text to the original html text in order to extract links in the actual body of an article.

好吧，看起来它返回了BeautifulSoup树.因此，您应该可以执行以下操作:

Well, it looks like it returns a BeautifulSoup tree. So you should be able to do something like:

article = page.summary()   # Extract article using readability
article.findAll("a")       # Return a list of all links in the article

这篇关于有没有一种方法可以使用可读性(文本提取算法)和python中的自定义算法从文本中提取链接?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文