如何跟踪使用Python在网页浏览器中打开网页? [英] How to keep track of webpages opened in web-browser using Python?

查看:439
本文介绍了如何跟踪使用Python在网页浏览器中打开网页?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想写一个Python脚本,可以跟踪哪些网页在我的网页浏览器(Mozilla Firefox浏览器23)被打开。我不知道从哪里开始。该标准的Python的web浏览器模块允许网页被打开,但标准文档没有关于与网页交互的任何东西。

I want to write a Python script which can keep track of which webpages have been opened in my webbrowser(Mozilla Firefox 23). I don't know where to start. The standard webbrowser module of Python allows webpages to be opened but the standard documentation doesn't have anything about interacting with the webpage.

所以,我需要写我的浏览器插件,可以将数据发送到我的Python脚本我在标准库中缺失的功能?

So do I need to write a plugin for my browser which can send the data to my Python script for am I missing functionality from the standard library?

我已经看过像一些相关的问题,但他们都有关使用机械化和/或硒在Python模拟网页浏览器。我不想这样做。我想使用标准Python库从我的web浏览器中获取数据。

I have looked at some related questions like this but they are all about simulating web-browser in Python using mechanize and/or selenium. I don't want to do that. I want to get data from my webbrowser in using standard Python libraries.

修改

只是一些更清晰添加到这个问题,我想跟踪当前网页在Firefox中打开。

Just to add some more clarity to the question, I want to keep track of the current webpages open in firefox.

推荐答案

这答案可能有点模糊 - 那是因为这个问题不是非常具体。

This answer may be a bit fuzzy -- that is because the question is not extremely specific.

如果我理解得很好,你要检查的历史的被访问页面。的问题是,它没有直接关系到一个HTML,也不到http协议,也不Web服务。历史(您可以在Firefox中观察时pressing按Ctrl-H)是在Firefox,因此实施的工具,它绝对是实现有关。不能有任何的标准的库,将能够提取的信息。

If I understand it well, you want to examine History of the visited pages. The problem is that it is not directly related to an HTML, nor to http protocol, nor to web services. The history (that you can observe in Firefox when pressing Ctrl-H) is the tool implemented in Firefox and as such, it is definitely implementation dependent. There can be no standard library that would be capable to extract the information.

对于HTTP协议和HTML格式的网页的内容,有什么样的交互的与网页的内容。该协议使用与URL作为参数GET和Web服务器发回文本正文一些元信息。主叫方(浏览器)可以做返回的数据什么。浏览器使用标记文本,跨$ P $点它与作为很好的渲染尽可能部件可读的文件。的相互作用(点击A HREF)由浏览器执行。它会导致HTTP协议的其他GET命令。

As for the HTTP protocol and the content of the pages in HTML, there is nothing like interaction with the content of the pages. The protocol uses GET with URL as the argument, and the web server sends back the text body with some meta information. The caller (the browser) can do anything with the returned data. The browser uses the tagged text and interprets it as a readable document with parts rendered as nicely as possible. The interaction (clicking on a href) is implemented by the browser. It causes other GET commands of the http protocol.

要回答你的问题,你需要找到如何Mozilla Firefox浏览器23存储的历史。很可能,你可以在内部SQLite数据库的某个地方找到它。

To answer your question, you need to find how Mozilla Firefox 23 stores the history. It is likely that you can find it somewhere in the internal SQLite databases.

更新2015年8月24日:查看有关放置在Firefox中的信息的变化erasmortg的评论。 (下面的文字是比这更旧。)

Update 2015-08-24: See the erasmortg's comment about the changes of placing the information in Firefox. (The text below is older than this one.)

更新:打开的选项卡的列表被绑定到用户。正如你可能希望它为Windows,你应该先得到这样的路径<$c$c>c:\\Users\\myname.mydomain\\AppData\\Roaming\\Mozilla\\Firefox\\Profiles\\yoodw5zk.default-1375107931124\\sessionstore.js.配置文件名可能应该从提取Ç:\\用户\\ myname.mydomain \\应用程序数据\\漫游\\ Mozilla的\\火狐\\ profiles.ini 。我刚才复制的 sessionstore.js 为试图获取数据。因为它说的的JavaScript 的,我没有使用标准的 JSON 模块解析它。你基本上得到解释。一个具有键窗口的项目包含另一个字典,它的标签又包含以下信息标签。

Update: The list of open tabs is bound to the user. As you probably want it for Windows, you should first get the path like c:\Users\myname.mydomain\AppData\Roaming\Mozilla\Firefox\Profiles\yoodw5zk.default-1375107931124\sessionstore.js. The profile name should probably be extracted from the c:\Users\myname.mydomain\AppData\Roaming\Mozilla\Firefox\profiles.ini. I have just copied the sessionstore.js for trying to get the data. As it says javascript, I did use the standard json module to parse it. You basically get the dictionary. One of the items with the key 'windows' contains another dictionary, and its 'tabs' in turn contains information about the tabs.

有复制你的 sessionstore.js 来工作目录并执行以下脚本:

Copy your sessionstore.js to a working directory and execute the following script there:

#!python3

import json

with open('sessionstore.js', encoding='utf-8') as f:
    content = json.load(f)

# The loaded content is a dictionary. List the keys first (console).
for k in content:
    print(k)

# Now list the content bound to the keys. As the console may not be capable
# to display all characters, write it to the file.
with open('out.txt', 'w', encoding='utf-8') as f:

    # Write the overview of the content.
    for k, v in content.items():
        # Write the key and the type of the value.
        f.write('\n\n{}:  {}\n'.format(k, type(v)))

        # The value could be of a list type, or just one item.
        if isinstance(v, list):
            for e in v:
                f.write('\t{}\n'.format(e))
        else:
            f.write('\t{}\n'.format(v))

    # Write the content of the tabs in each windows.
    f.write('\n\n=======================================================\n\n')
    windows = content['windows']
    for n, w in enumerate(windows, 1):  # the enumerate is used just for numbering the windows
        f.write('\n\tWindow {}:\n'.format(n))
        tabs = w['tabs']
        for tab in tabs:
            # The tab is a dictionary. Display only 'title' and 'url' from 
            # 'entries' subdictionary.
            e = tab['entries'][0]
            f.write('\t\t{}\n\t\t{}\n\n'.format(e['url'], e['title']))

既显示在控制台(几行)的结果,并写进工作目录中的 out.txt 文件。在 out.txt (在文件的结尾)包含类似的东西在我的情况:

The result is both displayed on the console (few lines), and written into the out.txt file in the working directory. The out.txt (at the end of file) contains something like that in my case:

Window 1:
    http://www.cyrilmottier.com/
    Cyril Mottier

    http://developer.android.com/guide/components/fragments.html#CommunicatingWithActivity
    Fragments | Android Developers

    http://developer.android.com/guide/components/index.html
    App Components | Android Developers

    http://www.youtube.com/watch?v=ONaD1mB8r-A
    ▶ Introducing RoboSpice: A Robust Asynchronous Networking Library for Android - YouTube

    http://www.youtube.com/watch?v=5a91dBLX8Qc
    Rocking the Gradle with Hans Dockter - YouTube

    http://stackoverflow.com/questions/18439564/how-to-keep-track-of-webpages-opened-in-web-browser-using-python
    How to keep track of webpages opened in web-browser using Python? - Stack Overflow

    https://www.google.cz/search?q=Mozilla+firefox+list+of+open+tabs&ie=utf-8&oe=utf-8&rls=org.mozilla:cs:official&client=firefox-a&gws_rd=cr
    Mozilla firefox list of open tabs - Hledat Googlem

    https://addons.mozilla.org/en-US/developers/docs/sdk/latest/dev-guide/tutorials/list-open-tabs.html
    List Open Tabs - Add-on SDK Documentation

    https://support.mozilla.org/cs/questions/926077
    list all tabs button not showing | Fórum podpory Firefoxu | Podpora Mozilly

    https://support.mozilla.org/cs/kb/scroll-through-your-tabs-quickly
    Scroll through your tabs quickly | Nápověda k Firefox

这篇关于如何跟踪使用Python在网页浏览器中打开网页?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆