调试仅生产错误的过程是什么? [英] What is the procedure for debugging a production-only error?

查看:136
本文介绍了调试仅生产错误的过程是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我先说说,我对这个话题无知,甚至不知道这个问题是否有客观的答案。如果最终是不,我将删除或投票结束该帖子。



这是场景:我刚写了一个小的Web服务。它在我的机器上工作。它适用于我的团队领导的机器。据我所知,除了生产服务器之外的每台机器上都有效。生产服务器在出现故障时溢出的例外来自于第三方JAR文件,并且对信息不太了解。我在网上搜了几个小时,但是没有提供任何有用的东西。



那么跟踪生产机器上出现的问题的过程是什么?有没有一个标准的方法,或者可能是一个类别/工具系列?



启发这个问题的错误已经被修复,但这更多的是好运而不是一个坚实的调试方法。我问这个问题以备将来参考。



编辑:

到目前为止,这个答案似乎总结为一个字: strong>日志记录。记录的一个问题是它需要预想。如果在现有的日志记录不佳的系统中出现这种情况,客户端是否担心敏感数据,而不是首先需要系统中广泛的日志记录系统呢?



< blockquote>

一些相关问题:

测试生产系统中的帐户和产品

在生产代码/服务器上运行测试



解决方案

除了这是非常宝贵的日志记录,这里是我和我的同事多年来使用的一些其他技术,回到我们无法访问的客户端机器上的16位窗口。 (我是否约会自己?)授予,不是一切都可以/将工作。




  • 分析您看到的所有行为。
  • 如果有可能,请重现。

  • 书桌检查,浏览您怀疑的代码。

  • 橡皮鸭它与团队成员和对代码很少或根本不熟悉的人。你必须向某人解释一些东西,你有更多机会发现某些东西。

  • 不要感到沮丧。休息5-10分钟快速走过建筑物/街道/任何地方。不要考虑那个时候的问题。

  • 听你的直觉。


Let me say upfront that I'm so ignorant on this topic that I don't even know whether this question has objective answers or not. If it ends up being "not," I'll delete or vote to close the post.

Here's the scenario: I just wrote a little web service. It works on my machine. It works on my team lead's machine. It works, as far as I can tell, on every machine except for the production server. The exception that the production server spits out upon failure originates from a third-party JAR file, and is skimpy on information. I search the web for hours, but don't come up with anything useful.

So what's the procedure for tracking down an issue that occurs only on production machines? Is there a standard methodology, or perhaps a category/family of tools, for this?

The error that inspired this question has already been fixed, but that was due more to good fortune than a solid approach to debugging. I'm asking this question for future reference.

EDIT:
The answer to this so far seems to be summed up by one word: logging. The one issue with logging is that it requires forethought. What if a situation comes up in an existing system with poor logging, or the client is worried about sensitive data and does not want extensive logging systems in the system in the first place?

Some related questions:
Test accounts and products in a production system
Running test on Production Code/Server

解决方案

In addition to logging, which is invaluable, here are are some other techniques myself and my co-workers have used over the years... going back to 16-bit windows on client machines we had no access to. (Did I date myself?) Granted, not everything can/will work.

  • Analyze any and all behavior you see.
  • Reproduce, if at all possible, reproduce it.
  • Desk check, walk through code you suspect.
  • Rubber duck it with team members AND people who have little or no familiarity with the code. The more you have to explain something to someone, the better chance you have of uncovering something.
  • Don't get frustrated. Take a 5-10 minute break. Take a quick walk across the building/street/whatever. Don't think about the problem for that time.
  • Listen to your instincts.

这篇关于调试仅生产错误的过程是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆