Post by "Lennoli", 03-08-2007, 9:26
-----------------------------------------------------
这几天看到了一篇国外著名测试专家所写的关于如何处理“如何复现不可复现的Bug”的文章,看完后觉得挺不错的,拿出来与各位分享:
=================================================================================================
Five Dimensions of Exploratory Testing
(Edit: was Four Dimensions of Exploratory Testing. Reader feedback has shown that I missed a dimension.)
I've been working on analyzing what I do when Exploratory Testing. I've found that describing what I actually do when testing can be difficult, but I'm doing my best to attempt to describe what I do when solving problems. One area that I've been particularly interested in lately is intermittent failures. These often get relegated to bug purgatory by being marked as "Non-reproducible" or "unrepeatable bugs". I'm amazed at how quickly we throw up our hands and "monitor" the bugs, allowing them to languish sometimes for years. In the meantime, a customer somewhere is howling, or quietly moving on to a competitor's product because they are tired of dealing with it. If only "one or two" customers complain, I know that means there are many others who are not speaking up and possibly quietly moving to a competitor's product.
So what do I do when Exploratory Testing when I track down an intermittent defect? I started out trying to describe what I do with an article for Better Software called Repeating the Unrepeatable Bug, and I've been mulling over concepts and ideas. Fortunately, James Bach has just blogged about How to Investigate Intermittent Problems which is an excellent, thorough post describing ideas on how to make an intermittent bug a bug that can be repeated regularly.
When I do Exploratory Testing, it is frequently to track down problems after the fact. When I test software, I look at risk, I draw from my own experience testing certain kinds of technology solutions, and I use the software. As I use the software, I inevitably discover bugs, or potential problem areas, and often follow hunches. I constantly go through a conjecture and refutation loop where I almost subconciously posit an idea about an aspect of the software, and design a test to decide whether my conjecture is falsifiable or not. (See the work of Karl Popper for more on conjectures and refutations and falsifiability.) It seems that I do this so much, I rarely think about it. Other times, I very consciously follow the scientific method, and design an experiment with controlled variables, a manipulated variable, and observe the responding variables.
When I spot an intermittent bug, I begin to build a theory about it. My initial theory is usually wrong, but I keep gathering data, altering it, and following the conjecture/refutation model. I draw information from others as I gather information and build the theory. I run the theory by experts in particular areas of the software or system to get more insight.
When I do Exploratory Testing to track down intermittent failures, these are five dimensions that I consider:
* Product
* Environment
* Patterns
* People
* Tools & Techniques
Product
This means having the right build, installed and configured properly. This is usually a controlled variable. This must be right, as a failure may occur at a different rate depending on the build. I record what builds I have been using, and the frequency of the failure on a particular build.
Environment
Taking the environment into account is a big deal. Installing the same build on slightly different environments can have an impact on how the software responds. This is another controlled variable that can be a challenge to maintain, especially if the test environment is used by a lot of people. Failures can manifest themselves differently depending on the context where they are found. For example, if one test machine has less memory than another, it might exacerbate the underlying problem. Sometimes knowing this information is helpful for tracking it down, so I don't hesitate to change environments if an intermittent problem occurs more frequently in one than another, using the environment a manipulated variable.
Patterns
When we start learning to track down bugs when we find them in a product, we learn to repeat exactly what we were doing prior to the bug occurring. We repeat the steps, repeat the failure, and then weed out extraneous information to have a concise bug report. With intermittent bugs, these details may not be important. In many cases I've seen defect reports for the same bug logged as several separate bugs in a defect database. Some of them have gone back for two or three years. We seldom look for patterns, instead, we focus on actions. With intermittent bugs, it is important to weed out the details, and apply an underlying pattern to the emerging theory.
For example, if a web app is crashing at a certain point, and we see SQL or database connection information in a failure log, a conjecture might be: "Could it be a database synchronization issue?" Through collaboration with others, and using tools, I could find information on where else in the application the same kind of call to a database is made, and test each scenario that makes the same kind of call to try to refute that conjecture. Note that this conjecture is based on the information we have available at the time, and is drawn from inference. It isn't blind guesswork. The conjecture can be based on inference to the best explanation of what we are observing, or "abductive inference".
A pattern will emerge over time as this is repeated, and more information is drawn in from outside sources. That conjecture might be false, so I adjust and retest and record the resulting information. Once a pattern is found, the details can be filled in once the bug is repeatable. This is difficult to do, and requires patience and introspection as well as collaboration with others. This introspection is something I call "after the fact pattern analysis". How do I figure out what was going on in the application when the bug occured, and how do I find a pattern to explain what happened? This emerges over time, and may change directions as more information is gathered from various sources. In some cases, my original hunch was right, but getting a repeatable case involved investigating the other possibilities and ruling them out. Aspects from each of these experiments shed new light on an emerging pattern. In other cases, a pattern was discovered by a process of elimination where I moved from one wrong theory to the next in a similar fashion.
The different patterns that I apply are the manipulated variables in the experiment, and the resulting behavior is the responding variable. Once I can repeat the responding variable on command, it is time to focus on the details and work with a developer on getting a fix.
Update:
Patterns are probably the most important dimension, and reader feedback shows I didn't go into enough detail in this section. I'll work on the patterns dimension and explain it more in another post.
People
When we focus on technical details, we frequently forget about people. I've posted before about creating a user profile, and creating a model of the user's environment. James Bach pointed me to the work of John Musa who has done work in software reliability engineering. The combination of the user's profile and their environment I was describing is called an "operational profile".
I also rely heavily on collaboration when working on intermittent bugs. Many of these problems would have been impossible for me to figure out without the help and opinions of other testers, developers, operations people, technical writers, customers, etc. I recently described this process of drawing in information at the right time from different specialists to some executives. They commented that it reminded them of medical work done on a patient. One person doesn't do it all, and certain health problems can only be diagnosed with the right combination of information from specialists applied at just the right time. I like the analogy.
Tools & Techniques
When Exploratory Testing, I am not only manually testing, but also use whatever tools and techniques help me build a model to describe the problem I'm trying to solve. Information from automated tests, log analyzers, looking at the source code, the system details, anything might be relevant to help me build a model on what might be causing the defect. As James Bach and Cem Kaner say, ET isn't a technique, it's a way of thinking about testing. Exploratory Testers use diverse techniques to help gather information and test out theories.
I refer to using many automated or diagnostic testing tools to a term I got from Cem Kaner: "Computer Assisted Testing." Automated test results might provide me with information, while other automated tests might help me repeat an intermittent defect more frequently than manual testing alone. I sometimes automate certain features in an application that I run while I do manual work as well which I've found to be a powerful combination for repeating certain kinds of intermittent problems. I prefer the term Computer Assisted Testing over "automated tests" because it doesn't imply that the computer takes the place of a human. Automated tests still require a human brain behind them and to analyze their results. They are a tool, not a replacement for human thinking and testing.
Next time you see a bug get assigned to an "unrepeatable" state, review James' post. Be patient, and don't be afraid to stand up to adversity to get to the cause. Together we can wipe out the term "unrepeatable bug". (原文出处:http://www.kohl.ca/blog/archives/000115.html)
==================================================================================================
简要的中文翻译:
要想复现不可复现的Bug,需要先提到一个概念就是ET(Exploring Test),也就是探索式测试,这种测试方法是由James Bach首先提出来的,在所掌握的被测对象的信息不是很充分的情况下,这是一种很有效的测试方法,如果大家感兴趣,我再整理一篇ET的文章出来。
在给大家阐述如何复现不可复现的Bug的思路之前,先说说为什么要复现这些不可复现的Bug(废话两句^_^)。对于整个项目或者产品而言,如果这些不可复现的Bug是很严重的Bug,比如导致系统崩溃等,如果不能及时、准确的定位和解决,最终发布出来的软件到达用户手中后,一旦出现势必会影响软件已经公司在用户心中的形象,严重的会“迫使”用户选择竞争对手的产品,这些显然都是公司所不愿看到的。而对于测试人员而言,出现了这些不可复现的Bug,实际上是一次很好的锻炼和提高机会,如果只是提交缺陷报告将这个大皮球踢给开发人员,不仅丧失了一次提高测试水平的机会,还有可能破坏和开发人员之间的关系。
废话完了,进入正题。当出现不可复现的Bug时,大家可以从以下五个方面来进行考虑:
1、被测对象的版本信息
我测试的到底是哪个版本,这主要是有两个作用:一是确认我测试的是正式的软件版本,如果不是就先记录下该问题,然后选择正式的版本进行测试(开发人员基于尝试的一次非正规的修改可能会导致不可复现的Bug);二是可以和其它版本进行对比,如果其它的版本没有类似的问题,就可以去对比这两个版本之间的区别。
2、环境
这里的环境是指出现不可复现的Bug时所对应的测试环境等,比如测试所用的计算机,如果出现不可复现的Bug,那我换一台机器是不是还会出现类似的问题,也就是说通过环境的改变来进一步搜集不可复现Bug的相关信息。
3、模式
这里的模式是指我对这个Bug如何出现的一个理解,先给这个Bug设定一个模式,比如是不是数据库通信中断,然后再进行测试,收集更多的信息去修改和完善这个模式,这样不断进行,最终直到Bug能完全复现为止,这个时候只要使用这个模式就可以复现出Bug了。
4、人
这里提到的人有两个含义:一是测试是由人来进行的,人的操作、人的思维方式会有不同,通过分析这些信息也有可能找到这些不可复现的Bug的蛛丝马迹;二是想复现不可复现的Bug,往往需要多个人之间的相互协作,比如测试人员、开发人员等,通过大家的沟通和协作就能更容易去复现了。
5、测试工具
通过一些debug工具或者log工具等搜集内存等信息,根据这些信息来进行分析,找出不同信息之间的共同点,比如某一块内存始终都会被改写等,通过这种方式来去复现Bug。
上面的五个方面都是和ET的思想紧密相关的,通过不断的测试和不断的信息收集和分析,逐步的把模糊的、不确定的测试变成清晰的、确定的测试,这样就能复现那些不能复现的Bug了。考虑信息时可以从以上五个方面来进行考虑。 |
|