John Stuart Mill, an English philosopher during the Victorian period, proposed several methods for getting to the bottom of what is causing a particular effect under investigation. The wikipedia article provides a helpful introduction. I first came across these in an inductive logic course (Copi’s Introduction to Logic was the textbook).
While writing this blog, two of the “See also” links at the bottom of the Wikipedia page caught my attention. The first is on the Baconian method, the second on Bayesian networks. It turns out that Mill published his methods both to popularize and extend Bacon’s work. Hopefully I’ll get some time someday to read both Mill’s and Bacon’s works. Bacon’s looks particularly obtuse, clothed in idiosyncratic philosophical language that I’ll likely need to paraphrase a good deal before I understand it.
Bayesian networks look like they take Bacon’s and Mill’s methods in a more formal direction. If Bacon is philosophically murky, the Bayesian networks are mathematically imposing, but look worth the time of further investigation. Although the math looks a little above me now, it does involve two sexy areas (at least for me) of math: probability and graph theory. I’ll try to give my experience with it when I get to it.
For now, I’ll stick with Mill’s methods and how they’ve helped me in my current job.
While the Windows system currently serving the school keeps things running well overall, there often are cryptic problems that seem inexplicable when first encountered. It seems that Microsoft had good intentions about extensive and careful error logging, but so far, most error codes I come across are too generic to be meaningful. Looking these codes up on TechNet and elsewhere on the net sometimes helps, but is still more miss than hit. Also, I’m having to deal with the electrical subsystem armed only with a smattering of memories of time spent with my electrical engineer uncle and half-remembered high school general science education on electricity. I can think of several other species of problems that don’t lend themselves to immediate solution via Google.
The biggest (technical) challenge in my job seems to lie in identifying the problem. This invariably involves understanding causes and effects–the very sort of thing Mill was interested in. So what are Mill’s methods and how have they helped me in this job? Since Wikipedia has already done a good job summarizing them semi-formally, I’ll give a more informal explanation along with examples from my job.
Method of Agreement
If you have 2 or more cases of a problem, and they all have one particular thing in common, that thing is the cause, or part of the cause of the problem.
Recently, I’ve been trying to find out why 6 computers had an error getting updates from the local automatic update (WSUS). Today I dug into Event Viewer on each of these machines and looked for any errors at all. I found that the computers all had gripes about ASP.NET not finding an IIS server. So in Mill’s terminology, these cases agree on the circumstance causing the problem. Although they each have numerous differences in software, hardware, etc. from each other, they had this one thing in common. Now I was able to search more effectively and found a simple one-liner in the command line that seems to have fixed the problem since they no longer show up with an update error.
Method of Difference
If you have two cases that are mostly the same, but one has a problem and the other doesn’t, the problem lies somewhere in the difference between the two cases.
Ok, not the best paraphrase, so here’s an example. In our computer labs, the machines are mostly the same. They are the same model, bought at the same time, from the same manufacturere, and have the same OS. When I have a problem with one lab computer, I can look at the one sitting right next to it that doesn’t and see what’s different between the two. While there are often a number of differences, it still narrows the possible source of the problem down significantly.
This method tends to be a little more powerful than the method of agreement because I sometimes find different partitions of problems when I use agreement. In my example for agreement above, I actually found that 4 of the computers had a problem with ASP.NET, but 2 had problems with SQL Server 2005. The method still worked, but I had to do a little thinking to realize I was dealing with two separate problems, not the same one.
With the method of difference, on the other hand, if I’m lucky enough to have a clear idea of where the difference lies, the problem is immediately narrowed and I can hone in more directly on the nature of the problem. I can’t really explain it, but it does seem to get results faster.
Also, the method of difference often works well even if the machines are not both lab computers and have greater differences. Although a computer in a teacher’s classroom may be a completely different model and have a different OS, they are still both computers filling very similar roles. More often than not I already know the problem lies on the networking layer and can begin by looking at differences in how they are handling DNS or if they are both pinging well or whatever. The greater the similarity between two cases, the easier it is to determine the area of difference. The very purpose of a scientist’s controlled experiment is to try to create two cases in which everything is the same except one purposely introduced difference. However, there are many areas of science where experimentation simply isn’t possible, only observation. The method of difference can still be used to good effect, just with more effort and less forceful conclusions.
A smart cookie might think about combining the methods of agreement and difference, and that’s just what Mill did…
Joint Method of Agreement and Difference
Not much to say here, and honestly, every definition I’ve read confuses me more than enlightens, so I’ll go directly to an example.
Let’s say I ended up with many different computers all over campus having a problem connecting to the domain. I would start eliminating possibilities of the source of the problem using both methods:
Do all computers point to the same server? No. But the ones with problems point to 192.168.1.253 (agreement).
If I point one of the problem computers to a different server, do I still have a problem? No. It works now. So the problem has to do with the difference between the working machines and the broken ones, and that difference involves that IP or the server using that IP (difference).
If I change the IP of the server for the machines that have a problem, and then point one of them to the new IP, do I still have a problem? Yes. So the problem is not with the address itself, but with the server (difference. Or is it agreement?)
Anyone reading this who does troubleshooting goes through steps like these, probably without thinking for a moment about Mill’s methods. Most troubleshooters may have never even heard of Mill. I didn’t need Mill’s methods to fix a problem like this, but I’ve found that being aware of them has increased the clarity I have regarding a problem and encouraged me to be more guided in my experimentation.
From all this, I’ve learned to continually (almost continuously) ask two simple questions, which embody the methods discussed so far:
“What is the difference between the machines that are working and those that are not?”
“Of the machines that have a problem, what do they have in common?”
Finally, once things are narrowed down, carefully introducing differences to test what is in common that might be the problem often yields results.
That’s the end of my post today. But wait! There’s more! We still have two more methods to cover, and we will in a later post.