Read The Log Files…
Posted by Kevin Powe on 06 Feb 2009 | Tagged as: How To, Plain English

I don’t claim to be a Top Gun consultant by any stretch of the imagination. Sadly, the Val Kilmer of IT consulting (whoever that might be) is unlikely to offer to let me ride his tail any time. (seriously, how could you deliver that dialogue with a straight face? Even in the 80s?)
But, I have gotten a lot of wins on client site by following one deceptively simple rule: read the log files. When strange things are happening intermittently, the first and best port of call is the log files for the system you’re debugging. Now, I don’t claim to be a genius in having discovered this rule. But it’s somewhere between surprising and depressing the number of times I’ve worked with really smart people and get a blank stare when I’ve asked ‘what do the log files say?’
And it’s wisdom I’ve come by the hard way myself. One of my first jobs was installing a HRMS system on client sites, and while troubleshooting a major hiccup with an install on client site, I was on the phone to our guru in the office. He asked if I’d checked the log files. I realised… I didn’t even know where the log files were. It turned out, if I’d checked them, I would have saved myself and the client a lot of grief, including a day’s work restoring from an old backup.
Log files are the forensic evidence of technical troubleshooting. And if we want to be the Grissoms of our own CSI, then there’s a few pointers it’s worth following.
Know where your log files are
It might sound like an obvious rule of thumb, but cases often have more than one crime scene, and the most important one might not be particularly obvious. You might be driving requests to an application through an interface that’s not giving you much more than an ‘Error occurred’, but chances are a log file somewhere is dutifully scribing reams of text about the problem that’s occurring.
For JEE applications, for example, you’ve got your default server log file, and application-specific log files at the very least. In the case of WebLogic Server, you’ve got your domain log file as well, and if you’re having problems with your back-end database, then the real answer might be in XA trace logs on your database server.
Once you’ve established where your log files are, understand where you can go to get additional context on problems that are occurring, as well. If you’re fortunate enough to have a unique message id associated with the error that’s occurring, understand where to find the specific documentation for your product stack so you can get more information on that specific error. And if that doesn’t give you a satisfactory answer, your next step might be a good solid Googling.
Don’t search on text strings
If there’s one thing my years of intensive crime show watching have taught me, it’s that forensics are all about understanding the problem that’s there, not the one you think you have. Almost invariably, if you search on a text string to try and troubleshoot, you’ll miss something.
A bit part of troubleshooting is understanding specific problems or errors in the context of anything else that might be happening at runtime, as well. So searching for keywords or text strings will catapult you past pages of logging that may contain that vital clue that will crack the case.
Look for patterns
Individual error messages by themselves may not mean much, but combining error messages and application behaviour can be a telling sign. It could well be that a specific server-side component is causing database connection leaks. Or, maybe it’s accessing a particular page in your web application that is causing all of those JNI crashes.
Understanding the patterns of behaviour in an application (if you’re lucky enough to have several days of log file information to debug from) can be crucial to finding the core problem.
Know how to turn on the fire hose
Once you’ve gotten a fairly clear idea of what the problem is, knowing how to configure more verbose levels of debug information can be very important. Keeping a reference handy on the debug switches supported for the product you’re troubleshooting is never a bad idea. In the case of WebLogic Server, for example, from WebLogic Server 9.0 onward, a number of configurable debug switches can be selected from inside the Administration Console. Handy! (it’s important to note though that these switches are far from the complete list you can use)
I’d like to talk about some rules of thumb that I think are good to follow to make a log file to be proud of as an application developer, and also how to troubleshoot some specific technologies, but that’s enough for now. I’ll come back to the other stuff.
Got any log file experiences? Bizarre error messages involving processes being sent insane by zombie children or the like? Times where you’ve saved the day and come out a hero, all thanks to a few choice log file messages? Share them in the comments, or just drop me a line and let me know what you think!

