Your use of Software Asset Management (SAM) tools and a car can be summarized by an analogy: you can operate the tools or the car without knowing exactly how the tool or car operates under the hood. For a car, you step on the gas and you go; you step on the brake and you stop. For a SAM tool, you click on some option to display the licenses of a computer, and you don’t know (or care to know) that your request is taking place on port 2153 accessing a relational database on Super Server version 28 and retrieving 22 records from the sam.license table. (Well, OK, now you know; class dismissed 😊.)
The SAM tools you use need to be running in tip-top shape (like your car) to provide you accurate data regarding your license position. That means that there’s an entire crew of service people whose sole responsibility is to, to paraphrase some of the lyrics to the song “Car Wash”, “keep your SAM machines humming”. They change the oil (apply patches), check the hoses (check and troubleshoot connectivity) and occasionally replace parts when they wear out (apply upgrades). I am one of those service people. What’s it like to maintain and service the SAM tools you’ve been using for so long?
I could answer that in one word: “Interesting”. (That’s only because the end of the first verse of “Car Wash” is currently playing in your head.) I play a number of roles as a support person, but the one I seem to play most often is Forensic Software Engineer; also known as a Detective. When things go wrong, where do I look to start my troubleshooting? If the error is in a user interface part of the SAM tool, I’m usually sent (or ask for) an e-mail with a picture of the error message. But what about the “silent” errors, like “40% of my machines have no VM Manager data” or “I installed the xyz agent on 12 computers a week ago and only 5 are showing up” ... where do I look then? The answer lies in error logs, and sometimes it’s a job just to *find* them because there are so many. Some examples:
- The IBM License Metric Tool (ILMT) and BigFix Inventory (BFI) have logs for each of the individual BigFix elements, relational database elements (and that could be either Microsoft SQL Server or IBM Db2) and the applications themselves. There are also BigFix Relays that have logs, too.
- Flexera’s Flexnet Manager Software has logs on both the server (if it’s installed on-premise) and the Inventory Beacons. And to make things even more interesting, some of these logs are in hidden folders, although a shortcut to some of them is provided in the Inventory Beacon interface.
- Snow software has several folders that contain logs for the various services the application uses.
Let’s assume for the moment that the issue here is that a particular Windows 2018 R2 Server in a development environment isn’t showing up in the list of servers in any of the tools listed above. (This is actually a very common issue that could have all sorts of solutions, and we’re going to look at a slightly contrived one now.) And let’s say that I’ve now found the log I think I need, and because it’s one particular server that isn’t showing up, I figured out that I should probably be looking at the agent logs on that particular server. However, do you honestly think the log has the issue blinking in red saying “here’s the problem you’re looking for”? Well, it does – sometimes. (Surprised, aren’t you?) Issues like lost database connectivity or connections that have failed usually manifest themselves as repeat messages that tell me that there’s something not happening that should be. It’s sometimes obvious when networking is notworking (pun intended). However, sometimes you have to eyeball the log carefully. Many issues are much more subtle, such as the error message that might appear only once - “Error 6 – handle is not valid”. Have you tried looking that up in Google? I have – and the results don’t tell you much.
In our little scenario, I now have the actual error: “Error 6 – handle is not valid” but that is totally meaningless to me. (And Google). So now what? The next thing I usually ask myself (or the server owner) is, “has anything changed recently”? Can you guess the answer the server owner will tell me? “No, of course not, I’d know if a change was applied!” is the correct answer (but sometimes, it’s not the truth). I’ll try to figure out when the issue started happening according to the data in the logs. In this example, I figured out that the issue started happening 5 days ago. At this point, all I know is that “five days ago, something happened”. I’ll then check Windows Programs & Features for anything newly installed, or temporarily turn off virus protection and restart the agent, or take a look at whether Windows Update installed something automatically.
In this example, Windows Update did indeed install a recent monthly rollup, and it was the one for January 9th, 2019. It was installed 6 days ago at the default time of 3am. Reading the description of the update, it mentions something about a known issue, where “the network interface controller may stop working on some client software configurations” and that “the exact problematic configurations are currently unknown”. (Ulp!) The issue is starting to look like it isn’t even related to the specific SAM tool at all! So, then I opened a command line and tried the “net use” command to connect to this same server (to test out networking without really going on the network), and to my surprise I saw the same “error 6” message that was in the agent log. This proved the issue was outside of the SAM tool – but now, how does this get fixed?
Having the server in a development environment is actually helpful to me because I don’t have to work through a production change control process to get a “fix” uninstalled. Permission was granted to revert the machine back to the previous restore point before the monthly rollup was applied. After taking what felt like forever (because a watched pot never boils), the restore completed and the agent was restarted. The logs were checked and (yay!) the error disappeared. A few hours later, the machine did appear in the SAM server list. The root cause of the issue was a “helpful” Microsoft patch that caused networking to fail. (This is also a fairly common occurrence, which is why many companies test these patches out before applying them to production servers.)
While this scenario was somewhat contrived, it does accurately portray a day in the life of a SAM support person. It’s easy to tell in a story like this, but the reality is that this process can be very time consuming, where sometimes you have to open a support ticket with the product vendor to find the problem, and resolution time can be measured in days (sometimes weeks) before the issue is finally resolved. So, the next time you need to call in an issue to your SAM support team, give them a hearty “thank you” for the work they do to keep your tools running. (Buying them a drink would be even better! 😊)
If you’re a SAM support person reading this, I’d love to hear some of your war stories of issues you’ve had to resolve.
Subscribe to Email Updates
- IT Asset Management
- Software Asset Management
- Data Analytics
- IBM License Metric Tool
- Storage & Computing
- Data Protection & Security
- Asset Tagging
- Change Management
- Hardware Decommission
- Legacy Software
- Software Decommission
- Workforce Solutions