Welcome back to the new season of the IBM License Metric Tool (ILMT) and BigFix Inventory (BFI) best practices and lessons learned series. I know you’ve been waiting since the end of the last episode of season one, sitting on that web page just itching to read some technical best practices. So without making you wait anymore, here we go!

The topic that I want to talk about, and that is nearest & dearest to my heart, is deployment. (Can software deployments be romantic? I think the Hallmark Channel is running that special right now …)

ILMT has several ways that it can be deployed, and if you choose to deploy it using Linux & DB2, you have an “all-in-one” deployment option, where you can install both the BigFix platform and the ILMT tool on a single server. The “all-in-one” install method is meant to be a shortcut to deploy BigFix when you are using it specifically for ILMT. BigFix Inventory doesn’t have an “all-in-one” option, and I sometimes wish that ILMT didn’t have this option either. The all-in-one option hides some of the complexity from you, and sets some things up automatically for you, but to better understand and support BigFix, I think it’s important to understand this complexity. Instead of letting the all-in-one installer set up a software scan for EVERY computer every 7 days that could run at any time during the day, it is a best practice to schedule the scans yourself, spreading the load over a weekly period and running them at off-hours. How can the all-in-one installer possibly know your “off hours”? It doesn’t, and that’s the point.

The deployment of BigFix really requires careful planning (“plan, plan and plan some more”). The all-in-one deployment option for ILMT raises the temptation of rushing to install because it’s “easy”. Well, remember the Acme Cracker Company from our last episode – you know – the one whose motto is “we’re crackers”? Well, they definitely adhere to truth in advertising! Here’s their story; remember, the story I’m about to tell you is true; the company name has been changed to make it more fun, as was done in the last episode.
ILMT was deployed at the Acme Cracker Company some time ago (let’s say two years). It was the all-in-one deployment on Linux, and the team (or external consultant, no one knows for sure) that deployed it is long gone, along with their knowledge of the decisions made and why. The deployment is a single server with everything on it, and all of the clients (or at least those that were working) were set to connect directly to the BigFix server. There were no BigFix relays deployed. No scan schedules were set up; everything ran with whatever defaults were set by the all-in-one installer. The initial deployment was to about 1,000 servers; two years later, there were about 3,500 servers being managed.

When Acme Crackers approached Siwel for help with their deployment, the version of ILMT was quite old; it was barely version 9. Mysterious issues would occur every now and then, and it seemed almost impossible to get rid of the dreaded “No VM Manager Data” issue. Given the fact that there was no planning involved with the deployment, we recommended that ILMT be deployed all over again with a better thought out architecture that would utilize relays and adhere to best practices whenever possible.

Unfortunately, Acme Crackers cracked up at the thought of a planned architecture with several relays. They did not want to go through the work of setting up firewall rules from clients to the relays. They didn’t want to support the additional relays even though all of that is done through the BigFix console. “Too much work”, they said. To make a short story long, they decided not to accept our recommendations and (let’s read this real slow) go with the all-in-one architecture again, this time with up-to-date versions of the software, and maybe (maybe!) someone will record the installation decisions that were made. All clients would connect to the BigFix server directly, as before. When the installation was finished, they’d like Siwel to verify it for them (that is, verify that functionality works).

Although the company name was changed, this story is real. Learn from Acme Crackers mistakes:

  • Why did they want to create the same environment that’s giving them trouble now all over again? (Rhetorical question)
  • The rule of thumb is to have one relay for every 1,000 servers in a BigFix infrastructure. With about 3,500 servers, this would mean 4 relays. This doesn’t consider isolated networks, where it is a best practice to have one connection from relay to relay cross a firewall boundary, rather than have “n” number of client connections cross the firewall to a relay on the other side.
  • The BigFix server’s main job is to take the client computer properties and load the data to the BigFix database. This process is called the FillDB process. If you have a large number of computers trying to send data to the BigFix server while FillDB is busy, that’s going to bog down the server because it’s going to try to do both. Now add the work of ILMT licensing calculations (the data import) to this, and one or more analysts performing bundling activities, which might require licensing recalculations. All of these activities are data intensive, so what do you think will happen with the server’s CPU? Hopefully the squirrels on that CPU’s conveyor belt are very strong, because they’re going to be running at full tilt for a long time. Ideally, you want to spread the workload out and not have a single server do everything (unless you have a really small deployment). The use of relays offloads the task of communication management from the BigFix server, and these relays package the data from multiple clients together to make that exchange more efficient. It is the use of relays that enables BigFix to scale to hundreds of thousands of clients.
  • The all-in-one installation creates a faux schedule that runs the software scan every 7 days, but the time that the scan is launched can be any time during the day. The best practice for running these scans is to stagger them and run them at off-peak times. You do that by setting up your own schedule.
  • Acme Crackers did not have executive sponsorship for the initial deployment. It is unclear if they have it even now. Without ownership of the process, there isn’t incentive to work with the tool, unless the threat of an IBM audit is driving this deployment.

Perhaps Acme Crackers will realize that this second deployment is not all it’s cracked up to be and take a step back to examine their situation. Don’t be Acme Crackers!

I’d love to hear about a best practice you’ve implemented as part of your tool deployment that stands the test of time. What lessons have you learned that you’d like to share?