In my last article, I made the case for “thinking like a database” while you’re working in Excel files. The idea, in a nutshell, is to instill certain habits of clean and organized data so that you can upsize to a database if you desire. Why would you want to upsize? Well, to keep it brief, databases are much better than Excel at handling large amounts of data and controlling the types of data contained in the records. The advantage to you, the SAM professional, is vastly improved performance for most files that you use on a repeated basis. This is especially true for situations in which you make incremental changes to a subset of data, and then need to run reports to tabulate and summarize the data.

In this blog entry, I’d like to take things just a little farther along from the foundation we laid last time. If you’d like to review what I wrote, go here for a refresher: SAM and the Art of Data Discipline Part I

Last time, I mentioned a few things to try to integrate into your work process when using Excel. These form a good starting point to “thinking like a database”:

  • Ensure each column of contains the same data type. Avoid mixing text and numeric entries in the same column, unless you are okay with the number being treated as text!
  • Be economical and clear in your column names. If these ever get imported into a database, and a SQL query must be written to refer to them, to-the-point names make it much easier to write and edit such queries.
  • Avoid duplicating column names.
  • Introduce unique ID numbers for each row of data.

Hopefully, you’ve been able to experiment with some of these recommendations and can see that they are helpful in general, not just in the event you import these records to a database. They also establish an order to the data that makes it easier to navigate the spreadsheets and focus attention where it needs to go. You probably find things like pivot tables work better too!

Now let’s take it to the next level: truly thinking relationally, like a relational database. Databases like SQL Server, MySQL, Microsoft Access, and others are what are known as Relational Databases. They are at their best when rows of records are boiled down to separate but related tables which can then be linked to form queries and reports to combine and summarize them as desired.

For example, consider a simple contacts list. You may have some contacts who have multiple mailing addresses. The natural tendency may be to have a complete row for each address, with the customer’s name (or portion thereof) repeated for every mailing address. Not a particularly hard thing to manage for a small list, but what if you have to manage hundreds of contacts and mailing addresses? In a relational database, the strategy would be to identify each unique contact with one row of summary information – name, company, title. Then a second table would contain each of the mailing addresses associated with the contact. (To make things more reliable and compact, you would assign each contact a unique ID, and each address a unique ID, with a “foreign key” pointing to the contact’s unique ID.) In a large database, this structure reduces the volume of data by eliminating duplication.

The mailing list example is a simple one, and I use it for illustration purposes because it’s easy to understand. But it’s a small example too. Much larger lists gain efficiency by leaps and bounds – consider a list of customers and all the unique items they’ve ordered. Or, in the SAM realm, a ready example is the list of software entitlements the company owns, and all the machines on which it’s installed. Considering that such lists run into the millions of rows, you can easily see why a relational structure presents advantages to populating, maintaining, and querying them!

This might seem like overkill in your daily Excel adventures. Why go to all this extra work, when you can just break apart the data if you ever decide to move to a database? Herein lies the balancing act. It’s true that many Excel files are temporary, rapidly evolving sketch pads. They may be shared between multiple people and need to be as easy to use as possible. But there are probably at least a few files in your arsenal that are candidates for future upsizing, and if you can set them up properly now, you’ll avoid time consuming headaches later on. And if you’re collaborating on files, this is a great opportunity to instill a common appreciation for a clean and orderly approach to the data.

Dividing your data into its parent and child tables and relating them back via joined queries can also be handled via Excel’s Power Pivot tools, so that you may get some of these benefits without ever leaving the Excel format.

So, think about it: what files and data do you work with that could benefit from this relational approach?