For all our talk in these blog pages about the importance of NOT managing your SAM program via countless Excel spreadsheets, the not-so-secret truth is that there are many spreadsheets floating around in all our SAM lives.

I have a confession to make: for years, I worked professionally every day in Microsoft Access. And during that time, I scarcely touched Excel –in fact, I had no idea how to create a simple pivot table! It wasn’t until I found myself in a SAM analyst role that I had to break into the Excel realm. In this way, my experience is the polar opposite of most folks’. But my years of working with data tables have taught me valuable lessons that are worth keeping in mind when working with Excel files. I’m happy to say I’ve become quite comfortable with Excel, but I thought I’d use today’s blog to make the case for “thinking like a database” when using spreadsheets.

Why, you might ask, would this be useful? And in particular, why useful to SAM? The answer is this: anything we can do to create repeatable, standardized data formats in all our work will make mountains of difference if we want to scale up to larger data formats. The very large amount of data generated in SAM activities, and the tendency to find ourselves repeating the same steps with updated data, make a great case to scale up to a database. So it can only benefit us to practice database-like approaches to how we set up our spreadsheet files.

How often can we say that we work with ever larger Excel files which eventually become too big and slow? Sometimes they even have more rows of data than Excel can accommodate! Those vlookups and sumifs start to crawl when there’s too much data, don’t they? With SAM data, we’re often dealing with hundreds of thousands of records extracted from discovery tools, creating unwieldy and bloated spreadsheet files.

If you can become comfortable with the basics of Access or SQL, you can take these burdensome, slow files, and let the database engine rip through queries and calculations in a fraction of the time required by Excel. What’s more, once you’ve established a format for the table, you can easily import NEW data and QUICKLY RERUN the same analyses without the setup time required in Excel. I won’t get into those basic skills in this forum, but you can see the benefit a database offers in this regard.

Sounds great, but it’s true there’s a significant caveat. With Excel, you have the beauty of almost limitless possibilities – you can put any kind of entry in any cell you choose; you can change your mind and put something completely new in a cell you previously populated. You can add colors and formatting to stress certain points. But in a database, the table is just a dumb container of data. Once you set the data types allowed in a given field, you cannot put just any value there – a numerical field requires numbers, not text; a date field is always a date field; etc. There’s little to no formatting available to highlight certain data. As you fill up that table with the data, it will just sit there, boring and basic. So why do it? Well, again, it comes down to the speed advantages afforded by databases when you’re ready to analyze the data.

Because you may not see the immediate need to use the database table, let’s say you prefer to continue working with Excel. This is certainly a valid approach – you’re familiar with Excel and its features after all. But there’s every reason to cultivate a habit of thinking of your Excel files as database tables – meaning forcing yourself to consider every column or row as one database field requiring consistent data types. Resist the temptation to break with these conventions as you add or manipulate data. A few suggestions to start with:

  • If you want to include a comment about a particular record, use a comment field to do so – avoid embedding comments using Excel’s “add comment” feature. 
  • Keep the column names brief and concise. Avoid lengthy field names.
  • Don’t repeat column names. If you must, make them unique by adding a numeric postfix.
  • Give each record a unique numerical ID. This is useful in databases, but also gives you a handy way to return to the original sort order after you’ve re-sorted the table.

If you’re good about this, you’ll find you have nice, clean data that is easy to import into a database table whenever you desire. It also enforces the discipline of thinking of individual fields, rather than a completely free-form format. Then, if you decide to upsize to a database table, you can quickly and painlessly do so, and take advantage of the scale and power of a database engine.

Get in the habit, too, of establishing templates with data types spelled out for your teammates and others in your organization. The more you can cultivate a culture where data is provided and exchanged in a standardized and clean format, the better for everybody! This instills an expectation that data will be prepared carefully and without “one-off” treatments. Even in Excel, this makes it easy to know what to expect in a given file – whether or not you ever upsize to a database table.

I find these points are particularly valuable for SAM, simply because of the volume of data in view for many of our analyses. Though these pointers are applicable to a broad range of use cases, SAM presents us with a remarkably persuasive argument for these “think like a database” concepts.

What about you? Do you like to “think like a database” when handling your SAM (or other) data? Give us your tips and pointers!