app; development

The App: Data Storage Choices for Scale

William H. Noel

13 Apr 2016 • 2 min read

Pro tip: If you are cutting and pasting using Edge and Ghost, be sure to copy and paste instead. Just lost an entire post trying to split one post into two. sigh

I come from an old school relational background. We made line of business medical billing apps.[^n] Scale was never really an issue. Now apps cost $1. No scale = no point.

So now I'm trying to create an app that could possibly scale up to be quite large. And that changes things.

SQL Server has been the go to product for me for years but it has it's limitations when things get really big. Specifically: management, including backing stuff up, keeping customers' data separate and service availability (99.999% uptime) become harder to do - and harder to do affordably - using our tried-and-true workhorse.

Since this is a greenfield (new) development effort, we can choose whatever storage technology we want. I've decided to learn about Azure Storage as a way to address these management and scale issues.

SQL Server storage allows you to create many indexes for efficiently querying a database on almost any basis. And that allows people like me to be sloppy and just collect everything, normalize it and let the server engine sort it out.

But Table Storage only gives you one index. One. That means you have to be really smart about choosing how you store your data so it can be accessed efficiently.

Which leads us to not normalizing the data. Normalizing means storing only one copy of a piece of information. You only have one birthday, right (let's hope)? So it should only be represented one time in the system. If that value ever changes, then it's automatically correct for everyone who accesses it. SQL Server becomes responsible for finding out where that birthdate lives and serving it up to whomever is asking.

But with Table Storage, information isn't "joined" together for you like it is with SQL Server. You have to go find it. And multiple requests to the table are inefficient. So the answer is to store it in multiple places and manually keep them in sync. Ugh.

This used to be heresy in the old days. Relational data design meant never storing the same piece of information in two places. Fine, but it has some issues when you scale that kind of architecture.

Do I believe that this app will ever outstrip SQL Server's ability to keep up? I wish, but probably not. But if it does we're screwed trying to convert everything. Better to pick an architecture that we can grow with and not worry about it. I need to join this century's programming effort anyway.

Table Storage can grow to billions of records (which they call entities) and millions of users without blinking. The key is architecting for web scale.

Table Storage gives us the ability to cheaply store multiple copies of the information - sacrificing normalization to gain speed, lower costs and have the ability to scale up without changing our code.

I will admit that this is really foreign to think this way and it'll take me a minute.

Off to find blog posts and tutorials on how to make this work.

[^n] We designed our database using relational techniques before SQL Server was invented.

Don't sign up. I'm not sending you things.