Is It Time to Rethink the Concept of the Data Warehouse?
January 14, 2011 by btaub
Filed under Business Intelligence, Data Warehousing, Latest
Infinite MIPS, Or How Your Hardware Vendor Let you Down
The Concept of Data Warehousing is Fundamentally Flawed
Ever step back, think about what you’re doing and then ask yourself, “Why?” Ever ask it about the concept of data warehousing? Let’s grow up and face a fact here – while it may be necessary, the concept of data warehousing is flawed.
Think about it. We already have all the tasty data we need in our operational systems. So, let’s chow down. Hey, wait a minute… Y’know what would be great fun? Let’s design a completely new database called a data warehouse. Then, let’s write programs to bring all of that data into our warehouse. Along the way, let’s integrate it all so we get a business-view of it, rather than a source-specific view. Hey, let’s also make sure it’s clean. And, let’s make sure we’ve built all the infrastructure necessary to schedule jobs, trap errors, verify totals, … Oh, and let’s ask our managers and shareholders to pay for all of this.
OK, is it just me or, when you step back, does this sound insane?
What Is the “Right Solution”?
So, what’s better? Well, in a really good world, all your data and systems would be integrated from the start AND you’d be able to report directly from them.
In a perfect world you wouldn’t have to integrate the data from multiple systems, you would have only one system and it would support all of your operational and informational (i.e. reporting and analysis) needs. So, what, or who, is keeping us from this perfect world?
Who’s The Villain?
(I’m sure that Dataspace employees and alumni know where I’m heading here) Who’s letting us down? Who’s making us spend all that extra money and do all that extra work just so we can actually use the data we capture?
Hardware vendors… J’accuse!
Hardware vendors? Why? Because they haven’t figured out how to master the laws of physics to give us infinite MIPS (there it is, Dataspace folks) – infinite computing power.
Think about it; if we had infinite computing power we’d put all of our data into a single, enormous integrated, normalized database. That database would support both our operational and informational needs. It would be complex but it could be made to look simple by layering views on top of it. It would keep all the history we could ever want because, well, why not? Best of all, response time to any query, no matter how complex, would be instantaneous. Why? Because we’d have infinite computing power.
So, in the end, data warehousing is really just a way to make up for the fact that hardware (and maybe communication) vendors, with as many PhDs as they have, just haven’t done that one little thing we need them to do – create a computer with infinite MIPS. (C’mon guys, get your act together!)
Is Data Warehousing the Only Solution?
Given the fact that hardware providers are smart yet, clearly, clueless, we’ve come up with a ‘dirty’ solution to help us get at our data – we build a data warehouse. We, in essence, do a lot of pre-processing on data because we don’t have the horsepower to do it when queries are issued. Preprocessing like integrating, aggregating, and putting into user-friendly formats.
But, is this the only way to do the job? Perhaps, given our lack of infinite MIPS, it is. Still, the idea of a single, enterprise-wide database is enticing. And, actually, there is a partial solution that, while not eliminating the need for informational data stores (i.e. data warehouses and data marts), minimizes the effort required to build them. That partial solution is integrating operational systems or, in its more common form, master data management.
Integrating data before, or as, you build, a data warehouse has a number of advantages:
- It makes building the warehouse easier and cheaper.
- It ensures that, operationally, the whole organization is seeing the same picture (unlike one client who called us after different data definitions led to a multi-million dollar ordering mistake).
- It creates a logical view of the single database concept, bringing you closer to that true picture of one, integrated database underlying your entire company.
- It opens you up to reporting out of a new generation of BI tools, ones that integrate data but don’t require traditional data warehouses yet don’t stress your operational systems each time a query is run. (more on this in a later posting)
Where Does This Leave Us?
So, data warehouses and data marts do accomplish a lot and, largely, are still necessary. But, integrating data between your operational systems will save you headaches, lower your cost of warehousing and, in some cases, maybe even eliminate the need for a data warehouse.
Where to start? Well, let’s leave that for a later post, too.
Any comments? I’d love to see them. Please submit them below.
– Ben



