Scalability and Performance of Cloud Data
Two of the biggest business trends of the past decade have been the increased reliance on data-driven decision-making in mid-and large-size companies and the fact that much of that data is now likely to be stored in the Cloud. Both of these trends on their own seem harmless enough. But combined they can pose a challenge for business intelligence professionals looking to generate reports in near-real time based on their company’s data.
Those challenges can quickly multiply as the amount of information being warehoused in the Cloud scales upward to 100 GB or more. The integration functions and report generation that may have gone smoothly when small amounts of company data were being accessed from the cloud can grind to a halt when millions of records are involved.
Simple stated, dragging massive amounts of data over the Internet brings with it inherent limitations that can make getting the records a business user needs in the time frame they need a frustrating experience.
SaaS and Scalability
Having a cloud-based data warehouse that is both scalable and easily accessible is especially tricky for companies working with Software as a Service (SaaS) enterprise platforms like Salesforce.com and NetSuite. The SaaS resource consumption model is designed to prevent a single user from getting all the resources of the Cloud server in order to maintain reasonable performance for all users. Governors and limits on the amount of cloud-based records that can be accessed by any client in one session keep the system moving, but prevent customers from getting all the reports they want, all the time, in unlimited quantities.
The vast majority of those self-imposed SaaS bottlenecks are time-based. For example, if a business user at a company wants to generate reports on sales completion rates based on two million records, that request would likely time out well before any reports were delivered, leaving the user empty handed. By building these time-constraints into any request for data, SaaS providers essentially force their clients into receiving data from the cloud in much smaller chunks—and that’s a numbers game many companies will have trouble winning. Transmitting only 40 records a second—the average rate for the many vendors and SaaS providers—means that performing data integration on five million cloud-based records will take up an entire work day and still not be completed. Delays like that can end up crippling a company’s real-time decision making and put it at a serious disadvantage compared with more nimble competitors.
A Hybrid Solution to Accessible Cloud-Based Data
Regardless of the industry, mid- to large-size companies are looking for integration solutions that are truly scalable in a cloud-based data warehouse. The reality for both B2B and B2C companies is that the amount of data they will warehouse is likely to continue to scale exponentially. Now, even mid-size firms can have a half of terabyte of data already stored in the cloud and in the next few years a terabyte or more will likely be the norm. Being limited to data queries at 40 records per second because of internet latency or restrictions put in place by SaaS platforms means most of a company’s data will be, for all intensive purposes, inaccessible.
One way to improve the accessibility of data is to mirror cloud data to a data warehouse sitting on an inexpensive on-premise or collocated server. There are many products that do this, although there are vast differences in scalability, performance, and robustness of the various offerings. What is necessary is some form of query logic that either streams the data or nurses the data from the Cloud in chunks to limit the number of records queried in one pass. Different vendors have different techniques, some of which are protected by patents that would block other vendors from copying these techniques.
Employing a replicated data warehouse allows you to do reporting and integration involving large data requests at speeds multiple times faster than what SaaS platforms can deliver. And having an inexpensive on-premise server or a private Cloud solution not only allows for faster reports and integration functions, but also provides companies with the additional piece of mind that they’ll have a back-up of their data for compliance purposes.