Integration solutions can have one or more “deployment modes,” which define where the actual hardware and software reside. These modes can be
- On Premise Integration runs on your own servers in your office or data center. Also known as a “Virtual Integration Appliance,” a completely redundant marketing buzzword that adds no real value.
- Integration Appliance is a rack-mounted server that you purchase or lease from the integration vendor that runs the vendor software in your office or data center.
- Platform as a Service (PaaS) is where you install and maintain the integration software of your choice on collocated servers that a third party manages but you have access to.
- Integration Platform as a Service (IPaaS) is the integration vendor’s software running on collocated servers that are also provided by the integration vendor. This is the more precise definition of “a cloud-based solution.”
- Hybrid Integration is a hosted development workbench user interface with an on-premise “headless ETL” runtime.
We will discuss the ins and outs of each deployment mode and what the vendor hype regarding “cloud based” solutions really means.
Data integration started with fixed length records in “flat file” interfaces between systems, often on 80-column punch cards and 9-track tapes. If you had a record that required more than 80 columns, there was a “continuation punch” in a column to indicate that the COBOL program was to read the next physical record to assemble a complete logical record. This was the way things worked for decades, before the data integration product industry existed.
The next level was to integrate the databases of the time: IMS, IDMS, VSAM, etc. This was typically done with custom COBOL programs. There were a few package solutions which allowed you to map data to databases using hand-edited control records for the mappings.
With the advent of relational databases, metadata about the database became available to the application layer, enabling a general purpose program to understand any database schema. This flexibility allowed vendors to create ETL programs with graphical user interfaces for the integration developer that ran on Windows PCs.
Next, web browsers became sophisticated enough to develop graphical UI interfaces to interact with database metadata. This enabled cross-platform ETL products that would run on UNIX or Windows servers with a remote user interface.
Both fat client and browser interfaces are commonly used today. Some vendors even support both implementations.
On-premise servers these days are incredibly inexpensive and scalable, with gigs of memory and multi-core CPUs handling any kind of load desired.
All of these advances in integration technology had one thing in common: they existing entirely on-premise in a customer’s data center, with perhaps a remote web user interface to access a central ETL server.
Having the ETL application physically collocated near the data means there is no middle layer to go through. Database calls are made directly to the physical database services, with few network hops.
With an on-premise solution, you can throw as much hardware as you want to improve performance. On-premise servers these days are incredibly inexpensive and scalable, with gigs of memory and multi-core CPUs handling any kind of load desired. Everything is under your control, unless you have outsourced your data center.
In the context of on-premise solutions, an integration vendor would be wise to offer cross-platform capabilities for both Windows and UNIX servers. This makes a big difference to some companies. Java and JDBC (Java Data Base Connectivity) are a good way to accomplish this with one code base.
On-premise integration means you only have to worry about your employees destroying data or giving it away. Of course, a well-designed hack can breach on-premise databases by exploiting careless employees who open phishing emails or viruses, or visit malicious web sites at work. Someone is always targeting on-premise databases, but at least you can take responsibility for solving the problem, and for handing in your card key when you’re blamed for not stopping a data breach. On balance, on-premise databases can be just as insecure as any other.
One of the true benefits of on-premise integration solutions is that they can access both on-premise and Cloud data, as long as firewall settings permit. Decent internet connections speeds are required to push or pull Cloud data, but local data is a breeze. Getting access to your own databases is merely a matter of getting database connection and password information from your DBA. Data encryption or transmission compression of local data is usually not even a factor in this context.
If you want control of the runtime environment, have competent system administrators, know how to build a data center, and need access to both on-premise and cloud data, an on-premise deployment is probably a good solution. The deciding factor will be if you have on-premise data, as the other solutions we will next discuss have limitations in this regard.
Shipping an integration product on pre-configured hardware was an interesting concept back in 2005 when one vendor unveiled a $250,000 “appliance” that was just software installed on a generic rack-mounted server. There was nothing special about the hardware or the generic operating system, only the price. Within a year, the only known vendor to do this offered leases at around $62,000 per year, then offered less and less expensive hardware. One customer was told if their $250,000 appliance ever failed for any reason after the warrantee period expired, they could buy another one for the same price. This same vendor finally offered a “virtual appliance” which was their software on your hardware, and a “Cloud appliance” on their colocation facility.
No other vendor has since offered an appliance-based solution, to the knowledge of this author. It’s a really big financial commitment for a customer, with a really big risk if there is ever a hardware problem, and there are no special hardware features that need to be exploited to perform data integration tasks. Need we say more?
Integration Platform as a Service (iPaaS)
With the growing popularity of SaaS applications, it was only a matter of time before someone invented the concept of a Cloud application to move all that data. Cloud applications have the perceived advantage of being “somebody else’s problem” to provide and support, freeing IT departments from having to maintain hardware and software, or maybe even having the business free themselves from having an IT department! Cloud Integration vendors, supported by analysts such as Gartner and Forrester, are persuading potential integration customers to use a one-size-fits-all “Cloud-based” solution with the argument of “People are doing this, so you should too, or you’ll get left behind” and little other business justification. There has yet to be an argument made by any of them as to why this is a good thing based on technical reasons.
Cloud-based solutions require vendor infrastructure for metadata – the designer interface — and an actual run-time engine. Having your integration engine hosted in a vendor’s infrastructure has many disadvantages, though.
- Your vendor can’t “see” your on-premise data. Unless you open up your firewall to the vendor, there is no internet access to your data. The only way this would work is if all of your application data was available through secure API’s and integrated with other application data using secure API’s. No vendor is going to open up their SQL database to the outside world, and neither should you, since that creates a security risk of a data breach of your infrastructure or of someone sniffing unencrypted network data traffic for protocols that don’t support encryption.
- The runtime engine has to scale to handle spikes in demand from any of the vendor’s customers. This generally is solved by using “bare metal” hardware dedicated to each customer or Virtual Machines, where many customers can access an array of hardware, with capacity allocated as needed.
- You are required to entrust your passwords to sensitive corporate and customer data to your integration vendor, both in transit and at rest in their colocation facility. This is yet another security risk in an age where everything can be and is hacked by foreign and domestic governments, private organizations, and individual hackers. A Cloud Integration vendor’s employees who have access to your data are also a potential risk to your data security. If the NSA can get hacked, your Cloud Integration vendor can get hacked for the wealth of commercially useful information available from all of the customers. Why would a hacker go after just your data when they can access all of the data from all of the customers of just one vendor?
- Performance issues created by Latency— Unless you are sending buffers of data or streams of data to an API that accepts many records at a time and handles the upsert (update / insert) logic, the inherent latency of sending individual records over the internet will kill performance. Pushing data to a local database can give thousands of records per second, while sending data to a remote database one record at a time will give you one or two records per second, tops. What is required at the other end is a process to ingest batches or streams of records, and very few relational databases meet this requirement.
The mad rush to move everything to the cloud is beginning to calm down as organizations realize that not everything should be there. The world of applications existed long before the cloud, and some applications will never see the cloud at all. Legacy applications in a bank, for example, are by necessity located on-premise because that is where they are used. Banks do not need cash dispenser technology in the cloud, for example. Moving something to the cloud is only necessary if there is a compelling reason to move it.
The explosion of devices connected to the Internet is also clouding the cloud-only theory. As intelligence on the Internet of Things becomes critical to decision-making, it will become necessary for analytics to reside close to the Things being analyzed, in order to ensure continuous operation. This means that some devices need to remain on-premise, where analysis can be performed, and enabled by gateways that also communicate the data to the cloud. Think of health monitoring equipment in a hospital, connected to the nurse’s station via a local area network, where many patients can be monitored at once. This way, continuous operation is ensured even if the cloud fails.
Security and privacy issues remain a concern for firms looking at the cloud. No one wants their private financial information, or intimate photos, to be hacked into. Outside the U.S., there are regulatory issues that prevent data from going to the cloud. The European Union is particularly worried about placing sensitive data in the cloud, citing security and privacy as the main barriers to government adoption.
One paradigm that many vendors have taken is to put the configuration workbench in the Cloud, and the run-time engine on premise. This gives prospective buyers the fiction that they are buying a Cloud integration product, but in fact there is also a local server that does the heavy lifting with software on it provided by the vendor. This in fact then gives you multiple points of failure, since the on-premise server has to interact in harmony with metadata that lives in the Cloud, and accomplishes nothing that a purely on-premise solution would, other than possible improvements in the ease of software installation and upgrades.
Following Pareto’s rule we predict that 80% of all integration projects will require on-premise as well as cloud connectivity. This means that only 20% will be deployed only in the cloud, or only on-premise. The world of integration is an AND world, not an exclusively OR world. With this “more the merrier” philosophy there is no need to have cloud and on-premise integration silos.