Integration Topology

star network topology LAN design networking hardware connected vector

The integration market will continue to grow and expand fueled by the growth of SaaS applications and not the Integration of Things (IoT) with growth projected to far surpass the 30% mark. With this explosion of things to connect to and the need for integration being hotter than ever before, the big question becomes do you need a bicycle, bus, or quantum approach? Naturally it depends on your use case. A bicycle is easier to use, costs less, but can’t carry as many people!

Point to Point Integration Model

In the simplest case you might be just trying to hook up just two products (or perhaps just two APIs from the two products). In this case you are probably just fine in doing this via REST and using Point-to-Point integration. The benefit of the approach is that it is simple, cheap and fast.

warehouseThe simplicity of Point-to-Point goes out the window as soon as you add another system to the mix. When you have two endpoints you only have 2 connections to worry about (A to B and B to A), but you add the 3rd system you now have 6 (A to B, B to A, A to C, C to A, B to C, C to B). Basically this follows Metcalf’s law where the “value of the network is proportional to the square of the connected systems.” For integration this practically means that if you have more than just a few endpoints (3 to 4) there needs to be a better design than “point-to-point.”

Why Point-to-Point Integration Leads to Hardened Silos

Most business intelligence analysts agree there is some need for customizing data integration and report generation by individual departments, not only because the goals of each department can differ, but also because they’re often using different SaaS platforms to help achieve those goals.

Marketing, sales, and CRM divisions may rely on Salesforce.com, for example, while accounting may be more comfortable with NetSuite as their SaaS platform for accounting. Challenges arise when an organization fails to develop a central data hub, or static data bus, and establish policies and uniform practices to guide how every department replicates and integrates data.

Data managed by one application may be interpreted differently by another application used by another division. Translating both semantics and codes from one application to another requires a deep understanding of the data and use cases within both applications.

That leaves business intelligence analysts with no visibility into the full lineage of an organization’s data or the relationships between data flows across the enterprise. It also adds to distrust as no one department can be sure data quality rules are being applied uniformly in other parts of the organization—or that everyone is speaking the same language when it comes to the semantics of integration.

Silos are almost inevitable in mid to large size organizations but without a central data bus and well-articulated policies guiding how that central data hub should be used, those silos can become hardened and end up serving as barriers to the effective sharing and analyzing of data.

Hub and Spoke Integration Model

warehouse_hubInstead of point-to-point and project-specific tactics, organizations are finding data integration and data management can be a far more efficient and effective process with a hub and spoke architecture. The use of a central data hub that every department and business intelligence analyst must connect through mandates the adoption of company-wide polices on data formats and semantics as well as the use of uniform replication and ETL tools.

With this approach there is some hub that controls all the integration points and it is the job of each system to connect to the hub and the job of the hub to keep each system updated. The huge benefit of this approach is amazingly good data quality and the approach is very simple to understand and implement. The Hub-and-Spoke model ensures that data changed in the hub is automatically sent out via the spokes to every system and every system has access to the same information. No longer is the customer record in Salesforce different to the customer record in Oracle.

 

But the Hub-and-Spoke architecture also provides another significant benefit. It frees the sender and receiver of the information from needing to know about each other. The hub keeps track of all of this, and as a sender of information I send it to the hub, and as a receiver I just wait for a message from the hub. I don’t need to worry about who sent it and how to get back to them. This also has a great impact on audit and logging where the hub can take care of all these functions since all information is flowing through the hub.

But, alas there are downsides to this model, every message has to go through the hub, the hub can become the bottleneck and a single point of failure. As the number of endpoints increases there is a need to get off the bicycle and onto a bus!

Hub-Centric vs. Point to Point Integration

One of the biggest challenges any ambitious organization will face is figuring out the right data integration models for when they’re just starting out, and also for when they’ve expanded their operations through the addition of new personnel, departments and SaaS providers.

Naive organizations often assume that the point-to-point integration model that worked well when they were small can simply be tweaked for larger data needs. But that ends up as a blueprint for failure, especially with organizations that have expanded to the point where there are multiple departments all using different SaaS platforms with different APIs.

A point-to-point approach can also promote a silo mentality among departments that can make it harder to develop data-related best practices that work across an entire organization.

Finally, this model requires expertise in both the source and target API’s, data models, and business rules. Finding people who understand two or more SaaS technologies is much more difficult, and the resulting solution is inherently more fragile.

As a 2011 Gartner Group Research Study authored by analyst Ted Friedman sums up, “Point-to-point and application-/project-specific approaches to data integration increase costs, create redundancies, introduce governance challenges and complicate change management.”

 

Enterprise Marketing Advisor and commentator Mark Herring points out the only reason that any organization should ever stick with point-to-point integration is if they’re not anticipating any growth at all. As Herring succinctly explains: “Success will become your biggest enemy! The simplicity of Point-to-Point goes out the window as soon as you add another system to the mix.”

Point-to-point in a growing operation often equals redundancy and waste with each department purchasing their own off-the-shelf solution to perform specific integration tasks that may work for one project, but are ill-suited for future jobs and are often incompatible with the tools being used by other divisions.

As Herring notes, “The approach is very simple to understand and implement. The Hub-and-Spoke model ensures that data changed in the hub is automatically sent out via the spokes to every system and every system has access to the same information. No longer is the customer record in Salesforce different to the customer record in Oracle.”

A move to a hub and spoke approach means fewer data-oriented interfaces within an organization. Instead of having to learn and support interfaces that may only be used by one division for one short-term project, IT departments are instead freed up to focus on effective organization-wide data management and support.

The use of a hub and spoke integration approach also means better data quality. Gartner’s Friedman explains, “Data quality rules can be applied to data flowing through the hub, ensuring that data quality issues are identified and addressed before data is delivered to consuming applications and processes.”

The ability to better monitor data flow through a centralized hub also gives companies greater insights into how each department is using data to develop business intelligence. Connecting the departments in an organization through a data hub ends up breaking down barriers and silos, leading to greater inter-departmental coordination and better business insights.

Herring suggests one other major advantage to organizations: “It frees the sender and receiver of the information from needing to know about each other. The hub keeps track of all of this, and as a sender of information I send it to the hub, and as a receiver I just wait for a message from the hub. I don’t need to worry about who sent it and how to get back to them. This also has a great impact on audit and logging where the hub can take care of all these functions since all information is flowing through the hub. 

One of the biggest challenges any ambitious organization will face is figuring out the right data integration models for when they’re just starting out, and also for when they’ve expanded their operations through the addition of new personnel, departments and SaaS providers.

Naive organizations often assume that the point-to-point integration model that worked well when they were small can simply be tweaked for larger data needs. But that ends up as a blueprint for failure, especially with organizations that have expanded to the point where there are multiple departments all using different SaaS platforms with different APIs.

Point-to-point in a growing operation often equals redundancy and waste with each department purchasing their own off-the-shelf solution to perform specific integration tasks that may work for one project, but are ill-suited for future jobs and are often incompatible with the tools being used by other divisions. 

A point-to-point approach can also promote a silo mentality among departments that can make it harder to develop data-related best practices that work across an entire organization.

Finally, this model requires expertise in both the source and target API’s, data models, and business rules. Finding people who understand two or more SaaS technologies is much more difficult, and the resulting solution is inherently more fragile.

As a 2011 Gartner Group Research Study authored by analyst Ted Friedman sums up, “Point-to-point and application-/project-specific approaches to data integration increase costs, create redundancies, introduce governance challenges and complicate change management.”

Optimizing Hub and Spoke Data Integration

Getting the most out of a hub and spoke approach requires appropriate data integration and replication solutions. Data replication modules are purpose-built, leveraging each API from popular SaaS providers in ways that one-size-fits-all ETL products cannot. With scalability and performance features, data volumes can scale as an organization’s information needs grow. That means a business intelligence analyst can replicate data from Salesforce.com to the integration hub, transform it to NetSuite’s format with an ETL tool, then have that same data sent to a NetSuite accounting system. When you want a full copy, use a replication product with all the automation of schema creation and interval copy built into the tool. When you need to transform data, use a pure Extract-Transform-Load tool.

If the business logic for data integration is done in native SQL, that makes it not only transparent and easily understood by the people who actually end up doing the work — but also means getting data back to each department’s SaaS platform can be done with straightforward copy commands.  Using SQL, the common language of databases, also means greater consistency in integration and replication, making it easier to maintain high data quality.   Testing, sanity checking, data cleansing and error checking can be done in the database prior to replicating data to the cloud. This also gives integration developers the ability to produce error reports, as data quality issues will be reported repeatedly until replication to the cloud is successful. Point-to-point integration generally does not allow for a feedback process to correct data quality errors. If you don’t catch the errors the first time, the records never make it through the system. In contrast, error flags in warehouse records and matching reject log table entries indicate which records are having issues.

The case for a hub and spoke approach to data integration is already quite compelling — and the use of a product or approach that facilitates the automated replication and integration functions makes it even more so.