Integrating vendor package applications should be a breeze. You have complete online documentation, API’s, lots of metadata, great vendor support, and a plentiful supply of consulting experts who are competing to get your business. What could go wrong?
The mantra of Business Process Reengineering began in the 1990’s with the rise of major new vendor applications to handle common financial and manufacturing functions. This reengineering involves the redesign of core business processes, with the stated purpose of improving productivity and quality. Companies start with a clean slate and rethink existing processes to deliver more value to the customer. The drivers of this were application vendors like Oracle and SAP, and the big consulting firms, who have made hundreds of billions of dollars changing the world, one patch release at a time. Whether reengineering business processes was a need of the business to improve or a need to conform to the design parameters of shiny new vendor applications that replaced archaic systems, we’re there. Over time, people found their careers changing from being developers of custom applications to being system integrators.
Preventing Vendor Lock-In
The first thing that buyers of vendor applications want to know is – will they actually work? And if the vendor applications don’t work for your business or the vendor can’t support their customers, goes out of business, gets bought by a competitor, or abusively raises their annual fees – then what? And what are your corporate goals? If you want to sell your company and your data is locked up by your Cloud vendor, you’ve created an obstacle to acquisition that could be the difference between a payday and a bankruptcy.
The ideal strategy would be if your decision makers had prior experience with every candidate solution before they made the major decision to change key systems. That is never going to happen. So to get in, you have to be able to get out. Vendor lock-in is to be avoided at all costs, if you at all value having a reliable infrastructure for your business and the ability to change systems according to your business needs.
Application vendors have varying support for data exports. Perhaps your CRM will produce simple lists in CSV format but will not dump the data in any way that could be used to migrate to another CRM. Desktop application vendors have been doing this for years, and some – but not all – Cloud vendors do it too. So if you are to have the peace of mind from being able to escape vendor lock-in, find a third-party solution that will give you a useful export of your data. The export can be either to a database schema or, at a minimum, CSV files. You’ll find other uses for a database data warehouse, if you go that route.
Untangling the Vendor’s Database Design
In some cases, the physical implementation of the vendor’s back-end database isn’t even accessible. Legacy applications used proprietary or embedded databases that were not accessible with ordinary ETL or reporting tools. API-based Cloud applications typically do not give you direct access to the physical database; the API layer hides the physical structure with a metadata-driven logical view. One cloud ERP vendor represents all the repeating data wrapped in XML, so that a single record returned by the API contains all the child records within it in long text fields. Another ERP vendor never updates database records; all writes to the database are new records with date stamps so you have all versions of a financial record. This makes navigating the data for reporting a challenge because you have to find the most recent version of each business record to get the current view of the data, but it’s great for auditing purposes.
Custom Objects and Fields
Most vendor packages allow for varying degrees of user customization, with new fields and objects – collections of fields – in the platform’s data dictionary. One CRM vendor stores up to 500 standard fields in one database table for all objects, and up to 500 custom fields in another database table. To get a single record, you use the object type and GUID (Global Unique Identifier) to query the standard fields, and join that to the corresponding row of the custom fields table. This means that the database structure itself is never physically modified, only the content. All data is stored as text, with data and numeric fields converted as appropriate in the user interface or in computations. The data dictionary describes the internal and external names, type, length, and edit mask of each column in the database. If you had access to the back-end database, it would not make sense without the data dictionary.
Documentation for a vendor package can consist of external documentation, an active data dictionary, a metadata layer, or just meaningful object and field names exposed in the code. Although external documentation such as web pages or PDF files may contain a lot of detail, the danger is that it may be obsolete or inaccurate. If the product uses an active data dictionary – one that is instrumental to the functioning of the product and not just a place to store information for business analysts – then you at least know that the dictionary definitions match reality. If programmatic code can get to the documentation through a metadata API layer, then you can automate many processes, such as building a database schema that matches the source system’s logical view. In the worst case, you might be able to see the field names by such techniques as Java reflection, which examines hard coded names in Java code provided by the vendor; then you can automate the standard fields but need a separate method of automating custom fields. All of these levels of documentation can lead to automation, but you’ll have to be adept at using whatever you can get.
Vendor application packages typically use proprietary API’s (Application Programming Interfaces) for many tasks. When the application is in the Cloud and you have no database access, there has to be an API for reading and writing data, and often times for the metadata layer. If you are updating any vendor application’s database, you will probably be doing this through proprietary API’s so all the business logic and data editing is properly performed. This is especially true for accounting packages, which have to create and balance transactions with integrity. If you have access to a relational database for an on-premise solution, you can use ordinary JDBC or ODBC to access the data and the database metadata. JDBC and ODBC are non-proprietary API’s that work with relational databases, but have also been adapted to non-relational sources by clever vendors who realize the value of these universal interfaces.
Cloud vs. On-Premise Integration Factors
The biggest difference to you as an integrator between Cloud and on-premise applications is that you can’t just write database queries if you don’t have access to the database. Everything is easier if you have a SQL database. You can see the schema and the data with ordinary reporting and ETL tools, as well as the administrative or programming tool set provide by the database vendor. Performance is faster, since you don’t have to drag data over the internet. You don’t have to learn a new API just to write reports. Queries don’t time out when you ask for millions of records. All of the benefits of owning a Cloud application are to the detriment of the integration developer. Many cloud vendors even bash IT departments and their hopeless backlogs of work – until the business demands that the vendor open up their architecture to customization and integration. Salesforce.com used to prominently display a graphic with the word “Programming” and a red slash through it – until customers demanded a programmatic interface.
Once you have learned a Cloud vendor’s API, the bottlenecks of performance and scalability are your enemy. Everything takes longer over the internet. SaaS multi-tenant architecture means first and foremost that you only get a slice of the API server’s resources. There are gates and governors and enforced timeouts on all queries. Some API’s limit how many records can be queried or updated in a single transaction. Some API’s will force a timeout if you query too much data, or will die halfway through the query due to system instability during the hours it takes to retrieve millions of records. One application vendor shuts down the system at 5pm every Thursday night for maintenance, leaving any processes hanging. Your Cloud application vendor has internal reports they use to identify customers that put excessive load on the system, and you can expect a phone call if you are a resource abuser.
Using a Data Warehouse as an Integration Hub
For all these reasons, having a data warehouse that manifests the data in a more obvious structure and format makes sense for reporting and integration. The translation and contextualization of the raw data can be done once. The physical structure can use the names provided by the platform’s data dictionary and the numeric, date, or text data types most appropriate to the data. If the data dictionary is insufficient or doesn’t even exist, perhaps there is documentation. Some vendors provide a pre-configured warehouse that makes this translation for you using a hard coded schema and pre-wired ETL mappings. Failing all these possibilities, you can model the data by inspection and do your own data mapping in an ETL tool.
You may also choose to use a different structure entirely than the vendor’s physical or logical view. If you’re into star schemas, you will spend weeks or months reengineering the data into your preferred data warehouse style. Given the IT backlog, the supposed advantages of a star schema may take a back seat to more of a schema cloning process, if that can be automated. It all depends on the resources and time you have available, and how valuable any transformations would be to your user community.
Integrating vendor application systems is generally harder than custom applications. You’ll have to learn proprietary API’s. You’ll have to fight the physical limitations of transporting data over the internet. Often very different data values are used to identify the same information. You’ll be dealing with the complexity of a solution designed to handle every kind of customization and the sum of requirements from all the vendor’s customers, including database objects for which you may have no use. It is likely that you’ll appreciate the opportunity to learn a new skill that makes you more professionally marketable, but meanwhile, things will be harder than using an on-premise database designed with just one company’s needs in mind.