Where do you begin with your data warehouse? How do you know who needs it, why they need it, what they want in it, where you’re going to get the data, and how you’re going to load it? It’s probably best to get requirements before you start coding.
Warehouse design falls into two prime areas of responsibility: logical and physical design. Logical design concerns the broader aspects of the design and focuses on the relationships between objects. Physical design concerns the details of how information is stored and transferred. The physical design cannot be done without an understanding of what data is used by the business, what data is needed to accomplish new business objectives, who will use it, and how it will be used. Our purpose in developing a business data model is to discover and document the structure and meaning of data.
- Data modeling is business analysis. Data Modeling is a business analysis process that typically consists of 90% discovery and understanding, and 10% documentation.
- Data modeling is business driven. It is based on business rules from the business community NOT from data processing.
- Data Modeling is more than drawing data models
- Data modeling is a practice, not a methodology. Data Modeling may — but need not be — integrated into an SDLC which involves process modeling or other analysis practices.
- Business data modeling is a business analysis process, not a design process!
- Do not judge a model as good or bad. Do judge a model as True or False
- Structure of a data model: All the facts shown in the data model must be functionally dependent on their key, their whole key, and nothing but their key!
- What does a data model look like? Typically consists of a graphic “blueprint” and definitions of the data components.
Model Acquisition Steps
First, identify key people and inform participants of the need to get together. This is a task that is going to take serious time and focus from the business community. Hold a scope session and set scope as business functions, data subjects, history depth, data quality, and integration with other models. Schedule and conduct acquisition sessions for topics in scope. Get commitment by the players to spend the time and minimize distractions. Source the attributes. Conduct review sessions.
Model acquisition is an iterative process, not a linear one. The process of conducting data model acquisition is to:
- decide what the topics will be (function/data)
- discuss only one topic at a time
- open the discussion of topic
- work toward group consensus on result
- determine recommended participants
- determine recommended procedure and steps
This is a learning experience, a communication vehicle for user driven decisions and group consensus on results. What the user community learns about its own business through this process can be as important as the input into the design of the data warehouse. It is an opportunity to discover key conflicts or different points of view within the organization, such as “what business are we in,” “is our focus short or long term,” or “what is our end game?”
Acquisition Session Participants
- The Sponsoring Organization Management (Business area) schedules the sessions, provides requirements and knowledge of business functions, provides a Session facilitator, and provides a tie breaker.
- Subject Matter Experts provide detail knowledge of the function.
- Application Team Functional Analysts provide knowledge of the function, and provide or act as scribe(s) to record function results
- Data Analysts analyze the data, provide the session moderator, and provide or act as scribe(s) to record data modeling results
Acquisition Session Responsibilities
- Facilitator (Business area assigned) leads the discussion about the topic and keeps the conversation centered on the topic being analyzed
- Tie Breaker (Business area assigned) makes the final business decision on any business dispute, and may escalate decisions if absolutely necessary
- Moderator guides the group through the modeling steps and develops and draws the model on a large surface visible to all.
- Scribe(s) are responsible for the permanent recording and publishing of the session results
Acquisition Session Logistics
- Should schedule no more than 5 hours a day.
- No more than 20 hours should be scheduled a week.
- If more than 5 hours a day are scheduled then schedule them at least every other day.
- Have proper facilities.
Acquisition Session Tasks
- Record Knowledge of the Topic. Organize everyone’s thoughts, ideas and facts before diagramming process begins. This is an information exchange and education opportunity.
- Actually build the Data Model
- Question understanding of the information to arrive at a picture and definitions which represent the data needs of the topic
- Refine the model to be sure that it represents the consensus of the group
- Moderator “reads back” the data model
- Subject matter experts provide “scenarios”
- Moderator “walks the model”
Benefits of Data Modeling
- Maximum flexibility is gained by storing the data structurally as close to its business definition as possible.
- A practical design approach — especially for relational databases.
- Can develop the parts in the context of the whole.
- Understandable by all.
Uses of Data Modeling
- Provides a top-down view of information resources as they support strategic business functions.
- Facilitates system integration and re-engineering by identifying common data.
- Identifies data requirements in support of software package purchase analysis.
- Provides a foundation from which to develop solid, expandable databases.
Common Corporate Vocabulary
- What business are we in?
- What is the core purpose of the business – products, services, “marketing intermediary”, charity / NGO, social, et cetera?
- Share knowledge about what data exists
- Avoid building redundant systems
- Agree on common terminology
Document Business Rules and Policies
This is the application layer, where procedural logic is applied to transactional data to understand what stage in the business process it is, when exceptions need to be handled, what pricing to apply, etc.
Document the Relationships between Data
This is where logical entities that model real world relationships are grouped. The classical example would be a customer, its offices, its staff members, its financial transactions (estimates, orders, invoices, and credits), line items of each of the financial transactions, products, etc. The physical data model may involve more complexity than the business data model in order to operate efficiently, but at this point, we only want to capture a business oriented view of the data.
Define the Domains of Data
What values are allowed in each field? What is a legal value, and what do you do when you find one that doesn’t follow the rules? Such data exceptions can cause information to get “stuck in the system,” with loss of functionality or revenue or customer delivery.
Determine Data Sources
- Contact business people to determine where the real data lives
- Look for non-computerized information
Determine the Quality of Data
If the data doesn’t match the expectations of the business, what do we do with it? Address cleansing to postal standards is a common application. Anything that can be keyed in directly by a human is suspect, especially if the source system doesn’t have tight edits or those edits are applied in the user interface after much of the data is created over time. A provisioning system where the requested date of service is sometimes prior to the order date is symptomatic of favoritism to customers wanting to jump the queue, but is terribly confusing to someone trying to understand the work flow of the service organization. Data that is imported from spreadsheets also has a high potential for incorrect mappings (postal code mapped to the state code) or non-standardized values (US, USA, U.S., United States being logically the same thing).
Determine User Needs
The first step in analyzing data architecture requirements is to assess the deliverables required to achieve business goals. Once this is determined, the systems sourced for data must be analyzed to determine the best way to fulfill requirements. What is the system of record? Determine User / Group / Departmental access requirements to systems, objects / tables, and fields. Determine Access Styles for bulk, incremental, historical, level of detail and latency.
Business Data Model acquisition is a process of learning about the business, not structuring data based on physical design objectives. The process is as important as the results. The logistics of interdepartmental cooperation and the intensity and duration of the meetings requires great care and commitment by the business community, but the rewards extend far beyond constructing a data warehouse.