Physical Database Design

The physical design of your database optimizes performance while ensuring data integrity by avoiding unnecessary data redundancies. The task of building the physical design is a job that truly never ends. You need to continually monitor the performance and data integrity as time passes. Many factors necessitate periodic refinements to the physical design.

In this article, we will discuss the concept of how physical structures of databases affect performance, including specific examples, guidelines, as well as best and worst practices.

 

Physical Database Design Process

Physical database design is the process of transforming a data model into the physical data structure of a particular database management system (DBMS). Physical Design is accomplished in multiple steps, which include expanding a business model into a fully attributed model (FAM) and then transforming the fully attributed model into a physical design model. 

Conceptual, Logical, and Physical Data Models

You begin with a summary-level business data model that’s most often used on strategic data projects.  It typically describes an entire enterprise, which allows you to understand at a high level the different entities in your data and how they relate to one another.  Due to its highly abstract nature, it may be referred to as a conceptual model.

Common characteristics of a conceptual data model:

  • Identifies important entities and the high-level relationships among them.
  • No attribute is specified.
  • No primary key is specified.
  • Complexity increases as you expand from a conceptual data model.

A logical data model, otherwise known as a fully attributed data model, allows you to understand the details of your data without worrying about how the data will be implemented in the database. A logical data model will normally be derived from and or linked back to objects in a conceptual data model. It is independent of DBMS, technology, data storage or organizational constraints.

Common characteristics of a logical data model:

  • Includes all entities and relationships among them.
  • All attributes for each entity are specified.
  • The primary key for each entity is specified.
  • Foreign keys (keys identifying the relationship between different entities) are specified.
  • Normalization occurs at this level.

The steps for designing the logical data model are as follows:

  • Specify primary keys for all entities.
  • Find the relationships between different entities.
  • Find all attributes for each entity.
  • Resolve many-to-many relationships

Finally, the physical data model will show you exactly how to implement your data model in the database of choice. It shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables. The target implementation technology may be a relational DBMS, an XML document, a spreadsheet, or any other data implementation option.

Common characteristics of a physical data model:

  • Describes data requirements for a single project or application.
  • Specifies all tables and columns.
  • Contains foreign keys used to identify relationships between tables.
  • Physical considerations may cause the physical data model to be different from the logical data model.

The physical design is where you translate schemas into actual database structures.  You transform the entities into tables, instances into rows, and attributes into columns.  At this time, you have to map:

  • Entity to Table
  • Attribute to Column
  • Primary Key and Alternate Key to Unique Index
  • Index to Non-unique Index
  • Foreign Keys to Non-unique Index

Transformation

  • Choosing a physical data structure for the data constructs in the data model.
  • Optionally choosing DBMS options for the existence constraints in the data model.
  • Does not change the business meaning of the data.
  • First transformation of a data model should be a “one to one” transformation.
  • Should not denormalize the data unless required for performance.
  • Based on shop design standards and DBA experience & biases.
  • Entity Subsetting – Choosing to transform only a subset of the attributes in an entity.

 

data-warehouse-architecture_entity_subsetting

 

  • Dependent Encasement – Collapsing a dependent entity into its parent to form a repeating group of attributes or collapsing a dependent entity into its parent to form a new set of attributes.

 

data-warehouse-architecture_dependent_encasement

 

  • Category Encasement
  • Category Discriminator Collapse
  • Horizontal Split
  • Vertical Split
  • Data Migration
  • Synthetic Keys – The merging of similar entities using a made up key. Always loses business vocabulary and is never more than BCNF. Used a lot in packages to make them easily extendable.

 

data-warehouse-architecture_synthetic_keys

 

  • Adding Summaries – Adding new entities that are each a summary of data upon a single level of a dimension.

 

data-warehouse-architecture_adding_summaries

 

  • Adding Dimensions

 

data-warehouse-architecture_adding_dimensions

 

It is often necessary to apply multiple transforms to a single entity to get the desired physical performance characteristics. All physical design transformations are compromises.

 

data-warehouse-architecture_adding_dimensions_post_transform

 

Database Definition Language (DDL)

Except for data cubes, a one to one translation of the Physical Model into the specific DDL of the database:

  • Entity to Table
  • Attribute to Column
  • Primary Key and Alternate Key to Unique Index
  • Index to Non-unique Index
  • Foreign Keys to Non-unique Index 

Data Cubes

Data cubes are the exception to the DDL one to one translation because:

  • Proprietary physical databases
  • Have their own structure definition language
  • Have their own data loaders

Watson-Watt’s Law of Third Best

  • Best never comes.
  • Second Best takes too long.
  • Identify the Third Best -the design that can be validated in time to meet an identified need and get on with it.