datagility 2 – data centricity

If we do not take determined action, the future of our organisation’s system landscape will become increasingly unmanageable, and irredeemably fragmented.

This is the second blog in the Data Topic series based on the book datagility. The previous blog is datagility 1 – data simplicity.

In Data Simplicity we described the importance of events and process definitions, but how can we make sure that, for instance, we respond effectively to a prospect’s request for a product? This seems far too simple to be a real challenge. And yet for many larger organisations, understanding something as basic as their client base has become problematic.

For example, for them to recognise that a prospect for a new product is an existing client can be suprisingly difficult. This can arise if their data is captured by different functional areas, or each view of their clients is in a business line’s silo’d system.

Even though our organisations may have access to more data than ever before, the possibility of them gaining beneficial holistic insights can become increasingly remote.

And this problem is unlikely to disappear any time soon. We know that for the majority of our organisations, a heterogeneous system landscape is inevitable for the foreseeable future.

This description of the fragmentation of our data, should more accurately be described as the fragmentation of the shared understanding of our data.

To fix this problem, relies on defining and sharing a universal understanding of our data as a first step. The shared definitions of our understanding will create shared mental models of our data. These are absolutely fundamental for our organisation’s survival and key to delivering true data agility.

“We don’t need models anymore!”

Technologists repeatedly claim that modelling our data is no longer required because of ‘something’ they advocate to do with technology. This implies somehow that technology can understand our organisation’s events, operations and strategic aspirations in a way that now means we don’t need to bother understanding it ourselves.

The promise is that the machines will do it all for us!

There is some merit to this argument, but this is only confined to analysing a legacy system’s data representation, and where there is little in terms of documentation or SME support available.

What it cannot do though, is understand the areas for which the existing systems have poor or non-existent fit.

We must derive the business understanding of our data from our business operations and strategy.

To share the data across the organisation demands that we use a common language. We cannot allow each engineer to make up the language of our data based upon their immediate requirements.

Imagine the same approach at an international congress of the World Health Organisation where all the delegates only spoke their own language and no translators were available. How much progress on any issue could we expect? The language barrier would quickly bring any collaboration to a complete standstill! Yet often we allow exactly this approach to prevail in our organisations’ system landscapes! 

Shared Communication Language

In Data Simplicity we described events within the context of the organisation’s data definitions. For example, we tend to think of our event data in the context of orders, clients, payments or products.

These data definitions form the basis of our language. We call these data terms business entities and we call their characterisitics their attributes.

However, there is more to our data definitions than simply the definition of its business entities. In addition to defining the organisation’s business entities, we need a way to document and share the associations between them. It is precisely to provide this capability that structural data models come into the picture – quite literally.

Geographical maps link the physical world’s landscape elements to each other. In the same way we need to link our business entities to each other and construct a data definition landscape. Structured data models allow us to construct data landscape maps that define, share and allow agreement of the structures of the required business data for our organisations.

Levels Of Business Data Models

So far we have implied that our structured data models are a single homogeneous set. But this is a simplification. There are multiple levels of our Business Data Models.

The highest level is typically referred to as a conceptual data model. This should be a definition of the ‘Organisation’s data on a page’. The conceptual business entities it contains must be agreed with the business and must be described in business and operational terms. The following diagram illustrates a typical conceptual data model that you may want to compare with your own business.

Figure 1 – A Typical Conceptual Data Model

Constructing these models should be very rapid. In half a day the first cut should be agreed and although they must not become stale, the ongoing updates should be infrequent and very limited in scope.

The more detailed level of data model must be derived from this high-level conceptual data model. Each of the high-level entities would probably become a relatively complex set of fine-grained structures as more detailed analysis is carried out. Each one should therefore be allocated its own diagram often called a subject area. The following diagram illustrates this technique conceptually.

Figure 2 – Developing Business Logical Models

Sharing Our Business Entity Defintions

The following diagram illsutrates a very simple model fragment of a Business Logical Data Model.

Figure 3 – A Simple Client Product Model Fragment

Without getting hung up on the detail of the visual syntax, I hope you can see that this simple fragment describes the basic data rules that:

“Each Client can have one or more Product Subscriptions allocated to them”

And that:

“Each Product is consumed via one or more Product Subscriptions”

But the model syntax requires some skills to decipher and may be held in a data modelling tool that typically has access limited to only a few stakeholders. It can therefore present a communication barrier to all but a limited audience.

In the past I have used a Data Lexicon (think Wikipedia) to publish the definitions of the models’ definitions to a wider audience. This approach can use richer decriptions in terms of linking to intranet and extranet resources, videos, images and presentations.

Technology Agnostic

Many technophiles confuse the use of data models as anachronistic, impediments to writing code, and somehow inextricably linked to relational databases. But notice that none of the preceding description has mentioned relational databases, or indeed, any technology at all. The models provide an essential tool for capturing, agreeing and disseminating the business language of our data.

Figure 4 – Data Definitions Must Be Technology Agnostic

But this business language must be technology agnostic as illustrated in figure 4.

Business Processes <=> Business Data

As a part of defining our process universe at a low-level, we should also map the data usage of each low-level process.

Figure 5 – Mapping Data Usage To Processes

This exercise is essential so we can understand how the two universes interact, but also provides validation of each. In other words, do we have all the data structures we require to support all our processes, and does every process have valid data usages.

We can quickly spot areas of no overlap and address these.

For example, where we have data definitions with no corresponding processes, does this indicate that there are missing process definitions, or does it indicate that in fact we don’t really need to record and use the data defined in the data model?

Agile Data Language

But if we are to define the language of our organisation based upon structural data models, how do we prevent them impeding rapid change? The answer is that the data language models must adopt a modelling style using ‘fixed’ structures but supporting dynamic agility for the data’s definitions .

Many years ago I developed a style I called rule-based modelling[1] and its approach can be truly transformational.

Rule-based modelling provides a metadata driven approach to data modelling. Data definitions can rapidly be adapted, not by changing the models or their implementations, but by simply modifying the metadata stored in the models’ physicalised structures!

Figure 6 illsutrates the technique for an OCR implementation using AI/Machine Learning capability.

Figure 6 – Rule Based Modelling Example

For each Document Type we can record the set of Document Type Attributes that it contains. So for a passport for example, this might include: holder’s name, passport number, country of issue and date of issue.

When we scan the actual documents, the OCR processing will record the actual Document Attribute values for each one of these defined Document Type Attributes. This is shown schematically in figure 7.

Figure 7 – Rule Based Modelling AI/ML Implementation

Now the route to agile change becomes one of providing and governing the metadata. If we want to process a new Document Type e.g. U.S. Driver’s Licence, we can simply create the metadata to define it and its associated Document Type Attributes. When the OCR machine learning (ML) is fed actual Licences’ data, it can learn how to extract the attribute data contained in them. Humans can seed the metadata and use this to train the ML.

Recording The Organisation’s DNA

Our Business Data Models allow us to the define our data using models. These models are themselves data – think of the list of our Entities and their Attributes.

We use the term metadata to describe the data that describes our data. Hence our Business Models are truly metadata models of our operational and business data.

Often organisations consider their metadata with some indifference – if at all. But a data-centric organisation should see the true significance of its metadata as recording its Intellectual Property (IP).

In fact, a truly data-centric organisation would value its metadata far more highly than almost anything else. Capturing or using any data without fully understanding its meaning and context compromises its benefit.

If an organisation’s metadata is in good shape, everything else can easily be fixed. But if the metadata is broken, then because everything else depends on it, it will be inordinately difficult to get anything else right!

Although metadata has a complex landscape of its own, we tend to segment it according to each domain’s maintenance processes, or specialist functions such as, CDO, legal, or finance. The following diagram illustrates a typical universe of metadata.

Figure 8 – A Typical Metadata Universe

Also notice that in this schematic, the business data model is placed as the hub, or the centre of the universe of the metadata. It makes sense to take this approach because the business data model embodies the overall definitions and structures of all our data. This makes it the natural lynchpin for the rest of the metadata universe.

We will return to the way that this metadata universe can be captured, maintained and used to drive the system landscape in the coming Data Topics.


What we have learned in this blog is that we need to

  1. take control of the data definitions of our organisations
  2. ensure Business Operational stakeholders define our data
  3. use Business Logical Data Models to fully define, share and agree our data definitions
  4. ensure our data defintions are technology agnostic
  5. map our data to corresponding low level operational process definitions
  6. adopt a Rule-based Modelling style will allow us to drive data agility from our model defintions
  7. use our Business Data Model as the central hub of our metadata universe

The next blog in this Data Topic is datagility 3 – data driven delivery where we will learn how to use our data defintions to drive the system landscape delivery.

[1] For more detail on this technique refer to “The Data Model Toolkit – ISBN: 978-1782224730” by the same author.

Leave a Reply

Your email address will not be published. Required fields are marked *