BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

How Graph Databases Supercharge Master Data Management

This article is more than 8 years old.

Master Data Management is a practice adopted when a company really gets serious about making use of its data. Based on recent trends and some emerging analyst research, it seems that when companies get serious about master data management, they start using graph database technology. Companies traveling the long road to becoming data-driven organizations should take a close look at why Graph Databases are taking master data management to a new level. Analysts like Forrester’s Michele Goetz have pointed out the disruptive potential of Graph DBs for MDM. (See “Disruption Coming For MDM - The Hub of Context”)

MDM 101

The basics of master data management, or MDM as it is called — not to be confused with Mobile Device Management — are pretty simple. If you have many different business systems that track customer data, for example, you want to make sure the data is the same everywhere. If a customer shows up in a new channel, you want to be able to connect them to their master record. If you make a change to an address or a phone number, you want that change to show up in all systems. If one system wants to get detailed information about a customer from another system, you want that to be easy.

In practice, MDM usually works by creating a central repository that collects all fields common to all the business systems. The master records in this repository become the authoritative source of information. MDM is applied to the most important data in a company, most commonly data about customers, products, employees, or anything important data that shows up in lots of systems. Figure 1 shows the typical configuration.

The MDM repository can work in many different ways. Sometimes all master records are created and edited in the central repository, or one system becomes the master repository and changes radiate out from there, or changes can happen in every system and are synchronized. Many vendors such as Teradata, SAP, Oracle, and Informatica offer solutions based on traditional software license models. Talend and Pentaho offer open-source powered versions as well.

The end result of an MDM program is a situation in which you have a high quality set of master data, providing all sorts of benefits:

  • Master records are accurate in all systems.
  • Data quality is improved.
  • Errors due to bad data are reduced.
  • Master records can be used to enhance other data during analysis.
  • One system can find information related to a master record in another system.

MDM becomes important after the initial stage of a company’s growth because many of the problems it solves don’t show up until you have many business systems operating at the same time. When this happens, the type of errors that MDM addresses crop up and people turn to the practice of MDM for a solution.

How Graph Databases Take MDM to a New Level

So, what’s the problem? Why can’t we just be happy with what MDM has become over the past decade or so?

The problem is that we are leaving valuable information on the table when we think of MDM as a synchronized repository. The world of MDM is essentially defined by the sort of primary key relationships that are used to connect relational databases. For the customer record, often the social security number becomes the key and we can easily use it to grab the master records for a customer from the master repository or from a business system.

Here’s where the glory of graph databases starts to emerge for MDM. If we now look at the collection of customer master records, what else can we find out about them. Graph databases treat the connections between two pieces of information as a first class object. This means that you cannot only search and categorize data by the fields, but also by the relationships between the data.

For example for customer data, a graph database can quickly tell you all the customers that bought the same products that live in the same state that are the same age. You can then quickly find all the customers that didn’t buy the product but are in the same state and the same age. It’s not that you couldn’t eventually ask and answer the same question using a relational model. You could. The problem is that the complexity of the query you would have to write and the effort involved makes it harder to answer important questions.

“Master Data Management innovators use graph databases to ask new questions and discover new answers within their existing data,” explained Emil Eifrem, CEO and co-founder of Neo Technology, makers Neo4j, the most popular graph database. “They are finally achieving that desirable 360-degree view of the customer in real time.”

Figure 2 shows the way that a Graph database extends a relational MDM repository.

What usually happens when a Graph DB is introduced into the MDM process is that the master repository is copied into the graph database so that relationships between the data can be captured and analyzed. It is common for some of the transactional data from business systems to be added to the graph database to supplement the master data. Graph databases do a great job of storing and searching sparse or incomplete datasets. In this way, MDM doesn’t have to just be about the master data but all of the data that tells the whole story of the customer.

In some cases the information that completes the graph for an individual can be assembled at query time. When an inquiry about a specific customer comes in, services use APIs to reach out and bring the relevant information into the graph, which allows for a real time picture to be assembled.

Using these methods, MDM becomes a far more powerful and real-time practice that can be used to inform apps or employees of everything that is known about a customer or other master record. Figure 3 illustrates these capabilities.

It is important to note that Graph DBs complement the existing MDM application, within the MDM infrastructure, as a way to store and retrieve Master Data. Out of the box, Graph DBs do not have the ability provide the foundational capabilities of MDM 101. Rather, Graph DBs take MDM to a new level.

The Power of the Next Level of MDM

By adding Graph DBs, MDM stops being a sort of slow-motion, background process and starts to look a lot more like an operational data store, one that has the added benefit of being able to perform graph analytics in real-time. This model of Graph DBs as a cache on top of a variety of different data stores is to me a hugely powerful pattern that will be put to many good uses. Please reach out to me if you are doing this.

With respect to MDM, Graph DBs complete the mission of MDM in the following ways:

  • The amount of information to describe a master record is expanded creating a richer picture.
  • Graph analytics can reveal new features about the master record.
  • The whole picture can be updated in real time.

I suspect that Graph DB-powered MDM will become a godsend for mobile app developers who want to get everything they can in one quick call and then keep using that graph on the device.

So, this lays out the basic case for using Graph DBs for MDM. In my next article, I will focus more deeply on the kind of questions that Graph Analytics can answer about master data.

Follow Dan Woods on Twitter:

Dan Woods is on a mission to help people find the technology they need to succeed. Users of technology should visit CITO Research, a publication where early adopters find technology that matters. Vendors should visit Evolved Media for advice about how to find the right buyers. See list of Dan's clients on this page.