Metadata Server Exchange

The metadata server exchange solution describes how third party metadata servers can exchange metadata through an Open Metadata Repository Cohort or Cohort for short.

A cohort uses a peer-to-peer exchange protocol. Servers that implement the protocol’s open metadata APIs and event exchange sequences can become a member of one or more cohorts. Each member of a cohort can send notifications about updates to its metadata to the other members of the cohort as well as query/update metadata from all of the member repositories.

Since the cohort protocols are open, they can be implemented by any technology. However in this solution we are going to focus on integrating third party metadata servers that do not implement the protocol.

Introducing the Repository Proxy

Third party metadata servers that do not directly support any of the open metadata APIs and protocols need an adapter to convert their events and APIs into open metadata events and APIs as well as manage the protocol event sequencing.

To make this easy, Egeria provides a special OMAG Server called the Repository Proxy that is an adapter for third party metadata servers. Inside the repository proxy are plug points for two repository connectors:

The repository proxy represents the third party metadata server in the cohort and calls the connectors as required. You need one repository proxy for each third party metadata server that you want to be in the solution.

Figure 1 shows the repository proxy in action:

Figure 1

Figure 1: showing a repository proxy acting as an adapter for a third party metadata server

You can create your own implementation of the repository connectors for your favorite metadata server using these instructions. Alternatively Egeria provides repository connector implementations for two third party metadata servers:

We will use these implementations to illustrate the metadata server exchange solution. We are also assuming that in this example, glossary terms are being maintained in IGC and the organization wants to connect these terms to the Hadoop data sources described in Apache Atlas.

Working with read-only third party metadata repository connectors

Most third party metadata servers do not support the storing of metadata from other metadata servers. The sticking point is typically that it can not store information about where the metadata came from and it can not guarantee that metadata from another metadata server is not updatable through its APIs and user interfaces. There can also be more subtle issues in the the scale (size) of metadata descriptions or or errors caused by unexpected values they contain.

This is why it is common that the repository connectors for third party metadata servers only support what we call read-only operation. They can publish information about metadata stored in the third party metadata server, and support open metadata queries to that repository. However, they do not pass metadata from other metadata servers to the third party metadata server.

Both IGC’s and Atlas’s repository connectors are read-only. Figure 2 shows them connected to their repository proxies and how the operate.

Figure 2

Figure 2: Read only repository connector operation

Because of their read-only nature, if we just connected them together in a cohort, it would be like two people talking and no-one listening. There would be no value to the solution.

Creating an enterprise view

Figure 3 shows a possible extension using an OMAG Server called the Metadata Access Point. This provides specialist APIs and events for retrieving and maintaining open metadata. The metadata access point can be augmented with a View Server to support a UI, or provide services to other third party tools.

Figure 3

Figure 3: Using a metadata access point to create an enterprise view

With this approach it is possible to issue queries that return metadata content content from both Atlas and IGC as if they were one metadata repository.

However, there is no support for updates or linking this metadata together.

Linking metadata from different metadata servers

Figure 4 adds an Egeria Metadata Server to the cohort enabling the storage of new metadata. This means that the APIs of the metadata access point can be used to link glossary terms from IGC to asset definitions from Atlas. These links (called relationships) are stored in the Egeria Metadata Server. When queries for metadata are made through the metadata access point, the IGC glossary terms are shown linked to the Atlas assets as if all of the metadata is stored in a single repository.

Figure 4

Figure 4: Using a metadata server to provide storage for relationships between IGC and Atlas metadata

Expanding the scope of metadata being captured

With the metadata server in place, it is possible to connect an Integration Daemon to the metadata server to provide metadata from additional third party technologies through the metadata access point, as shown in figure 5.

Figure 5

Figure 5: Using a metadata server to provide storage for new metadata

With the above capabilities deployed, there is now a rich source of metadata visible through the metadata access point. Metadata from the IGC and Atlas repositories can be retrieved, combined together and used in new ways without needing to change their implementation.

However, there is no additional metadata being made available through either the IGC or Atlas UIs since they only access metadata stored in their own private metadata repositories.

Integrating third party metadata servers through the integration daemon

There is an alternative integration path for third party metadata servers to integrate into the open metadata ecosystem even when they do not meet the requirements to have their repository connectors write metadata into their private metadata repository.

Figure 6 shows IGC connected using this alternative approach. IGC is now connecting through an integration daemon in a similar way to the other third party technologies shown in Figure 5. Storing metadata from other repositories is now possible because IGC is no longer providing metadata services to the broader metadata ecosystem as part of the cohort federated queries, removing the requirement to store information about where the metadata came from. The downside is that the metadata in the IGC’s xMeta repository is no longer visible to the metadata access point because IGC is no longer a member of the cohort. IGC’s metadata will have to be extracted by the integration daemon and stored in the Egeria metadata server for it to be more broadly used.

With this approach, IGC can update its own metadata, and any metadata created through the metadata access point. However, an attempt to update metadata that originated in Atlas would fail when the integration daemon attempted to publish this update into the Egeria metadata server. (See metadata provenance to understand why.)

Figure 6

Figure 6: Integrating a third party metadata server through the integration daemon

Note: this pattern could be repeated to move Apache Atlas to connect through an integration daemon too.

Summary

In this solution, you have seen different mechanisms for integrating third party metadata servers together and then build out the metadata ecosystem to enable new use cases.

There are two main integration approaches:

Further information

More about the different types of Cohort Members including information on how to configure them. Specifically

There is also specific configuration information for the IBM Information Governance catalog (IGC) and Apache Atlas setup below:

These are links to more information about cohorts

This link provides guidance if you are interested in writing your own repository connectors:



License: CC BY 4.0, Copyright Contributors to the ODPi Egeria project.