The open metadata type system
Knowledge about data is spread amongst many people and systems. One of the roles of a metadata repository is to provide a place where this knowledge can be collected and correlated in as automated fashion as possible. To enable many different tools and processes to populate the metadata repository we need agreement on what data should be stored and in what format (structures).
Figure 1 shows the different areas of metadata that we need to support for a wide range of metadata management and governance tasks. This metadata may be spread across different metadata repositories that each specialize in particular use cases or communities of users.
Figure 1: Common metadata areas
- Area 0 describes base types and infrastructure. This includes the root type for all open metadata entities called OpenMetadataRoot and types for Asset, DataSet, Infrastructure, Process, Referenceable, SoftwareServer and Host.
- Area 1 collects information from people using the data assets. It includes their use of the assets and their feedback. It also manages crowd-sourced enhancements to the metadata from other areas before it is approved and incorporated into the governance program.
- Area 2 describes the data assets. These are the data sources, APIs, analytics models, transformation functions and rule implementations that store and manage data. The definitions in Area 2 include connectivity information that is used by the open connector framework (and other tools) to get access to the data assets.
- Area 3 describes the glossary. This is the definitions of terms and concepts and how they relate to one another. Linking the concepts/terms defined in the glossary to the data assets in Area 2 defines the meaning of the data that is managed by the data assets. This is a key relationship that helps people locate and understand the data assets they are working with.
- Area 4 defines how the data assets should be governed. This is where the classifications, policies and rules are defined.
- Area 5 is where standards are established. This includes data models, schema fragments and reference data that are used to assist developers and architects in using best practice data structures and valid values as they develop new capabilities around the data assets.
- Area 6 provides the additional information that automated metadata discovery engines have discovered about the data assets. This includes profile information, quality scores and suggested classifications.
- Area 7 provides the structures for recording lineage.
Figure 2 provides more detail of the metadata structures in each area and how they link together.
Bottom left is Area 0 - the foundation of the open metadata types along with the IT infrastructure that digital systems run on such as platforms, servers and network connections. Sitting on the foundation are the assets. The base definition for Asset is in Area 0 but Area 2 (middle bottom) builds out some of the common types of assets that an organization uses. These assets are hosted and linked to the infrastructure described in Area 0. For example, a data set could be linked to the file system description to show where it is stored.
Area 5 (right middle) focuses on defining the structure of data and the standard sets of values (called reference data). The structure of data is described in schemas and these are linked to the assets that use them.
Many assets have technical names. Area 3 (top middle) captures business and real world terminologies and organizes them into glossaries. The individual terms described can be linked to the technical names and labels given to the assets and the data fields described in their schemas.
Area 6 (bottom right) captures additional metadata captured through automated analysis of data. These analysis results are linked to the assets that hold the data so that data professionals can evaluate the suitability of the data for different purposes. Area 7 (left middle) captures the lineage of assets from a business and technical perspective. Above that in Area 4 are the definitions that control the governance of all of the assets. Finally Area 1 (top right) captures information about users (people, automated process) their organization, such as teams and projects, and feedback.
Figure 2: Metadata detail within the metadata areas
Within each area, the definitions are broken down into numbered packages to help identify groups of related elements. The numbering system relates to the area that the elements belong to. For example, area 1 has models 0100-0199, area 2 has models 0200-299, etc. Each area’s sub-models are dispersed along its range, ensuring there is space to insert additional models in the future.
Figure 3 shows a couple of fragments from the models. Each of the UML classes represents an open metadata type. The stereotype on the UML class in the double angle brackets of entity, relationship and classification defines the category of type: Entity, Relationship or Classification respectively. The line between entities with the big arrow head means “inheritance”. A type points to its supertype.
The example on the left of figure 3 comes from model 0010. it shows that Asset inherits from Referenceable which inherits from OpenMetadataRoot. This means that Asset is a subtype of Referenceable which is a subtype of OpenMetadataRoot. Alternatively, OpenMetadataRoot is the supertype of Referenceable which is a supertype of Asset. This inheritance identifies which attributes (instance properties) are valid for an instance of a particular type since it is the aggregation of the attributes defined explicitly for the type and all of its supertypes. For example, Asset has two attributes defined: name and description. It also supports qualifiedName and additionalProperties because they are inherited from Referenceable. OpenMetadataRoot does not have any attributes defined so Asset gets nothing from it.
The fragment on the right-hand side of figure 3 comes from model 0011. It shows the classification called Template that can be connected to a Referenceable. Since Referenceable is already defined in model 0010, it is shown without the white box where the attributes are show (called the “attribute container” in UML parlance).
SourcedFrom is a relationship that connects two instances of Referenceable and any of its sub types. This means SourcedFrom could connect two instances of type Asset together. The types of the instances connected do not need to be the same - SourcedFrom could connect a Referenceable instance with an Asset instance.
Figure 3: Guide to reading the open metadata type models
The UML model diagrams show the currently active types. Some types and attributes have been deprecated and these have been removed from the model diagrams. However there is a description of the deprecated types and which of the active types to use instead. Although the deprecated types can be used (for backwards compatibility) it is always preferable to use the latest types since they are typically more efficient and more consistent than their predecessors.
Links to the detailed description of the types
- Area 0 - base types and infrastructure. This includes types for Asset, DataSet, Infrastructure, Process, Referenceable, Server and Host.
- Area 1 - people, projects, communities, collaboration and feedback.
- Area 2 - specific types of assets such as data sources, APIs, analytics models, transformation functions and rule implementations that store and manage data.
- Area 3 - glossaries and semantic knowledge.
- Area 4 - governance definitions.
- Area 5 - models, schemas and reference data.
- Area 6 - metadata captured during automated analysis of assets.
- Area 7 - the structures for recording lineage.
Index of open metadata types
Attribute Type Definitions
These are the types defined for attributes (properties) in the open metadata types
Below are the open metadata types in alphabetical order. The link takes you to the UML model and description page for the type.
AbstractConcept AcceptedAnswer ActionAssignment Actions ActionTarget ActivityDescription ActivityType ActorProfile AdjacentLocation AnalyticsModel AnalyticsModelExperiment AnalyticsModelExperimentParticipant AnalyticsModelRole AnalyticsEngine AnalyticsProject Anchors Annotation AnnotationStatus AnnotationExtension AnnotationReview AnnotationReviewLink Antonym APIEndpoint APIHeader APIManager APIOperation APIOperations APIParameterList APIParameter APIRequest APIResponse APISchemaType Application ApplicationServer ApplicationService ApprovedPurpose ArrayDocumentType ArraySchemaType Asset AssetDiscoveryReport AssetLocation AssetManager AssetOrigin AssetOwner AssetOwnership AssetOwnerType AssetSchemaType AssetZoneMembership AttachedComment AttachedLike AttachedNoteLog AttachedNoteLogEntry AttachedRating AttachedStorage AttachedTag AttributeForSchema AuditLog AuditLogFile AvroFile
CalculatedValue Campaign CanonicalVocabulary CategoryAnchor CategoryHierarchyLink Certification CertificationType ClassificationAnnotation ClassWord CloudPlatform CloudProvider CloudService CloudTenant CohortMember CohortMemberMetadataCollection CohortRegistryStore Collection CollectionMembership Comment CommentType Community CommunityMember CommunityMembership CommunityMembershipType ComplexSchemaType ComponentOwner ConceptBead ConceptBeadAttribute ConceptBeadAttributeCoverage ConceptBeadAttributeLink ConceptBeadLink ConceptBeadRelationshipEnd ConceptModelAttributeCoverageCategory ConceptModelDecoration ConceptModelElement Confidence ConfidenceLevel Confidentiality ConfidentialityLevel Connection ConnectionConnectorType ConnectionEndpoint ConnectionToAsset ConnectorCategory ConnectorImplementationChoice ConnectorType ConnectorTypeDirectory ContactDetails ContactMethodType ContactThrough ContentCollectionManager ContentManager ContextDefinition ContributionRecord ControlFlow ControlledGlossaryTerm ControlPoint ControlPointDefinition Criticality CriticalityLevel CrowdSourcingContribution CrowdSourcingContributor CrowdSourcingRole CSVFile CyberLocation
Database DatabaseManager DatabaseServer DataClass DataClassAnnotation DataClassAssignment DataClassAssignmentStatus DataClassComposition DataClassDefinition DataClassHierarchy DataContentForDataSet DataField DataFieldAnalysis DataFieldAnnotation DataFlow DataFolder DataFile DataItemOwner DataItemSortOrder DataMeasurementLevel DataMovementEngine DataProcessingAction DataProcessingDescription DataProcessingPurpose DataProcessingSpecification DataProcessingTarget DataProfileAnnotation DataProfileLogAnnotation DataProfileLogFile DataSet DataSourceMeasurementAnnotation DataSourcePhysicalStatusAnnotation DataStore DataStoreEncoding DataValue DataVirtualizationEngine DependentSoftwareComponent DeployedAPI DeployedAnalyticalComponent DeployedConnector DeployedDatabaseSchema DeployedReport DeployedSoftwareComponent DeployedVirtualContainer DerivedRelationalColumn DerivedSchemaAttribute DerivedSchemaTypeQueryTarget DesignModel DesignModelElement DesignModelElementOwnership DesignModelGroup DesignModelGroupHierarchy DesignModelGroupMembership DesignModelGroupOwnership DesignModelElementsInScope DesignModelImplementation DesignModelScope DesignPattern DetailedProcessingActions DigitalService DigitalServiceDependency DigitalServiceDesign DigitalServiceImplementation DigitalServiceManagement DigitalServiceManager DigitalServiceOperator DigitalSupport DiscoveredAnnotation DiscoveredDataField DiscoveredNestedDataField DiscoveryEngineReport DiscoveryInvocationReport DiscoveryRequestStatus DiscoveryServiceRequestStatus DisplayDataContainer DisplayDataField DisplayDataSchemaType DivergentAttachmentAnnotation DivergentAttachmentClassificationAnnotation DivergentAttachmentRelationshipAnnotation DivergentAttachmentValueAnnotation DivergentClassificationAnnotation DivergentDuplicateAnnotation DivergentRelationshipAnnotation DivergentValueAnnotation DockerContainer Document DocumentSchemaType DocumentSchemaAttribute DocumentStore DuplicateType
ElementSupplement EmbeddedConnection EmbeddedProcess Endianness Endpoint EnforcementPoint EnforcementPointDefinition Engine EngineHostingService EnumSchemaType EnterpriseAccessLayer EventBroker EventSchemaAttribute EventSet EventType EventTypeList ExceptionBacklog ExceptionLogFile ExecutionPointDefinition ExecutionPointUse ExternalGlossaryLink ExternalId ExternalIdLink ExternalIdScope ExternallySourcedGlossary ExternalReference ExternalReferenceLink ExternalSchemaType
Glossary GlossaryCategory GlossaryProject GlossaryTerm GovernanceAction GovernanceActionExecutor GovernanceActionEngine GovernanceActionFlow GovernanceActionProcess GovernanceActionRequestSource GovernanceActionService GovernanceActionStatus GovernanceActionType GovernanceActionTypeExecutor GovernanceActionTypeUse GovernanceApproach GovernanceClassificationLevel GovernanceClassificationSet GovernanceClassificationStatus GovernanceConfidentialityLevel GovernanceControl GovernanceControlLink GovernanceConfidentialityLevel GovernanceDomain GovernanceDomainDescription GovernanceDomainSet GovernanceDaemon GovernanceDefinition GovernanceDefinitionMetric GovernanceDefinitionScope GovernanceDriver GovernanceDriverLink GovernanceEngine GovernanceImplementation GovernanceMeasurements GovernanceMeasurementsDataSet GovernanceMetric GovernanceObligation GovernanceOfficer GovernancePolicy GovernancePolicyLink GovernancePrinciple GovernanceProcedure GovernanceProcess GovernanceProcessImplementation GovernanceProject GovernanceResponse GovernanceResponsibility GovernanceResponsibilityAssignment GovernanceResults GovernanceRole GovernanceRoleAssignment GovernanceRule GovernanceRuleImplementation GovernanceService GovernanceStrategy GovernanceZone GovernedBy GraphEdge GraphEdgeLink GraphSchemaType GraphStore GraphVertex GroupedMedia
Impact ImpactedResource ImpactSeverity ImplementationLocation ImplementationSnippet IncidentClassifier IncidentClassifierSet IncidentDependency IncidentOriginator IncidentReport IncidentReportStatus Incomplete InformalTag InformationSupplyChain InformationSupplyChainComposition InformationSupplyChainImplementation InformationSupplyChainSegment InformationView Infrastructure InstanceMetadata ISARelationship IsATypeOfRelationship ITInfrastructure ITProfile
LastAttachment LastAttachmentLink LatestChange LatestChangeAction LatestChangeTarget LibraryCategoryReference LibraryTermReference License LicenseType Like LineageMapping LinkedExternalSchemaType LinkedFile LinkedMedia LinkedType ListenerInterface LiteralSchemaType Location LogFile
MapDocumentType MapFromElementType MapSchemaType MapToElementType MasterDataManager MediaCollection MediaFile MediaReference MediaType MediaUsage Meeting Meetings Memento MetadataAccessService MetadataCohortPeer MetadataCollection MetadataIntegrationService MetadataRepository MetadataRepositoryCohort MetadataServer MetamodelInstance MeteringLog MobileAsset Modifier MoreInformation
NamingConventionRule NamingStandardRule NamingStandardRuleSet NestedFile NestedLocation NestedSchemaAttribute Network NetworkGateway NetworkGatewayLink NextGovernanceAction NextGovernanceActionType NoteEntry NoteLog NoteLogAuthor NoteLogAuthorship NotificationManager
ObjectAttribute ObjectIdentifier ObjectSchemaType OpenDiscoveryAnalysisReport OpenDiscoveryEngine OpenDiscoveryService OpenDiscoveryPipeline OpenMetadataRoot OperationalStatus OperatingPlatform OperatingPlatformManifest OrderBy Organization OrganizationalCapability OrganizationalControl Ownership OwnerType
Peer PermittedProcessing PermittedSynchronization Person PersonalContribution PersonRole PersonRoleAppointment PolicyAdministrationPoint PolicyDecisionPoint PolicyEnforcementPoint PolicyInformationPoint PolicyRetrievalPoint Port PortAlias PortDelegation PortImplementation PortSchema PortType PreferredTerm PrimaryCategory PrimaryKey PrimeWord PrimitiveSchemaType Process ProcessCall ProcessContainmentType ProcessInput ProcessHierarchy ProcessOutput ProcessPort ProcessVariable ProcessVariableMapping ProfileIdentity Project ProjectCharter ProjectCharterLink ProjectDependency ProjectHierarchy ProjectManager ProjectManagement ProjectScope ProjectTeam PropertyFacet PublisherInterface
Rating Regulation RegulationArticle RegulationCertificationType Referenceable ReferenceableFacet ReferenceCodeTable ReferenceCodeMappingTable ReferenceData ReferenceValueAssignment RelatedDesignPattern RelatedMedia RelatedKeyword RelatedTerm RelationalColumn RelationalColumnType RelationalDBSchemaType RelationalTableType RelationalTable RelationalView RelationshipAdviceAnnotation RelationshipAnnotation ReplacementTerm ReportingEngine RepositoryProxy RequestForAction RequestResponseInterface RequirementsLibrary ResponsibilityStaffContact ResourceList Retention RetentionBasis ReusableTechnique ReusableTechniqueUse RunnableSoftwareComponent RuntimeForProcess
SchemaAnalysisAnnotation SchemaAttribute SchemaAttributeDefinition SchemaAttributeType SchemaElement SchemaLinkElement SchemaLinkToType SchemaQueryImplementation SchemaType SchemaTypeChoice SchemaTypeDefinition SchemaTypeOption SchemaTypeSnippet SchemaTypeImplementation SearchKeyword SearchKeywordLink SecureLocation SecurityTags SemanticAnnotation SemanticAssignment ServerAssetUse ServerAssetUseType ServerEndpoint Set SetDocumentType SetSchemaType SimpleDocumentType SimpleSchemaType SoftwareComponent SoftwareManifest SoftwareModule SoftwareModuleContent SoftwareLibrary SoftwarePackageManifest SoftwareServer SoftwareServerCapability SoftwareServerDeployment SoftwareServerPlatform SoftwareServerPlatformDeployment SoftwareServerSupportedCapability SoftwareService SolutionBlueprint SolutionBlueprintComposition SolutionComponent SolutionComponentImplementation SolutionComponentPort SolutionComposition SolutionLinkingWire SolutionPort SolutionPortDelegation SolutionPortDirection SourceCodeFile SourceComponent SourceControlLibrary SourcedFrom SpineObject SpineAttribute StarRating StorageVolume StewardshipServer StructDocumentType StructSchemaType SubjectArea SubjectAreaDefinition SubjectAreaGovernance SubjectAreaHierarchy SubjectAreaOwner SubscriberList SupplementaryProperties SupportedComponentVariable SupportedDiscoveryService SupportedGovernanceService SupportedProcessVariable SupportedVariableType SuspectDuplicateAnnotation Synonym
TabularColumnType TabularColumn TabularFileColumn TabularSchemaType TargetForAction Task Taxonomy Team TeamLeader TeamLeadership TeamMember TeamMembership TeamStructure TechnicalControl Template TermAnchor TermAssignmentStatus TermCategorization TermHASARelationship TermISATypeOFRelationship TermTYPEDBYRelationship TermRelationshipStatus Threat ToDo ToDoSource ToDoStatus Topic TopicSubscribers TransientEmbeddedProcess Translation TypeEmbeddedAttribute
ValidValue ValidValueAssignment ValidValueDefinition ValidValueMember ValidValuesImplementation ValidValuesMapping ValidValuesSet ValueCategory VerificationPoint VerificationPointDefinition VirtualContainer VirtualMachine VirtualConnection
License: CC BY 4.0, Copyright Contributors to the ODPi Egeria project.