ISO/IEC JTC1/SC21/WG3 N Source: USA Date: October 18, 1991 The following persons participated in the development of this report: Roger Burkhart Deere & Company Scott Dickson Ontek Corporation John Hanna Vitro Corporation Sandra Perez Concept Technology, Inc. Tony Sarris Ontek Corporation Madhu Singh Bell Communications Research John Sowa IBM Corporation Cliff Sundberg Digital Equipment Corporation Acknowledgment: Robert Meersman, a contributor to current and earlier work on ISO IRDS and database standards, contributed to description of the conceptual schema layers and to interpretation of ISO background documents. TABLE OF CONTENTS 1.0 Introduction 2.0 Scope 3.0 Definitions and Abbreviations 4.0 Foundations of the Conceptual Schema 5.0 Roles of the Conceptual Schema 6.0 Conceptual Schema Taxonomy 6.1 Layers of Schema Definition 6.2 Types of Model Expressiveness 6.3 The Three Schema Architecture 6.4 Views and View Integration 6.5 Life Cycle Phases 6.6 Model Generality 7.0 Levels of Description 8.0 Conceptual Schema of the IRDS 9.0 Schema Language Framework 10.0 References 1.0 Introduction 2.0 Scope An IRDS describes and manages an organization's information system resources and information describing other entities such as engineering designs, equipment, manufacturing processes, operational procedures, databases and documents. It serves as a repository for all the information about these resources that users and computer systems need to share. Because this information comes from many sources both within and outside the enterprise, it is important that it be sharable on as wide a basis as possible. The IRDS supports the integration of an enterprise's information environment and information models conforming to national and international standards. The IRDS conceptual schema specifies the basic concepts, definitions, rules and integration algorithms that make this integration and sharing possible. An IRDS serves as a communications path between senders and receivers of information who may be widely separated in time and place. Communication cannot take place, however, unless the sender and receiver share a common understanding, or interpretation, of the data they transfer. The IRDS puts the meaning of information at the core of its design, and keeps this interpretation entirely separate from details of its representation, storage, processing or presentation. The IRDS separates the conceptual view of information so that it can be used and applied in many different ways. It can be presented in multiple external forms, for either users or computer programs, and it can be stored and processed using any selected technology, including separate systems such as database managers. Because it may be used across many different industries and for many needs throughout the systems life cycle, the IRDS is based on a generic design that is not constrained by current technology or a short-term vision of needs. The contents of an IRDS are entirely customizable, and the range of information systems it describes is unrestricted. This means that its conceptual schema must be general enough to cover the needs of any potential user community. A solution to the need for generality is to base the IRDS conceptual schema on the basic structures of meaning inherent in any attempt to communicate. This approach assumes that the structures of meaning captured by an information system are the same as those used by people when they communicate. The IRDS need not discover these principles on its own, but can instead draw from established sources in logic, linguistics, mathematics, computer science and philosophy. The IRDS conceptual schema must be based on such foundations to remain neutral for any selected application. 3.0 Definitions and Abbreviations Application Schema - A schema in which domain-specific types of objects and the rules obeyed by those objects are described. Conceptual Model - The composite of the conceptual schema and the information base which together describe objects in a universe of discourse. Conceptual Schema - An ontology for the objects and relationships belonging to a universe of discourse along with necessary propositions about those objects and relationships. IRDS Definition Schema - A schema in which primitive concepts, object types and operations, and fundamental lexical and syntactic categories are defined. Collectively these define the basic modeling capability of an IRDS. IRDS Normative Schema - A schema which defines formal modeling constructs available for defining models and for translating between models. Layers of Schema Definition - An approach to describing the IRDS Conceptual Schema. This approach divides the IRDS Conceptual Schema into four layers each having unique roles and characteristics. Level of Description - The relative position of a conceptual model in a series of conceptual models that describe each other. Modeling Schema - A schema that defines a framework or system for capturing abstract conceptual content of a model. Ontology - A system for classifying everything that exists in a universe of discourse. Proposition - A description of a state of affairs. Formally, a proposition is a primitive, abstract entity on which logical operations may be performed. Universe of Discourse - Those entities and happenings that have been, are or ever might be and about which there exists a collection of represented information having a common understanding. 4.0 Foundations of the Conceptual Schema The IRDS conceptual schema is founded on the assumption that the meaning of information can be separated, at least conceptually, from the specific forms or languages used to represent it. This separation is a necessary base for managing the conceptual schema as a separate resource, and for independently mapping the conceptual schema to its external and internal interfaces. Talking about the conceptual schema can be somewhat difficult, since it is separate from language, yet some language must always be used to conduct the discussion. The first requirement in defining the conceptual schema is to understand precisely its relation both to language and to the objects belonging to an application domain. These relationships were expressed by Ogden and Richards [1] in a diagram called the meaning triangle: Figure 1. The Meaning Triangle. Each of the points on the triangle indicates a separate component that may be involved in thought or communication. The Object is any entity >From some real or imagined world about which an idea is held. The Concept is the idea or thought of the object as held in the mind of a person. The Symbol is an auditory, visual, or other form of utterance which is taken to stand for the object when communicated as part of a language. Any one of these components may be present without the others. An object does not depend on ideas formed about it, a concept may be formed about an object which does not exist (such as a unicorn), and a symbol may be held without knowing what object it stands for (such as a word from a foreign language). The relation between symbol and object is not direct, but is imputed as the combination of the relation between symbol and concept and the relation between concept and object. The conceptual schema is concerned with defining the concepts that lie at the center of meaning and which are separate both >From the symbols that express them and the objects they refer to. The corners of the triangle each represent a single symbol, concept, or object that may be involved in communication. Typical application domains, however, contain a rich assortment of objects that requires a complex structure of concepts and symbols to describe them. The meaning triangle can be extended to show these larger collections of elements: Figure 2. Collections of Meaning Elements. The application domain is the collection of all real or imagined objects of interest. These objects include not only particular individuals, but changes to these objects and associations between them. The conceptual schema is the collection of types and generic rules for objects that may exist in a domain, and the information base is the collection of concepts for the individual objects that exist in the domain. The terminology of conceptual schema and information base is derived from the ISO technical report TR9007, "Concepts and Terminology for the Conceptual Schema" [2]. Language is a structure of symbols used to communicate concepts, either general concepts belonging to the conceptual schema, or concepts of individuals belonging to the information base. As a structure of communicated symbols, language is the only element of the meaning triangle that can be stored and processed in a computer system. Forms of representation as computer data are specialized forms of language. The conceptual schema and information base correspond to ideas held in somebody's mind; they must be reduced to a particular form of language before they can be processed by a computer. The role of computer processing is to manipulate language strings, including translation >From one form of language to another. The final interpretation of data processed by a computer can be performed only by a person to whom they are presented externally. 5.0 Roles of the Conceptual Schema A conceptual schema identifies the types of objects that exist in some domain of interest and the rules these objects must obey. (See the ISO technical report TR9007 [2], for an extended treatment.) There is not just one conceptual schema, but many different ones, each defined by the particular domain of interest to which it applies. The way its domain is selected defines a variety of roles for the conceptual schemas managed by an IRDS. For the IRDS, the domain of interest consists of the information systems used by an enterprise, along with the structure and behavior of the enterprise itself and its surrounding environment. In principle, the IRDS may be used to record any information about these resources that an enterprise chooses. The content is entirely open-ended so that an enterprise can define anything it wants. An initial role of the IRDS conceptual schema is to define the information stored in the IRDS, including these customized contents. In this role, the IRDS conceptual schema is much like the data definition facilities of a traditional database system. The IRDS can always be used as a database management system, the contents of which happen to be descriptions of other information systems. Unlike some database systems, IRDS requires the ability to dynamically modify the definition of its content as it evolves, and it is generally much richer in the structure of its conceptual schema. In some respects, however, the IRDS can be regarded as a database that contains information about information resources or anything else of interest. Because the subject matter of the IRDS consists of information resources, much of its content is subject to potential standardization. Most enterprises rely on standard techniques to model the enterprise and specify its information and other systems. There is an assortment of standard models for documenting the requirements of a system and for defining its logical and physical design. An additional role of the IRDS conceptual schema is to support the definition of standardized contents. Because potentially standard contents can come from many different sources, the IRDS must provide a generic framework in which any portion of an information resource description can be standardized by a responsible organization. Contents which are so general as to apply to any information system, or which are widely used but not represented by a formal standards group, are likely to be standardized as an inherent part of IRDS. An important role of the IRDS conceptual schema architecture is to supply an overall framework in which particular standards can be positioned and related to each other. Certain aspects of an information system are so basic that there is no escaping them even in a minimal description. One of these is its conceptual schema, along with the implementation of the conceptual schema in internal and external interfaces. The techniques the IRDS supplies for defining its own conceptual schema can be used just as well to specify the conceptual schema of another system. Using the IRDS to detail the conceptual schema of an information system, along with its external and internal interfaces, yields several major advantages. One compelling advantage from describing conceptual schemas of the enterprise is the ability to integrate partial views of the enterprise. Each information system deals with only a portion of the enterprise, which typically overlaps portions of the enterprise covered by other systems. Each such partial view may adopt its own conventions or rules for the objects under discussion, and may represent these rules under a variety of formalisms or languages. Because the IRDS breaks the conceptual schema down to its basic elements of meaning, the IRDS provides the basis for deciphering what is really meant by all these partial views and for specifying how they relate to each other. Providing the ability to map views to each other and to automatically translate between them is a major goal that drives reduction of the IRDS conceptual schema to its fundamental level. The IRDS conceptual schema defines a canonical form in which to capture the meaning of all enterprise views, which can then be expressed in a variety of forms. Once captured in this way, the conceptual schema becomes a major resource for the enterprise in its own right. It has many potential uses not only for supporting information systems, but for understanding, controlling, and managing the enterprise. A more technical use of the conceptual schema is to specify the function of an information system separately from its implementation. By rigorously separating the "what" of an information system from its "how," the conceptual schema enables the selection of any technology that can best perform the job. If the description of the technology is complete enough, the IRDS can even be used to locate data at run time and to retrieve or process data using its own standard interfaces. Carried to this extent, the IRDS becomes a virtual database manager for all the enterprise information. 6.0 Taxonomy of Conceptual Schemas The IRDS conceptual schema taxonomy is a system for classifying the variety of conceptual schemas managed by the IRDS. This taxonomy establishes several independent dimensions for classifying the IRDS contents. Following are these basic dimensions: 1. Layers of Schema Definition 2. Types of Model Expressiveness 3. Three Schema Architecture 4. Views and View Integration 5. Life Cycle Phases 6. Model Generality These dimensions are each independent systems of classification for possible contents of the IRDS conceptual schema. Because they are independent, they may be combined with each other to define many fine-grained subdivisions of the total content. The most fundamental dimensions are listed first, followed by dimensions useful in the specific context of an IRDS. The following sections present each of these dimensions. 6.1 Layers of Schema Definition Layer 1 - IRDS Definition Schema The IRDS Definition Schema defines the primitive concepts, object types and operations, and fundamental lexical and syntactical categories that define the basic modeling capability of an IRDS. It contains the primitives used to define the IRDS Normative Schema layer. All IRDS modeling constructs are ultimately defined in terms of the theoretical constructs defined by this layer. These constructs are taken from philosophy, logic, linguistics, mathematics, computer science and other disciplines. This layer supplies a theoretical foundation for the normative schema; it has no direct operational role within the IRDS. It captures the basic structures of meaning that implicitly lie behind any attempt to communicate by either natural or formal languages. It is expected that there are multiple sets of primitives that can capture these structures, with each set being internally complete and consistent, but equivalent to the other sets. Equivalence of sets means that the primitives of one set can be defined using the primitives of the other, and vice versa. The IRDS standard will note these equivalent formulations, but will select and use one of them as the basis for defining the constructs of the IRDS Normative Schema layer. The ultimate definition of the modeling primitives in this layer can only be expressed informally, using as precise a form of language as possible. All meaning captured by the IRDS is ultimately reducible to these constructs, but their primitive level would likely make such a reduction burdensome and inconvenient. No construct is included in this layer if it can be defined as a combination of other constructs; this is what it means for a construct to be primitive. To support the definition of the IRDS Normative Schema and subsequent layers, the primitives must include an ability to define further constructs that are abbreviations or macros for combinations of these primitives. A language is supplied for specifying these macro constructs. This language, called the Defining Language, is used to define the constructs of the subsequent normative schema layer. Example contents: Object, Type, Type-Instance Relation, Proposition, Event, Symbol, Lambda Abstraction Layer 2 - IRDS Normative Schema The IRDS Normative Schema supplies the complete set of formal modeling constructs available for defining models used with the IRDS. This model supplies a common interpretation, or semantics, for all modeling languages used with the IRDS. It supports the unification of models expressed in different languages, and may be used to translate between them. The constructs include all those of the IRDS Definition Schema layer plus additional ones which are not primitive but are defined for convenience in unifying a wide variety of common models. The constructs of the IRDS Normative Schema are fixed as part of any given version of the IRDS standard. The constructs belonging to this layer are fully specified as part of the IRDS standard. There is no specific rule to decide which constructs must be included in this layer, but at a minimum any construct included in two or more modeling frameworks is a candidate. The main role of this layer is to make unification between modeling frameworks more convenient than unification directly at the layer of primitives would be. No essential modeling capability can be lost from an incomplete selection, since all the primitives are automatically included. Constructs belonging to this layer are defined either directly in terms of primitives from the IRDS Definition Schema layer, or in a language that can express all the contents of this layer. The IRDS directly supports such a language, called the IRDS Normative Language, to permit the input and mapping definitions for the modeling schemas in layer 3. Example contents: Attribute, Relationship, Subtype, Aggregation, Process, Trigger Layer 3 - Modeling Schemas Each Modeling Schema defines a framework or system for capturing the conceptual content of a model. Each such modeling system is a set of constructs and their definitions that express all or some of the modeling capability defined by the IRDS Definition Schema layer. Each such definition can also refer to the predefined constructs supplied by the IRDS Normative Schema layer. These models are called modeling schemas because their domain is concerned with process of modeling itself, separate from details of any particular domain being modeled. Many modeling approaches will be needed to cover the variety of IRDS application domains. While these modeling approaches may all be equivalent in some fundamental sense, particular ones may be considerably more convenient or familiar in specific modeling domains. Once defined and registered as part of the IRDS, these modeling frameworks may be used to specify information systems that can still be integrated with systems defined under other frameworks. Modeling schemas are defined using a specification language provided by layer 2. Modeling schemas may be standardized by the user of the IRDS, the supplier of the IRDS, and by any national or international standardization group. The modeling schema defined as part of the IRDS standard will be those that represent widely used approaches but lack a clearly identifiable standards group to take responsibility for its definition. Example contents: E-R Schema, Data Flow Schema, SQL Schema, ANS138, Programming Language Schema, ISO IRDS, Object Oriented Schema Layer 4 - Application Schemas Application Schemas define the types of objects that exist in some chosen domain, plus the rules that those objects must obey. These objects refer to both tangible kinds of objects such as airplanes or drawings, and intangible kinds of objects such as plans or allocations. Each such schema may be communicated to or from the IRDS by one or more schema languages, but the IRDS Normative Schema defines an abstract conceptual meaning for the schema that is language-independent. The domain over which the schema applies may be specified as part of a larger domain on which other schemas are defined, and the schema may be subdivided into subschemas that apply to selected portions of its domain. Application domains for real-world problems require extensive type structures to capture the complexity of their object structure and behavior. Types are built out of elementary types, which classify single objects, associations between objects, or changes to objects. Types asserted about objects define propositions, which are the basic units of meaning in an information system. Complex propositions can be built by combining simpler propositions. Types may be specified not only for the static state of some application world, but for changes which occur in that world and for processes that define a sequence of changes. Example contents: Enterprise schemas, Parts of enterprise schemas 6.2 Types of Model Expressiveness Static Models vs. Dynamic Models vs. Higher-Order Models This dimension classifies models according to the expressiveness of their constructs. Rather than being a classification of models, it is really a classification of theoretical constructs in the IRDS Definition Schema layer. Complete models, however, can be built using a subset of the available constructs. Such models utilize only a selected portion of the concepts expressible in language, but these concepts may be adequate for many purposes. The various types of expressiveness can be used to simplify presentation of concepts. One type of expressiveness is defined by static objects, which either do not change or whose change is not described formally by the system. Objects which do not change include mathematical entities such as numbers, or the state of some object as recorded at some point in time. The operations defined on static objects are to assert or test the truth of propositions recorded about them. The static level is captured in the data storage structures of an information system. An additional type of expressiveness is reached when the concept of change is included. Changes to objects can be classified under event types, and event types linked in cause and effect relations. Modeling the dynamic aspects of a domain specifies the processes which can occur in them. The dynamic level is captured by operations that simulate or control the behavior of objects in a domain. Subsequent types of expressiveness are defined when propositions at either the static or dynamic level are defined as objects about which further propositions can be asserted. These include propositions that modify other propositions, such as stating a level of evidence or mode of belief, and reasoning about change and time. This level of complexity is captured in knowledge representation systems and by systems of modal and temporal logic. 6.3 The Three Schema Architecture External vs. Conceptual vs. Internal The conceptual schema has been widely discussed as part of a three-schema architecture for data management, as derived from the work of ANSI/SPARC [3][4]. In this architecture, the conceptual schema is at the middle of a three-level structure, between the external schema, directed to users of the database, and the internal schema, directed to internal storage systems of the database. The IRDS conceptual schema fully supports the three-schema view, and extends its scope beyond data management to include events and processes. Any portion of the conceptual schema, including all the parts defined by the conceptual schema architecture, can be presented in both external and internal forms. Multiple external or internal schemas may be defined, so that the same conceptual schema can be presented in multiple alternative forms either to external users or to the internal machinery of a computer. Both the external and internal schemas, in fact, are mappings of the conceptual schema to a linguistic representation. The basic difference between them is that the external schema is defined for communication with an external user, and the internal schema is defined for direct execution or storage on a machine. Both kinds of presentation may take many alternative forms, from strings of text to graphical displays to messages on a network or inside a computer. Both may include events in addition to static information; a keystroke or mouse click in a user interface is as as much a linguistic event as a spoken word, and a process of transforming data in a computer system can simulate events occurring in an application domain. The external and internal schemas are not properly a part of the conceptual schema they communicate, but they are closely related to it. Given any conceptual schema, the application domain can be expanded to include not only the objects in the application domain, but the external or internal forms used to communicate about these objects. The forms of representation can also be selected as special domains for detailed description. A new conceptual schema can be established to cover the structures of symbols that communicate the original domain. The meaning triangle discussed earlier can be used to illustrate the relationship of this new conceptual schema to the original one: Figure 3. Conceptual Schema for a Representation. In the IRDS conceptual schema architecture, the layers of abstraction are defined by their subject matter, as established by the contents of the application domain at each layer. The schemas for languages used to communicate about these domains define a separate subdivision of the conceptual schema content. By convention, the domain at each application layer is considered to include the forms of representation used to communicate about it. Each layer is defined by a core conceptual schema stripped of all representation issues, plus additional specialized schemas for each form of external or internal representation. Following is an illustration of this structure: Figure 4. External and Internal Subschemas For A Conceptual Schema. Languages for communicating information are important potential candidates for standardization. For any such language, both the information it communicates and the symbols it employs to communicate the information must be specified. The information communicated defines the semantics of the language, and the structure of symbols defines the grammar or syntax of the language. The semantics can be based on the core structures of meaning defined by the IRDS conceptual schema, but each language must specify its own conventions for the mappings between symbols and the meanings they carry. Even though the external and internal schemas always include the definition of a linguistic or representational form, this does not exhaust their content. Each schema can include further information about the uses of the interface or what lies behind them. The external schema, for example, can include information about users of a particular interface, their skill level, or expectations. The internal interface can include extensive information about performance of machines in manipulating its representations, or any other implementation-related information. Compiling a specification into a directly executable form is a special case of mapping from a conceptual to internal schema, and embeds many assumptions beyond merely representational ones. The external interface should be understood as including all aspects of how an information system relates to its external environment, and the internal interface should be understood as encompassing all aspects of the technology base on which it is implemented. 6.4 Views and View Integration The conceptual schema for an entire enterprise is large and complex. Given its scale, many persons must contribute to its development over many years. As the enterprise changes, its conceptual schema must evolve to reflect new rules and structures for operating. There is no global perspective that can capture the many kinds of activities and components the enterprise includes. To be practical, the conceptual schema must support the definition of partial views that can be defined independently of each other, and yet be combined later when their overlap or relationship to each other is discovered or resolved. A view is part of a conceptual schema, whose relationship to the rest of the schema is either known or still unspecified. Like any part of a conceptual schema, a view can also be presented in many different linguistic forms, but this mapping is classified under the three-schema distinction and should not be regarded as defining a separate view. As defined here, a view is simply a subschema belonging to some larger schema. A view can be related to the rest of the conceptual schema in many different ways. It can select a different but equivalent set of defined modeling constructs in which to express its information. The modeling constructs can be used to express its contents as being wholly dependent on the contents of another view. Separate views that contain the same information can be mapped to a third view to explain their precise relationship to each other. Integration of separate views is major role for the IRDS. The mappings between views can be chosen as a specialized domain, and detailed structures of information built to describe the mappings. All integration between views is specified by relating them to a common set of underlying concepts. The meaning of these concepts is ultimately defined by the core set of concepts from the IRDS Definition Schema Layer. 6.5 Life Cycle Phases The development of the total enterprise system is an important process that needs to be managed over time. The development of an information system can be split into distinct life cycle phases according to the amount and kind of information a specification contains and the tasks that must be performed on it. This spectrum extends from front-end analysis and design through all stages of physical design, operation, and support for previously deployed versions. The types of system specifications and the transformations that take them from stage to stage can be defined as part of an application model for the systems development process. Many of these transformations require the addition of information by the systems developer, but some of them can be partially or wholly automated if the description is complete enough. The application model can retain a complete forward and backward trace of the steps by which the specification was generated. The life cycle needs do not end once the initial development of a system has ended. Systems evolve in response to changes in requirements or the most effective solution. The facilities of the IRDS must be complete enough to handle the version and configuration control needs of the systems development process. Evolution of a system from one version to another can be modeled by defining each version as a view that derives in part from the preceding version. The facilities of the IRDS conceptual schema are complete enough for an entire existing information system to be modeled as part of a new system. This provides a powerful capability for subsuming old systems or for emulating their functions and interfaces. 6.6 Model Generality Generic vs. Industry vs. User Conceptual schemas differ in the generality of the domain over which they apply. Enterprises are not all different; most enterprises contain processes and substructures similar to those of other enterprises. As a practical matter, there is no reason every enterprise should repeat the specification of portions of its business that match those of other businesses. Additionally, an enterprise does not exist in isolation. Many of the processes it conducts include interaction with other enterprises. To communicate between enterprises, they must share a portion of their conceptual schemas. The generality of a domain is measured by the number of times it occurs. A general domain is encountered many times, either within an enterprise or across many different enterprises. By sharing conceptual schemas, the cost of formalizing the conceptual schema for a general domain can be invested only once but recouped many times. Additionally, the users of a shared conceptual schema will have already established the basis for their communication. The formal construction of the conceptual schema clarifies the precise meaning of the information that users or enterprises communicate. In addition, the ability to build and integrate conceptual views provides a high degree of flexibility in adopting a general schema as a core definition but extending and customizing it with local definitions. Because it decouples specifications from the entanglements of language, technology, and interfaces, a conceptual schema based approach is the most promising foundation for widespread software reuse. General models result not merely from omitting details that differ, but >From analysis that identifies fundamental building blocks that can be assembled in many different ways. In-depth analysis can result in a wide assortment of basic concepts that virtually everybody shares in common. The structures of meaning at the core of the IRDS conceptual schema are one such example, but other universally shared concepts include those that describe the physical world and the basic ways that people organize and plan their activity. An IRDS should not be regarded as an empty container into which an enterprise must pour everything that fills it. The initial population of standardized concepts may become an important measure for an IRDS system. These standardized concepts can flow from many sources. The concepts that apply across any enterprise, or are used to communicate between enterprises, would likely be established by a formal standardization process. The concepts needed for specific application domains could be established by industry groups dealing with that domain (e.g. automotive, aerospace/defense, consumer electronics, etc.). Users can also represent concepts particular to their enterprise. In specialized domains for which public standardization is not possible, an enterprise could standardize its own concepts for use within the enterprise or with selected partners. A conceptual schema based approach provides a complete and rigorous foundation for expanding such efforts, and an IRDS can assist in managing and integrating the models they produce. 7.0 Levels of Description A conceptual schema supplies basic definitions for the content of an IRDS, but an IRDS holds more than just conceptual schemas. A conceptual schema specifies the fundamental types for classifying objects in some domain, along with rules that define proper usage of those types. To describe the actual contents of a domain, the conceptual schema must be supplemented by a collection of instances it applies to. The ISO technical report TR9007 [2] refers to the collection of instances as the information base for a conceptual schema. In this report, the combination of a conceptual schema and its associated information base will be defined as the conceptual model for a domain. In the ISO technical report TR9007, both the conceptual schema and the information base are related formally to a universe of discourse containing the objects they describe. The universe of discourse is the subject matter for the entire conceptual model. This subject-matter relation is entirely distinct from any internal structure within the conceptual model. For example, the internal structure of a conceptual schema, including all the dimensions of the conceptual schema taxonomy, is contained within the model, as is the population of objects identified by the model and related to the schema through the type-instance relation of the IRDS Definition Schema. The subject-matter relation positions the entire conceptual model in its containing environment, by specifying the universe of objects that it describes. The information systems belonging to an enterprise comprise a conceptual model, which has as its subject matter the people, processes, entities, or relationships that belong to the enterprise. For an IRDS, the principal subject matter is ordinarily the information systems constructed by an enterprise, but may also include the enterprise itself as necessary to explain or manage its information systems. The IRDS is also an information system that may be described by the same or different IRDS. The use of information systems to describe other information systems is a special case of the recursiveness with which the domain of a conceptual model may be defined. A new conceptual model is defined whenever a new collection of objects is identified to serve as its subject matter. Once defined, the conceptual model itself, along with each of its elements, can be identified as objects belonging to a new universe of discourse. A new level of conceptual model can be constructed that has the previous conceptual model as its universe of discourse. The recursiveness of domain definition means that any fixed prescription of levels of IRDS description is overly restrictive. An IRDS can be used to describe any information system, regardless of what that information system describes. Though normally used to describe the information systems of an enterprise, an IRDS can also be used to describe the enterprise itself, or another IRDS. The description of one IRDS by another, any number of times, can be useful to deal with heterogeneous or distributed systems. To provide upward compatibility, a newer IRDS could be used to describe an older installed version. An IRDS can also be used to describe itself. A level of description is defined by the pair of a conceptual model and the domain it describes. While the number of levels is not fixed, some basic levels can be defined by their relation to the enterprise and to the definition of the IRDS itself. These levels can show the most likely positioning of an IRDS in an overall information system architecture. Working up from the basic level of the enterprise, four distinct levels can be identified: 4. IRDS Definition 3. IRDS Example Contents 2. Information Systems ... 1. Enterprise The Enterprise level consists of the actual people, processes, entities, and relationships that information systems are ultimately constructed to describe or control. The Information Systems level consist of the databases and processing systems that hold encoded descriptions of the enterprise objects. The IRDS level describes these systems and may establish partial or total control over them. The IRDS level can also hold information about the enterprise itself. The IRDS level may be split as many times as necessary for one IRDS to describe another. The top level, IRDS Definition, defines the structure and capabilities of an IRDS. If an IRDS contains a description of itself, this level consists of the subset of the IRDS that is used to define itself. The definition of an IRDS in a standard is another form of description at this level. Each IRDS level in this structure can be a complete conceptual model, containing both the types of a conceptual schema and the instances of a corresponding information base. The inclusion of both types and instances within an upper level is a departure from earlier approaches to defining information systems levels, such as those of the ISO IRDS Framework [5] and the draft ISO Reference Model of Data Management [6]. These define an upper level as containing only the types or schema for the lower level of a level pair. The departure from earlier approaches is due to the expansion of IRDS scope from an Information Resource Dictionary System to a system for building a comprehensive description of information resources and their surrounding enterprise environment. An information systems dictionary limits the role of an upper level to defining types for a lower level, but an expanded IRDS supports open-ended description of lower levels by upper levels, including information about particular instances. Metadata about both types and instances may be distributed throughout upper levels. For example, the directory component of an IRDS specifies locations or addresses for particular instances belonging to other systems. Support for three-schema views might require the IRDS to record the particular format in which a particular user wants to see some piece of information. On the implementation side, a database might be recorded as residing on a particular host machine having a particular network address. An IRDS could also hold a history of changes to information contained in a base system. While there is a natural progression in limiting upper levels to have less information than lower levels, an expanded IRDS does not enforce this limitation, and does not exclude instance information from upper levels. Each level provides a complete conceptual model of the lower level, and the description of one level by another can be repeated as many times as necessary. 8.0 Conceptual Schema of the IRDS An IRDS can be one of the information systems that belongs to an enterprise. All capabilities the IRDS provides for describing other information systems can equally well be used to describe itself. The IRDS conceptual schema and its external and internal interfaces can be defined using the same facilities as provided for direct use by the enterprise. The self application of IRDS facilities is a major simplification in the architecture of the IRDS: it means that one generic set of facilities can be furnished to satisfy all needs, and that IRDS facilities are tested and exercised as an inherent part of IRDS development. The core part of the IRDS conceptual schema describes the structure of schemas managed and administered by the IRDS, including the composition, organization, behavior, construction, evolution and presentation of the elements that make up these descriptions. Generic portions of the IRDS conceptual schema describe means for interpreting the contents of modeling schemas, application schemas, and even application instances in a unified manner across all modeled enterprise domains. Unified interpretation views include the directory, dictionary, thesaurus, and encyclopedia. The IRDS conceptual schema describes how to derive these views from the framework of IRDS models, including those that represent the business, technology, system, and implementation aspects of an enterprise and its products. Other parts of the complete IRDS conceptual schema describe the external and internal schemas that define the IRDS services and technology interfaces. External schemas of the IRDS describe communication interfaces to application programs, other IRDS instances, or human users. Internal schemas describe the interfaces to storage and processing systems within the implementation domain of an IRDS. The external and internal interfaces support a variety of abstraction levels and modes of interaction. Multiple IRDS instances may be needed by an enterprise to support decentralized development and control of conceptual schemas. Each instance provides conceptual integrity for a designated enterprise domain. IRDS domains would be selected to provide appropriate islands of independent development while facilitating the enterprise-wide integration of understanding between domains. Enterprise policy concerning the balance between top-down control and bottom-up learning could also be implemented through the IRDS. One IRDS could be used to integrate or control the contents of another through their external services interfaces. The services interface component of the IRDS encompasses both traditional import-export mechanisms and a new class of IRDS management services. These services include, but are not limited to object version and configuration management, work flow and life cycle control mechanisms, security policy enforcement, generalized inquiry facilities, and directory management services. The conceptual schema of IRDS provides the definition capability to support enforcement of IRDS management service policies. The simplest IRDS services interface functionality is data import-export through file transfer. Enhanced functionality supports shared access to information resources through a query/ response dialog. More complex is the interaction between multiple IRDS instances to identify a common means for communicating about their domains. And finally, the most complex interaction is between an IRDS and human user for learning about existing models, or for adding new descriptions of enterprise knowledge and experience. 9.0 Schema Language Framework The IRDS conceptual schema must satisfy a wide variety of requirements: readability by practitioners; a rigorous foundation that would satisfy theoreticians; and enough expressive power to support all existing conceptual schema languages. No single language can meet all these requirements simultaneously. Instead, the IRDS standard permits different languages to be used to express a conceptual schema: 1. Existing schema languages: Many languages have been implemented in vendor systems, and many others have been proposed in the research literature. These include the ISO IRDS SQL Data Model, Express, Entity-Relationship diagrams, NIAM diagrams, many vendor languages, and research prototypes. The IRDS conceptual schema languages must support and coexist with these languages. 2. Defining language: For theoretical precision, the semantics of the conceptual schema languages are defined by their mapping to a language with a model-theoretic semantics, such as a version of predicate calculus. The defining language is used only to establish the foundations of the other conceptual schema languages, and no practitioner is ever required to learn it in order to read or write a conceptual schema. 3. Normative languages: Since the defining language must have a limited number of primitives in order to simplify the semantic foundations, it is likely to be too low-level a language to provide all the features needed in a practical conceptual schema language. To support all those features, normative languages are defined. The semantics of a normative language is defined by its mapping to the IRDS Normative Schema. It will be rich enough to include a superset of the semantics of all existing schema languages. Of the existing schema languages, Conceptual Graphs [7][8] were selected as a basis for developing the initial IRDS Normative Language. Conceptual graphs are a highly readable graphic language with a well-defined theoretical basis, a 15-year history of research publications, world-wide user community, and commercial applications in production use. Because conceptual graphs have a direct mapping to predicate calculus, and since the defining language is a version of predicate calculus, a normative form of predicate calculus is defined as an additional normative language. Other normative languages can be defined provided they provide a complete mapping to the semantics defined by the IRDS Normative Schema. For readability by programmers and designers who have not been trained in formal logic, a conceptual schema is definable in a stylized version of some natural language, such as Structured English or Structured French. These stylized languages would have an unambiguous syntax, and their semantics would be defined by a formal mapping to the IRDS Normative Schema. Figure 5. Conceptual Schema Languages The schema languages on the right of the diagram have been used to express conceptual schemas for various systems. The normative language must be rich enough to include a superset of the semantics of all of them. The defining language, however, would be a more primitive version of predicate calculus, and some constructs in the normative language may have to map to complex expressions in the defining language. The solid arrow from the schema languages on the right indicates that all of their semantics can be fully represented by constructs in the normative language. The dotted arrow from the normative language to the schema languages on the right indicates that some constructs in the normative language might not be mappable into certain schema languages, since many of them have limited expressive power. For readability, every construct in the normative language has an equivalent representation in a stylized natural language, including versions for English, French, and others. The readability of conceptual graphs, their great expressive power, and their smooth mapping to and >From natural languages are among the reasons for proposing them as a basis for the IRDS normative language. 9.1 Conceptual Graphs as a Normative Language The normative language is based on conceptual graphs, but it also supports layers of representations to accommodate type hierarchies, entity-relationship diagrams, and NIAM diagrams. In effect, each of these other notations can represent a view of the conceptual schema at a different level of detail. A type hierarchy has the least amount of detail. It shows the entity types with their subtype-supertype links. Entity-relationship diagrams show the permissible relations between types and the cardinality constraints on the relations. NIAM is essentially a superset of the two: it combines type hierarchies with cardinality constraints on relations. By showing inheritance through the type hierarchy, NIAM can also support object-oriented schemas. The difference between conceptual graphs and the other graphic notations is based on the distinction between first-order logic and higher-order logic. First-order logic describes instances of entities and relationships, as in the sentence "Bob has a green Volvo". With quantifiers such as "every" and "some", first-order logic can also state general principles: "Every vehicle has some color." The following diagram shows conceptual graphs and predicate calculus notations for these two sentences: Figure 6. Conceptual Graph Notation. In a conceptual graph, boxes represent concepts and circles represent conceptual relations. The POSS relation represents possession, and the ATTR relation represents attribute. Conceptual graphs also have a fully equivalent linear notation that uses square brackets for the boxes and parentheses for the relations. Following is the linear notation for the two graphs in Figure 6: [PERSON: Bob]->(POSS)->[VOLVO]->(ATTR)->[COLOR: green]. [VEHICLE: @every]->(ATTR)->[COLOR]. Second-order logic makes statements about types of entities and relationships, such as "CAR is a subtype of VEHICLE" or "VEHICLE has attribute COLOR." Whenever a fact can be stated in either a first-order or a second-order form, the second-order version is usually shorter: First-order: (Ax)(car(x) -> vehicle(x)). (Ax)(Ey)(vehicle(x) -> (color(y) & attribute(x,y))). Second-order: car < vehicle. has_attribute(vehicle, color). As these examples illustrate, a simple second-order statement can express information that requires quantifiers in first-order logic. Because of the absence of quantifiers, many proofs in this restricted version of second-order logic can be simpler and faster than in first-order logic. From the statements that every car is a vehicle and every vehicle has a color, first-order logic can derive the conclusion that every car has a color: (Ax)(Ey)(car(x) -> (color(y) & attribute(x,y))). This inference requires a multistep proof, but the equivalent inference in second-order logic follows by the simpler principle of inheritance: since CAR is a subtype of VEHICLE, all attributes of VEHICLE are inherited by CAR: has_attribute(car, color). Second-order logic with quantifiers over types can become highly complex, but the restricted version without quantifiers is a concise and efficient language for expressing many kinds of constraints. Some graphic notations, such as type hierarchies, entity-relationship diagrams, and NIAM diagrams, express second-order statements without quantifiers. Since these notations are useful for many applications, the normative language does support them. Conceptual graphs are a general graphic language that can express both first-order and higher-order statements. Following are first-order graphs for the statements that every car is a vehicle and every vehicle has some color: [CAR: @every]- - -[VEHICLE]. [VEHICLE: @every]->(ATTR)->[COLOR]. The equivalents in second-order graphs are statements about types. As in predicate calculus notation, the first-order graphs require quantifiers, but the second-order graphs have no quantifiers: [TYPE: car]<-(SUBT)<-[TYPE: vehicle]. [TYPE: vehicle]->(HAS-ATTR)->[TYPE: color]. The relation HAS-ATTR between types states that the ATTR relation holds between instances of those types. These second-order conceptual graphs map directly to type hierarchies and E-R diagrams: the SUBT relation represents the subtype links in a type hierarchy, and the HAS-ATTR relation represents the attribute links in an E-R diagram. Similar second-order relations in conceptual graphs can be used to represent every kind of link in an E-R or NIAM diagram. Since conceptual graphs can represent both first-order and higher-order statements with or without quantifiers, they can represent all the constraints that can be expressed in type hierarchies, E-R diagrams, and NIAM in essentially the same way. But they can also state constraints that are not expressible in those notations. Some typical examples include: If a car has color yellow, it is a taxicab. No person under the age of 18 may drive a car that has more than 150 horespower. No vehicle may have more wheels on its front axle than its rear axle. These sentences use negations, if-then rules, comparisons, and constants such as 18 or 150. None of them could be represented in E-R diagrams or NIAM, but all of them could be stated in predicate calculus or conceptual graphs. 9.2 Mapping Grammars For each schema language, there must be a pair of mapping grammars that determine how it is to be mapped to and from conceptual graphs (the normative language). The next diagram shows the languages and their mapping grammars. Figure 7. Mapping Grammars. Since the normative language is more expressive, the mappings from right to left are total mappings, but the mappings from left to right are only partial mappings. Since the normative language can be mapped into stylized natural languages, it is possible to annotate any of the translations with comments in natural language. Suppose, for example, that someone wanted to map E-R diagrams into NIAM. Since NIAM is a more expressive language, it would be possible to map everything in E-R into the normative language and then into NIAM: E-R -> Normative Language -> NIAM But since NIAM is a more expressive language than E-R, not all the information can be mapped from NIAM into E-R. Whatever is not expressible in E-R would therefore be mapped into stylized English: NIAM -> Normative Language -> E-R + English The stylized natural languages can therefore supplement the existing schema languages with comments that are readable by both people and machines. 10.0 References [1] Ogden, C. K., Richards, I. A., The Meaning of Meaning, Harcourt Brace Jovanovich, New York, 1989 (First published 1923) [2] ISO, Concepts and Terminology for the Conceptual Schema, ISO TR9007, 1987 [3] ANSI/X3/SPARC, Study Group on Data Base Management Systems: Interim Report 75-02-08, in ACM SIGMOD Newsletter, Vol 7, No. 2, 1975 [4] Tsichritzis, D., Klug, A. (eds.), The ANSI/X3/SPARC DBMS Framework, Report of Study Group on Database Management Systems, AFIPS Press, Montvale, NJ 1977 [5] ISO, Information Technology - Information Resource Dictionary Standard (IRDS) Framework, International Standard ISO/IEC 10027, 1990 [6] ISO, Information Technology - Reference Model of Data Management, Draft International Standard ISO/IEC DIS 10032, 1991 [7] Sowa, J. F., Conceptual Structures: Information Processing in Mind and Machine, Addison-Wesley, Reading, MA, 1984 [8] Sowa, J. F., Towards the Expressive Power of Natural Language, in Principles of Semantic Networks (J.F. Sowa, ed.), Morgan Kaufmann Publishers, San Mateo, CA, 1991, pp. 157-189