Creating a data warehouse model based on the corporate data model. Corporate databases What elements are included in the corporate data model

Increasingly, IT professionals are turning their attention to data management solutions based on industry standard data models and business decision templates. Ready-to-download complex physical data models and business intelligence reports for specific areas of activity allow you to unify the information component of the enterprise and significantly speed up the execution of business processes. Solution templates enable service providers to leverage the power of non-standard information hidden in existing systems, thereby reducing project lead times, costs and risks. For example, real-world projects show that data model and business decision templates can reduce development effort by 50%.

An industry logic model is a domain-specific, integrated, and logically structured view of all the information that needs to reside in an enterprise data warehouse to answer both strategic and tactical business questions. The main purpose of models is to facilitate orientation in the data space and help in highlighting the details that are important for business development. In modern conditions, for a successful business, it is imperative to have a clear understanding of the connections between the various components and to have a good idea of ​​the overall picture of the organization. Identification of all details and relationships using models allows the most efficient use of the time and tools of organizing the work of the company.

Data models are abstract models that describe how data is presented and accessed. Data models define data items and the relationships between them in a particular area. A data model is a navigation tool for both business and IT professionals that uses a specific set of symbols and words to accurately explain a specific class of real-world information. This allows for better communication within the organization and thus creates a more flexible and stable application environment.


An example of a “GIS for government and local government” model.

Today, it is strategically important for software and service providers to be able to quickly respond to changes in the industry associated with technological innovations, the removal of government restrictions and the complexity of supply chains. Along with the changes in the business model, the complexity and cost of the information technology required to support a company's operations increases. Data management is especially difficult in an environment where corporate information systems, as well as functional and business requirements for them, are constantly changing.

Industry data models are intended to help facilitate and optimize this process, in moving the IT approach to the modern level.

Industry data models from the companyEsri

Esri ArcGIS data models are working templates for use in GIS projects and for creating data structures for different application areas. Data model building involves creating a conceptual design, logical and physical structure that can then be used to build a personal or corporate geodatabase. ArcGIS provides tools for creating and managing a database schema, and Data Model Templates are used to quickly launch a GIS project across a variety of applications and industries. Esri has spent a significant amount of time with the user community to develop a number of templates that can provide a quick start to the design of an enterprise geodatabase. These projects are described and documented at support.esri.com/datamodels. Below, in the order in which they appear on this site, is a semantic translation of Esri's industry model names:

  • Address register
  • Agriculture
  • Meteorology
  • Basic spatial data
  • Biodiversity
  • Interior space of buildings
  • Greenhouse gas accounting
  • Maintaining administrative boundaries
  • Military establishment. Intelligence service
  • Energy (including the new ArcGIS MultiSpeak protocol)
  • Ecological structures
  • Ministry of Emergency Situations. Fire brigade
  • Forest cadastre
  • Forestry
  • Geology
  • National level GIS (e-gov)
  • Groundwater and waste water
  • Health care
  • Archeology and conservation of memorial sites
  • National security
  • Hydrology
  • International Hydrographic Organization (IHO). S-57 format for ENC
  • Irrigation
  • Land Registry
  • Municipal government
  • Nautical navigation
  • State cadastre
  • Oil and gas structures
  • Pipelines
  • Raster storage
  • Bathymetry, seabed relief
  • Telecommunications
  • Transport
  • Water supply, sewerage, housing and communal services

These models contain all the necessary features of the industry standard, namely:

  • are freely available;
  • are not tied to the technology of the “chosen” manufacturer;
  • created as a result of the implementation of real projects;
  • created with the participation of industry experts;
  • are designed to provide information interaction between various products and technologies;
  • do not contradict other standards and regulations;
  • used in completed projects around the world;
  • are designed to work with information throughout the entire life cycle of the system being created, and not the project itself;
  • expandable according to the needs of the customer without losing compatibility with other projects and / or models;
  • accompanied by additional materials and examples;
  • used in guidelines and technical materials of various industrial companies;
  • a large community of participants, while access to the community is open to everyone;
  • a large number of references to data models in publications in recent years.

Esri is part of an expert group of independent bodies that recommend various industry models, such as PODS (Pipeline Open Data Standards - an open standard for the oil and gas industry; PODS is currently being implemented as an Esri PODS Esri Spatial 5.1.1 geodatabase) or a geodatabase (GDB) from ArcGIS for Aviation, which takes into account ICAO and FAA recommendations, as well as the AIXM 5.0 navigation data exchange standard. In addition, there are recommended models that strictly adhere to existing industry standards, such as S-57 and ArcGIS for Maritime (marine and coastal features), as well as models created from the work performed by Esri Professional Services and are de facto standards in the corresponding area. For example, GIS for the Nation and Local Government influenced NSDI and INSPIRE standards, and Hydro and Groundwater (hydrology and groundwater) are heavily used in the freely available professional ArcHydro suite and commercial products. third parties. It should be noted that Esri also supports de-facto standards such as NHDI. All proposed data models are documented and ready for use in enterprise IT processes. Accompanying materials for models include:

  • UML diagrams of relationships of entities;
  • data structures, domains, directories;
  • ready-made geodatabase templates in ArcGIS GDB format;
  • sample data and sample applications;
  • examples of data loading scripts, examples of analysis utilities;
  • reference books on the proposed data structure.

Esri consolidates its experience in building industry models in the form of books and localizes published materials. The following books have been localized and published by Esri CIS:

  • Geospatial Service Oriented Architecture (SOA);
  • Designing geodatabases for transport;
  • Corporate geographic information systems;
  • GIS: new energy for electrical and gas enterprises;
  • Oil and gas on a digital map;
  • Modeling our world. Esri Geodatabase Design Guide;
  • Thinking about GIS. GIS Planning: A Manual for Managers;
  • Geographic information systems. Basics;
  • GIS for administrative and economic management;
  • Web GIS. Principles and Applications;
  • Systems Design Strategies, 26th Edition;
  • 68 issues of ArcReview magazine with publications of companies and users of GIS systems;
  • ... and many other thematic notes and publications.

For example, the book " Modeling our world ..."(translation) is a comprehensive guide and reference for GIS data modeling in general, and geodatabase data model in particular. The book shows how to come up with the right data modeling decisions, decisions that are involved in every aspect of a GIS project, from database design to data and data collection to spatial analysis and visualization Describes in detail how to design a geographic database appropriate to the project, configure database functionality without programming, manage workflow in complex projects, model various network structures such as river, transport or electrical networks, integrate satellite imagery into the process of geographic analysis and display, and create 3D models of GIS data. Book " Designing geodatabases for transportation"contains methodological approaches that have been tested on a large number of projects and fully comply with the legislative requirements of Europe and the United States, as well as international standards. And in the book" GIS: New Energy for Electric and Gas Enterprises"Using real-world examples, it shows the benefits that corporate GIS can bring to the energy supplier, including aspects such as customer service, network operations and other business processes.


Some of the books, translated and original, published in Russian by Esri CIS and DATA +. They address both conceptual issues related to GIS technology and many applied aspects of modeling and deploying GIS of various sizes and purposes.

We will consider the application of industry models using the example of the data model BISDM (Building Interior Space Data Model, information model of the internal space of a building) version 3.0. BISDM is a development of a more general BIM (Building Information Model) model and is intended for use in the design, construction, operation and decommissioning of buildings and structures. Used in GIS software, it allows you to efficiently exchange geodata with other platforms and interact with them. Refers to the general group of FM tasks (management of the organization's infrastructure). Let's list the main advantages of the BISDM model, the use of which allows:

  • organize the exchange of information in a heterogeneous environment according to uniform rules;
  • get a "physical" implementation of the BIM concept and recommended rules for construction project management;
  • to maintain by means of GIS a single repository throughout the entire life cycle of a building (from design to decommissioning);
  • coordinate the work of various specialists in the project;
  • visualize the planned schedule and construction stages for all participants;
  • give a preliminary estimate of the cost and timing of construction (4D and 5D data);
  • monitor the progress of the project;
  • ensure high-quality operation of the building, including maintenance and repairs;
  • become a part of the asset management system, including the functions of analyzing the effectiveness of the use of space (leasing, warehouse, employee management);
  • calculate and manage energy efficiency objectives of the building;
  • simulate the movement of human flows.

BISDM defines the rules for working with spatial data at the level of the internal premises in a building, including the purpose and uses, laid communications, installed equipment, accounting for repairs and maintenance, logging incidents, and relationships with other company assets. The model helps to create a unified repository of geographic and non-geographic data. The experience of the world's leading companies was used to isolate entities and model at the geodatabase (geodatabase) level of the spatial and logical relationships of all physical elements that form both the building itself and its internal premises. Following the principles of BISDM can significantly simplify the tasks of integration with other systems. The first stage is usually CAD integration. Then, during the operation of the building, data exchange with ERP and EAM systems (SAP, TRIRIGA, Maximo, etc.) is used.


Visualization of BISDM structural elements using ArcGIS.

In the case of using BISDM, the customer / owner of the facility receives an end-to-end exchange of information from the idea of ​​creating an object to the development of a complete project, control of construction with receiving up-to-date information by the time the facility is put into operation, control of parameters during operation, and even during reconstruction or decommissioning of the facility. Following the BISDM paradigm, GIS and the GDB created with its help become a common data warehouse for related systems. Often, the GDB contains data created and operated by third-party systems. This must be taken into account when designing the architecture of the system being created.

At a certain stage, the accumulated "critical mass" of information allows you to move to a new qualitative level. For example, upon completion of the design phase of a new building, it is possible to automatically visualize 3D survey models in GIS, compile a list of equipment to be installed, calculate the mileage of utilities to be laid, perform a number of checks and even give a preliminary financial estimate of the project cost.

Once again, we note that when BISDM and ArcGIS are used together, it becomes possible to automatically build 3D models from the accumulated data, since the geodatabase contains a complete description of the object, including z-coordinates, floor membership, types of element connections, equipment installation methods, material, available paths personnel movement, functional purpose of each element, etc. etc. It should be noted that after the initial import of all design materials into the BISDM GDB, there is a need for additional information content for:

  • placement of 3D models of objects and equipment in designated places;
  • collecting information about the cost of materials and the procedure for their laying and installation;
  • control of permeability according to the dimensions of the installed non-standard equipment.

Due to the use of ArcGIS, it is easier to import additional 3D objects and references from external sources, because the ArcGIS Data Interoperability module allows you to create procedures for importing such data and placing it correctly within the model. All formats used in the industry are supported, including IFC, AutoCAD Revit, Bentlye Microstation.

Industry data models from IBM

IBM provides a set of storage management tools and models for a variety of business areas:

  • IBM Banking and Financial Markets Data Warehouse (finance)
  • IBM Banking Data Warehouse
  • IBM Banking Process and Service Models
  • IBM Health Plan Data Model (healthcare)
  • IBM Insurance Information Warehouse (insurance)
  • IBM Insurance Process and Service Models
  • IBM Retail Data Warehouse (retail)
  • IBM Telecommunications Data Warehouse (telecommunications)
  • InfoSphere Warehouse Pack:
    - for Customer Insight (for understanding customers)
    - for Market and Campaign Insight (for understanding the company and the market)
    - for Supply Chain Insight (for understanding suppliers).

For example, the model IBMBankingandFinancialMarketsDataWarehouse is designed to address the specific problems of the banking industry in terms of data, and IBMBankingProcessandServiceModels- in terms of processes and SOA (Service Oriented Architecture). For the telecommunications industry, models are presented IBMInformationFrameWork (IFW) and IBMTelecommunicationsDataWarehouse (TDW)... They help to significantly speed up the process of creating analytical systems, as well as reduce the risks associated with the development of business intelligence applications, corporate data management and organization of data warehouses, taking into account the specifics of the telecommunications industry. The capabilities of IBM TDW cover the entire spectrum of the telecommunications market - from Internet providers and cable network operators offering wire and wireless telephony services, data transmission and multimedia content, to multinational companies providing telephone, satellite, long distance and international communication services, as well as organizations global networks. Today TDW is used by large and small wire and wireless service providers around the world.

A tool called InfoSphere Warehouse Pack for Customer Insight provides structured and easy-to-deploy business content for a growing number of business projects and industries, including banking, insurance, finance, health insurance, telecommunications, retail and distribution. For business users InfoSphere Warehouse Pack for Market and Campaign Insight helps to maximize the efficiency of market analysis activities and marketing campaigns through a step-by-step process of developing and taking into account the specifics of the business. By using InfoSphere Warehouse Pack for Supply Chain Insight organizations have the ability to receive current information on supply chain operations.


Esri's position within the IBM solution architecture.

Particularly noteworthy is the IBM approach to utilities and utilities. To meet the growing demands of consumers, utilities need a more flexible architecture than those in use today, as well as an industry standard object model to facilitate the free flow of information. This will enhance the communication capabilities of utilities, enabling more cost-effective interoperability, and giving new systems better visibility of all the resources they need, no matter where they are located within the organization. The basis for this approach is SOA (Service Oriented Architecture), a component model that maps the functions of departments and services of various applications that can be reused. The "services" of such components exchange data through interfaces without rigid binding, hiding from the user all the complexity of the systems behind them. In this mode, enterprises can easily add new applications regardless of the software vendor, operating system, programming language, or other intrinsic characteristics of the software. Based on SOA, the concept is being implemented SAFE ( Solution Architecture for Energy), it allows the utility company to obtain a standards-based, holistic view of its infrastructure.

Esri ArcGIS® is a globally recognized software platform for geographic information systems (GIS), which provides the creation and management of digital assets of electric power, gas transmission, distribution, and telecommunication networks. ArcGIS allows you to carry out the most complete inventory of the components of the electrical distribution network, taking into account their spatial location. ArcGIS dramatically extends the IBM SAFE architecture by providing the tools, applications, workflows, analytics, and information integration capabilities needed to manage a smart energy enterprise. ArcGIS within the framework of IBM SAFE allows you to receive information from various sources about infrastructure facilities, assets, customers and employees with accurate data about their location, as well as create, store and process georeferenced information about enterprise assets (supports, pipelines, wires, transformers, cable ducts etc.). ArcGIS within the SAFE infrastructure dynamically connects core business applications by combining data from GIS, SCADA, and customer service systems with external information such as traffic intensity, weather conditions, or satellite imagery. Utilities use this combined information for a variety of purposes, from S.O.R. (the overall picture of the operational environment) to site inspection, maintenance, network analysis and planning.

The information components of a utility company can be modeled using several levels that range from the lowest - physical - to the highest, most complex level of business logic. These layers can be integrated to meet typical industry requirements such as automated measurement logging and SCADA management. By building the SAFE architecture, utilities are making significant strides in promoting an industry-wide open object model called the Common Information Model (CIM) for Energy and Utilities. This model provides the necessary basis for moving many enterprises towards a service-oriented architecture as it encourages the use of open standards for structuring data and objects. Due to the fact that all systems use the same objects, the confusion and inelasticity associated with different implementations of the same objects will be reduced to a minimum. Thus, the definition of the client object and other important business objects will be unified across all systems of the power supply company. Now, with CIM, service providers and service consumers can share a common data structure, making it easier to outsource expensive business components as CIM establishes a common base on which to build information exchange.

Conclusion

Comprehensive industry data models provide companies with a single, integrated view of their business information. Many companies find it difficult to integrate their data, although this is a prerequisite for most enterprise-wide projects. According to a study by The Data Warehousing Institute (TDWI), more than 69% of organizations surveyed found integration to be a significant barrier to new applications adoption. On the contrary, the implementation of data integration generates tangible revenue and efficiency gains for the company.

A well-constructed model uniquely identifies the meaning of the data, which in this case is structured data (as opposed to unstructured data such as an image, binary file, or text, where the meaning can be ambiguous). The most effective are the industry models offered by professional vendors such as Esri and IBM. The high return on use of their models is achieved due to the significant level of detail and accuracy. They usually contain many data attributes. In addition, both Esri and IBM have extensive modeling experience and are well versed in building industry-specific models.


Zaitsev S.L., Ph.D.

Repeating groups

Duplicate groups are attributes for which a single instance of an entity can have more than one value. For example, a person can have more than one skill. If, in terms of business requirements, we need to know the skill level for each, and each person can only have two skills, we can create the entity shown in Fig. 1.6. Here is the entity A PERSON with two attributes for storing skills and skill level for each.

Rice. 1.6. This example uses repeating groups.

The problem with repeating groups is that we cannot know exactly how many skills a person might have. In real life, some people have one skill, some have several, and some have none yet. Figure 1.7 shows the model reduced to the first normal form. Note the added Skill ID that each uniquely identifies SKILL.

Rice. 1.7. Model reduced to first normal form.

One fact in one place

If the same attribute is present in more than one entity and is not a foreign key, then this attribute is considered redundant. The logical model should not contain redundant data.

Redundancy requires additional space, but while memory efficiency is important, the real problem lies elsewhere. Ensuring that redundant data is synchronized is overhead, and you always run the risk of conflicting values.

In the previous example SKILL depends on Person ID and from Skill ID. This means that you will not have SKILL until it appears A PERSON, possessing this skill. This also makes it difficult to change the Skill Name. It is necessary to find each entry with the Name of the skill and change it for each Person who has this skill.

Figure 1.8 shows the model in second normal form. Notice that the added entity SKILL, and the attribute TITLE the skill is transferred to this entity. The skill level remained, respectively, at the intersection PERSONS and SKILL.

Rice. 1.8. In second normal form, the repeating group is moved to another entity. This provides the flexibility to add the required number of Skills and change the Skill Name or Skill Description in one place.

Each attribute depends on the key

Each attribute of an entity must depend on the primary key of that entity. In the previous example School name and Geographic area present in the table A PERSON but do not describe the person. To achieve the third normal form, you need to move the attributes to the entity, where they will depend on the key. Figure 1.9. shows the model in third normal form.

Rice. 1.9. In third normal form School name and Geographic region transferred to an entity, where their values ​​depend on the key.

Many-to-many relationships

Relationship many-to-many reflect the reality of the surrounding world. Note that in Figure 1.9, there is a many-to-many relationship between PERSONOUS and SCHOOL... The attitude accurately reflects the fact that A PERSON can study in many SCHOOLS and in SCHOOL can learn a lot PERSON. To achieve the fourth normal form, an associative entity is created that eliminates the monogy-to-many relationship by generating a separate entry for each unique combination of school and person. Figure 1.10 shows the model in fourth normal form.

Rice. 1.10. In fourth normal form, a monogo-to-many relationship between PERSONOUS and SCHOOL resolved by introducing an associative entity, in which a separate entry is allocated for each unique combination SCHOOLS and PERSONS.

Formal definitions of normal forms

The following definitions of normal forms can seem daunting. Think of them simply as formulas to achieve normalization. Normal forms are based on relational algebra and can be interpreted as mathematical transformations. Although this book is not devoted to a detailed discussion of normal forms, modelers are encouraged to take a deeper look at the subject.

In a given relation R, the Y attribute functionally depends on the X attribute. In symbolic form, RX -> RY (read as "RX functionally defines RY") - if and only if each value of X in R is associated with exactly one value of Y in R (at any given time). Attributes X and Y can be compound (Date CJ. Introduction to Database Systems. 6th edition. Ed. Williams: 1999, 848 pp.).

The relation R corresponds to the first normal form (1NF) if and only if all domains belonging to it contain only atomic values ​​(Date, ibid.).

A relation R corresponds to second normal form (2NF) if and only if it corresponds to 1NF, and each non-key attribute is completely dependent on the primary key (Date, ibid.).

The relation R corresponds to third normal form (3NF) if and only if it corresponds to 2NF, and each non-key attribute does not transitively depend on the primary key (Date, ibid.).

The relation R corresponds to Boyes-Codd normal form (BCNF) if and only if each determinant is a candidate for use as a key.

NOTE Below is a brief explanation of some of the abbreviations used in Date's definitions.

MVD (multi-valued dependency) is a multi-valued dependency. Used only for entities with three or more attributes. In a multivalued dependency, the value of the attribute depends only on a portion of the primary key.

FD (functional dependency) - functional dependency. With functional dependency, the value of an attribute depends on the value of another attribute that is not part of the primary key.

JD (join dependency) is a join dependency. In a union dependency, the primary key of the parent entity is traced back to at least the third level descendants, while retaining the ability to be used in the union by the original key.

The ratio corresponds to the fourth normal form (4NF) if and only if there is an MVD in R, for example A®®B. Moreover, all R attributes functionally depend on A. In other words, R contains only dependencies (FD or MVD) of the K®X form (i.e., the functional dependence of the X attribute on the candidate for use as a key K). Accordingly, R meets the requirements of 4NF if it meets BCNF and all MVDs are actually FDs (Date, ibid.).

For the fifth normal form, the relation R satisfies the union dependence (JD) * (X, Y,…, Z) if and only if R is equivalent to its projections onto X, Y, ..., Z, where X, Y ,. .., Z is a subset of the set of attributes R.

There are many other normal forms for complex data types and specific situations that are beyond the scope of this discussion. Any model development enthusiast would like to learn other normal forms as well.

Business normal forms

In his book, Clive Finklestein (An Introduction to Information Engineering: From Strategic Planning to Information Systems. Reading, Massachusetts: Addison-Wesley, 1989) took a different approach to normalization. It defines business normal forms in terms of coercion to those forms. Many modelers find this approach more intuitive and more pragmatic.

The first business normal form (1BNF) takes out duplicate groups to another entity. This entity gets its own name and primary (composite) key attributes from the original entity and its repeating group.

The second business normal form (2BNF) takes out attributes that are partially dependent on the primary key to another entity. The primary (composite) key of this entity is the primary key of the entity in which it was originally located, along with additional keys on which the attribute depends entirely.

The third business normal form (3BNF) takes attributes that are independent of a primary key into another entity, where they are completely dependent on the primary key of that entity.

The fourth business normal form (4BNF) takes attributes that depend on the value of the primary key or are optional to a secondary entity, where they depend entirely on the value of the primary key, or where they must (necessarily) be present in that entity.

The fifth business normal form (5BNF) appears as a structural entity if there is a recursive or other dependency between instances of a secondary entity, or if a recursive dependency exists between instances of its primary entity.

Completed Logical Data Model

The completed logical model must satisfy the requirements of the third business normal form and include all the entities, attributes, and relationships necessary to support the data requirements and business rules associated with the data.

All entities must have names that describe their content and have a clear, concise, complete description or definition. A future post will cover an initial set of guidelines for the correct formation of entity names and descriptions.

Entities must have a complete set of attributes, so that every fact about each entity can be represented by its attributes. Each attribute must have a name that reflects its meaning, a Boolean data type, and a clear, short, complete description or definition. In a future blog post, we will look at an initial set of guidelines for properly formatting attribute names and descriptions.

Relationships should include a verbal construction that describes the relationship between entities, along with characteristics such as plurality, necessity of existence or possibility of absence of a relationship.

NOTE Plurality relationship describes the maximum number of secondary entity instances that can be associated with an instance of the original entity.Necessity of existence orpossibility of absence relationship is used to determine the minimum number of instances of a secondary entity that can be associated with an instance of the original entity.

Physical data model

Once a complete and adequate logical model has been created, you are ready to make a decision about the choice of an implementation platform. The choice of platform depends on the requirements for the use of data and the strategic principles of shaping the architecture of the corporation. Choosing a platform is a complex issue beyond the scope of this book.

In ERwin, a physical model is a graphical representation of a real-world database. The physical database will be made up of tables, columns, and relationships. The physical model depends on the platform chosen for implementation and the requirements for using the data. The physical model for IMS will be very different from that for Sybase. The physical model for OLAP reports will look different from the model for OLTP (online transaction processing).

The data modeler and database administrator (DBA) use the logical model, usage requirements, and corporate architecture policy to develop a physical data model. You can denormalize the physical model to improve performance, and create views to support usage requirements. The following sections detail the process of denormalizing and creating views.

This section provides an overview of the process of building a physical model, collecting data usage requirements, defining the components of a physical model, and providing reverse engineering. In the following publications, these issues are covered in more detail.

Collecting data usage requirements

Typically, you collect data usage requirements early on during interviews and work sessions. At the same time, the requirements should determine as fully as possible the use of data by the user. The superficial attitude and gaps in the physical model can lead to unplanned costs and delay in project implementation. Requirements for use include:

    Access and performance requirements

    Volumetric characteristics (an estimate of the amount of data to be stored) that allow the administrator to represent the physical volume of the database

    Estimating the number of users who need concurrent access to data to help you design your database for acceptable performance levels

    Aggregate, pivot, and other calculated or derived data that can be considered candidates for storage in persistent data structures

    Reporting requirements and standard queries to help the database administrator build indexes

    Views (persistent or virtual) that will assist the user when performing data aggregation or filtering operations.

In addition to the chairman, secretary, and users, the modeler, database administrator, and database architect must participate in the usage requirements session. The user's historical data requirements should be discussed. The length of time that data is retained has a significant impact on the size of the database. Older data is often stored in a generalized form, and atomic data is archived or deleted.

Users should bring examples of requests and reports with them to the session. Reports must be strictly defined and must include atomic values ​​used for any summary and summary fields.

Physical Data Model Components

The components of a physical data model are tables, columns, and relationships. Logical model entities are likely to become tables in the physical model. Boolean attributes become columns. Logical relationships will become constraints on the integrity of the relationships. Some logical relationships cannot be implemented in a physical database.

Reverse engineering

When a logical model is not available, it becomes necessary to recreate the model from the existing database. In ERwin, this process is called reverse engineering. Reverse engineering can be done in several ways. The modeler can explore the data structures in the database and recreate tables in a visual modeling environment. You can import data definitions language (DDL) into a tool that supports reverse engineering (such as Erwin). Advanced tools such as ERwin include functions that provide ODBC communication with an existing database to create a model by directly reading data structures. Reverse engineering with ERwin will be discussed in detail in a future post.

Using corporate functional boundaries

When building a logical model for the modeler, it is important to ensure that the new model is consistent with the corporate model. Using corporate functional boundaries means modeling data in terms used within a corporation. The way data is used in a corporation is changing faster than the data itself. In each logical model, data must be presented in a holistic manner, regardless of the business domain it supports. Entities, attributes, and relationships must define business rules at the corporation level.

NOTE Some of my colleagues refer to these corporate functional boundaries as real-world modeling. Real-world modeling encourages the modeler to view information in terms of its actually inherent relationships and relationships.

The use of corporate functional boundaries for a data model that is constructed appropriately provides the basis for supporting the information needs of any number of processes and applications, which enables the corporation to more efficiently exploit one of its most valuable assets, information.

What is an Enterprise Data Model?

Enterprise data model (EDM) contains entities, attributes, and relationships that represent the information needs of a corporation. EDM is usually categorized according to subject areas, which represent groups of entities related to supporting specific business needs. Some subject areas may cover specific business functions such as contract management, while others may cover entities that describe products or services.

Each logical model must correspond to the existing domain of the corporate data model. If the logical model does not meet this requirement, a domain model must be added to it. This comparison ensures that the corporate model is improved or adjusted and that all logical modeling efforts are coordinated within the corporation.

EDM also includes specific entities that define the scope of values ​​for key attributes. These entities have no parents and are defined as independent. Independent entities are often used to maintain the integrity of relationships. These entities are identified by several different names, such as code tables, reference tables, type tables, or classification tables. We will use the term "corporate business object". An enterprise business object is an entity that contains a set of attribute values ​​that are independent of any other entity. Corporate business objects should be used consistently within a corporation.

Building a corporate data model by augmenting

There are organizations where the corporate model has been built from start to finish as a result of a single concerted effort. On the other hand, most organizations create fairly complete corporate models by scaling up.

Building up means building something sequentially, layer by layer, just like an oyster grows a pearl. Each data model created provides a contribution to the formation of the EDM. Building an EDM in this way requires additional modeling steps to add new data structures and domains or augment existing data structures. This makes it possible to build an enterprise data model by augmenting, iteratively adding levels of detail and refinement.

Modeling methodology concept

There are several visual data modeling methodologies. ERwin supports two:

    IDEF1X (Integration Definition for Information Modeling - an integrated description of information models).

    IE (Information Engineering).

IDEF1X is a good methodology and the use of its notation is widespread

Integrated description of information models

IDEF1X is a highly structured data modeling methodology that extends the IDEF1 methodology adopted as a FIPS (Federal Information Processing Standards) standard. IDEF1X uses a highly structured set of modeling construct types and results in a data model that requires an understanding of the physical nature of the data before such information can be made available.

The rigid structure of IDEF1X forces the modeler to assign characteristics to entities that may not correspond to the realities of the surrounding world. For example, IDEF1X requires all entity subtypes to be exclusive. This leads to the fact that a person cannot be both a client and an employee. While real practice tells us differently.

Information Engineering

Clive Finklestein is often referred to as the father of information engineering, although similar concepts were shared with him by James Martin (Martin, James. Managing the Database Environment. Upper Saddle River, New Jersey: Prentice Hall, 1983.). Information Engineering uses a business-driven approach to information management and uses a different notation to represent business rules. IE serves as an extension and development of the notation and core concepts of the ER methodology proposed by Peter Chen.

IE provides the infrastructure to support information requirements by integrating corporate strategic planning with information systems that are being developed. This integration allows the management of information resources to be more closely aligned with the long-term strategic prospects of the corporation. This business-driven approach has led many modelers to choose IE over other methodologies that tend to focus on short-term development challenges.

IE proposes a sequence of actions that leads a corporation to identify all of its information needs for collecting and managing data and identifying relationships between information objects. As a result, information requirements are clearly articulated based on management directives and can be directly translated into a management information system that will support strategic information needs.

Conclusion

Understanding how to use a data modeling tool like ERwin is only part of the problem. In addition, you must understand when data modeling tasks are being solved and how information requirements and business rules are collected that should be represented in the data model. Conducting work sessions provides the most conducive environment for collecting information requirements in an environment that includes domain experts, users, and information technology professionals.

Building a good data model requires analyzing and researching the information requirements and business rules collected through work sessions and interviews. The resulting data model should be compared to the enterprise model, if possible, to ensure that it does not conflict with existing object models and includes all the required objects.

The data model consists of logical and physical models that represent information requirements and business rules. The logical model must be reduced to third normal form. The third normal form constrains, adds, updates, and removes data structure anomalies to support the "one fact in one place" principle. The collected information requirements and business rules should be analyzed and researched. They need to be compared to the enterprise model to ensure they do not conflict with existing object models and include all the required objects.

In ERwin, the data model includes both logical and physical models. ERwin implements the ER approach and allows you to create logical and physical model objects to represent information requirements and business rules. Logical model objects include entities, attributes, and relationships. Physical model objects include tables, columns, and constraints on the integrity of relationships.

One of the following publications will cover the issues of identifying entities, defining entity types, choosing entity names and descriptions, as well as some techniques to avoid the most common modeling errors associated with the use of entities.

Entities must have a complete set of attributes, so that every fact about each entity can be represented by its attributes. Each attribute must have a name that reflects its meaning, a Boolean data type, and a clear, short, complete description or definition. In a future blog post, we will look at an initial set of guidelines for properly formatting attribute names and descriptions. Relationships should include a verbal construction that describes the relationship between entities, along with characteristics such as plurality, necessity of existence or possibility of absence of a relationship.

NOTE Plurality relationship describes the maximum number of secondary entity instances that can be associated with an instance of the original entity.Necessity of existence or possibility of absence relationship serves to determine the minimum number of instances of a secondary entity that can be associated with an instance of the original

Increasingly, IT professionals are turning their attention to data management solutions based on industry standard data models and business decision templates. Ready-to-download complex physical data models and business intelligence reports for specific areas of activity allow you to unify the information component of the enterprise and significantly speed up the execution of business processes. Solution templates enable service providers to leverage the power of non-standard information hidden in existing systems, thereby reducing project lead times, costs and risks. For example, real-world projects show that data model and business decision templates can reduce development effort by 50%.

An industry logic model is a domain-specific, integrated, and logically structured view of all the information that needs to reside in an enterprise data warehouse to answer both strategic and tactical business questions. The main purpose of models is to facilitate orientation in the data space and help in highlighting the details that are important for business development. In modern conditions, for a successful business, it is imperative to have a clear understanding of the connections between the various components and to have a good idea of ​​the overall picture of the organization. Identification of all details and relationships using models allows the most efficient use of the time and tools of organizing the work of the company.

Data models are abstract models that describe how data is presented and accessed. Data models define data items and the relationships between them in a particular area. A data model is a navigation tool for both business and IT professionals that uses a specific set of symbols and words to accurately explain a specific class of real-world information. This allows for better communication within the organization and thus creates a more flexible and stable application environment.


An example of a “GIS for government and local government” model.

Today, it is strategically important for software and service providers to be able to quickly respond to changes in the industry associated with technological innovations, the removal of government restrictions and the complexity of supply chains. Along with the changes in the business model, the complexity and cost of the information technology required to support a company's operations increases. Data management is especially difficult in an environment where corporate information systems, as well as functional and business requirements for them, are constantly changing.

Industry data models are intended to help facilitate and optimize this process, in moving the IT approach to the modern level.

Industry data models from the companyEsri

Esri ArcGIS data models are working templates for use in GIS projects and for creating data structures for different application areas. Data model building involves creating a conceptual design, logical and physical structure that can then be used to build a personal or corporate geodatabase. ArcGIS provides tools for creating and managing a database schema, and Data Model Templates are used to quickly launch a GIS project across a variety of applications and industries. Esri has spent a significant amount of time with the user community to develop a number of templates that can provide a quick start to the design of an enterprise geodatabase. These projects are described and documented at support.esri.com/datamodels. Below, in the order in which they appear on this site, is a semantic translation of Esri's industry model names:

  • Address register
  • Agriculture
  • Meteorology
  • Basic spatial data
  • Biodiversity
  • Interior space of buildings
  • Greenhouse gas accounting
  • Maintaining administrative boundaries
  • Military establishment. Intelligence service
  • Energy (including the new ArcGIS MultiSpeak protocol)
  • Ecological structures
  • Ministry of Emergency Situations. Fire brigade
  • Forest cadastre
  • Forestry
  • Geology
  • National level GIS (e-gov)
  • Groundwater and waste water
  • Health care
  • Archeology and conservation of memorial sites
  • National security
  • Hydrology
  • International Hydrographic Organization (IHO). S-57 format for ENC
  • Irrigation
  • Land Registry
  • Municipal government
  • Nautical navigation
  • State cadastre
  • Oil and gas structures
  • Pipelines
  • Raster storage
  • Bathymetry, seabed relief
  • Telecommunications
  • Transport
  • Water supply, sewerage, housing and communal services

These models contain all the necessary features of the industry standard, namely:

  • are freely available;
  • are not tied to the technology of the “chosen” manufacturer;
  • created as a result of the implementation of real projects;
  • created with the participation of industry experts;
  • are designed to provide information interaction between various products and technologies;
  • do not contradict other standards and regulations;
  • used in completed projects around the world;
  • are designed to work with information throughout the entire life cycle of the system being created, and not the project itself;
  • expandable according to the needs of the customer without losing compatibility with other projects and / or models;
  • accompanied by additional materials and examples;
  • used in guidelines and technical materials of various industrial companies;
  • a large community of participants, while access to the community is open to everyone;
  • a large number of references to data models in publications in recent years.

Esri is part of an expert group of independent bodies that recommend various industry models, such as PODS (Pipeline Open Data Standards - an open standard for the oil and gas industry; PODS is currently being implemented as an Esri PODS Esri Spatial 5.1.1 geodatabase) or a geodatabase (GDB) from ArcGIS for Aviation, which takes into account ICAO and FAA recommendations, as well as the AIXM 5.0 navigation data exchange standard. In addition, there are recommended models that strictly adhere to existing industry standards, such as S-57 and ArcGIS for Maritime (marine and coastal features), as well as models created from the work performed by Esri Professional Services and are de facto standards in the corresponding area. For example, GIS for the Nation and Local Government influenced NSDI and INSPIRE standards, and Hydro and Groundwater (hydrology and groundwater) are heavily used in the freely available professional ArcHydro suite and commercial products. third parties. It should be noted that Esri also supports de-facto standards such as NHDI. All proposed data models are documented and ready for use in enterprise IT processes. Accompanying materials for models include:

  • UML diagrams of relationships of entities;
  • data structures, domains, directories;
  • ready-made geodatabase templates in ArcGIS GDB format;
  • sample data and sample applications;
  • examples of data loading scripts, examples of analysis utilities;
  • reference books on the proposed data structure.

Esri consolidates its experience in building industry models in the form of books and localizes published materials. The following books have been localized and published by Esri CIS:

  • Geospatial Service Oriented Architecture (SOA);
  • Designing geodatabases for transport;
  • Corporate geographic information systems;
  • GIS: new energy for electrical and gas enterprises;
  • Oil and gas on a digital map;
  • Modeling our world. Esri Geodatabase Design Guide;
  • Thinking about GIS. GIS Planning: A Manual for Managers;
  • Geographic information systems. Basics;
  • GIS for administrative and economic management;
  • Web GIS. Principles and Applications;
  • Systems Design Strategies, 26th Edition;
  • 68 issues of ArcReview magazine with publications of companies and users of GIS systems;
  • ... and many other thematic notes and publications.

For example, the book " Modeling our world ..."(translation) is a comprehensive guide and reference for GIS data modeling in general, and geodatabase data model in particular. The book shows how to come up with the right data modeling decisions, decisions that are involved in every aspect of a GIS project, from database design to data and data collection to spatial analysis and visualization Describes in detail how to design a geographic database appropriate to the project, configure database functionality without programming, manage workflow in complex projects, model various network structures such as river, transport or electrical networks, integrate satellite imagery into the process of geographic analysis and display, and create 3D models of GIS data. Book " Designing geodatabases for transportation"contains methodological approaches that have been tested on a large number of projects and fully comply with the legislative requirements of Europe and the United States, as well as international standards. And in the book" GIS: New Energy for Electric and Gas Enterprises"Using real-world examples, it shows the benefits that corporate GIS can bring to the energy supplier, including aspects such as customer service, network operations and other business processes.


Some of the books, translated and original, published in Russian by Esri CIS and DATA +. They address both conceptual issues related to GIS technology and many applied aspects of modeling and deploying GIS of various sizes and purposes.

We will consider the application of industry models using the example of the data model BISDM (Building Interior Space Data Model, information model of the internal space of a building) version 3.0. BISDM is a development of a more general BIM (Building Information Model) model and is intended for use in the design, construction, operation and decommissioning of buildings and structures. Used in GIS software, it allows you to efficiently exchange geodata with other platforms and interact with them. Refers to the general group of FM tasks (management of the organization's infrastructure). Let's list the main advantages of the BISDM model, the use of which allows:

  • organize the exchange of information in a heterogeneous environment according to uniform rules;
  • get a "physical" implementation of the BIM concept and recommended rules for construction project management;
  • to maintain by means of GIS a single repository throughout the entire life cycle of a building (from design to decommissioning);
  • coordinate the work of various specialists in the project;
  • visualize the planned schedule and construction stages for all participants;
  • give a preliminary estimate of the cost and timing of construction (4D and 5D data);
  • monitor the progress of the project;
  • ensure high-quality operation of the building, including maintenance and repairs;
  • become a part of the asset management system, including the functions of analyzing the effectiveness of the use of space (leasing, warehouse, employee management);
  • calculate and manage energy efficiency objectives of the building;
  • simulate the movement of human flows.

BISDM defines the rules for working with spatial data at the level of the internal premises in a building, including the purpose and uses, laid communications, installed equipment, accounting for repairs and maintenance, logging incidents, and relationships with other company assets. The model helps to create a unified repository of geographic and non-geographic data. The experience of the world's leading companies was used to isolate entities and model at the geodatabase (geodatabase) level of the spatial and logical relationships of all physical elements that form both the building itself and its internal premises. Following the principles of BISDM can significantly simplify the tasks of integration with other systems. The first stage is usually CAD integration. Then, during the operation of the building, data exchange with ERP and EAM systems (SAP, TRIRIGA, Maximo, etc.) is used.


Visualization of BISDM structural elements using ArcGIS.

In the case of using BISDM, the customer / owner of the facility receives an end-to-end exchange of information from the idea of ​​creating an object to the development of a complete project, control of construction with receiving up-to-date information by the time the facility is put into operation, control of parameters during operation, and even during reconstruction or decommissioning of the facility. Following the BISDM paradigm, GIS and the GDB created with its help become a common data warehouse for related systems. Often, the GDB contains data created and operated by third-party systems. This must be taken into account when designing the architecture of the system being created.

At a certain stage, the accumulated "critical mass" of information allows you to move to a new qualitative level. For example, upon completion of the design phase of a new building, it is possible to automatically visualize 3D survey models in GIS, compile a list of equipment to be installed, calculate the mileage of utilities to be laid, perform a number of checks and even give a preliminary financial estimate of the project cost.

Once again, we note that when BISDM and ArcGIS are used together, it becomes possible to automatically build 3D models from the accumulated data, since the geodatabase contains a complete description of the object, including z-coordinates, floor membership, types of element connections, equipment installation methods, material, available paths personnel movement, functional purpose of each element, etc. etc. It should be noted that after the initial import of all design materials into the BISDM GDB, there is a need for additional information content for:

  • placement of 3D models of objects and equipment in designated places;
  • collecting information about the cost of materials and the procedure for their laying and installation;
  • control of permeability according to the dimensions of the installed non-standard equipment.

Due to the use of ArcGIS, it is easier to import additional 3D objects and references from external sources, because the ArcGIS Data Interoperability module allows you to create procedures for importing such data and placing it correctly within the model. All formats used in the industry are supported, including IFC, AutoCAD Revit, Bentlye Microstation.

Industry data models from IBM

IBM provides a set of storage management tools and models for a variety of business areas:

  • IBM Banking and Financial Markets Data Warehouse (finance)
  • IBM Banking Data Warehouse
  • IBM Banking Process and Service Models
  • IBM Health Plan Data Model (healthcare)
  • IBM Insurance Information Warehouse (insurance)
  • IBM Insurance Process and Service Models
  • IBM Retail Data Warehouse (retail)
  • IBM Telecommunications Data Warehouse (telecommunications)
  • InfoSphere Warehouse Pack:
    - for Customer Insight (for understanding customers)
    - for Market and Campaign Insight (for understanding the company and the market)
    - for Supply Chain Insight (for understanding suppliers).

For example, the model IBMBankingandFinancialMarketsDataWarehouse is designed to address the specific problems of the banking industry in terms of data, and IBMBankingProcessandServiceModels- in terms of processes and SOA (Service Oriented Architecture). For the telecommunications industry, models are presented IBMInformationFrameWork (IFW) and IBMTelecommunicationsDataWarehouse (TDW)... They help to significantly speed up the process of creating analytical systems, as well as reduce the risks associated with the development of business intelligence applications, corporate data management and organization of data warehouses, taking into account the specifics of the telecommunications industry. The capabilities of IBM TDW cover the entire spectrum of the telecommunications market - from Internet providers and cable network operators offering wire and wireless telephony services, data transmission and multimedia content, to multinational companies providing telephone, satellite, long distance and international communication services, as well as organizations global networks. Today TDW is used by large and small wire and wireless service providers around the world.

A tool called InfoSphere Warehouse Pack for Customer Insight provides structured and easy-to-deploy business content for a growing number of business projects and industries, including banking, insurance, finance, health insurance, telecommunications, retail and distribution. For business users InfoSphere Warehouse Pack for Market and Campaign Insight helps to maximize the efficiency of market analysis activities and marketing campaigns through a step-by-step process of developing and taking into account the specifics of the business. By using InfoSphere Warehouse Pack for Supply Chain Insight organizations have the ability to receive current information on supply chain operations.


Esri's position within the IBM solution architecture.

Particularly noteworthy is the IBM approach to utilities and utilities. To meet the growing demands of consumers, utilities need a more flexible architecture than those in use today, as well as an industry standard object model to facilitate the free flow of information. This will enhance the communication capabilities of utilities, enabling more cost-effective interoperability, and giving new systems better visibility of all the resources they need, no matter where they are located within the organization. The basis for this approach is SOA (Service Oriented Architecture), a component model that maps the functions of departments and services of various applications that can be reused. The "services" of such components exchange data through interfaces without rigid binding, hiding from the user all the complexity of the systems behind them. In this mode, enterprises can easily add new applications regardless of the software vendor, operating system, programming language, or other intrinsic characteristics of the software. Based on SOA, the concept is being implemented SAFE ( Solution Architecture for Energy), it allows the utility company to obtain a standards-based, holistic view of its infrastructure.

Esri ArcGIS® is a globally recognized software platform for geographic information systems (GIS), which provides the creation and management of digital assets of electric power, gas transmission, distribution, and telecommunication networks. ArcGIS allows you to carry out the most complete inventory of the components of the electrical distribution network, taking into account their spatial location. ArcGIS dramatically extends the IBM SAFE architecture by providing the tools, applications, workflows, analytics, and information integration capabilities needed to manage a smart energy enterprise. ArcGIS within the framework of IBM SAFE allows you to receive information from various sources about infrastructure facilities, assets, customers and employees with accurate data about their location, as well as create, store and process georeferenced information about enterprise assets (supports, pipelines, wires, transformers, cable ducts etc.). ArcGIS within the SAFE infrastructure dynamically connects core business applications by combining data from GIS, SCADA, and customer service systems with external information such as traffic intensity, weather conditions, or satellite imagery. Utilities use this combined information for a variety of purposes, from S.O.R. (the overall picture of the operational environment) to site inspection, maintenance, network analysis and planning.

The information components of a utility company can be modeled using several levels that range from the lowest - physical - to the highest, most complex level of business logic. These layers can be integrated to meet typical industry requirements such as automated measurement logging and SCADA management. By building the SAFE architecture, utilities are making significant strides in promoting an industry-wide open object model called the Common Information Model (CIM) for Energy and Utilities. This model provides the necessary basis for moving many enterprises towards a service-oriented architecture as it encourages the use of open standards for structuring data and objects. Due to the fact that all systems use the same objects, the confusion and inelasticity associated with different implementations of the same objects will be reduced to a minimum. Thus, the definition of the client object and other important business objects will be unified across all systems of the power supply company. Now, with CIM, service providers and service consumers can share a common data structure, making it easier to outsource expensive business components as CIM establishes a common base on which to build information exchange.

Conclusion

Comprehensive industry data models provide companies with a single, integrated view of their business information. Many companies find it difficult to integrate their data, although this is a prerequisite for most enterprise-wide projects. According to a study by The Data Warehousing Institute (TDWI), more than 69% of organizations surveyed found integration to be a significant barrier to new applications adoption. On the contrary, the implementation of data integration generates tangible revenue and efficiency gains for the company.

A well-constructed model uniquely identifies the meaning of the data, which in this case is structured data (as opposed to unstructured data such as an image, binary file, or text, where the meaning can be ambiguous). The most effective are the industry models offered by professional vendors such as Esri and IBM. The high return on use of their models is achieved due to the significant level of detail and accuracy. They usually contain many data attributes. In addition, both Esri and IBM have extensive modeling experience and are well versed in building industry-specific models.


DB architecture

The KMD schema is a description of the structure of the data model from the point of view of the administrator.

An AMD schema is a description of an internal or physical model. This is where the description of the physical location of the data on the media is stored. The schema stores direct indications of the location of data in memory (volumes, disks).

The KMD schema describes the structure of data, records and fields.

All DBMSs support three main types of data models:

1. Hierarchical model. It assumes some kind of root entry. Branches come from the roots.

Not all objects are conveniently described in this way. There are no links in the hierarchy and there is a large redundancy of information.

2. Network model. Allows you to correctly display all the complexities of relationships.

The model is convenient for representing links with data from the external environment, but less convenient for describing in a database, which leads to additional work for the user to study the navigation through links.

3. The relational model. It is based on the mathematical term Relation - a relation, and simply - a table. For example, rectangular two-dimensional.

The relational data structure was developed in the late 1960s by a number of researchers, of whom the most significant contribution was made by IBM employee Edgar Codd. With the relational approach, data is presented in the form of two-dimensional tables - the most natural for humans. At the same time, for data processing, Codd suggested using the apparatus of set theory - union, intersection, difference, Cartesian product.

Data type- this concept has the same meaning as in programming languages ​​(i.e., the data type determines the internal representation in the computer memory and the way of storing the data instance, as well as the set of values ​​that the data instance can take and the set of valid data operations). All existing modern databases support special data types designed to store data of integer type, fractional floating point, characters and strings, calendar dates. Many database servers have other types implemented, for example, the Interbase server has a special data type for storing large arrays of binary information (BLOB).

Domain Is a potential set of values ​​of a simple data type, it resembles a data subtype in some programming languages. A domain is defined by two elements - a data type and a boolean expression that is applied to the data. If this expression evaluates to true, then the data instance belongs to the domain.

Attitude Is a two-dimensional table of a special kind, consisting of a header and a body.

Heading Is a fixed set of attributes, each of which is defined on some domain, and there is a one-to-one correspondence between the attributes and the defining domains.


Each of the attributes is defined on its own domain. The domain is the integer data type, and the boolean condition is n> 0. The title is immutable over time, in contrast to the body of the relationship. Relationship body Is a collection tuples, each of which is an attribute-value pair.

The power of the relationship is the number of its tuples, and degree of attitude- the number of attributes.

The degree of the ratio is constant for a given ratio, while the power of the ratio changes over time. The power of the relationship is also called the cardinal number.

The above concepts are theoretical and are used in the development of language tools and software systems of relational DBMS. In everyday work, their informal equivalents are used instead:

attitude - table;

attribute - column or field;

tuple - record or string.

Thus, the degree of the ratio is the number of columns in the table, and the cardinal number is the number of rows.

Since a relation is a set, and in classical set theory, by definition, a set cannot contain coinciding elements, a relation cannot have two identical tuples. Therefore, for a given relationship, there is always a set of attributes that uniquely identify a tuple. This set of attributes is called key.

The key must meet the following requirements:

· Must be unique;

· Must be minimal, that is, removing any attribute from the key leads to a violation of uniqueness.

As a rule, the number of attributes in the key is less than the degree of the relationship, however, as a last resort, the key can contain all the attributes, since the combination of all attributes satisfies the uniqueness condition. Typically, a relationship has multiple keys. Of all the keys of the relationship (they are also called "possible keys"), one is chosen as primary key... When choosing primary key the key with the fewest attributes is usually preferred. It is also impractical to use keys with long string values.

In practice, a special numeric attribute, an auto-incremental zero, is often used as a primary key, the value of which can be generated by a trigger (a trigger is a special procedure called when changes are made to the database) or by special means defined in the DBMS engine.

The basic concepts described in this chapter are not specific to any particular database implementation, but are common to all of them. Thus, these concepts are the basis of a certain general model, which is called the relational data model.

The founder of the relational approach, Date established that the relational model has three parts:

· Structural;

· Manipulative;

· Holistic.

In the structural part of the model, relations are fixed as the only data structure used in the relational model.

In the manipulation part, two basic mechanisms for manipulating relational bases are fixed - relational algebra and relational calculus.

An integral part is understood as a certain mechanism to ensure the non-destructibility of data. The integral part contains two basic requirements for the integrity of relational databases - entity integrity and referential integrity.

Requirement entity integrity is that any tuple of any relation must be distinguishable from any other tuple of this relation, that is, in other words, any relation must have a primary key. This requirement must be met if the basic properties of the relationship are met.

In the data manipulation language, as well as in the query language, a mathematical apparatus called the algebra of relations is executed, for the following actions are defined:

1. Standard operations: - intersection, - union, \ - difference, X - Cartesian product.

2. Specific: projection, limitation, connection, division.

a. Union.

ShD SHM EI NR

R 1 (part number, material number, units of measure, consumption rate)

R 2 (ШД, ШМ, ЕИ, НР)

Need to find

Attachment of the sets R 1 and R 2 is assumed. In this operation, the degree is conserved, and the cardinality of the result set

b. Intersection.

Highlight matching lines.

c. Difference.

Eliminate tuples from R 1 that coincide with R 2.

d. Cartesian product.

This is where tuples are concatenated.

Each line of one set concatenates with each line of the other.

Two sets are given:

The Cartesian product is as follows:

In this case, the S-degree is equal to, and, i.e. you get 12 rows and 5 columns.

The corporate database is the central link of the corporate information system and allows you to create a single information space for the corporation. Corporate databases


Share your work on social media

If this work did not suit you at the bottom of the page there is a list of similar works. You can also use the search button

TOPIC V. CORPORATE DATABASES

V .1. Organization of data in corporate systems. Corporate databases.

V .2. DBMS and structural solutions in corporate systems.

V .3. Internet / Intranet technologies and corporate solutions for database access.

V .1. ORGANIZATION OF DATA IN CORPORATE SYSTEMS. CORPORATE DATABASES

Corporate base data is the central link of the corporate information system and allows you to create a single information space for the corporation. Corporate databases (Figure 1.1).

There are various definitions of databases.

Under the database (DB) understand a set of information logically connected in such a way as to make up a single set of data stored in the memory devices of a computer. This set acts as the initial data of the tasks solved in the process of functioning of automated control systems, data processing systems, information and computing systems.

The term database can be summarized as a collection of logically related data intended for sharing.

Under the database is understood as a set of data stored together with such a minimum redundancy that allows them to be used in an optimal way for one or more applications.

The purpose of creating databases as forms of data storageconstruction of a data system that does not depend on the adopted algorithms (software), the technical means used, the physical location of the data in the computer. The database assumes multipurpose use (several users, many forms of documents and requests of one user).

Basic requirements for databases:

  • Completeness of data presentation. The data in the database should adequately represent all the information about the object and they should be sufficient for ODS.
  • Database integrity. The data must be saved during the processing of their ODS and in any situations that arise in the course of work.
  • Data structure flexibility. The database should allow changing data structures without violating its integrity and completeness when external conditions change.
  • Feasibility. This means that there must be an objective representation of various objects, their properties and relationships.
  • Availability. It is necessary to ensure the delimitation of access to data.
  • Redundancy. The database should have a minimum redundancy in the representation of data about any object.

Knowledge is understood as a set of facts, patterns and heuristic rules that can be used to solve the problem.

Knowledge base (KB)  a set of databases and used rules obtained from decision-makers. The knowledge base is an element of expert systems.

Distinguish different ways of presenting data.

Physical data - it is data stored in the computer's memory.

Logical data representation corresponds to a custom view of physical data. The difference between physical and corresponding logical representations of data is that the latter reflects some important relationships between physical data.

Under the corporate database understand a database that unites in one form or another all the necessary data and knowledge about the organization being automated. In corporate information systems, such a concept asintegrated databases, in which the principle of single input and repeated use of information is implemented.

Rice. 1.1. The structure of the interaction of departments with the information resources of the corporation.

Corporate databases are focused (centralized) and distributed.

Lumped (centralized) database is a database, the data of which is physically stored in the storage devices of one computer. In fig. 1.2 presents a diagram of a server application for accessing databases in various platforms.

Figure 1.2. Scheme heterogeneous centralized database

Centralization of information processing made it possible to eliminate such disadvantages of traditional file systems as data incoherence, inconsistency and redundancy. However, as databases grow, and especially when used in geographically dispersed organizations, problems arise. For example, for concentrated databases located at the node of a telecommunication network, with the help of which various departments of the organization gain access to data, with the growth of the volume of information and the number of transactions, the following difficulties arise:

  • Large flow of data exchange;
  • High traffic on the network;
  • Low reliability;
  • Poor overall performance.

While it is easier to ensure the security, integrity, and consistency of information in a centralized database during updates, these problems create certain difficulties. Data decentralization is proposed as a possible solution to these problems. Decentralization achieves:

  • Higher degree of simultaneity of processing due to load balancing;
  • Improving the use of data in the field when performing remote (remote) queries;
  • Lower costs;
  • Ease of managing local databases.

The costs of creating a network, in the nodes of which workstations (small computers) are located, are much lower than the costs of creating a similar system using a large computer. Figure 1.3 shows the logical diagram of a distributed database.

Figure 1.3. Distributed corporation database.

Let's give the following definition of a distributed database.

Distributed database - it is a collection of information, files (relations) stored in different nodes of the information network and logically connected in such a way as to make up a single set of data (communication can be functional or through copies of the same file). Thus, it is a set of databases connected logically, but physically located on several machines that are part of the same computer network.

The most important performance requirements for a distributed database are:

  • Scalability;
  • Compatibility;
  • Support for various data models;
  • Portability;
  • Location transparency;
  • Autonomy of distributed database nodes (Site Autonomy);
  • Distributed request processing;
  • Execution of distributed transactions.
  • Support for a homogeneous security system.

Location transparency allows users to interact with databases without knowing anything about their location. The autonomy of distributed database nodes means that each database can be maintained independently of the others. A distributed query is a query (SQL statement), during the execution of which objects (tables or views) of different databases are accessed. When executing distributed transactions, concurrency control of all involved databases is carried out. Oracle7 uses two-phase information transfer technology to perform distributed transactions.

The databases that make up a distributed database do not have to be homogeneous (i.e., be maintained by one DBMS) or processed in the environment of the same operating system and / or on computers of the same type. For example, one database can be an Oracle database on a SUN machine running SUN OS (UNIX), a second database can be hosted by a DB2 database on an IBM 3090 mainframe with an MVS operating system, and a third database can be maintained by SQL / DS also on the IBM mainframe, but with the VM operating system. Only one condition is required - all machines with databases must be accessible over the network they are part of.

The main task of a distributed database - distribution of data over the network and providing access to it. There are the following ways to solve this problem:

  • Each node stores and uses its own dataset that is available for remote queries. This distribution is divided.
  • Some data frequently used at remote sites may be duplicated. This distribution is called partially duplicated.
  • All data is duplicated at each node. This distribution is called fully duplicated.
  • Some files can be split horizontally (a subset of records is selected) or vertically (a subset of attribute fields is selected), while the selected subsets are stored in different nodes along with unsplit data. This distribution is called split (fragmented).

When creating a distributed database, at the conceptual level, you have to solve the following tasks:

  • It is necessary to have a single conceptual diagram of the entire network. This will provide logical transparency of data for the user, as a result of which he will be able to form a request to the entire database, being behind a separate terminal (it seems to work with a centralized database).
  • A schema is needed to locate the data on the network. This will provide transparency of data placement, thanks to which the user does not have to specify where to send the request to get the required data.
  • It is necessary to solve the problem of heterogeneity of distributed databases. Distributed databases can be homogeneous or heterogeneous in terms of hardware and software. The problem of heterogeneity is relatively easy to solve if the distributed database is heterogeneous in terms of hardware, but homogeneous in terms of software (the same DBMS in the nodes). If different DBMSs are used in the nodes of a distributed system, tools for converting data structures and languages ​​are required. This should provide transparent transformation across the nodes of the distributed database.
  • It is necessary to solve the problem of dictionary management. To provide all kinds of transparency in a distributed database, you need programs that manage numerous dictionaries and reference books.
  • You need to define methods for executing queries in a distributed database. The methods for executing queries in a distributed database differ from those in centralized databases, since individual parts of the queries must be executed at the location of the corresponding data and partial results must be passed to other nodes; at the same time, coordination of all processes must be ensured.
  • It is necessary to solve the problem of parallel query execution. In a distributed database, a sophisticated concurrency control mechanism is needed, which, in particular, must ensure synchronization when information is updated, which ensures data consistency.
  • A developed methodology for the distribution and placement of data, including splitting, is one of the main requirements for a distributed database.

One of the actively developing new areas of architecture of computing systems, which is a powerful tool for non-numerical information processing, are database machines... Database machines are used to solve non-numeric tasks such as storing, searching and transforming documents and facts, and working with objects. Following the definition of data as digital and graphic information about objects of the surrounding world, different content is embedded in the concept of data in numerical and non-numerical processing. Numerical processing uses objects such as variables, vectors, matrices, multidimensional arrays, constants, and so on, while non-numeric processing uses objects such as files, records, fields, hierarchies, networks, relationships, etc. non-numeric processing is interested directly in information about objects (for example, a specific employee or a group of employees), and not in the file of employees as such. The file of employees is not indexed here to select a specific person; here the content of the desired entry is more interesting. Large amounts of information are usually subjected to non-numerical processing. In various applications, you can perform, for example, the following operations on this data:

  • increase the salary of all employees of the company;
  • calculate the bank interest on the accounts of all clients;
  • make changes to the list of all goods in stock;
  • find the required abstract from all texts stored in the library or in the bibliographic information retrieval system;
  • find a description of the required contract in a file containing legal documents;
  • browse all files containing descriptions of patents and find a patent (if any) similar to the proposed one again.

To implement the database engine, parallel and associative architecture as an alternative to uniprocessorvon Neumannstructure, allowing to work with large amounts of information in real time.

Database machines are gaining importance in connection with the research and application of artificial intelligence concepts such as knowledge representation, expert systems, inference, pattern recognition, etc.

Information storages. Today, many admit that already now, most companies operate several databases and, for successful work with information, not only different types of databases are required, but different generations of DBMSs. According to statistics, each organization uses an average of 2.5 different DBMSs. It became obvious the need to "isolate" the business of companies, or rather, the people involved in this business, from the technological features of databases, to provide users with a single view of corporate information, regardless of where it is physically stored. This stimulated the emergence of information storage technology ( Data Warehousing, DW).

The main purpose of DW is creation of a single logical representation of data contained in different types of databases, or, in other words, a single corporate data model.

The new round of DW development became possible due to the improvement of information technologies in general, in particular, the emergence of new types of databases based on parallel query processing, which in turn relied on advances in parallel computers. Were created query builderswith an intuitive graphical interface, which made it easy to build complex database queries. Various softwaremiddle layer (midleware)provided communicationbetween different types of databases, and finally fell sharplystorage devices.

A data bank may be present in the structure of a corporation.

Database - functional and organizational component in automated control systems and information and computing systems, providing centralized information support for a group of users or a set of tasks solved in the system.

Database is considered as an information and reference system, the main purpose of which is:

  • in the accumulation and maintenance in working order of the aggregate of information that constitutes the information base of the entire automated system or a certain set of tasks solved in it;
  • in the issuance of the data required by the task or user;
  • in providing collective access to stored information;
  • in ensuring the necessary management of the use of information contained in the information base.

Thus, a modern databank is a complex software and hardware complex, which includes technical, system and network tools, databases and DBMS, information retrieval systems for various purposes.

V .2. DBMS AND STRUCTURAL SOLUTIONS IN CORPORATE SYSTEMS

Database and knowledge management systems

An important component of modern information systems are database management systems (DBMS).

DBMS - a set of software and language tools intended for the creation, maintenance and use of databases.

The database management system provides access of data processing systems to databases. As already noted, DBMSs acquire an important role in the creation of corporate information systems and, a particularly important role, in the creation of information systems using distributed information resources based on modern network computer technologies.

The main feature of modern DBMSs is that modern DBMSs support technologies such as:

  • Client / server technology.
  • Database languages ​​support. itschema definition language DB (SDL - Schema Definition Language),data manipulation language (DML), integrated languages SQL (Structured Queue Language), QDB (Query - By - Example) and QMF (Query Management Facility ) Is an advanced peripheral query specification and reporting tool for DB 2, etc .;
  • Direct data management in external memory.
  • Management of RAM buffers.
  • Transaction management. OLTP - technology (On-Line Transaction Processing), OLAP - technology (On-Line Analysis Processing) for DW.
  • Ensure data protection and integrity. The use of the system is allowed only to users who have the right to access the data. When users perform operations on data, the consistency of the stored data (integrity) is maintained. This is important in corporate multi-user information systems.
  • Journalization.

Modern DBMS must ensure compliance with the database requirements listed above. In addition, they must comply with the following principles:

  • Data independence.
  • Versatility. The DBMS must have powerful conceptual data model support for displaying custom logical views.
  • Compatibility. The DBMS must remain operational with the development of software and hardware.
  • Redundancy of data. Unlike file systems, a database must be a single collection of integrated data.
  • Data protection. The DBMS must provide protection against unauthorized access.
  • Data integrity. The DBMS must prevent users from breaking the database.
  • Management of simultaneous work. The DBMS must protect the database from inconsistencies in the shared access mode. To ensure a consistent state of the database, all user requests (transactions) must be executed in a specific order.
  • The DBMS must be universal. It should support different data models on a single logical and physical basis.
  • The DBMS should support both centralized and distributed databases and, thus, become an important link in computer networks.

Considering a DBMS as a class of software products aimed at maintaining databases in automated systems, we can distinguish two most essential features that determine the types of DBMS. According to them, a DBMS can be viewed from two points of view:

  • their capabilities in relation to distributed (corporate) databases;
  • their relationship to the type of the data model implemented in the DBMS.

In relation to corporate (distributed) databases, the following types of DBMS can be conditionally distinguished:

  • DBMS "desktop". These products are primarily focused on working with personal data ("desktop" data). They have command sets for sharing common databases, but small in size (like a small office). First of all, it is a DBMS such as Assess, dBASE, Paradox, EohPgo. Why Assess, dBASE, Paradox, EohPgo have poor access to corporate data. The point is, there is no easy way to overcome the barrier between personal and corporate data. And the point is not even that the mechanism of the personal data (or small office) DBMS is focused on accessing data through many gateways, internetworking products, etc. The problem is that these mechanisms are usually associated with complete file transfers and a lack of forked index support, resulting in server queues virtually stalling on large systems.
  • Specialized high-performance multi-user DBMS. Such DBMSs are characterized by the presence of a multiuser system kernel, a data manipulation language and the following functions typical for developed multiuser DBMSs:
  • organization of the buffer pool;
  • the presence of a system for processing queues of transactions;
  • the presence of mechanisms for multi-user data locking;
  • transaction logging;
  • availability of access control mechanisms.

These are DBMS like Oracle, DB2, SQL / Server, Informix, Sybase, ADABAS, Titanium and others provide a wide service for processing corporate databases.

When working with databases, the transaction mechanism is used.

Transaction Is a logical unit of work.

Transaction is a sequence of data manipulation statements executedas a whole(all or nothing) and translating databasefrom one holistic state to another holistic state.

A transaction has four important properties known as ASID properties:

  • (A) Atomicity ... A transaction is executed as an atomic operation - either the entire transaction is executed, or it is not executed entirely.
  • (C) Consistency... A transaction moves a database from one consistent (consistent) state to another consistent (consistent) state. Within a transaction, database consistency can be violated.
  • (I) Insulation ... Transactions of different users should not interfere with each other (for example, as if they were executed strictly in turn).
  • (E) Durability... If the transaction is completed, then the results of its work should be saved in the database, even if the next moment the system crashes.

The transaction usually starts automatically from the moment the user connects to the DBMS and continues until one of the following events occurs:

  • Command COMMIT WORK issued.
  • The ROLLBACK WORK command was issued.
  • The user has disconnected from the DBMS.
  • There was a failure of the system.

For the user, she usually wears atomic character... In fact, this is a complex user (application) - database interaction mechanism. Enterprise systems software uses a real-time transaction processing engine (On-lineTransaction Processing Systems, OLTP), in particular accounting programs, software for receiving and processing client orders, financial applications, produce a lot of information. These systems are designed (and appropriately optimized) to handle large amounts of data, complex transactions, and intensive read / write operations.

Unfortunately, the information placed in the databases of OLTP systems is not very suitable for use by ordinary users (due to the high degree of normalization of tables, specific data presentation formats, and other factors). Therefore, data from different information pipelines is sent (in the sense of being copied) to storage warehouse, sorting and subsequent delivery to the consumer. In information technology, the role of warehouses is played byinformation storages.

Delivery of information to the end user - real-time analytical data processing systems (On-line Analytical Processing, OLAP)that provide extremely easy access to data through convenient means of generating queries and analyzing results. In OLAP systems, the value of an information product increases due to the use of various methods of analysis and statistical processing. In addition, these systems are optimized in terms of the speed of data extraction, collection of generalized information and are aimed at ordinary users (they have an intuitive interface). If OLTP system gives answers to simple questions like "what was the level of sales of product N in region M in January 199x?", then OLAP systems ready for more complex user requests, for example: "To provide an analysis of sales of product N in all regions according to the plan for the second quarter in comparison with the two previous years."

Client / server architecture

In modern systems distributed information processing, technology takes center stage client / server. In system client-server architecturesdata processing is divided between the client computer and the server computer, communication between which takes place over the network. This separation of data processing is based on the grouping of functions. Typically, a database server computer is dedicated to performing database operations, and a client computer runs application programs. Figure 2.1 shows a simple client-server architecture system that includes a computer acting as the server and another computer acting as its client. Each machine performs different functions and has its own resources.

Database

Server computer

Network

IBM compatible PC

IBM compatible PC

IBM compatible PC

Applications

Rice. 2.1. Client-server architecture system

The main function of the client computer is to execute the application (user interface and presentation logic) and communicate with the server when required by the application.

Server Is an object (computer) that provides services to other objects at their request.

As follows from the term itself, the main function of the server computer is to serve the needs of the client. The term "Server" is used to refer to two different groups of functions: a file server and a database server (hereinafter, these terms mean, depending on the context, either software that implements the specified groups of functions, or computers with this software). File servers are not designed to perform database operations, their main function is to share files between multiple users, i.e. providing simultaneous access of many users to files on the computer - file server. An example of a file server is Novell's NetWare operating system. The database server can be installed and operated on a file server computer. Oracle DBMS in the form of NLM (Network Loadable Module) is executed in the NetWare environment on the file server.

The local network server must have the resources appropriate to its functional purpose and the needs of the network. Note that due to the focus on the open systems approach, it is more correct to talk about logical servers (meaning a set of resources and software that provide services over these resources), which are not necessarily located on different computers. A feature of a logical server in an open system is that if, for reasons of efficiency, it is advisable to move the server to a separate computer, then this can be done without the need for any modification, both of itself and of the applications that use it.

One of the important server requirements is that the operating system hosting the database server must be multitasking (and preferably, but not necessarily, multiuser). For example, an Oracle DBMS installed on a personal computer with an MS-DOS (or PC-DOS) operating system that does not meet the multitasking requirement cannot be used as a database server. And the same Oracle database installed on a computer with a multitasking (though not multiuser) OS / 2 operating system can be a database server. Many flavors of UNIX, MVS, VM and some other operating systems are both multitasking and multi-user.

Distributed computing

The term "distributed computing" is often used to refer to two different, albeit complementary, concepts:

  • Distributed database;
  • Distributed data processing.

The application of these concepts makes it possible to organize access to information stored on multiple machines for end users using different tools.

There are many types of servers:

  • Database server;
  • Print server;
  • Remote access server;
  • Fax server;
  • Web server, etc.

At the heart of the underlying technology is Client / Server are such basic technologies as:

  • Operating system technologies, the concept of interaction of open systems, the creation of object-oriented environments for the functioning of programs;
  • Telecommunication technologies;
  • Network technologies;
  • Graphical User Interface Technologies ( GUI);
  • Etc.

Advantages of client-server technology:

  • Client / server technology allows computing on heterogeneous computing environments. Platform independence: Access to heterogeneous network environments that include different types of computers with different operating systems.
  • Independence from data sources: access to information from heterogeneous databases. Examples of such systems are DB2, SQL / DS, Oracle, Sybase.
  • Load balance between client and server.
  • Perform computation where it is most efficient;
  • Provide the ability to scale efficiently;
  • Cross-platform computing... Cross-platform computing is simply defined as the implementation of technologies in heterogeneous computing environments. The following possibilities should be provided here:
  • The application must run on multiple platforms;
  • It must have the same interface and logic on all platforms;
  • The application must integrate with the native operating environment;
  • It should behave the same on all platforms;
  • Simple and consistent support should be provided for it.

Distributed computing. Distributed computing involves the distribution of work among several computers (although distributed computing is a broader concept).

Downsizing. Downsizing is the porting of mainframe applications to small computer platforms.

  • Reduced infrastructure and hardware costs. Economical: The availability of inexpensive computing equipment and the increasing proliferation of local area networks make client-server technology more economical than other data processing technologies. The equipment can be upgraded as soon as the need arises.

Reducing the overall execution time of the application;

Reducing client memory usage;

Reducing network traffic.

  • Ability to work with multimedia: to date, many multimedia programs have been created for the PC. There are no such programs for the terminal-host configuration, or they are very expensive.
  • The ability to attract large computing resources for database operations: since applications are executed on client computers, additional (compared to the terminal-host configuration) resources, such as computational resources of the central processor and operational memory.
  • Better Programmer Productivity: Programmer productivity is increased by using tools such as SQL * Forms and CASE, which allow you to develop applications faster than programming languages ​​such as C, PL1, or COBOL.
  • Increased end-user productivity: Nowadays, many end-users have mastered systems such as Lotus, Paradox, Word Perfect, Harvard Graphics, etc.

The server-side interface is defined and fixed. Therefore, it is possible to create new client parts of an existing system (an example of interoperability at the system level).

Rice. 2.2. Illustration of client access to a server share.

How to implement client-server technology

The following discussion discusses the installation of a system based on client-server technology and capable of performing distributed data processing. The following computer hardware and software is required:

  • database server computer;
  • client computers;
  • communication network;
  • network software;
  • application software.

SQL language ... High Level Query Language - SQL (Structured Query Language ) serves to implement queries to databases, such as YAMD, YOD and PNP and is adopted as a standard. Language SQL was originally adopted as the data language of the company's software products IBM and YAMD relational DBMS SYSTEM R from IBM ... An important feature of the language SQL is that the same language is presented through two different interfaces, namely: through an interactive interface and through an application programming interface (dynamic SQL). Dynamic SQL consists of many built-in language features SQL , provided specifically for the construction of interactive applications, where an interactive application is understood as a program that is written to support access to the database of the end user working on the interactive terminal. Language SQL provides the functions of defining, manipulating and managing database data and is transparent to the user from the point of view of the implemented DBMS.

Rice. 2.3. Scheme for executing user queries to distributed databases.

The internal structure of the databases is determined by the data models used. The conceptual model has more abstraction capabilities and richer semantics than external models. External models are often referred to as syntactic or operational models, referring to the syntactic nature of control and use as a means of user interaction with the database. In Information Modeling, there are different levels of abstraction, from the conceptual model to the physical data model, that affect the architecture of the DBMS.

The data model has three components:

  • The data structure to represent from the user's point of view of the database.
  • Valid operations performed on the data structure. It is necessary to be able to work with this structure with the help of various NOD and NMD operations. A rich structure is worthless if there is no way to manipulate its contents.
  • Integrity Control Constraints. The data model should be provided with means to maintain its integrity and protect it. As an example, consider the following two constraints:
  • Each subtree must have a source node. In hierarchical databases, you cannot store child nodes without a source node.
  • With respect to a relational database, there cannot be identical tuples. For a file, this requirement requires all records to be unique.

One of the most important characteristics of a DBMS is the ability to link objects.

There are the following types of links between objects:

  • One-to-One (1: 1)... One object of one set can be associated with one object of another set.
  • One-to-Many (1: M)... One object of one set can be associated with many objects of another set.
  • Many-to-Many (M: N)... One object of one set can be associated with many objects of another set, but at the same time one object of another set can be associated with many objects of the first set.
  • Ramified ... One object of one set can be associated with objects of many sets.
  • Recursive ... One object of a given set can be linked by an object of the same set.

The following basic data models exist:

  • Relational data model.
  • Hierarchical data model.
  • Incomplete network data model.
  • CODASYL data model.
  • Extended network data model.

V .3. INTERNET / INTRANET TECHNOLOGIES AND CORPORATE DATABASE ACCESS SOLUTIONS

The main problem of systems based on the client-server architecture is that, in accordance with the concept of open systems, they are required to be mobile in the widest possible class of hardware and software solutions of open systems. Even if we limit ourselves to UNIX-based local area networks, different networks use different equipment and communication protocols. Attempts to create systems that support all possible protocols lead to their overload with network details to the detriment of functionality.

An even more complex aspect of this problem is associated with the possibility of using different representations of data in different nodes of a heterogeneous local network. Different computers may have different addressing, number representation, character encoding, etc. This is especially important for high-level servers: telecommunications, computing, databases.

A common solution to the problem of mobility in systems based on a client-server architecture is to rely on software packages that implement Remote Procedure Call (RPC) protocols. With these tools, a call to a service at a remote site looks like a normal procedure call. RPC tools, which naturally contain all the information about the specifics of the local network hardware and network protocols, translates the call into a sequence of network interactions. Thus, the specifics of the network environment and protocols are hidden from the application programmer.

When a remote procedure is called, RPC programs convert client data formats to intermediate machine-independent formats and then convert to server data formats. When passing the response parameters, similar transformations are performed.

Other similar works that may interest you. Wshm>

6914. Database concept 11.56 KB
The database is presented in an objective form, a set of independent materials of articles of calculations of normative acts of court decisions and other similar materials systematized in such a way that these materials can be found and processed using an electronic computer Civil Code of the Russian Federation Art. A database organized in accordance with certain rules and maintained in the computer memory is a set of data characterizing the current state of some ...
8064. Distributed databases 43.66 KB
Distributed databases A distributed database RDB is understood as a set of logically interconnected shared data that are physically distributed over different nodes of a computer network. Data access should not depend on the presence or absence of data replicas. The system should automatically determine the methods for performing the data fusion connection, a network channel capable of handling the volume of transmitted information and a node having sufficient processing power to connect tables. The RDBMS must be capable of ...
20319. DATABASES AND THEIR PROTECTION 102.86 KB
Online online databases emerged in the mid-1960s. Operations on operational databases were processed interactively using terminals. Simple index-sequential record organizations quickly evolved to a more powerful set-oriented record model. Charles Bachmann received the Turing Prize for leading the Data Base Task Group (DBTG), which developed a standard language for data description and data manipulation.
5031. Database Development Library 11.72 MB
Database design technology. Determining relationships between entities and creating a data model. The main ideas of modern information technology are based on the concept according to which data should be organized into databases in order to adequately reflect the changing real world and meet the information needs of users. These databases are created and operate under the control of special software systems called database management systems DBMS.
13815. DATABASE HIERARCHICAL MODEL 81.62 KB
The main ideas of modern information technology are based on the concept of databases, according to which the basis of information technology is data organized in databases that adequately reflect the state of a particular subject area and provide the user with relevant information in this subject area. It must be recognized that the data is ...
14095. Library database development 11.72 MB
The increase in the volume and structural complexity of the stored data, the expansion of the circle of users of information systems have led to the widespread use of the most convenient and relatively easy to understand relational (tabular) DBMS.
5061. Polyclinic database creation 2.4 MB
The development of computer technology and information technology has provided opportunities for the creation and widespread use of automated information systems (AIS) for various purposes. Information systems for managing economic and technical facilities are being developed and implemented
13542. Geological information databases 20.73 KB
Recently, the introduction of computer technologies and, in particular, databases, into the scientific sphere has been taking place at a rapid pace. This process does not bypass geology either, since it is in the natural sciences that there is a need for storing and processing large amounts of information.
9100. Database. Basic concepts 26.28 KB
A database is a collection of information about specific objects of the real world in any subject area of ​​economics, management, chemistry, etc. The purpose of an information system is not just storage of data about objects, but also manipulation of this data, taking into account connections between objects. Each object is characterized by a set of properties data, which are called attributes in the database.
5240. Creation of the database "Dean's office" 1.57 MB
Database (DB) is a set of interconnected data stored together on external storage media of a computer, with such an organization and minimal redundancy that allows them to be used in an optimal way for one or several applications

Industry data models

The main purpose of the models is to facilitate orientation in the data space and help in highlighting the details that are important for business development. In today's environment, for a successful business, it is imperative to have a clear understanding of the links between the various components and to have a good idea of ​​the overall picture of the organization. Identification of all details and relationships using models allows the most efficient use of the time and tools of organizing the work of the company.

Data models are abstract models that describe how data is presented and accessed. Data models define data items and the relationships between them in a particular area. A data model is a navigation tool for both business and IT professionals that uses a specific set of symbols and words to accurately explain a specific class of real-world information. This allows for better communication within the organization and thus creates a more flexible and stable application environment.

The data model uniquely defines the meaning of the data, which in this case is structured data (as opposed to unstructured data such as, for example, an image, binary file, or text, where the meaning can be ambiguous).

As a rule, models of a higher level (and more general in content) and a lower one (respectively, more detailed) are distinguished. The upper level of modeling is the so-called conceptual data models(conceptual data models), which give the most general picture of the functioning of an enterprise or organization. The conceptual model includes the main concepts or subject areas that are critical to the functioning of the organization; usually their number does not exceed 12-15. Such a model describes the classes of entities that are important to the organization (business objects), their characteristics (attributes), and the associations between pairs of these classes (that is, relationships). Since the terminology in business modeling has not yet completely settled down, in various English-language sources, conceptual data models can also be called the subject area model (which can be translated as domain models) or the subject enterprise data model (subject corporate data models).

The next hierarchical level is logical data models(logical data models). They can also be called enterprise data models or business models. These models contain data structures, their attributes and business rules, and represent the information used by the enterprise from a business perspective. In such a model, data is organized in the form of entities and relationships between them. The logical model presents data in a way that makes it easy for business users to understand. In a logical model, a data dictionary can be distinguished - a list of all entities with their precise definitions, which allows different categories of users to have a common understanding of all the input and information output streams of the model. The next, lower level of modeling is the physical implementation of the logical model using specific software tools and technical platforms.

The logical model contains a detailed corporate business decision, which usually takes the form of a normalized model. Normalization is a process that ensures that each data item in a model has only one value and is completely and uniquely dependent on the primary key. Data items are organized into groups according to their unique identification. The business rules governing data items must be fully incorporated into the normalized model with prior validation and validation. For example, a data item such as Customer Name is likely to be split into First Name and Last Name and grouped with other related data items into a Customer entity with a primary key Customer ID.

The logical data model is independent of application technologies such as databases, networking technologies, or reporting tools, and the means of their physical implementation. There can be only one Enterprise Data Model in an organization. Logical models typically include thousands of entities, relationships, and attributes. For example, a data model for a financial institution or telecommunications company can contain about 3,000 industry concepts.

It is important to distinguish between logical and semantic data model. The logical data model represents an enterprise business solution, and the semantic data model represents an applied business solution. The same corporate logical data model can be implemented using different semantic models, i.e. semantic models can be viewed as the next level of modeling approaching physical models. Moreover, each of these models will represent a separate "slice" of the corporate data model in accordance with the requirements of various applications. For example, in the corporate logical data model, the Client entity will be completely normalized, and in the semantic model for the data mart it can be represented as a multidimensional structure.

A company can have two ways to create a corporate logical data model: build it independently or use a ready-made one. industry model(industry logical data model). In this case, differences in terms reflect only different approaches to building the same logical model. In the event that a company independently develops and implements its own logical data model, then such a model, as a rule, is called simply a corporate logical model. If an organization decides to use a ready-made product from a professional supplier, then we can talk about an industry logical data model. The latter is a ready-made logical data model that reflects the functioning of a particular industry with a high degree of accuracy. An industry logic model is a domain-specific and integrated view of all the information that must reside in an enterprise Data Warehouse to answer both strategic and tactical business questions. Like any other logical data model, the industry model is independent of application decisions. It also does not include derived data or other calculations for faster data retrieval. As a rule, most of the logical structures of such a model are well embodied in its effective physical implementation. Such models are being developed by many suppliers for a wide variety of areas of activity: finance, manufacturing, tourism, healthcare, insurance, etc.

An industry logical data model contains information that is common to the industry and therefore cannot be a comprehensive solution for a company. Most companies have to grow the model by an average of 25% by adding data items and expanding definitions. The out-of-the-box models contain only key data elements, and the rest of the elements must be added to the corresponding business objects during the installation of the model in the company.

Industry logical data models contain a significant amount of abstraction. Abstractions mean the union of similar concepts under common names such as Event or Participant. This adds flexibility and uniformity to industry models. Thus, the concept of an Event is applicable to all industries.

Business Intelligence Specialist Steve Hoberman identifies five factors to consider when deciding whether to acquire an industry data model. The first is the time and money needed to build the model. If an organization needs to achieve results quickly, then the industry model will be beneficial. Using an industry model may not immediately provide a picture of the entire organization, but it can save a significant amount of time. Instead of modeling itself, time will be spent linking existing structures to the industry model and discussing how best to customize it to the needs of the organization (for example, which definitions should be changed and which data items should be added).

The second factor is the time and money required to keep the model in good working order. If the enterprise data model is not part of a methodology that allows you to monitor compliance with its accuracy and compliance with modern standards, then such a model becomes outdated very quickly. The industry data model can prevent this risk from happening as it is kept up to date with external resources. Of course, changes taking place within the organization should be reflected in the model by the company itself, but industry changes will be reproduced in the model by its supplier.

The third factor is experience in risk assessment and modeling. Creation of a corporate data model requires qualified resources from both the business and the IT staff. As a rule, managers are well aware of either the work of the organization as a whole, or the activities of a particular department. Few of them have both broad (company-wide) and deep (within departments) knowledge of their business. Most managers usually know only one area well. Therefore, in order to get the overall corporate picture, significant business resources are required. This also increases the demands on the IT staff. The more business resources are required to create and test a model, the more experienced analysts must be. They must not only know how to get information from the business staff, but also be able to find a common point of view in contentious areas and be able to present all this information in an integrated way. The person creating the model (in many cases the same analyst) must have good modeling skills. Building enterprise logic models requires modeling “for the future” and the ability to literally convert complex business “into squares and lines”.

On the other hand, the industry model allows outside expertise to be leveraged. Industry-specific logic models are built using proven modeling methodologies and teams of experienced professionals to avoid common and costly problems that can arise when developing enterprise data models within an organization.

The fourth factor is the existing application infrastructure and supplier relationships. If an organization already uses many tools from the same supplier and has established relationships with him, then it makes sense and the industry model to order from him. This model will be able to work freely with other products from the same supplier.

The fifth factor is intra-industry exchange of information. If a company needs to communicate with other organizations working in the same field, then the industry model can be very useful in this situation. Organizations within the same industry use similar structural components and terminology. Nowadays, in most industries, companies are forced to exchange data in order to successfully conduct business.

The most effective are the industry models offered by professional suppliers. High efficiency of their use is achieved due to the significant level of detail and accuracy of these models. They usually contain many data attributes. In addition, the creators of these models not only have extensive modeling experience, but are also well versed in building models for a particular industry.

Industry data models provide companies with a single, integrated view of their business information. Many companies find it difficult to integrate their data, although this is a prerequisite for most enterprise-wide projects. According to a study by The Data Warehousing Institute (TDWI), more than 69% of organizations surveyed found integration to be a significant barrier to new applications adoption. On the contrary, the implementation of data integration generates tangible income for the company.

The industry data model, in addition to linking to existing systems, provides great benefits for enterprise-wide projects such as Enterprise Resource Planning (ERP), master data management, business intelligence, data quality improvement, and employee development.

Thus, industry logical data models are an effective tool for integrating data and obtaining a holistic view of the business. The use of logical models seems to be a necessary step towards the creation of corporate data warehouses.

Publications

  1. Steve Hoberman. Leveraging the Industry Logical Data Model as Your Enterprise Data Model.
  2. Claudia Imhoff. Fast-Tracking Data Warehousing & Business Intelligence Projects via Intelligent Data Modeling

To sell, you need to understand what we are selling

Let's define the terminology and concepts. ( Data Warehouse) Is not a system of key performance indicators (KPI, KPI), it is not a large database, it is not an analytical one OLAP tool, this is not an intelligent system that allows you to extract new data and obtain statistical dependencies, this is not a system of a single reference data - all this is not a CD, if we talk about it in the context of a single item.

Enterprise data warehouseit is a specially organized array of enterprise (organization) data, processed and stored in a single hardware and software complex, which provides quick access to operational and historical information, multidimensional data analysis (KPI for various dimensions), obtaining forecasts and statistics in terms of the agreed normative and reference information (NSI).

Potential customers to corporate data warehouse and what do they get?

How do you identify potential corporate customers looking for a data warehouse?

  1. First of all, a lot of information should arise in the daily activities of the company. These can be phone calls, financial transactions, customer complaints / reviews, customer requests for shipment, information from spy satellites, etc. In principle, anything, the main thing is that there is a lot of data.
  2. The potential client should have a desire to see and analyze this information. At the same time, the analysis period should be quite extensive - from a day or even an hour, to an analysis of several years.
  3. The client must have a normally working infrastructure (there should not be any servers connected by a twisted pair cable or via a USB port). If the client does not have infrastructure, he needs to sell it.

What benefits does a customer get from implementing an enterprise data warehouse?

  1. A unified information system for storing corporate data appears, in which a unified reference information is used.
  2. It becomes possible to conduct a comprehensive analysis of the business. For example: which clients are the most profitable and profitable; what service is most in demand with which clients, what kind of claims are most frequent and in what regions, etc.
  3. It becomes possible to conduct analysis using historical data. Often, operational (automating daily business processes) systems do not allow this, they simply do not have enough space for storing history and capacity for analysis.
  4. It becomes possible to connect and analyze information previously stored in different information systems. For example, traffic data for different branches is stored in billing systems from different developers. After the implementation of CD, it becomes possible to analyze them together, in a single report.
  5. It becomes possible to analyze and cross data of different kinds. For example, money and traffic, the number of staff and the number of refusals or claims, etc.
  6. A basis appears for a better calculation of the cost of services - based on information from the corporate data warehouse, it is possible to obtain more adequate data for natural distribution bases.

What is a corporate data warehouse

From a technical point of view, what components does an enterprise data warehouse use?

Components corporate data warehouse enterprises

  1. The client always has operating systems - data sources for corporate data warehouse. These are, for example, accounting, billing, banking, etc. systems.
  2. Using ETL application(software that allows you to extract, transform and load data), data from the source systems fall into the data warehouse database. The following can be used as ETL tools: Informatica Power Center, IBM DataStage, Oracle Data Integrator, Oracle WareHouse Builder. There are also products from other vendors, but they are almost not represented on the Russian market.
  3. Itself database corporate storage is not abstract in its structure (a set of tables, fields in them and relationships between tables), but is created on the basis of data models. For the most part, either Oracle or Teradata is used as a database.
  4. Data model is a description of all entities, database objects of the corporate data warehouse and includes: conceptual data model, logical data model and physical database model ... At the level of the conceptual model, entities and relationships between them are defined. At the level of the logical model, entities are divided into business areas, they are given a detailed and complete description, and relationships are written. When developing a physical database model, the entire structure of the database is determined - from tables and fields in them, to partitions and indexes. Data modelsToday IBM, SAP and Oracle supply the market, but buying a data model does not automatically mean building the right enterprise warehouse.Data model is not a boxed product. It needs to be modified to suit the needs of a particular client.
  5. Further, already using data from the corporate data warehouse, the areas of analysis, reporting and data marts... Subsequently, users can independently build the necessary reporting and carry out multidimensional analysis. Business Objects, Oracle Discoverer, IBM AlphaBlocks and other products are mainly used as analysis tools.

What the components of an enterprise data warehouse look like (data model, ETL processes, data marts)

We will give illustrative examples of a data model, implementation of an ETL process, forms of support for a single reference data, data marts.


Logical modeldata.
Defines entities, their attributes and relationships between them.


ETL processelimination of duplicates in the original data


Data entry form for the formation of a unified directory


Data Showcasein the form of a tabular report


Data Showcasewith graphics and color
outputting data according to a given condition


Data Showcasewith schedule

Related software and hardware

First of all, in addition to the services for the development of a corporate data warehouse, licenses are also sold for both server software (OS, database, application server, etc.) and client sites (anti-virus protection and security).

The client's existing servers may not be designed for datastore deployment. It is necessary to put forward requirements for them and sell the hardware to a potential client.

In addition to the servers themselves, disk arrays are needed to store a significant amount of information.

Intending to build a corporate data warehouse, a potential client does not always understand how he will provide backup. Often, the client's existing backup systems are not able to simultaneously connect data volumes from 20-30 TB to the backup.

Typically, the client's specialists and users require training courses.

M.V. Kovtun August 2010

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru/

  • 1. Relational data model
    • 1.1 The relational data model. Basic definitions
    • 1.2 Operations on relations
  • 2. Corporate information systems
  • Bibliography

1. Relational data model

1.1 The relational data model. Basic definitions

In mathematical disciplines, the concept of "table" corresponds to the concept of "relation" (relation). The table reflects an object of the real world - an entity, and each of its lines reflects a specific instance of the entity. Each column has a name unique to the table. Strings have no names, their order is not defined, and the number is not logically limited. One of the main advantages of a relational data model is homogeneity (each row in a table has the same format). It is up to the user to decide whether the respective entities are homogeneous. This solves the problem of model suitability.

Basic concepts:

* A relationship is a two-dimensional table containing some data.

* Entity - an object of any nature, data about which is stored in the database. Attributes are properties that characterize an entity (columns).

* The degree of relationship is the number of columns.

* Relationship Scheme - a list of attribute names, for example, EMPLOYEE (No., Full name, Year of birth, Position, Department).

* Domain - a set of values ​​of the attributes of a relation (data type).

* A tuple is a table row.

* Cardinality (cardinality) - the number of rows in the table.

* Primary key is an attribute that uniquely identifies the rows of a relationship. A multi-attribute primary key is called a composite primary key. The primary key cannot be completely or partially empty (null). Keys that can be used as primary keys are called potential or alternative keys.

* A foreign key is an attribute (s) of one table that can serve as the primary key of another table. References the primary key of another table.

Normalization is a process aimed at reducing the redundancy of information in a database. In addition to the data itself, various names, object names and expressions can also be normalized in the database.

A non-normalized database contains information in one or more different tables; this gives the impression that the inclusion of data in a particular table is not due to any apparent reasons. This state of affairs can have a negative impact on data security, the rational use of disk space, the speed of queries, the efficiency of updating the database and, perhaps most importantly, the integrity of the stored information. The database before normalization is a structure that has not logically been broken down into more manageable, smaller tables.

The normal form is a kind of indicator of the level, or depth, of database normalization. The normalization level of the database corresponds to the normal form in which it is located.

1.2 Operations on relations

To bring the table to the first normal form (1NF), you need to follow two rules:

1. Atomicity or indivisibility. Each column must contain one indivisible value.

2. The table should not contain duplicate columns or groups of data.

For example, if a table contains in one field the full address of a person (street, city, postal code), it will not meet the 1NF rules, since it will contain different values ​​in one column, which would be a violation of the atomicity rule. Or if the database contains data about movies and it contains the columns actor1, actor2, actor3, it will also not comply with the rules, since the data will be repeated.

Normalization should begin with checking the database structure for compatibility with 1NF. All columns that are not atomic must be split into their constituent columns. If there are duplicate columns in the table, then they need to select a separate table.

To bring the table to first normal form, you should:

* Find all fields that contain multipart pieces of information.

* The data that can be broken down into component parts must be placed in separate fields.

* Move duplicate data into a separate table.

* Check if all tables match the conditions of the first normal form.

To bring the tables to the second normal form (2NF), the tables should be already in 1NF. Normalization should be done in order.

Now, in second normal form, the condition must be met - any column that is not a key (including foreign) must depend on the primary key. Typically, these columns, which have values ​​that are independent of the key, are easy to identify. If the data contained in the column is not related to the key that describes the row, then they should be separated into their own separate table. The primary key must be returned to the old table.

To bring the base to the second normal form, you need to:

* Identify all columns that are not directly dependent on the primary key of this table.

* Create the required fields in the users and forums tables, select from existing fields or create primary keys from new ones.

* Each table needs its own primary key

* Create foreign keys and designate their relationships between tables. The final step of normalization to 2NF will be the allocation of foreign keys for communication with associated tables. The primary key of one table must be a foreign key in another.

Hints:

Another way to convert a schema to 2NF is to look at the relationships between the tables. Ideally, create all one-to-many relationships. Many-to-many relationships need restructuring.

A properly normalized table will never have duplicate rows (two or more rows whose values ​​are not keys and contain the same data).

The database will be in third normal form if it is converted to second normal form and each non-key column is independent of each other. If you follow the normalization process correctly up to this point, there may be no questions about converting to 3NF. You should be aware that 3NF is violated if changing the value in one column requires a change in the other column.

To bring the base to the third normal form, you need:

* Determine which fields of which tables have interdependencies, i.e. fields that depend more on each other than on the row as a whole.

* Create matching tables. If there is a problematic column in step 1, create split tables for it.

* Create or allocate primary keys. Each table must have a primary key.

* Create the required foreign keys that form any of the relationships.

In the fourth normal form, an additional rule is that it is necessary to exclude multivalued dependencies. In other words, all rows in the table must be independent of each other. The presence of some row X should not mean that row Y is also somewhere in this table.

2. Corporate information systems

relational model data system

A system (from the Greek systema - a whole, a compound made up of parts) is a collection of elements that interact with each other, forming a certain integrity, unity. Here are some concepts that are often used to characterize a system.

1. A system element is a part of a system that has a specific functional purpose. Complex elements of systems, in turn, consisting of simpler interconnected elements, are often called subsystems.

2. Organization of the system - internal orderliness, consistency of the interaction of system elements, manifested, in particular, in limiting the variety of states of elements within the system.

3. The structure of the system - the composition, order and principles of interaction of the elements of the system, which determine the basic properties of the system. If the individual elements of the system are spaced across different levels and the internal connections between the elements are organized only from higher to lower levels and vice versa, then we speak of the hierarchical structure of the system. Purely hierarchical structures are practically rare, therefore, somewhat expanding this concept, the hierarchical structure is usually understood as such structures, where, among other connections, hierarchical relationships are of primary importance.

4. System architecture - a set of system properties that are essential for the user.

5. Integrity of the system - the fundamental irreducibility of the properties of the system to the sum of the properties of its individual elements (emergence of properties) and, at the same time, the dependence of the properties of each element on its place and function within the system.

Information system - an interconnected set of tools, methods and personnel used to store, process and issue information in order to achieve the set goal "

The Federal Law "On Information, Informatization and Information Protection" provides the following definition:

"Information system is an organizationally ordered set of documents (arrays of documents) and information technologies, including the use of computer technology and communication, realizing information processes"

Scale classification

In terms of scale, information systems are divided into the following groups:

* single;

* group;

* corporate.

A corporate information system is a scalable system designed for the integrated automation of all types of economic activities of large and medium-sized enterprises, including corporations consisting of a group of companies requiring unified management.

A corporate information system can be considered a system that automates more than 80% of the divisions of an enterprise.

Recently, in many publications devoted to the use of information technologies in the management of economic objects, the term "corporate information systems" is often used, which in them means the actual automated information systems of economic objects.

An automated information system (AIS) is a combination of various types of support, as well as specialists designed to automate the processing of accounting and analytical information. As a rule, the types of support are homogeneous for different systems in composition, which makes it possible to implement the principle of compatibility of systems in the course of their operation. In the process of studying AIS as a complex system, it is necessary to single out individual parts and elements and consider the features of their use at the stages of creation and operation.

Corporate information systems are an evolution of systems for workgroups, they are focused on large companies and can support geographically dispersed nodes or networks. Basically, they have a hierarchical structure of several levels. Such systems are characterized by a client-server architecture with specialization of servers or a multi-tier architecture. When developing such systems, the same database servers can be used as when developing group information systems. However, in large information systems, the most widely used servers are Oracle, DB2 and Microsoft SQL Server.

For group and corporate systems, the requirements for the reliability of operation and data safety are significantly increased. These properties are supported by data, reference, and transactional integrity support in the database servers.

Classification by scope

According to the scope of application, information systems are usually divided into four groups:

* transaction processing systems;

* decision making systems;

* information and reference systems;

* office information systems.

Bibliography

1. Agaltsov, V.P. Database. In 2 volumes. V. 2. Distributed and remote databases: Textbook / V.P. Agaltsov. - M .: ID FORUM, NITs INFRA-M, 2013.

2. Golitsyna, O. L. Databases: Textbook / O.L. Golitsyna, N.V. Maksimov, I.I. Popov. - M .: Forum, 2012.

3. Karpova, I.P. Databases: Textbook / I.P. Karpov. - SPb .: Peter, 2013.

4. Kirillov, V.V. Introduction to relational databases. Introduction to relational databases. Kirillov, G.Yu. Gromov. - SPb .: BHV-Petersburg, 2012.

5. Pirogov, V.Yu. Information systems and databases: organization and design: Textbook / V.Yu. Pirogov. - SPb .: BHV-Petersburg, 2009.

6. G.N. Fedorov. Information Systems. - M .: Academy, 2013.

7. A.E. Satunina, L.A. Sysoeva. Project management of the corporate information system of the enterprise. - M .: Finance and statistics, Infra-M, 2009.

Posted on Allbest.ru

...

Similar documents

    The essence and characteristics of the types of data models: hierarchical, network and relational. Basic concepts of the relational data model. Attributes, database relationship schema. Data integrity conditions. Relationships between tables. General understanding of the data model.

    term paper added 01/29/2011

    Corporate information systems and databases, their use to improve and debug business. Classification of corporate information systems. OLTP class information systems. Prompt analytical processing.

    term paper added 01/19/2011

    Databases with two-dimensional files and relational database management systems (DBMS). Creating a database and processing queries to them using a DBMS. The main types of databases. Basic concepts of relational databases. Fundamental properties of relationships.

    abstract, added 12/20/2010

    Database system concept. The relational model and its characteristics. Integrity in the relational model. Relational algebra. Database design issues. Normal forms of relationships. Designing a database using the entity-relationship method. ER diagrams. SQL language.

    lecture course, added 10/03/2008

    A defined logical structure of data that is stored in a database. Basic data models. Elements of the relational data model. An example of using foreign keys. Basic requirements for the relationship of the relational data model.

    presentation added on 10/14/2013

    Databases and their use in computing. Features and basic constructive unit of the network data model. Hierarchical model, objects of the subject area. Relational model, its visibility, presentation of data in tabular form.

    abstract, added 12/19/2011

    Types and functions of the Microsoft Access database management system. Hierarchical, network, relational model for describing databases. Basic concepts of a database table. Features of creating database objects, basic forms. Access to the Internet in Access.

    test, added 01/08/2011

    Modern database management systems (DBMS). Analysis of the hierarchical data model. Relational data model. Post-relational data model as an extended relational model that removes the restriction on the indivisibility of data stored in table records.

    scientific work, added 06/08/2010

    Data models in database management. Conceptual data models. The role of databases in information systems. Relational data model. Definition of the subject area. Building a database model for the "Pets" information system.

    term paper, added 04/19/2011

    Information model in Access as a kind of simplified substitute for a real object or system. Basic structures that determine the organization of data and the relationships between them; a relational type of data organization. An example of a database in taxation.

Zaitsev S.L., Ph.D.

Repeating groups

Duplicate groups are attributes for which a single instance of an entity can have more than one value. For example, a person can have more than one skill. If, in terms of business requirements, we need to know the skill level for each, and each person can only have two skills, we can create the entity shown in Fig. 1.6. Here is the entity A PERSON with two attributes for storing skills and skill level for each.

Rice. 1.6. This example uses repeating groups.

The problem with repeating groups is that we cannot know exactly how many skills a person might have. In real life, some people have one skill, some have several, and some have none yet. Figure 1.7 shows the model reduced to the first normal form. Note the added Skill ID that each uniquely identifies SKILL.

Rice. 1.7. Model reduced to first normal form.

One fact in one place

If the same attribute is present in more than one entity and is not a foreign key, then this attribute is considered redundant. The logical model should not contain redundant data.

Redundancy requires additional space, but while memory efficiency is important, the real problem lies elsewhere. Ensuring that redundant data is synchronized is overhead, and you always run the risk of conflicting values.

In the previous example SKILL depends on Person ID and from Skill ID. This means that you will not have SKILL until it appears A PERSON, possessing this skill. This also makes it difficult to change the Skill Name. It is necessary to find each entry with the Name of the skill and change it for each Person who has this skill.

Figure 1.8 shows the model in second normal form. Notice that the added entity SKILL, and the attribute TITLE the skill is transferred to this entity. The skill level remained, respectively, at the intersection PERSONS and SKILL.

Rice. 1.8. In second normal form, the repeating group is moved to another entity. This provides the flexibility to add the required number of Skills and change the Skill Name or Skill Description in one place.

Each attribute depends on the key

Each attribute of an entity must depend on the primary key of that entity. In the previous example School name and Geographic area present in the table A PERSON but do not describe the person. To achieve the third normal form, you need to move the attributes to the entity, where they will depend on the key. Figure 1.9. shows the model in third normal form.

Rice. 1.9. In third normal form School name and Geographic region transferred to an entity, where their values ​​depend on the key.

Many-to-many relationships

Relationship many-to-many reflect the reality of the surrounding world. Note that in Figure 1.9, there is a many-to-many relationship between PERSONOUS and SCHOOL... The attitude accurately reflects the fact that A PERSON can study in many SCHOOLS and in SCHOOL can learn a lot PERSON. To achieve the fourth normal form, an associative entity is created that eliminates the monogy-to-many relationship by generating a separate entry for each unique combination of school and person. Figure 1.10 shows the model in fourth normal form.

Rice. 1.10. In fourth normal form, a monogo-to-many relationship between PERSONOUS and SCHOOL resolved by introducing an associative entity, in which a separate entry is allocated for each unique combination SCHOOLS and PERSONS.

Formal definitions of normal forms

The following definitions of normal forms can seem daunting. Think of them simply as formulas to achieve normalization. Normal forms are based on relational algebra and can be interpreted as mathematical transformations. Although this book is not devoted to a detailed discussion of normal forms, modelers are encouraged to take a deeper look at the subject.

In a given relation R, the Y attribute functionally depends on the X attribute. In symbolic form, RX -> RY (read as "RX functionally defines RY") - if and only if each value of X in R is associated with exactly one value of Y in R (at any given time). Attributes X and Y can be compound (Date CJ. Introduction to Database Systems. 6th edition. Ed. Williams: 1999, 848 pp.).

The relation R corresponds to the first normal form (1NF) if and only if all domains belonging to it contain only atomic values ​​(Date, ibid.).

A relation R corresponds to second normal form (2NF) if and only if it corresponds to 1NF, and each non-key attribute is completely dependent on the primary key (Date, ibid.).

The relation R corresponds to third normal form (3NF) if and only if it corresponds to 2NF, and each non-key attribute does not transitively depend on the primary key (Date, ibid.).

The relation R corresponds to Boyes-Codd normal form (BCNF) if and only if each determinant is a candidate for use as a key.

NOTE Below is a brief explanation of some of the abbreviations used in Date's definitions.

MVD (multi-valued dependency) is a multi-valued dependency. Used only for entities with three or more attributes. In a multivalued dependency, the value of the attribute depends only on a portion of the primary key.

FD (functional dependency) - functional dependency. With functional dependency, the value of an attribute depends on the value of another attribute that is not part of the primary key.

JD (join dependency) is a join dependency. In a union dependency, the primary key of the parent entity is traced back to at least the third level descendants, while retaining the ability to be used in the union by the original key.

The ratio corresponds to the fourth normal form (4NF) if and only if there is an MVD in R, for example A®®B. Moreover, all R attributes functionally depend on A. In other words, R contains only dependencies (FD or MVD) of the K®X form (i.e., the functional dependence of the X attribute on the candidate for use as a key K). Accordingly, R meets the requirements of 4NF if it meets BCNF and all MVDs are actually FDs (Date, ibid.).

For the fifth normal form, the relation R satisfies the union dependence (JD) * (X, Y,…, Z) if and only if R is equivalent to its projections onto X, Y, ..., Z, where X, Y ,. .., Z is a subset of the set of attributes R.

There are many other normal forms for complex data types and specific situations that are beyond the scope of this discussion. Any model development enthusiast would like to learn other normal forms as well.

Business normal forms

In his book, Clive Finklestein (An Introduction to Information Engineering: From Strategic Planning to Information Systems. Reading, Massachusetts: Addison-Wesley, 1989) took a different approach to normalization. It defines business normal forms in terms of coercion to those forms. Many modelers find this approach more intuitive and more pragmatic.

The first business normal form (1BNF) takes out duplicate groups to another entity. This entity gets its own name and primary (composite) key attributes from the original entity and its repeating group.

The second business normal form (2BNF) takes out attributes that are partially dependent on the primary key to another entity. The primary (composite) key of this entity is the primary key of the entity in which it was originally located, along with additional keys on which the attribute depends entirely.

The third business normal form (3BNF) takes attributes that are independent of a primary key into another entity, where they are completely dependent on the primary key of that entity.

The fourth business normal form (4BNF) takes attributes that depend on the value of the primary key or are optional to a secondary entity, where they depend entirely on the value of the primary key, or where they must (necessarily) be present in that entity.

The fifth business normal form (5BNF) appears as a structural entity if there is a recursive or other dependency between instances of a secondary entity, or if a recursive dependency exists between instances of its primary entity.

Completed Logical Data Model

The completed logical model must satisfy the requirements of the third business normal form and include all the entities, attributes, and relationships necessary to support the data requirements and business rules associated with the data.

All entities must have names that describe their content and have a clear, concise, complete description or definition. A future post will cover an initial set of guidelines for the correct formation of entity names and descriptions.

Entities must have a complete set of attributes, so that every fact about each entity can be represented by its attributes. Each attribute must have a name that reflects its meaning, a Boolean data type, and a clear, short, complete description or definition. In a future blog post, we will look at an initial set of guidelines for properly formatting attribute names and descriptions.

Relationships should include a verbal construction that describes the relationship between entities, along with characteristics such as plurality, necessity of existence or possibility of absence of a relationship.

NOTE Plurality relationship describes the maximum number of secondary entity instances that can be associated with an instance of the original entity.Necessity of existence orpossibility of absence relationship is used to determine the minimum number of instances of a secondary entity that can be associated with an instance of the original entity.

Physical data model

Once a complete and adequate logical model has been created, you are ready to make a decision about the choice of an implementation platform. The choice of platform depends on the requirements for the use of data and the strategic principles of shaping the architecture of the corporation. Choosing a platform is a complex issue beyond the scope of this book.

In ERwin, a physical model is a graphical representation of a real-world database. The physical database will be made up of tables, columns, and relationships. The physical model depends on the platform chosen for implementation and the requirements for using the data. The physical model for IMS will be very different from that for Sybase. The physical model for OLAP reports will look different from the model for OLTP (online transaction processing).

The data modeler and database administrator (DBA) use the logical model, usage requirements, and corporate architecture policy to develop a physical data model. You can denormalize the physical model to improve performance, and create views to support usage requirements. The following sections detail the process of denormalizing and creating views.

This section provides an overview of the process of building a physical model, collecting data usage requirements, defining the components of a physical model, and providing reverse engineering. In the following publications, these issues are covered in more detail.

Collecting data usage requirements

Typically, you collect data usage requirements early on during interviews and work sessions. At the same time, the requirements should determine as fully as possible the use of data by the user. The superficial attitude and gaps in the physical model can lead to unplanned costs and delay in project implementation. Requirements for use include:

    Access and performance requirements

    Volumetric characteristics (an estimate of the amount of data to be stored) that allow the administrator to represent the physical volume of the database

    Estimating the number of users who need concurrent access to data to help you design your database for acceptable performance levels

    Aggregate, pivot, and other calculated or derived data that can be considered candidates for storage in persistent data structures

    Reporting requirements and standard queries to help the database administrator build indexes

    Views (persistent or virtual) that will assist the user when performing data aggregation or filtering operations.

In addition to the chairman, secretary, and users, the modeler, database administrator, and database architect must participate in the usage requirements session. The user's historical data requirements should be discussed. The length of time that data is retained has a significant impact on the size of the database. Older data is often stored in a generalized form, and atomic data is archived or deleted.

Users should bring examples of requests and reports with them to the session. Reports must be strictly defined and must include atomic values ​​used for any summary and summary fields.

Physical Data Model Components

The components of a physical data model are tables, columns, and relationships. Logical model entities are likely to become tables in the physical model. Boolean attributes become columns. Logical relationships will become constraints on the integrity of the relationships. Some logical relationships cannot be implemented in a physical database.

Reverse engineering

When a logical model is not available, it becomes necessary to recreate the model from the existing database. In ERwin, this process is called reverse engineering. Reverse engineering can be done in several ways. The modeler can explore the data structures in the database and recreate tables in a visual modeling environment. You can import data definitions language (DDL) into a tool that supports reverse engineering (such as Erwin). Advanced tools such as ERwin include functions that provide ODBC communication with an existing database to create a model by directly reading data structures. Reverse engineering with ERwin will be discussed in detail in a future post.

Using corporate functional boundaries

When building a logical model for the modeler, it is important to ensure that the new model is consistent with the corporate model. Using corporate functional boundaries means modeling data in terms used within a corporation. The way data is used in a corporation is changing faster than the data itself. In each logical model, data must be presented in a holistic manner, regardless of the business domain it supports. Entities, attributes, and relationships must define business rules at the corporation level.

NOTE Some of my colleagues refer to these corporate functional boundaries as real-world modeling. Real-world modeling encourages the modeler to view information in terms of its actually inherent relationships and relationships.

The use of corporate functional boundaries for a data model that is constructed appropriately provides the basis for supporting the information needs of any number of processes and applications, which enables the corporation to more efficiently exploit one of its most valuable assets, information.

What is an Enterprise Data Model?

Enterprise data model (EDM) contains entities, attributes, and relationships that represent the information needs of a corporation. EDM is usually categorized according to subject areas, which represent groups of entities related to supporting specific business needs. Some subject areas may cover specific business functions such as contract management, while others may cover entities that describe products or services.

Each logical model must correspond to the existing domain of the corporate data model. If the logical model does not meet this requirement, a domain model must be added to it. This comparison ensures that the corporate model is improved or adjusted and that all logical modeling efforts are coordinated within the corporation.

EDM also includes specific entities that define the scope of values ​​for key attributes. These entities have no parents and are defined as independent. Independent entities are often used to maintain the integrity of relationships. These entities are identified by several different names, such as code tables, reference tables, type tables, or classification tables. We will use the term "corporate business object". An enterprise business object is an entity that contains a set of attribute values ​​that are independent of any other entity. Corporate business objects should be used consistently within a corporation.

Building a corporate data model by augmenting

There are organizations where the corporate model has been built from start to finish as a result of a single concerted effort. On the other hand, most organizations create fairly complete corporate models by scaling up.

Building up means building something sequentially, layer by layer, just like an oyster grows a pearl. Each data model created provides a contribution to the formation of the EDM. Building an EDM in this way requires additional modeling steps to add new data structures and domains or augment existing data structures. This makes it possible to build an enterprise data model by augmenting, iteratively adding levels of detail and refinement.

Modeling methodology concept

There are several visual data modeling methodologies. ERwin supports two:

    IDEF1X (Integration Definition for Information Modeling - an integrated description of information models).

    IE (Information Engineering).

IDEF1X is a good methodology and the use of its notation is widespread

Integrated description of information models

IDEF1X is a highly structured data modeling methodology that extends the IDEF1 methodology adopted as a FIPS (Federal Information Processing Standards) standard. IDEF1X uses a highly structured set of modeling construct types and results in a data model that requires an understanding of the physical nature of the data before such information can be made available.

The rigid structure of IDEF1X forces the modeler to assign characteristics to entities that may not correspond to the realities of the surrounding world. For example, IDEF1X requires all entity subtypes to be exclusive. This leads to the fact that a person cannot be both a client and an employee. While real practice tells us differently.

Information Engineering

Clive Finklestein is often referred to as the father of information engineering, although similar concepts were shared with him by James Martin (Martin, James. Managing the Database Environment. Upper Saddle River, New Jersey: Prentice Hall, 1983.). Information Engineering uses a business-driven approach to information management and uses a different notation to represent business rules. IE serves as an extension and development of the notation and core concepts of the ER methodology proposed by Peter Chen.

IE provides the infrastructure to support information requirements by integrating corporate strategic planning with information systems that are being developed. This integration allows the management of information resources to be more closely aligned with the long-term strategic prospects of the corporation. This business-driven approach has led many modelers to choose IE over other methodologies that tend to focus on short-term development challenges.

IE proposes a sequence of actions that leads a corporation to identify all of its information needs for collecting and managing data and identifying relationships between information objects. As a result, information requirements are clearly articulated based on management directives and can be directly translated into a management information system that will support strategic information needs.

Conclusion

Understanding how to use a data modeling tool like ERwin is only part of the problem. In addition, you must understand when data modeling tasks are being solved and how information requirements and business rules are collected that should be represented in the data model. Conducting work sessions provides the most conducive environment for collecting information requirements in an environment that includes domain experts, users, and information technology professionals.

Building a good data model requires analyzing and researching the information requirements and business rules collected through work sessions and interviews. The resulting data model should be compared to the enterprise model, if possible, to ensure that it does not conflict with existing object models and includes all the required objects.

The data model consists of logical and physical models that represent information requirements and business rules. The logical model must be reduced to third normal form. The third normal form constrains, adds, updates, and removes data structure anomalies to support the "one fact in one place" principle. The collected information requirements and business rules should be analyzed and researched. They need to be compared to the enterprise model to ensure they do not conflict with existing object models and include all the required objects.

In ERwin, the data model includes both logical and physical models. ERwin implements the ER approach and allows you to create logical and physical model objects to represent information requirements and business rules. Logical model objects include entities, attributes, and relationships. Physical model objects include tables, columns, and constraints on the integrity of relationships.

One of the following publications will cover the issues of identifying entities, defining entity types, choosing entity names and descriptions, as well as some techniques to avoid the most common modeling errors associated with the use of entities.

Entities must have a complete set of attributes, so that every fact about each entity can be represented by its attributes. Each attribute must have a name that reflects its meaning, a Boolean data type, and a clear, short, complete description or definition. In a future blog post, we will look at an initial set of guidelines for properly formatting attribute names and descriptions. Relationships should include a verbal construction that describes the relationship between entities, along with characteristics such as plurality, necessity of existence or possibility of absence of a relationship.

NOTE Plurality relationship describes the maximum number of secondary entity instances that can be associated with an instance of the original entity.Necessity of existence or possibility of absence relationship serves to determine the minimum number of instances of a secondary entity that can be associated with an instance of the original