Obejectives:
Data Integration & Interoperability (DII) describes processes related to the movement and consolidation of data within and between data stores, applications and organisations. Integration consolidates data into coherent, physical or virtual forms; Interoperability of data is the ability of systems to communicate with each other.
DII solutions enable basic functions of data management which most organisations depend on:
- Migration and conversion of data
- Consolidation of data in hubs or marts
- Integration of providers’ packets into an organisation’s application portfolio
- Sharing of data between application or between organisations
- Distribution of data between data store and data center
- Data storage
- Management of Data Interfaces
- Obtaining and entring of external data
- Integration of structured and unstructured data
- Provision of operational intelligence and support to management’s decisions
DII depends on other areas of data management:
- Data Governance
- Data Architetture
- Data Security
- Metadata
- Data Storage e Operations
- Data modeling and design
Data Integration and Interoperability are pivotal for Data Warehousing and Business Intelligence, as well as for Reference Data and Master Data Management, as all of these systems focus on the transformation and integration of data from source systems to hubs of consolidated data and from hubs to target systems, where they can be delivered to data users (both systems and humans).
They are also crucial for the emerging area of Big Data management. Big Data tend to integrate various types of data, including structured data stored in databases, unstructured text data in documents or files and other types of unstructured data such as the audio, video and streaming ones. These integrated data can be extracted, used to develop predictive models and implemented in operational intelligence activities.
Data Integration and Interoperability activities entails finding data where it is needed, when it is needed and in the form in which it is needed. Data Integration activities follow a development life cycle: they start with planning and go through desing, development, testing and implementation. Once implemented, integrated systems have to be updated, monitored and improved.
Activities carried out by our Team:
Planning and analysing
- Defining data integration and life cycle requirements: Defining data integration requirements entail understanding the business objectives of the organisation, as well as the data required and the technology initiatives proposed to achieve those objectives. The requirement definition process creates and reveals valuable Metadata, which have to be managed throughout the data lifecycle, from discovery to the use in operational processes. The more complete and accurate an organisation’s metadata is, the better it is able to manage risks and costs of data integration.
- Run the Data Discovery: it has to be run before planning. Its objective is to identify potential sources of data for the data integration activity. The discovery identifies where data can be collected and where they can be integrated. The process combines technical research, using tools that read metadata and/or content on an organisation’s datasets, with subject matter expertise.
- Documenting Data Lineage: The data lineage process allows information on how data flows in an organisation to be discovered. This information can be used to document data lineage at a high level: how the data being analysed is acquired or created by the organisation, where it moves and is modified within the organisation and how it is used by the organisation for analytics, decision-making or event triggering.
- Profiling data: Understanding the content and structure of data is essential for successful data integration. Profiling can help integration teams discover these differences and use this knowledge to make better sourcing and design decisions. If data profiling is ignored, information that should influence design will not be discovered until testing or operations. Basic profiling involves analysing the following:
- Data format as defined in data structures and infered from real data
- Data population, including null, blank o default data level
- Data value and how they correspond to a defined set of valid data
- Patterns and relationships within the data set, such as related fields and cardinality rules
- Relationships with other data sets
- Gathering business rules: Business rules are a crucial subset of requirements. A business rule is a statement that defines or binds an aspect of business processes. Business rules aim to support the business structure or to control or influence business behaviour. They fall into one of the following categories: definitions of business terms, facts about mutual terms,constraints or assertions of actions and derivations.
Designing solutions of data integration
- Designing the architecture of data integration: Data integration solutions have to be speciefied both at a company level and at the individual solution level, and by establishing company standards, the organisation saves time in implementing individual solutions, as evaluations and negotiations have been performed in advance in regard to the need. A company approach allows savings in the costs of licences trough group discounts and in the costs of managing a coherent and less complex set of solutions.
- Modelling data hubs, interfaces, messages and data service: data structures necessary to Data Integration and Interoperability include those in which data persist, such as hubs in Master Data Management, data warehouse and marts and operational datastore and those transient and used only for the transfer or transformation of data, such as interface, layout of messages and canonical models.
- Mapping data sources on targets: Almost all data integration solutions include the transformation of data from source to target structures. Mapping data sources on targets entails specifying the rules for the transformation of data from a location and format to others. The transformation can be carried out with a batch planning or it can be triggered by the occurence of a real-time event. It can be carried out trough the persistance of the target format or through the virtual data presentation in the target format.
- Planning Data Orchestration: Data flow in a data integration solution must be planned and documented. Data Orchestration is the pattern of data flows from beginning to end, including intermediate steps, necessary to complete the transformation and/or transaction.
Developing data integration solutions
- Developing Data Service: It is possible to develop services to access, transform and deliver data as specified, matching the selected interaction model. Tools or providers’ suites are more frequently used for the implementation of data integration solutions, such as data transformation, Master Data Management, data warehousing and so on. Using coherent tools or providers’ standard suites for these purposes different across the organisation can simplify the operational support and reduce operational costs by enabling shared support solutions.
- Developing data flows: Data integration flows or ETL generally develop within tools specialised in managing data flows with ownership. Data batch flows develop in a scheduler (usually the company standard scheduler) that manages the order, frequency and dependency of the execution of the data integration parts that have been developed. Interoperability requirements may include the development of mappings or coordination points between data stores.
- Developing the approach to data migration: Data must be transferred when new applications are implemented or when applications are fallen into disuse or unified. Data migration projects are often underestimated or poorly designed, as programmers are simply told to “move the data” and they do not carry out analysis and design activities required for data integration. When data is migrated without a proper analysis, it often appears different from data that comes from a normal processing; or migrated data may not work as expected with the application.
- Developing a pubblication approach: systems, in which critical data is created or managed, must make that data available to other systems of the organisation. New or modified data must be sent from applications that produce data to other systems when data is changed or according to a recurring schedule. The best practice is to establish definitions of the common messages for the organisation’s various types of data and to allow data users with the appropriate access authority to be notified of any changes to the data of interest.
- Developing complex events processing flows: Developing complex events processing solutions requires:
- The preparation of historical data on an individual, organisation, product or market and of compilation predictive models
- The processing of the real time data flow to fully fill in the predictive model and identify significant events (opportunities or threats)
- The execution of the triggered action in response to the prediction
- Keeping DII metadata: During the process of developing DII, an organisation will create and discover valuable metadata that should be managed and maintained to ensure the correct understanding of the data in the system and to avoid the need to rediscover it for future solutions. Realible metadata improves an organisation’s ability to manage risks, reduce costs and obtain more value from its data. Data structure of all the systems involved in the integrations should be documented as source, target or staging and include business and technical definitions (structure, format and size), as well as the transformation of data between persistent data stores.
Implementing and monitoring
Data services that have been developed and tested must be activated. Real-time data processing requires the monitoring of problems in real time. Parameters indicating potential processing problems must be established, as well as the direct notification of problems. Automated and manual monitoring of problems must be established, particurarly as the complexity and risk of triggered resposnsed increases.