A Data Science Central Community
Dynamic data integration incorporating data quality and master data management (MDM) assures consistency and reliability for upstream analytics and information sharing. A pragmatic approach that treats all of a company’s data as big data will facilitate integration efforts.
In Part I, we stressed the importance of data quality. In this post, we focus on MDM and the connection between the two as part of data governance strategy.
As financial institutions seek to become data-driven enterprises, data governance should be regarded as a strategic centerpiece to that mission. This is particularly true when a principal objective of data governance is to reduce exposure to the risks lurking within departmental or individual data silos.
It Starts with Data Governance
Executing successfully on a data integration strategy should start with a formal data governance program that establishes procedural and cultural best practices for defining data elements across different user and application silos. Line of business users should be involved throughout the process to define the use cases and objectives the data will serve. Examples include adherence to GRC (governance, regulatory, compliance) initiatives, maintaining accurate customer records, or ensuring consistency of reference data for algorithmic trading. Along with data quality, MDM is a key initiative toward achieving ROI metrics around cost reduction, productivity enhancement or new revenue generation.
Business units or departments often create or purchase applications tactically to support specific functional requirements. Many of these apps are deployed with little regard for, or sometimes in spite of what other groups have deployed. The resulting technology fragmentation, differences in semantic definitions or format nuances, and data silos limit enterprise analytics capabilities and hamper oversight, causing missed ROI opportunities while exposing firms to undue risks and violations. As suggested in our last post, the first step to successful data integration is to audit existing systems and take inventory of what data assets exist, know who the app and data “owners” are, and understand how data is used and managed.
A formal program should be overseen by a data governance council comprised of business unit leaders with subject matter knowledge and IT staff that bring technology expertise. The group’s role is to oversee all aspects of master data, including identifying the data elements, creating consistent definitions for use across the enterprise, establishing rules (and rewards) for data entry, maintenance and changes to the master data, and setting data retention and auditing guidelines. Each data element must have a name, a structural format and a definition, all of which is then documented within a core metadata repository. Without a well-defined decision-making body and processes that create, maintain and validate master data that has senior management buy-in, the probability that an MDM project will fails rises significantly.
MDM faces many challenges and risks
MDM is part of a comprehensive data management strategy to enable an enterprise to link all of its critical data to one file, called a master, or “golden” file, which then serves as a common point of reference. It is meant to provide a consistent definition of reference data, such as customer name, address, social security number and product ID codes. Its most critical feature is a matching engine, which reconciles duplicates that may be caused by semantic differences created by disparate systems or end users to arrive at a common “version of the truth”.
When implemented judiciously, MDM brings automation and scale to gaining contextual understand among and between data elements from both traditional and newer data sources and formats, enabling better analytics and sharing. But “when” is the catch word, because it is often not done properly.
That is because MDM is an arduous, time-consuming and resource-intensive undertaking that is often derailed by politically-protected data silos, insufficient ROI justification, cross-systems complexity or the lack of management commitment to an ongoing data governance process. What often ends up happening is that master data from disparate source systems are fed into a large repository, where the data is matched and cleansed to produce that “golden” file, which is then accessed by all the source systems. However, many projects die after a master repository is implemented due to lax updating and maintenance, including regular inputs from the application user groups, or inadequate data governance enforcement.
Complicating things further is that more end-users rely on data that originates from external sources, such as a SaaS-based app, a cloud-based market data provider or on data that is managed entirely by third parties. In addition, many users still revert to spreadsheet and personal databases and storage media that circumvent defined data management policies of a formal governance program, wreaking havoc in production environments when their data does not sync with the “golden” file.
However, since master data is used by multiple applications, errors in the “golden” file can cause errors in all the apps that use it. Consequences can range from compliance violations and fines around new LEI requirements to catastrophic trading losses caused by inaccurate or inconsistent reference data. These examples illustrate the importance of linking MDM to data quality in order to ensure that the data going into the central repository is clean and has a clear history of versioning.
A pragmatic approach
In an ideal world, MDM works most effectively when deployed across the entire institution. But is a single version of the truth ever attainable in firms with far flung operating units that have their own budgets, application and data silos, and business objectives? The risk and expense of an enterprise-wide effort coupled with political issues around data ownership may make it easier to go incrementally, with the first project using a few key sources of metadata. A few short-term successes can assure further constituency support and resources.
Rather than architect a centralized MDM hub consider a registry style hub instead. A registry still matches and cleans the data, but creates a metadata index that points to the master data in source systems instead of storing the entire master data file. Since most financial institutions have large, heterogeneous systems, a registry approach can be faster and cheaper to implement with less politics since the master data does not go to a centralized hub which then overwrites the master data in the source system.
The key is to shift from worrying about the “golden” file to providing application users with a consistent representation of shared information. The registry’s metadata is updated regularly and automatically from all data sources, providing the availability that users need to then derive context for upstream analytics and sharing. The more line of business users are involved in all aspects of governance, the better their expectations and requirements can be met. Success with the registry approach can then be extended to a broader MDM initiative, particularly as firms seek to incorporate big data into their analytics and decision-making.