Карта сайта
Версия для печати

Полезные метрики для оценки качества данных

7 февраля 2013 В процессе построения оценочной карты качества информации (scorecard), компаниям необходимо определить релевантные метрики для выполнения поставленной задачи с максимальной эффективностью. Тем временем, чрезвычайно важно понимать отличия и взаимосвязь между инструментами оценки качества информации и управления данными. Материал, который мы Вам предлагаем, поможет сделать правильный выбор из многочисленных инструментов для оценки качества данных. (Материал опубликован на английском языке)
Processes for computing raw data quality scores for base-level metrics can then feed different levels of metrics using different views to address the scorecard needs of various groups across the organization. Ultimately, this drives the description, definition and management of base-level and complex data quality metrics so that:
  • Scorecard relevancy is based on a hierarchical rollup of metrics.
  • The definition of the metrics is separated from its context, thereby allowing
  • the same measurement to be used in different contexts with different validity
  • thresholds and weights.
  • Appropriate reporting can be generated based on the level of detail expected
  • for the data consumer’s specific role and accountability.

Defining Relevant Metrics

Good data quality metrics exhibit certain characteristics. Defining metrics that share these characteristics will lead to a data quality scorecard that is meaningful:
  • Business Relevance. The metric must be defined within a business context that explains how its score relates to improved business performance.
  • Measurable. The metric must have a process that quantifies it within a discrete range.
  • Controllable. The metric must reflect a controllable aspect of the busines process so that when the measurement is not in a desirable range, some action to improve the data should be triggered.
  • Reportable. The metric’s definition should provide the right level of information to the data steward when the measured value is not acceptable.
  • Traceable. Documenting a time series of reported measurements must provide insight into the result of  improvement efforts over time as well as support statistical process control.
In addition, recognizing that reporting a metric summarizes its value, the scorecarding environment should also provide the ability to reveal underlying data that contributed to a particular metric score. Reviewing data quality measurements and evaluating the data instances that contributed to any unsatisfactory scores suggest the need to be able to drill down into the performance metric. This provides a better understanding of any existing or emergent patterns contributing to a poor score, an assessment of the impact, and help in root-cause analysis.

Source: sas.com