Data mining challenges


Data mining is not a “one-off” exercise, to be done and finished with. Rather, it is an ongoing process; one examines a data set, identifies features of possible interest, discusses them with an expert, goes back to the data in the light of these discussion, and so on.

how to efficiently mine quality information from multiple data sources is a challenging task for current research, especially in the current big data era, because in real world applications, data stored in multiple places often have conflictions. The conflictions include:

  • data name conflictions: (a) the same object has different names in different data sources, or (b) two different objects from different data sources may have the same name;
  • data format conflictions: the same object in different data sources has different data types, domains, scales, and precisions;
  • data value confliction: The same object in different data sources records different values;
  • data sources confliction: different data sources have different database structures.

In order to overcome these conflictions, four effective approaches have been adopted to mine useful information and discover new knowledge from multiple data sources:

  • pattern analysis, which mining useful patterns and information from one data source or several data sources in accordance with changing conditions constraints or relationships;
  • multiple data source classification, which labels data sources according to a certain standard, then classifies them by their labels;
  • multiple data source clustering, which clusters data sources according to their similarities/distances;
  • multiple data source fusion, which combines data from multiple data sources to achieve higher accuracy and more specific ratiocinations.

 Based on these approaches, we can mine useful information from multiple data sources to discover new knowledge according to individual needs.

از مجموع 1 رأی

فاقد نظر