Preprocessing in data mining: data transformation


Data transformation

the data are transformed into forms appropriate for mining.

Data transformation tasks:.

  • Normalization
  • Attribute construction
  • Aggregation
  • Attribute Subset Selection
  • Descritization
  • Generalization


  1. Normalization:
    • the attribute data are scaled so as to fall within a small specified range, such as -1.0 to 1.0, 0.0 to 1.0.
  2. Attribute construction (or feature construction): new attributes are constructed and added from the given set of attributes to help the mining process.
  3. Aggregation: summary or aggregation operations are applied to the data. For example, the daily sales data may be aggregated so as to compute monthly and annual total amounts.
  4. Discretization: Dividing the range of a continuous attribute into intervalsFor example, values for numerical attributes, like age, may be mapped to higher-level concepts, like youth, middleaged, and senior.
  5. Generalization: Data Transformation – where low-level or “primitive” (raw) data are replaced by higher-level concepts through the use of concept hierarchies. For example, categorical attributes, like street, can be generalized to higher-level concepts, like city or country.

فاقد نظر