Total Pageviews

Thursday, March 8, 2012

Data Mining KB

  1. Setting Up Oracle Data Miner 11g Release 2
  2. Using Oracle Data Miner 11g Release 2
  3. ODM 11gR2–Attribute Importance

    Data Mining

    Data mining uses large quantities of data to create models. These models can provide insights that are revealing, significant, and valuable. For example, data mining can be used to:
    • Predict those customers likely to change service providers.
    • Discover the factors involved with a disease.
    • Identify fraudulent behavior.
    Data mining is not restricted to solving business problems. For example, data mining can be used in the life sciences to discover gene and protein targets and to identify leads for new drugs.
    Oracle Data Mining performs data mining in the Oracle Database. Oracle Data Mining does not require data movement between the database and an external mining server, thereby eliminating redundancy, improving efficient data storage and processing, ensuring that up-to-date data is used, and maintaining data security.
    For detailed information about Oracle Data Mining, see Oracle Data Mining Concepts.

    Oracle Data Mining Functionality

    Oracle Data Mining supports the major data mining functions. There is at least one algorithm for each data mining function.
    Oracle Data Mining supports the following data mining functions:
    • Classification: Grouping items into discrete classes and predicting which class an item belongs to; classification algorithms are Decision Tree, Naive Bayes, Generalized Linear Models (Binary Logistic Regression), and Support Vector Machines.
    • Regression: Approximating and predicting continuous numerical values; the algorithms for regression are Support Vector Machines and Generalized Linear Models (Multivariate Linear Regression).
    • Anomaly Detection: Detecting anomalous cases, such as fraud and intrusions; the algorithm for anomaly detection is one-class Support Vector Machines.
    • Attribute Importance: Identifying the attributes that have the strongest relationships with the target attribute (for example, customers likely to churn); the algorithm for attribute importance is Minimum Descriptor Length.
    • Clustering: Finding natural groupings in the data that are often used for identifying customer segments; the algorithms for clustering are k-Means and O-Cluster.
    • Associations: Analyzing "market baskets", items that are likely to be purchased together; the algorithm for associations is a priori.
    • Feature Extraction: Creating new attributes (features) as a combination of the original attributes; the algorithm for feature extraction is Non-Negative Matrix Factorization.
    In addition to mining structured data, ODM permits mining of text data (such as police reports, customer comments, or physician's notes) or spatial data.

    Oracle Data Mining Interfaces

    Oracle Data Mining APIs provide extensive support for building applications that automate the extraction and dissemination of data mining insights.
    Data mining activities such as model building, testing, and scoring are accomplished through a PL/SQL API, a Java API, and SQL Data Mining functions. The Java API is compliant with the data mining standard JSR 73. The Java API and the PL/SQL API are fully interoperable.
    Oracle Data Mining allows the creation of a supermodel, that is, a model that contains the instructions for its own data preparation. The embedded data preparation can be implemented automatically and/or manually. Embedded Data Preparation supports user-specified data transformations; Automatic Data Preparation supports algorithm-required data preparation, such as binning, normalization, and outlier treatment.
    SQL Data Mining functions support the scoring of classification, regression, clustering, and feature extraction models. Within the context of standard SQL statements, pre-created models can be applied to new data and the results returned for further processing, just like any other SQL query.
    Predictive Analytics automates the process of data mining. Without user intervention, Predictive Analytics routines manage data preparation, algorithm selection, model building, and model scoring so that the user can benefit from data mining without having to be a data mining expert.
    ODM programmatic interfaces include
    • Data mining functions in Oracle SQL for high performance scoring of data
    • DBMS_DATA_MINING PL/SQL packages for model creation, description, analysis, and deployment
    • DBMS_DATA_MINING_TRANSFORM PL/SQL package for transformations required for data mining
    • Java interface based on the Java Data Mining standard for model creation, description, analysis, and deployment
    • DBMS_PREDICTIVE_ANALYTICS PL/SQL package supports the following procedures:
      • EXPLAIN - Ranks attributes in order of influence in explaining a target column
      • PREDICT - Predicts the value of a target column
      • PROFILE - Creates segments and rules that identify the records that have the same target value

      Reference Oracle.com
OLAP tools


OLAP tools are widely used to analyze information from different perspectives and provide functions like drill-down, slice-and-dice and the vendors that sell OLAP tools promise a very high performance. In the table below a list of the OLAP & reporting tools is presented which are included in our 100% vendor independent BI & OLAP tool survey 2012, a comparison made on 103 criteria.

OLAP, ROLAP & MOLAP

OLAP is an abbreviation for On-line Analytical Processing and contains a multidimensional or relational datastore designed to provide quick access to pre-summarized data & multidimensional analysis. There are three flavours:

MOLAP: Multidimensional OLAP – enabling OLAP by provding cubes.

ROLAP: Relational OLAP – enabling OLAP using a relational database management system.

DOLAP: Desktop OLAP – enabling OLAP functionality on the local computer (simulation); DOLAP products do not contain an OLAP server.

List of reporting & OLAP tools

The following products which include OLAP tools were thoroughly examined on 103 criteria considered important for high productivity and reporting systems that actually add value to your organization. The OLAP tools are listed in random order.

Reporting & OLAP tools Version Vendor OLAP included?

Oracle Enterprise BI Server 11g1 Oracle
Microsoft BI & OLAP tools 2008/2010 Microsoft
IBM Cognos Series 10 10.1 IBM
QlikView 11 QlikTech
Board Management IntelligenceToolkit 7.1 Board International
Oracle Hyperion System 9 Oracle
SAP NetWeaver BI 7.3 SAP
Microstrategy 9 Microstrategy
SAP Business Objects Enterprise XI r4 SAP
SAS Enterprise BI Server 9.2 SAS Institute
BizzScore Suite 7.3 EFM Software
WebFOCUS 8 Information Builders
JasperSoft (open source) 4.5 JasperSoft
Style Intelligence 11 InetSoft
Pentaho BI suite (open source) 4 Pentaho
Tableau Software 6.1 Tableau Software

Links

http://apandre.wordpress.com/tools/comparison/

http://qlikviewvsolap.blogspot.com/2009/06/qlikview-vs-sas.html

http://mydatamine.com/?p=751

http://www.the-data-mine.com/Software/MostPopularDataMiningSoftware

http://www.salford-systems.com/doc/elder.pdf

http://www.cs.uvm.edu/~icdm/algorithms/10Algorithms-08.pdf

No comments:

Post a Comment