DATA MINING PROJECT
Step 1: Define Business
Objectives- This step is similar to any information system project. First of
all, determine whether a data mining solution is really needed. State the
objectives. Are we looking to improve our direct marketing campaigns? Do we
want to detect fraud in credit card usage? Are we looking for associations between
products that sell together? In this step, define expectations. Express how the
final results will be presented and used.
Step 2: Prepare Data- This step consists of data
selection, preprocessing of data, and data transformation.
Select the data to be extracted from the data warehouse. Use the business
objectives to determine what data has to be selected. Include appropriate
metadata about the selected data. Select the appropriate data mining technique(s)
and algorithm(s). The mining algorithm has a bearing on data selection.
Unless the data is
extracted from the data warehouse, when it is assumed that the data is already
cleansed, pre-processing may be required to cleanse the data. Preprocessing
could also involve enriching the selected data with external data. In the preprocessing
sub-step, remove noisy data, that is, data blatantly out of range. Also ensure
that there are no missing values.
Rubber Internal Mixer
Nigerian politics read more opposition apc government
sbobet
SATTA MATKA
Manage your finance with best USA cash advance lender, which provide online cash advance loans Long term and low fee. Call: +1 855 243 8701
Step 3: Perform Data
Mining- Obviously, this is the crucial step. The knowledge discovery
engine applies the selected algorithm to the prepared data. The output from this
step is a set of relationships or patterns. However, this step and the next
step of evaluation may be performed in an iterative manner. After an initial
evaluation, there may be need to adjust the data and redo this step. The
duration and intensity of this step depend on the type of data mining
application. If the database is being segmented not too many iterations are
needed. If a predictive model is being created, the models are repeatedly set
up and tested with sample data before testing with the real database.
Step 4: Evaluate Results- The aim is to discover
interesting patterns or relationships that help in the understanding of
customers, products, profits, and markets. In the selected data, there are
potentially many patterns or relationships. In this step, all the resulting
patterns are examined, and a filtering mechanism is applied so as to select
only the promising patterns for presentation and use.
Step 5: Present
Discoveries- Presentation of patterns / associations discovered may be in the
form of visual navigation, charts, graphs, or free-form texts. Presentation
also includes storing of interesting discoveries in the knowledge base for repeated
use.
Step 6: Ensure Usage of
Discoveries- The goal of any data mining operation is to understand the
business, discern new patterns and possibilities, and also turn this understanding
into actions. This step is for using the results to create actionable items in
the business. The results of the discovery are disseminated so that action can
be taken to improve the business.
Selecting Data Mining
Software Tools
Before we get into a
detailed list of criteria for selecting data mining tools, let us make a few general but
important observations about tool selection.
• The tool must be able to integrate well with the data warehouse
environment by accepting data from the warehouse and be compatible with the
overall metadata framework.
• The patterns and relationships discovered must be as accurate as
possible. Discovering erratic patterns is more dangerous than not discovering
any patterns at all.
• In most cases, an explanation for the working of the model and
how the results were produced is required. The tool must be able to explain the
rules and how patterns were discovered.
Let us now analyse a list
of criteria for evaluating data mining tools. The list is by no means
exhaustive, but it covers the essential points.
Data Access: The data mining tool must be able to access data sources
including the data warehouse and quickly bring over the required datasets to
its environment. On many occasions data from other sources may be needed to
augment the data extracted from the data warehouse. The tool must be capable of
reading other data sources and input formats.
Data Selection: While selecting and extracting data for mining, the tool must be
able to perform its operations according to a variety of criteria. Selection abilities
must include filtering out of unwanted data and deriving new data items from
existing ones.
Sensitivity to Data
Quality: Because of its
importance, data quality is worth mentioning again. The tool must be able to
recognize missing or incomplete data and compensate for the problem. The tool
must also be able to produce error reports.
Data Visualization: Data mining techniques process substantial data volumes and produce
a wide range of results. Inability to display results graphically and diagrammatically
diminishes the value of the tool severely.
Extensibility: The tool architecture must be able to integrate with the data warehouse
administration and other functions such as data extraction and metadata management.
Performance: The tool must provide consistent performance irrespective of the
amount of data to be mined, the specific algorithm applied, the number of variables
specified, and the level of accuracy demanded.
Scalability: Data mining needs to work with large volumes of data to discover
meaningful and useful patterns and relationships. Therefore, ensure that the
tool scales up to handle huge
data volumes.
Openness: This is a desirable feature. Openness refers to being able to
integrate with the environment and other types of tools. The ability of the
tool to connect to external applications where users could gain access to data
mining algorithms from the applications, is desirable. The tool must be able to
share the output with desktop tools such as graphical displays, spreadsheets,
and database utilities. The feature of openness must also include availability
of the tool on leading server platforms.
Suite of Algorithms: A tool that provides several different algorithms rather than one
that supports only a few data mining algorithms, is more advantageous.
Multi-technique Support: A data mining tool supporting more than one technique is worth
consideration. The organization may not presently need a composite tool with many
techniques, but a multi-technique tool opens up more possibilities. Moreover, many
data mining analysts desire to cross-validate discovered patterns using several
techniques.
0 comments:
Post a Comment