Concept of data-mining
The concept of data-mining first needs a basic understanding of what databases are and how we use databases in our everyday lives. Databases offer organizations the methods needed to store, manage, and efficiently retrieve information. “A database is an integrated collection of logically related data elements” (2009, O’Brien & Marakas, p.174). Databases consist of any number of tables. These database tables consist of columns and rows, just like excel tables. A database can be considered a collection of tables. It does not take much of a leap to go from an understanding of excel to database tables. Each row corresponds to a single record and each column represents a different attribute and in databases is referred to as a field. For example, we might have a database of names and phone numbers and the columns might be “FirstName”, “LastName”, and “TelephoneNumber”. Each row, or record, under the columns, or fields, would consist of the data planning to be stored. A database of 75 employees would consist of a table with 75 rows or records. Data may be logically organized into characters, fields, records, files, and databases, just as writing can be organized into letters, words, sentences, paragraphs, and documents (2009, O’Brien & Marakas, p.170). Data mining is used today primarily by companies with a strong customer focus. Extensive data aggregation is being done in order to make better decisions. Data mining is also called data mining. “One important characteristic about the data in a data warehouse is that, unlike a typical database in which changes can occur constantly, data in a data warehouse are static, which means that once the data are gathered up, formatted for storage, and stored in the data warehouse, they will never change” (2009, O’Brien & Marakas, p.192). Data warehouses are the areas in which data is pulled or mined in order to identify patterns which can then be evaluated and interpreted into business knowledge.
Advanced software programs with algorithms and advanced methods of interpretational and mathematical procedures and statistical techniques are used to go through all the data and extract information that may have previously been unknown in terms of strategic business information. Data mining is used in many different ways. One way is to determine product bundles with a method called market-basket analysis. Other uses include identifying manufacturing problems and helping determine quality issues in the manufacturing process. A popular use for data mining is to help acquire new customers and prevent customer attrition as well as cross-selling to existing customers. profiling customers with greater accuracy is also a valid choice for data mining.
Oracle data mining 10gR2
The Oracle in-Database Miner is a GUI interface to a software engine with a wide range of algorithms including anomaly detection, attribute importance, association rules, clustering, classification & regression, nonnegative matrix factorization, structured & unstructured data (text mining), and BLAST (Basic Logical Alignment Search Tool) which is a life sciences similarity search algorithm. Implementing BLAST will enable complex analytical pipelines within BLAST searches as well as allowing to subselect portions of the database using SQL, thereby restricting searches. The Oracle data mining 10gR2 also includes a spreadsheet add-in for predictive analytics, and a PL/SQL API & Java API for developing advanced analytical applications.
Oracle data mining provides a summary of statistical information prior to data mining. The GUI shows histograms and data summaries along with model performance and evaluation viewers. Graphical data helps users simplify activities and helps automate the data mining process.
An example of data mining is found in the use of decision trees. The problem is to find customers likely to buy a new car and to classify the profile of that person. The decision tree method within the data mining of the Oracle 10g R2 system for the automotive industry includes classification, prediction, and customer profiling. The following formula illustrates the process.
IF(Income >50KANDGender=FANDStatus >Single…), THENP(Buy Car=1)
in this example the result is a confidence level = .77 and support = 250.
The data mining process at Xerox involved a systematic series of steps that can be repeated as necessary. The process started with defining the problem and acquiring all the knowledge of the domain. Then the process moves to data and focuses on target data sets, data reprocessing, and data reduction. It is the assembling of relevant data sources and business processes and finding useful variables along with summarizing data with SQL. Then the process moves to the data mining task selection and the algorithm selection. This includes identifying clusters that describe behaviors and looking at which variables describe the problem as well as choosing statistical methods, decision trees, Bayesian nets, etc. Next in line is the data mining and the interpretation of results. The search for patterns and the discovering of knowledge occur within the data mining process and the explaining of mined patterns and quantifying of correlations begins. Rules can also be created. And finally, the deployment of knowledge occurs with tools and documentation, reports and proposals for business decisions and implementation, and the quantifying of benefits with rollout and feedback.
The companies that will succeed and flourish in the 21st century are the companies that will understand data mining and how it is utilized. The reward of finding new things that matter can be immeasurable. Corporate and personal financial benefits can be realized through data mining.
Berger, C. (2006). Sr. Dir. Product Management, Life & Health Sciences Industry & Data Mining Technologies Oracle Corporation.
O’Brien, J.A. & Marakas, G.M. (2009). Management Information Systems. McGraw-Hill Irwin. 9th ed.
Thieret, T. (2006). Principal Scientist, Imaging and Systems Technology Center Xerox Innovation Group. Webster, New York.