Researching, writing, and reading in LIS: Data mining and its applications

Maruja De Villa Lorica
Paper written in June, 2010

Introduction

The vast amount of information brings both challenges and opportunities to the library and information science profession. According to Chen (2004), data mining, which refers to the search for valuable information in large volumes of data, is one of the answers to surmount this challenge.

Definition of Data mining

Stair and Reynolds (2003, p.121) define data mining as an information analysis tool that involved automated discovery of patterns and relationship in a data warehouse. It makes use of advanced statistical techniques and machine learning to discover facts in a large database, including databases in the Internet. Also known as knowledge discovery in database (KDD), Suh, Park, and Jeon (2010) describe data mining as the process to identify hidden knowledge, unknown patterns, and new rules from large databases that are potentially useful and ultimately understandable for making crucial decisions.

According to Rob and Coronel (2007, p. 527), data mining tools analyze the data, uncover problems and opportunities hidden in data relationships, form computer models based on their findings, and then use the models to predict business behavior-requiring minimum end user intervention. End-users utilize the system’s finding to gain knowledge that may yield competitive advantages.

Lloyd-Williams (1997) enumerated the four major stages of data mining as follows: selection, pre-processing, data mining, and interpretation. Selection entails the creation of target data set that will undergo analysis. Pre-processing refers to the preparation of dataset for analysis by the data mining software. This involves resolving undesirable data characteristics such as missing data (non-complete fields), irrelevant fields, non-variant fields, skewed fields, and outlying data points. Data mining involves subjecting the cleaned data to analysis by the data mining software to identify hidden trends or patterns, or to test specific hypotheses. The last stage, interpretation is the analysis and interpretation of the results produced.

Applications of Data mining

Data mining extracts patterns, trends, and rules from data warehouses to evaluate and predict proposed business strategies. The system’s findings create knowledge that could improve competitiveness, improve profits, and transform business process. In the field of marketing, data mining is used to improve customer retention, cross-selling opportunities, campaign management, and customer segmentation analysis, among others (Stair and Reynolds, 2003, p.121). In the financial sector, banks and credit card companies use knowledge-based analysis to detect fraud, thereby decreasing fraudulent transactions (Rob and Coronel, 2007, p. 528).

Data mining can also be used for predictive analysis. Predictive analysis is a form of data mining that combines historical data with assumptions about future conditions to predict outcomes of events such as future sales or the probability that a customer will default on a loan (Stair and Reynolds, 2003, p. 122). In South Korea, the research team of Suh, et al. (2010) applied text and data mining techniques to forecast the trend of petitions filed to e-People. They used data mining techniques to decrease time-consuming tasks of reading and classifying a large number of petitions, and to increase accuracy in evaluating the trend of petitions. Their results contributed in helping petition inspectors to give more attention on detection and tracking important groups of petitions that could become potential problems. Further, the trend values created by data mining were used as the baseline for making better decisions.

Chen (2005) used data mining techniques to provide personalized book recommendation for library users. Readers borrowing history records served as the source for data mining. In the data mining process, the degree of similarity of readers’ borrowing history and the association levels of some attributes were analyzed to construct a decision tree. A ‘high’ association levels in the decision tree served as basis to provide useful information to recommend the most adaptive book(s) for the reader.

References

Chen, C. C. (2005, September). Using Data Mining Techniques to Discover Personalized Book Recommendation for the Library. Journal of Educational Media & Library Sciences, 43(1), 87-107. Retrieved June 14, 2010, from University of North Texas Electronic Resources, Library Literature & Information Science Full Text database.

Lloyd-Williams, M. (1997). Discovering the hidden secrets in your data - the data mining approach to information. Information Research, 3(2). Retrieved June 14, 20101 from http://informationr.net/ir/3-2/paper36.html

Rob, P., & Coronel, C. (2007). Database design: Implementation and management (7th ed.). Boston, MA: Thomson.

Stair, R. M., & Reynolds, G. W. (2003). Fundamentals of information systems (2nd ed). Boston, MA: Thomson.

Suh, J. H., Park, C. H., & Jeon, S. H. (2010). Applying text and data mining techniques to forecasting the trend of petitions filed to e-People. Expert Systems with Applications, 37(10), 7255-7268. Retrieved June 14, 2010 from the University of North Texas Electronic Resources, Library, Information Science & Technology Abstracts Full Text database.

Researching, writing, and reading in LIS

Monday, January 17, 2011

Data mining and its applications

No comments:

Post a Comment