Data Mining - Bounding the Definition



Generally, data mining (sometimes called knowledge or data discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among many fields in large relational databases. The realm of data mining is very broad and the topic of hundreds of papers. The goal here is to acknowledge the breadth of activities that data mining encompasses and to define the specific subset of activities that concern this paper.
Large-scale information technology has been evolving separate transaction and analytical systems. Data mining provides the link between the two. Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries. Several types of analytical software are available: statistical, machine learning, and neural networks.

The purist's definition of data mining is the use of mathematical models to discover previously unknown patterns or trends in data that can then be used to make predictions. Broader definitions of data mining include the traditional online analytical processing (OLAP), where people ask specific questions of data and then get specific answers or possibly discover new information through canned reports or other means.

To show how broad the term "data mining" is, consider the two examples given below. The first is a more physical world view of "data mining" as expressed in the following news article statement:

"One hopes the audience knows the perils of discarding storage with sensitive data but this article drives home the point. Two MIT grad students bought used drives from eBay and secondhand computer stores. Among the data found on the 158 drives were 5,000 credit-card numbers, porn, love-letters and medical information."

A second example of data mining involving the use of sophisticated data mining tools by a Midwest grocery chain. They used their data mining capacity to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered information in various ways to increase revenue. For example, they could move the beer display closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays.
Both of the above are valid examples of data mining. Clearly there is a wide range of activities "between" these examples that can also be properly characterized as data mining. For reasons based on the data sources neither of the cases are likely to be useful for the Equipment Asset Management community. For utilities the newer customer relationship management systems may be potential targets for discovery analysis with neural network or decision tree type data mining applications. In both cases though, take note that some skilled and experienced personnel were needed to extract the "value" from the information discovered.

For this paper, we will define data mining as information processing and OLAP. For Equipment Asset Management, data mining involves two processes, "exploring" data to determine what is available and useful, and "applying" the results of the exploration, taking what is available and useful, and creating information delivery applications for routine use (i.e. the information a decision maker needs arrives when needed).

Many writers on this topic make the point of having specific information goals for your data mining efforts. The second point often made is the need for good quality data, referred to by one writer as "cleaned and pressed" data. The preparation of raw data for use in data mining is labor intensive. To obtain value from data mining the users must address, among other issues, incorrectly entered data, duplicate data (especially in transactional systems) and situations where the "data keys" are inconsistent.

The grocery chain in the example above was able to discern the useful "pattern" from the large number of transactions documented at their stores. They were able to do this because all of their data is captured, coded and stored in a consistent manner and in a common data system. This situation is not typically found at most utilities.

For most power delivery companies, the issues are less about discovering unknown relationships or knowledge within the data and more about capturing the low hanging fruit based on already known relationships. Most already know that equipment aging will be tied not only to calendar time but also to several operational factors such as the following:
number of operations for load tap changers (LTC) or circuit breakers, run time for compressors or pumps, operation at or above maximum ratings and number and magnitude of faults or lightning strikes.

The cost of maintenance consists of the labor hours and materials consumed in the maintenance activities. Benefits produced by the maintenance spending can be seen in the performance of the subject equipment. Answers to other types of questions such as "Will we be able to perform all of the tasks that are required by our regulators in the required time frame?" should be available from the business data systems that support the operation of your business. So how do you get them?

The remainder of this paper will discuss data mining from the perspective of capitalizing on existing data systems. To do that, usually the quality of the data in these systems must be improved. The goal of this paper, as stated initially, is to discuss data mining as a means to leverage existing investments in information technology and business process improvement. The goal is to make effective use of available data while using the same tools and techniques to increase the value of existing data systems through the business improvement process.
The data mining examples presented will demonstrate various types of information that can be delivered to decision makers. These can include the following:

-Identify classes or predetermined groups (e.g. All of particular age manufacturer model of a type of equipment that has demonstrated a higher than desired failure rate).
-Identify clusters, or data items that are grouped according to logical relationships (e.g. All circuit breakers of a specific type that have not been operated within a specified time interval or a set of pumps that have been operated at near their NPSH limimt for some cumulative amount of time).
-Identify associations between events and the state of certain equipment and we can look for on going trends.

These information deliverables may directly relate to equipment and the operational stress it experiences. They could be related to the business processes that are associated with maintaining the equipment. The management of either can benefit from an improved decision support system based on focused data mining efforts.

Next: Equipment Asset Management Data Mining.
Back to Abstract.

Back to Services.