In the traditional file processing approach that was used in business data processing for many years, each business application was designed to use one or more specialized data files containing only specific types of data records. However, the file processing approach finally became too cumbersome, costly and inflexible to supply the information needed to manage modern business, and as we shall soon see, was replaced by the database management approach. Despite traditional file processing approach apparent logic and simplicity, file processing systems had the following major problems:
Data Redundancy
- Independent data files included a lot of duplicated data.
- Duplicated data: the same data (such as a customer’s name and address) were recorded and stores in several files.
- This data redundancy caused problem when data had to be updated.
Lack of Data Integration
- Having data in independent files made it difficult to provide end users with information for ad hoc requests that required accessing data stored in several different files.
- Special computer programs had to be written to retrieve data from each independent file.
Data Dependence
- The organization of files, their physical locations and storage hardware, and the application software used to access those- depended on one another.
Lack of Data Integrity or Standardization
- Data elements to be defined differently by different end users and applications.
- For examples, end user 1 write address, end user 2 write addresses and end user 3 write add.
To solve the problems encountered with the file processing approach, the database management approach was conceived as the foundation of modern methods of managing organizational data. The database management approach consolidates data records, formerly held in separate files, into databases that can be accessed by many different application programs. In addition, a database management system (DBMS) serves as a software interface between users and databases, which helps users easily access the data in a database. Thus, database management involves the use of database management software to control how database are created, interrogated, and maintained to provide information needed by end users.
A database management system (DBMS) is the main software tool of the database management approach, because it controls the creation, maintenance, and use of the databases if an organization and its end users. The three major functions of a database management system are to create new databases and database applications, to maintain the quality of the data in an organization’s databases and to use the database of an organization to provide the information needed by its end users.
Examples of DBMS
- Microcomputer DBMS package - MS Access 2003
- Mainframe and server versions - Oracle Database 10g
- IBM DB2 UBD 8.2
- Microsoft SQL Server 2005
- Sybase ASE 15
- Open Source DBMS - MySQL 5.0
Therefore, nowadays business tend to use data warehouse system instead of traditional file in business data processing. A data warehouse is stores data that have been extracted from various operational, external, and other databases of an organization. It is a central source of the data that have been cleaned, transformed, and cataloged so they can be used by managers and other business professionals for data mining, online analytical processing, and other forms of business analysis, market research and decision support. Data warehouse may be subdivided into data marts, which hold subsets of data from the warehouse that focus on specific aspects of a company, such as a department or a business process.
The acquisition process of this system might include activities like consolidating data from several sources, filtering out unwanted data, correcting incorrect data, converting data to new data elements, or aggregating data into new data subsets.
These data are then stored in the enterprise data warehouse, from which they can be moved into data marts or to an analytical data store that holds data in a more useful form for certain types of analysis. Metadata (data that define the data in the data warehouse) are stored in a metadata repository and cataloged by a metadata directory. Finally, a variety of analytical software tools can be provided to query, report, mine and analyze the data for delivery via Internet and intranet Web systems to business end users.
One important characteristic about the data in a data warehouse is that, unlike a typical database in which changes can occur constantly, data in a data warehouse are static, which means that once the data are gathered up, formatted for stage, and stored in the data warehouse, they will never change. This restriction is so that queries can be made on the data to look for complex patterns or historical trends that might otherwise go unnoticed with dynamic data that change constantly as a result of new transactions and updates.
Data mining is a major use of data warehouse databases and the static data they contain. In data mining, the data in a data warehouse are analyzed to reveal hidden patterns and trends in historical business activity. This analysis can be used to help managers make decisions about strategic changes in business operations to gain competitive advantages in the marketplace.
Data mining can discover new correlations, patterns, and trends in vast amount of business data stored in data warehouses. Data mining software uses advanced pattern recognition algorithms as well as variety of mathematical and statistical techniques to sift through mountains of data to extract previously unknown strategic business information. For example, many companies use data mining to:
- Perform “market-basket analysis” to identify new product bundles.
- Find root causes of quality or manufacturing problems.
- Prevent customer attrition and acquire new customer.
- Cross-sell to existing customers.
- Profile customers with more accuracy.