With the development of computer technology, the storage of data is doubling, but the development of mass data analysis method is difficult to catch up. Data Mining technology is born under the situation of such "data ocean, knowledge desert". The public department will be able to bring greater convenience to the population through the application of data mining technology and will receive the most direct feedback from governance.
The basic requirements of data exploration application are the establishment, link, circulation and integration of data in all kinds of databases. When the data used for exploration are identifiable personal data; when users of government databases or data exchange platforms can integrate, identify, and classify fragmented personal life records through mining technology for data profiling, data mining applications have become a powerful tool for improving the administrative efficiency of the public department. One of the most high-profile data-mining applications in the world today is a series of border-defense measures introduced by the United States Department of Homeland Security since September 11 attacks. These measures have been extended to include the requirements for European airlines in submission of passenger data. The Bureau of Labor Insurance Ministry of Labor also uses data mining technology to compare data. After the implementation of the new retirement system, the retirement reserve will be managed by the Bureau of Labor Insurance Ministry of Labor in a unified way. The long-standing underreporting of insurance wages by the labor insurance will be "present" because the pension and labor insurance fees will be remitted to the Bureau of Labor Insurance Ministry of Labor for management. By comparing the employer's insurance salary with the wage standard of the retirement reserve, all the underpaid amount will be exposed. Therefore, data mining analysis is a method and technology to find potential laws and extract useful knowledge from massive data. It can not only analyze existing problems, but also predict future trends, and the results of analysis are easy to understand and apply. Therefore, all fields pay attention to mining analysis.
Data mining is a new subject composed of computer technology, artificial intelligence technology and statistical technology. It adopts scientific methods in the fields of mathematics, statistics, artificial intelligence and neural network. By using the techniques of cluster analysis, association analysis and decision tree, the hidden, previously unknown and potentially valuable relationships, patterns and trends for decision-making are mined out from a large number of data, and the models for decision-making support are established with these knowledge and rules to provide methods, tools and processes for predictive decision-making support.
According to the "cross-industry data mining standard process", the data mining process can be divided into the following 6 steps:
- Business Understanding-Understand the requirements and ultimate objectives of the project from a business perspective and translate these objectives into plans and objectives for the use of data mining.
- Data Understanding-Extract relevant data from the database as required and evaluate available data.
- Data Preprocessing-Mainly process the extracted data, check the integrity and consistency of the data, fill in the missing data, process noise data and so on, to meet the modeling requirements.
- Modeling-Using data mining tools for modeling.
- Evaluating-Evaluate the model established and specifically examine whether the results are commercially acceptable.
- Deployment-Organizing findings and cognitive processes into readable text, i.e., writing data mining reports.