Timeliness Of Spp Payments At Smk Tritech Infomatika Using Naive Bayes Algorithm

ABSTRACT


INTRODUCTION
SMK Tritech Informatika is one of the private educational institutions that focuses on vocational education.The cost of education is one of the supporting inputs for the implementation of education.This cost plays a very important role in achieving good education.In this case, one of the education costs found at Smk Tritech Informatika is the student fee that must be paid every month or better known as the Education Development Contribution (SPP).This SPP fee is generally applied by private institutions / schools which are charged to each student.Because private schools in the management of education are charged to the community or local policy.In contrast to public schools whose costs are generally borne by the government.So in this case the school charges tuition fees to student guardians.for each month.Which aims for the continuity of education atSmk Tritech Informatika [1] [2].
In this case, the problem that arises regarding tuition payments is if students are late in paying tuition fees from the specified time [3][4].This is a problem because tuition payments are one of the important factors in providing good service quality for the school.Quoting data from interviews with school caregivers, that students who are late in paying tuition fees [5] [6].Generally, the factors that cause the above problems are the economic factors of income of parents or guardians of students, and also the habit of students who slow down payments even though they have been given money by their parents, and often students use money that should be used to pay for the pesantren, used for their personal expenses [7][8].This is a difficult problem because many students are late in paying their tuition fees, causing the school to have less income.Meanwhile, these costs are needed for the continuity of school education, such as paying teacher salaries and paying for school needs [9].So there needs to be a solution to this problem by classifying payments based on the level of timeliness of payment.So that it can be an evaluation material for the school to increase tuition payments at the specified time [10] [11].The purpose of this research is to classify tuition payments based on the timeliness of payments using the Naive Bayes algorithm.The Naive Bayes algorithm is one of the Top 10 Algorithms in Data Mining which was published in December 2006 by the IEEE International Conference on Data Mining [12] [13].Naïve bayes merupakan algoritma dalam melakukan yang berakar dari teorema probabilitas Bayes.This algorithm is often used in machine learning for classification, namely predicting the category or class of a sample based on its features.The Naive Bayes algorithm assumes that each feature is conditionally independent of its class [14] [15].Although these assumptions are often not completely true in real-world contexts, the Naive Bayes algorithm remains quite simple, fast, and often quite effective.Naïve Bayes is included in the subfield of data mining so that many cases in data mining use the Naïve Bayes algorithm.Data mining is the process of collecting information and patterns that can be used as needed on large data sets.So this research will utilize the data mining process and the Naïve Bayes algorithm [16] [17].

Data Mining
Data mining is a computational process for discovering patterns in large data sets.An overview of the data mining process as knowledge discovery in databases includes: the use of algorithms, statistical tools, and machine learning to extract previously unknown patterns [18].Data mining supports the data analysis process by identifying clusters, detecting anomalies, discovering dependencies, and finding correlations [19].

Classification
Classification is also one of the data mining methods or techniques [20][21].The definition of classification is a job of assessing data objects that aim to be included in certain classes from a number of available classes.In classification there are two main jobs, namely the construction of a model as a prototype to be stored as memory and the use of the model is used for recognition/classification/prediction on another data object so that it is known which class the data object is stored in the model that has been built [22].In the classification method there are several phases of completion, starting from training data and ending with the data testing process so that an accurate decision is produced.The following is a picture of the solution flow of the Classification method [23].

Figure 1. Classification Method Solving Flow 2.3. Payment of tuition fees
SPP payment is a monthly payment or can also be interpreted as payment of the operational costs of the institution / education that must be paid monthly.SPP can be interpreted as an educational development contribution paid by students in schools.The purpose of the SPP is so that the institution/education can finance educational operations and also finance educational facilities so that the institution/school can carry out better learning activities.

Naive Bayaes Algorithm
Algorithms are techniques for organising stages to solve problems in the form of sentences with a limited number of words, arranged logically and systematically.Algorithms are also often defined as a procedure for solving problems using certain steps and limited in number.The accuracy test used in the Naive Bayes Algorithm generally uses the Confusion Matrix method with the following formula [24].The following is an explanation of the confusion matrix formula table: • Precision is used to measure how large the proportion of positive classes that are successfully predicted correctly from the entire postive class, which is calculated using the formula : • Recall is used to show the percentage of positive data classes that are successfully predicted correctly from all positive class data, which is calculated using the formula: • Accuracy is the sum of the ratio of correct data to the total amount of data.

RapidMiner
RapidMiner is software used for knowledge discovery that has approximately 400 data mining operators, including input operators, output operators, data preprocessing and visualisation.Another definition of RapidMiner is data processing software using data mining principles.RapidMiner extracts patterns from large data sets by combining statistical methods, artificial intelligence and databases.RapidMiner can also be defined as tools used in techniques that are in the environment of machine learning, data mining, text mining, and predictive analytics [25].The advantage of RapidMiner in data mining processing is that it is very easy to use to calculate a lot of data by using operators.This operator functions to modify data, where the data is connected to the nodes on the operator, then the user only needs to connect to the node to see the results.The results that RapidMiner displays visually with a graph.With these advantages, making RapidMiner the software of choice for extracting data with data mining methods [26].

Research Flow
The research flow used in this study will be explained in the following figure:  4. Data Selection, selecting data that has been collected to be used as a data mining process separated from operational data or datasets.5. Preprocessing/Cleaning, discarding unused data and also at this stage includes checking the data and correcting the data if there are errors, such as typographical errors.8. Data mining implementation, in this phase data analysis is carried out using the Naive Bayes Classifier Algorithm.So as to produce a model and level of accuracy in the application of the algorithm.9. Interpretation/Evaluation, in the last phase carried out is the process of forming conclusions from the results that have been obtained.

Testing Results
In the results of data testing there are several outputs generated by RapidMiner software, namely: Figure 3.It can be explained that the Naïve Bayes calculation process using Rapidminer tools is first carried out by importing training data and testing data, then the data is connected to the tools operators in RapidMiner as shown above.Next, the run process is carried out.

Accuracy Level of Algorithm Model Implementation
The method of testing the accuracy level used is the confusion matrix method which consists of precission, recall, and accuracy.Confusion matrix testing for testing data processed using Ms. Excel tools for accuracy values can be seen in the following table:

Figure 2 .
Figure 2. Research FlowBased on the picture above, the research flow can be explained as follows:1.Identify the problem, determine the background of the problem, parameters and solution of the problem 2. Data collection, collecting by using observation, interviews and literature studies.So as to produce SPP payment data provided by the cottage both primary data and secondary data and also literature related to SPP. 3. discussion in this research.

Figure 3 .
Figure 3. Rapidminer Naive Bayes Operator Configuration ProcessAfter running, the classification results will appear on the example set data.The following are the results of the naive bayes calculation using RapidMiner tools.Table6.Classification results

Figure 4 .
Figure 4. Results of Rapidminer Tools In Figure 4. is the result of testing with RapidMiner tools with the number of True Positive is 11 data classified as pred.TEPAT and class TEPAT, False Positive is 6 data classified as pred.TEPAT but class TERLAMBAT, True Negative is 3 data classified as pred.SLOWEST and class SLOWEST, and False Negative as much as 2 data is classified as pred.SLOWEST but class FAST.Therefore, it can be concluded that the classification results of tuition payment data using RapidMiner and Ms. Excel tools are the SAME.

Table 1 .
Confusion Matrix Formula

Table 2 .
Primary data of tuition payment

Table 3 .
Secondary Data of Tuition Payment

Table 4 .
Data Processing Results In this phase, the process of transforming data forms that do not have clear entities into valid or ready data forms for the Data Mining process is carried out.

Table 5 .
Training Data

Table 6 . Classification results
The results of the SPP payment data classification process with Ms. Excel tools and RapidMiner tools are the SAME.

Table 7 .
Testing Results Confusion Matrix tools Ms. Excel