Weka - Data mining Tool W eka is a tool for big data and data mining. It is used to various classification, experiments, and analysis over large data sets. Installation Guide -weka You can download Weka from here and follow the normal installation procedure. After completion you will get following window, here you can begin your classification or experiment on different data sets with Weka.
Weka - Data Mining Tool
Experiment (Irish Dataset):
In this experiment I
am using Irish dataset and different algorithm to show classification using 10
fold cross-validation methods, there will be 10 repetitions on the processes to
determine the results.(Note: Use of all other datasets and algorithms is similar process). For beginners, start Weka and click on Experimenter Option
1
Using J48
Here I am using J48
algorithm to Irish datasets, the process is as follows:
There
are three panels starting with Setup Panel:
·
Click New to start new experiment
·
Click add new under datasets in order to add new
dataset i.e. Irish.arff
·
Click
add new under Algorithm in order to add new dataset i.e. J48
·
Experiment type is cross validation by
default
·
And Its classification by default
We can also choose other experiment types such as percentage
split etc., regression types and we can set the number of repetitions under the
Iteration control.
Run Panel
·
Click start to run the setup algorithm and
datasets
Furthermore, you can click stop to abort the process. This is the log window that contains starting
time as 23:14:09 i.e. after you click start and finished time 23:14:09 i.e. the
process is finished. And there are no errors and status as not running.
Overall, The J48 takes no time (negligible time span) and shows the experiment
(classification) results without any errors.
Analyse Panel:
·
Click Experiment to get last run experiment
·
Select
Options eg. standard deviations if you want on your result
·
Click Perform Test to get result summary or test
output
Moreover, you can found available result list under the
result lists and you also can save the output.
And significance is 0.05 or 5 percentage by default and v
represents significantly better result while * represents significantly worse
results based on value of significance.
Evaluating J48 on Iris dataset: We got average of 95.33 percent correct using
J48 on iris dataset. It’s a 10 fold
cross validation so, if we want to see individual result we can save result on
CVS file from setup panel.
Comments
Post a Comment