Skip to main content

Posts

Showing posts with the label big data

Featured Post

Data Mining with Weka -Installation

Weka - Data mining Tool W eka is a tool for big data and data mining. It is used to various classification, experiments, and analysis over large data sets. Installation Guide -weka  You can download Weka from   here   and follow the normal installation procedure. After completion you will get following window, here you can begin your classification or experiment on different data sets with Weka.

Data Mining With Weka - Experiment (Irish Dataset) & J48

Weka - Data Mining Tool   Experiment  (Irish Dataset): In this experiment I am using Irish dataset and different algorithm to show classification using 10 fold cross-validation methods, there will be 10 repetitions on the processes to determine the results. (Note: Use of all other datasets and algorithms is similar process).  For beginners, start Weka and click on Experimenter Option 1          Using J48      Here I am using J48 algorithm to Irish datasets, the process is as follows: There are three panels starting with Setup Panel: ·          Click New to start new experiment ·          Click add new under datasets in order to add new dataset i.e. Irish.arff ·            Click add new under Algorithm in order to add new dataset i.e. J48 ·      ...

Data Mining With Weka - Algorithms

Weka - Data Mining Tool  Algorithms: There are a lot of algorithm in weka for various classification and experiments and some the major and widely used are following : Decision tree(J48): NAME: weka.classifiers.trees.J48 SYNOPSIS: Class for generating a pruned or Un pruned  C4.5 decision tree. Naïve Bayes: NAME: weka.classifiers.bayes.NaiveBayes SYNOPSIS: Class for a Naive Bayes classifier using estimator classes. Numeric estimator precision values are chosen based on analysis of the training data. For this reason, the classifier is not an UpdateableClassifier (which in typical usage are initialized with zero training instances) KNN(IBK): NAME: weka.classifiers.lazy.IBk SYNOPSIS: K-nearest neighbours classifier. Can select appropriate value of K based on cross-validation. Can also do distance weighting. SVM(LibSVM):  NAME: weka.classifiers.functions.LibSVM SYNOPSIS: A wrapper class for the libsvm tools (the li...

Data Mining With Weka - Description of Datasets

Weka - Data Mining Tool Description of Datasets   Description of Adult dataset:   Name: Adult Number of instances: 32561 Number of attributes: 15 Description about attributes:  ·          Age: Type: Numeric,  Missing: 0, Distinct:73 ·          Workclass: Type: Nominal,  Missing: 0, Distinct:9 ·          Fnlwgt: Type: Numeric,  Missing: 0, Distinct:21648 ·          Education: Type: Nominal,  Missing: 0, Distinct:16 ·          Education-num: Type: Numeric,  Missing: 0, Distinct:16 ·          Marital-status: Type: Nominal,  Missing: 0, Distinct:7 ·          Occupation: Type: Nominal,  Missing: 0, Distinct:15 ·    ...

Data Mining With Weka -DataSets

Weka - Data Mining Tool Data Sets You can download datasets from  Here   (eg. Adult, Irish, Zoo etc used here) and save the datasets in to “.arff” format. All the datasets are in numeric value format so I changed it into Nominal values in order to process by algorithms used in Weka by following process:          Click Open file in order to open dataset file eg. Adult.arff          Select Choose > Filter > Unsupervised >attribute> NumericToNominal          Click All to apply change to all  attributes          Click Apply          See the Result > Type: Nominal         Click Save to save the result eg. Adult.arff Furthermore, we can found number of attribute as 15, and instances as 32562 and Relation as conversion to Nominal. Same Procedure applied to all data sets for eg: In irish , we can found ...

Data Mining with Weka -Installation

Weka - Data mining Tool W eka is a tool for big data and data mining. It is used to various classification, experiments, and analysis over large data sets. Installation Guide -weka  You can download Weka from   here   and follow the normal installation procedure. After completion you will get following window, here you can begin your classification or experiment on different data sets with Weka.

Yahoo ! S4 - Analysis of Distributed System

Analysis of Distributed System -Yahoo ! S4 S4 is designed on the context of the search engine (Yahoo! Search Engine) which supports data mining and machine learning algorithms, instigate on MapReduce model. So, it makes possible to parallelize and distribute batch processing tasks and operations in immense clusters without less or no human intervention over issues like failover management. It is low latency scalable stream processing engine which streams the event flow at given data rate automatically. Unlike Hadoop (the popular batch processing system), S4 works based on MapReduce (stream processing system typically operate on static data by scheduling batch jobs). On the contrary, it needs segment partitioning of the input data in fixed sized segments to be processed by MapReduce platform where latency is proportional to length of the segment plus overhead requirement for segmentation and initiates processing jobs; apparently it’s a tradeoff between latency and segmenta...