Skip to main content

Posts

Showing posts with the label S4

Featured Post

Data Mining with Weka -Installation

Weka - Data mining Tool W eka is a tool for big data and data mining. It is used to various classification, experiments, and analysis over large data sets. Installation Guide -weka  You can download Weka from   here   and follow the normal installation procedure. After completion you will get following window, here you can begin your classification or experiment on different data sets with Weka.

Yahoo ! S4 - Analysis of Distributed System

Analysis of Distributed System -Yahoo ! S4 S4 is designed on the context of the search engine (Yahoo! Search Engine) which supports data mining and machine learning algorithms, instigate on MapReduce model. So, it makes possible to parallelize and distribute batch processing tasks and operations in immense clusters without less or no human intervention over issues like failover management. It is low latency scalable stream processing engine which streams the event flow at given data rate automatically. Unlike Hadoop (the popular batch processing system), S4 works based on MapReduce (stream processing system typically operate on static data by scheduling batch jobs). On the contrary, it needs segment partitioning of the input data in fixed sized segments to be processed by MapReduce platform where latency is proportional to length of the segment plus overhead requirement for segmentation and initiates processing jobs; apparently it’s a tradeoff between latency and segmenta