Skip to main content

Posts

Showing posts with the label latency

Featured Post

Data Mining with Weka -Installation

Weka - Data mining Tool W eka is a tool for big data and data mining. It is used to various classification, experiments, and analysis over large data sets. Installation Guide -weka  You can download Weka from   here   and follow the normal installation procedure. After completion you will get following window, here you can begin your classification or experiment on different data sets with Weka.

Twitter Strom - Analysis of Distributed System

Analysis of Distributed System -Twitter Strom Storm is a real time distributed stream data processing system with fault-tolerant capabilities. Storm is used on today’s one of the most popular social networking site Twitter to run various critical computations at large scale, and in real-time. Storm has simple distribute system architecture and built with various features like fault tolerance, latency, efficient and easy human intervention etc. and can be used on very complex distribute stream data processing. It uses very simple topology for various functions like (tweet) word count  in one of the most popular social networking site Twitter. Strom uses Nimbus and Zookeeper for internal architecture which handles supervision and worker message flow architecture activities. Furthermore, a popular method for generating Storm topologies at Twitter is by using Summingbird. Summingbird is a general stream processing abstraction, which provides a separate logical planner tha

Yahoo ! S4 - Analysis of Distributed System

Analysis of Distributed System -Yahoo ! S4 S4 is designed on the context of the search engine (Yahoo! Search Engine) which supports data mining and machine learning algorithms, instigate on MapReduce model. So, it makes possible to parallelize and distribute batch processing tasks and operations in immense clusters without less or no human intervention over issues like failover management. It is low latency scalable stream processing engine which streams the event flow at given data rate automatically. Unlike Hadoop (the popular batch processing system), S4 works based on MapReduce (stream processing system typically operate on static data by scheduling batch jobs). On the contrary, it needs segment partitioning of the input data in fixed sized segments to be processed by MapReduce platform where latency is proportional to length of the segment plus overhead requirement for segmentation and initiates processing jobs; apparently it’s a tradeoff between latency and segmenta