Weka - Data mining Tool W eka is a tool for big data and data mining. It is used to various classification, experiments, and analysis over large data sets. Installation Guide -weka You can download Weka from here and follow the normal installation procedure. After completion you will get following window, here you can begin your classification or experiment on different data sets with Weka.
Analysis of Distributed System -Twitter Strom
Storm is a real time distributed stream
data processing system with fault-tolerant capabilities. Storm is used on
today’s one of the most popular social networking site Twitter to run various
critical computations at large scale, and in real-time.
Storm has simple distribute system
architecture and built with various features like fault tolerance, latency, efficient
and easy human intervention etc. and can be used on very complex distribute
stream data processing. It uses very simple topology for various functions like
(tweet) word count in one of the most popular social
networking site Twitter.
Strom uses Nimbus and Zookeeper for
internal architecture which handles supervision and worker message flow
architecture activities.
Furthermore, a popular method for generating Storm topologies at Twitter is by using Summingbird. Summingbird is a general stream
processing abstraction, which provides a separate logical planner that can map
to a variety of stream processing and batch processing systems.
Storm is a critical infrastructure at Twitter
that powers many of the real-time data-driven decisions that are made at
Twitter. The use of Storm at social networking site Twitter is expanding
rapidly, and raises a number of potentially interesting directions for future
work.
These include automatically optimizing the topology (intra-bolt
parallelism and the packaging of tasks in executors) statically, and
re-optimizing dynamically at runtime, without incurring a big performance
impact. In addition, It wants to improve the visualization tools, improve the
reliability of certain parts (e.g. move the state stored in local disk on
Nimbus to a more fault tolerant system like HDFS), provide a better integration
of Storm with Hadoop, and potentially use Storm to monitor, react, and adapt
itself to improve the configuration of running topologies.
Another interesting direction of
upcoming work of Strom might be support a declarative query paradigm for Storm
that still allows easy extensiblility.
Comments
Post a Comment