Download Print this page

Dell PowerEdge C5220 Test Report page 4

Hide thumbs Also See for PowerEdge C5220:

Advertisement

WHAT WE TESTED
Hadoop
MapReduce
Dell PowerEdge C5220: Hadoop MapReduce Performance
Reuse or repurpose servers easily when workloads change with hot-swap
server nodes – you no longer need to experience downtime by replacing the
entire server chassis.
Designed with power-efficiency and maintainability in, the Dell PowerEdge
C5220 maximizes operating efficiency with a shared-infrastructure design. To learn
more about the Dell PowerEdge C5220 and the entire Dell PowerEdge C Series, visit
http://www.dell.com/us/enterprise/p/poweredge-cloud-servers.
To test the ability of the PowerEdge C5220 microserver to handle large data
processing tasks, we used Hadoop, specifically Cloudera Distribution Including Apache
Hadoop (CDH). Below, we briefly discuss Hadoop and the benchmark tool we used,
MapReduce benchmark (mrbench).
Hadoop, developed by Apache Software Foundation, is an open-source
distributed application that enables the analysis of large volumes of data for specific
purposes. Using Hadoop's framework, IT organizations and researchers can build
applications that tailor the data analysis to specific needs for each company, even using
unstructured data. Many different markets—among them finance, IT, and retail—use
Hadoop due to its ability to handle heterogeneous data, both structured and
unstructured.
Hadoop can run across any number of machines using varied hardware,
spreading data across all available hardware resources using a distributed file system,
Hadoop Distributed File System (HDFS), and replicating data to minimize loss if a
hardware malfunction occurs. The software is able to detect hardware failures, and to
work around said failures to allow uninterrupted access to data. Because of its ability to
run on different hardware, a Hadoop cluster is scalable and flexible – it can be expanded
to encompass growing databases and companies. It is also cost-effective as it allows
companies to utilize commodity hardware effectively.
MapReduce is a framework within Hadoop that provides the ability to process
extremely large datasets in parallel across the Hadoop cluster, shortening the overall
processing time greatly. MapReduce breaks input data down into chunks to be
processed across the Hadoop cluster. When an application is run on a Hadoop cluster,
MapReduce perfoms "map" tasks that process data in parallel. The data is then sent to
"reduce" tasks that reduce the information into a final result. This allows for faster data
processing using multiple nodes, while still producing a single, comprehensive, accurate
result.
A Principled Technologies test report 4

Advertisement

loading