Dell PowerEdge C5220 Test Report page 4

Hide thumbs Also See for PowerEdge C5220:

Getting started with your system (169 pages)

Hardware owner's manual (124 pages)

Getting started manual (71 pages)

page of 13

/ 13
Bookmarks

WHAT WE TESTED

Hadoop

MapReduce

Dell PowerEdge C5220: Hadoop MapReduce Performance



Reuse or repurpose servers easily when workloads change with hot-swap

server nodes – you no longer need to experience downtime by replacing the

entire server chassis.

Designed with power-efficiency and maintainability in, the Dell PowerEdge

C5220 maximizes operating efficiency with a shared-infrastructure design. To learn

more about the Dell PowerEdge C5220 and the entire Dell PowerEdge C Series, visit

http://www.dell.com/us/enterprise/p/poweredge-cloud-servers.

To test the ability of the PowerEdge C5220 microserver to handle large data

processing tasks, we used Hadoop, specifically Cloudera Distribution Including Apache

Hadoop (CDH). Below, we briefly discuss Hadoop and the benchmark tool we used,

MapReduce benchmark (mrbench).

Hadoop, developed by Apache Software Foundation, is an open-source

distributed application that enables the analysis of large volumes of data for specific

purposes. Using Hadoop's framework, IT organizations and researchers can build

applications that tailor the data analysis to specific needs for each company, even using

unstructured data. Many different markets—among them finance, IT, and retail—use

Hadoop due to its ability to handle heterogeneous data, both structured and

unstructured.

Hadoop can run across any number of machines using varied hardware,

spreading data across all available hardware resources using a distributed file system,

Hadoop Distributed File System (HDFS), and replicating data to minimize loss if a

hardware malfunction occurs. The software is able to detect hardware failures, and to

work around said failures to allow uninterrupted access to data. Because of its ability to

run on different hardware, a Hadoop cluster is scalable and flexible – it can be expanded

to encompass growing databases and companies. It is also cost-effective as it allows

companies to utilize commodity hardware effectively.

MapReduce is a framework within Hadoop that provides the ability to process

extremely large datasets in parallel across the Hadoop cluster, shortening the overall

processing time greatly. MapReduce breaks input data down into chunks to be

processed across the Hadoop cluster. When an application is run on a Hadoop cluster,

MapReduce perfoms "map" tasks that process data in parallel. The data is then sent to

"reduce" tasks that reduce the information into a final result. This allows for faster data

processing using multiple nodes, while still producing a single, comprehensive, accurate

result.

A Principled Technologies test report 4

Table of Contents

Dell PowerEdge C5220 Test Report page 4

Related Manuals for Dell PowerEdge C5220

Related Products for Dell PowerEdge C5220