SlideShare a Scribd company logo
1 of 27
Download to read offline
Testing Hadoop jobs
    with MRUnit

 Boulder/Denver Hadoop Users Group
                        05.12.2010


                     © 2010 Eric Wendelin
Eric Wendelin
Hadooper @returnpath
Blog: eriwen.com
Twitter: @eriwen
What is MRUnit?

• Testing library for MapReduce
• Developed by Cloudera
• Easy integration between MapReduce
  and standard testing tools (e.g. JUnit)

  cloudera.com/hadoop-mrunit
Why do I need that?
Testing without MRUnit
• Write tests that create JobConf or
  Configuration   objects
 •   conf.set(‘mapred.job.tracker’, ‘local’)

• Developing new test input files stored
  alongside MapReduce test code
• Lots of work to validate output files
 • External file I/O makes tests slooooow
MRUnit makes testing
Hadoop jobs easier
Testing with MRUnit

• No external test input or output files
 • Programmatically specified
• Less test harness code (but also perhaps
  less control)
• Concise, fast tests
Example
class ExampleTest() {
  private Example.MyMapper mapper
  private Example.MyReducer reducer
  private MapReduceDriver driver

    @Before void setUp() {
      mapper = new Example.MyMapper()
      reducer = new Example.MyReducer()
      driver = new MapReduceDriver(mapper, reducer)
    }

    @Test void testMapReduce() {
      driver.withInput(new Text(‘a’), new Text(‘b’))
      driver.withOutput(new Text(‘c’), new Text(‘d’))
      driver.runTest()
    }
}
Example
class ExampleTest() {
  private Example.MyMapper mapper
  private Example.MyReducer reducer
  private MapReduceDriver driver

    @Before void setUp() {
      mapper = new Example.MyMapper()
      reducer = new Example.MyReducer()
      driver = new MapReduceDriver(mapper, reducer)
    }

    @Test void testMapReduce() {
      driver.withInput(new Text(‘a’), new Text(‘b’))
          .withOutput(new Text(‘c’), new Text(‘d’))
          .runTest()
    }
}
Test map and reduce
    separately
class ExampleTest() {
  private Example.MyMapper mapper
  private MapDriver driver

    @Before void setUp() {
       mapper = new Example.MyMapper()
       driver = new MapDriver(mapper)
     }

    @Test void testMap() {
      driver.withInput(new Text(‘a’), new Text(‘b’))
      driver.withOutput(new Text(‘c’), new Text(‘d’))
      driver.runTest()
    }
}
class ExampleTest() {
  private Example.MyReducer reducer
  private ReduceDriver driver

    @Before void setUp() {
       reducer = new Example.MyReducer()
       driver = new ReduceDriver(reducer)
     }

    @Test void testReduce() {
      driver.withInput(new Text(‘a’),
          [new Text(‘foo’), new Text(‘bar’)])
      driver.withOutput(new Text(‘c’), new Text(‘d’))
      driver.runTest()
    }
}
Counters!
driver.withInput(...)
driver.run()

def counters = driver.getCounters()

assertEquals(1, counters.findCounter
    (‘foo’, ‘bar’).getValue())
Verifying logging
def messages = []
def appender = [
    append: { messages.add(it) },
    requiresLayout: { false }
  ] as AppenderSkeleton
Logger.getRootLogger().addAppender(appender)

driver.runTest()

assertTrue messages.find {
    it.getLevel.toString() == ‘WARN’ &&
    it.getMessage().contains(‘My err’) }

Logger.getRootLogger().removeAppender(appender)
Cool stuff I haven’t
         tried...
• The   PipelineMapReduceDriver  - allows
  testing a series of MapReduce passes
 • Just call addMapReduce(mapper, reducer)
• Mock objects - MockReporter,
  MockInputSplit, and MockOutputCollector

• Test combiners with
  myMapReduceDriver.setCombiner(myCombiner)
Problems with MRUnit
Not useful for
streaming jobs
shell$ ./myMapper.py < test.input |
sort | ./myReducer.py > actual.out

shell$ diff expected.out actual.out
runTest()  does not
    give meaningful
information on failure
Better to use run() and
      then assert
driver.setInput(new Text(‘foo’),
    new Text(‘bar’))

def output = driver.run()

assertEquals ‘baz’, output[0].first
assertEquals ‘jy’, output[0].second
Documentation is
 severely lacking
runXxx()   calls setup()
called for new Hadoop
 API, but not old API
Tests are not executed
 in a distributed way
In Summary, MRUnit...

• Makes testing your Hadoop jobs easier
• Abstracts away a lot of the boilerplate test
  setup you need
• Has it’s problems
 • but they are outweighed by the benefits
?
cloudera.com/hadoop-mrunit


Blog: eriwen.com
Twitter: @eriwen
Email:
eric.wendelin@returnpath.net
                   © 2010 Eric Wendelin

More Related Content

What's hot

NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATAGauravBiswas9
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introductionsudhakara st
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"Nicola Ferraro
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache HadoopAjit Koti
 
Comparison of Machine Learning Algorithms
Comparison of Machine Learning Algorithms Comparison of Machine Learning Algorithms
Comparison of Machine Learning Algorithms butest
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologiesneeraj rathore
 

What's hot (20)

Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
NoSql
NoSqlNoSql
NoSql
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Apache Spark Introduction
Apache Spark IntroductionApache Spark Introduction
Apache Spark Introduction
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Comparison of Machine Learning Algorithms
Comparison of Machine Learning Algorithms Comparison of Machine Learning Algorithms
Comparison of Machine Learning Algorithms
 
SQOOP PPT
SQOOP PPTSQOOP PPT
SQOOP PPT
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 

Viewers also liked

Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr AgileNCR2013
 
Groovy-er desktop applications with Griffon
Groovy-er desktop applications with GriffonGroovy-er desktop applications with Griffon
Groovy-er desktop applications with GriffonEric Wendelin
 
Javascript Stacktrace Ignite
Javascript Stacktrace IgniteJavascript Stacktrace Ignite
Javascript Stacktrace IgniteEric Wendelin
 
Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!Eric Wendelin
 
JavaScript + Jenkins = Winning!
JavaScript + Jenkins = Winning!JavaScript + Jenkins = Winning!
JavaScript + Jenkins = Winning!Eric Wendelin
 
Test your Javascript! v1.1
Test your Javascript! v1.1Test your Javascript! v1.1
Test your Javascript! v1.1Eric Wendelin
 
Error Detection and Correction
Error Detection and CorrectionError Detection and Correction
Error Detection and CorrectionTechiNerd
 
CRC Error coding technique
CRC Error coding techniqueCRC Error coding technique
CRC Error coding techniqueMantra VLSI
 
UNIT TESTING PPT
UNIT TESTING PPTUNIT TESTING PPT
UNIT TESTING PPTsuhasreddy1
 

Viewers also liked (14)

Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
Agile NCR 2013- Anirudh Bhatnagar - Hadoop unit testing agile ncr
 
Groovy-er desktop applications with Griffon
Groovy-er desktop applications with GriffonGroovy-er desktop applications with Griffon
Groovy-er desktop applications with Griffon
 
Javascript Stacktrace Ignite
Javascript Stacktrace IgniteJavascript Stacktrace Ignite
Javascript Stacktrace Ignite
 
Cn lec-06
Cn lec-06Cn lec-06
Cn lec-06
 
Intro to Hadoop
Intro to HadoopIntro to Hadoop
Intro to Hadoop
 
Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!Gradle 3.0: Unleash the Daemon!
Gradle 3.0: Unleash the Daemon!
 
Apache Avro and You
Apache Avro and YouApache Avro and You
Apache Avro and You
 
Piggybacking
PiggybackingPiggybacking
Piggybacking
 
JavaScript + Jenkins = Winning!
JavaScript + Jenkins = Winning!JavaScript + Jenkins = Winning!
JavaScript + Jenkins = Winning!
 
Test your Javascript! v1.1
Test your Javascript! v1.1Test your Javascript! v1.1
Test your Javascript! v1.1
 
Gradle by Example
Gradle by ExampleGradle by Example
Gradle by Example
 
Error Detection and Correction
Error Detection and CorrectionError Detection and Correction
Error Detection and Correction
 
CRC Error coding technique
CRC Error coding techniqueCRC Error coding technique
CRC Error coding technique
 
UNIT TESTING PPT
UNIT TESTING PPTUNIT TESTING PPT
UNIT TESTING PPT
 

Similar to Testing Hadoop jobs with MRUnit

An introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAn introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAnanth PackkilDurai
 
Testing multi outputformat based mapreduce
Testing multi outputformat based mapreduceTesting multi outputformat based mapreduce
Testing multi outputformat based mapreduceAshok Agarwal
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...MapR Technologies
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...CloudxLab
 
How and why i roll my own node.js framework
How and why i roll my own node.js frameworkHow and why i roll my own node.js framework
How and why i roll my own node.js frameworkBen Lin
 
Background Jobs - Com BackgrounDRb
Background Jobs - Com BackgrounDRbBackground Jobs - Com BackgrounDRb
Background Jobs - Com BackgrounDRbJuan Maiz
 
NetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience ReportNetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience ReportAnton Arhipov
 
AngularJS Testing Strategies
AngularJS Testing StrategiesAngularJS Testing Strategies
AngularJS Testing Strategiesnjpst8
 
Functional Programming in Java 8
Functional Programming in Java 8Functional Programming in Java 8
Functional Programming in Java 8Omar Bashir
 
Load Testing with PHP and RedLine13
Load Testing with PHP and RedLine13Load Testing with PHP and RedLine13
Load Testing with PHP and RedLine13Jason Lotito
 
33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good TestsTomek Kaczanowski
 
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...GeeksLab Odessa
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...Chester Chen
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)Jerome Eteve
 
服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScriptQiangning Hong
 
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Jyotirmoy Sundi
 

Similar to Testing Hadoop jobs with MRUnit (20)

Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
An introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAn introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduce
 
Testing multi outputformat based mapreduce
Testing multi outputformat based mapreduceTesting multi outputformat based mapreduce
Testing multi outputformat based mapreduce
 
Shooting the Rapids
Shooting the RapidsShooting the Rapids
Shooting the Rapids
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
 
How and why i roll my own node.js framework
How and why i roll my own node.js frameworkHow and why i roll my own node.js framework
How and why i roll my own node.js framework
 
Background Jobs - Com BackgrounDRb
Background Jobs - Com BackgrounDRbBackground Jobs - Com BackgrounDRb
Background Jobs - Com BackgrounDRb
 
NetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience ReportNetBeans Plugin Development: JRebel Experience Report
NetBeans Plugin Development: JRebel Experience Report
 
AngularJS Testing Strategies
AngularJS Testing StrategiesAngularJS Testing Strategies
AngularJS Testing Strategies
 
Functional Programming in Java 8
Functional Programming in Java 8Functional Programming in Java 8
Functional Programming in Java 8
 
R console
R consoleR console
R console
 
Load Testing with PHP and RedLine13
Load Testing with PHP and RedLine13Load Testing with PHP and RedLine13
Load Testing with PHP and RedLine13
 
Testing in airflow
Testing in airflowTesting in airflow
Testing in airflow
 
33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests33rd Degree 2013, Bad Tests, Good Tests
33rd Degree 2013, Bad Tests, Good Tests
 
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
AI&BigData Lab.Руденко Петр. Automation and optimisation of machine learning ...
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
 
服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript服务框架: Thrift & PasteScript
服务框架: Thrift & PasteScript
 
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
Cascading talk in Etsy (http://www.meetup.com/cascading/events/169390262/)
 

Recently uploaded

UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 

Recently uploaded (20)

20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 

Testing Hadoop jobs with MRUnit

  • 1. Testing Hadoop jobs with MRUnit Boulder/Denver Hadoop Users Group 05.12.2010 © 2010 Eric Wendelin
  • 2. Eric Wendelin Hadooper @returnpath Blog: eriwen.com Twitter: @eriwen
  • 3. What is MRUnit? • Testing library for MapReduce • Developed by Cloudera • Easy integration between MapReduce and standard testing tools (e.g. JUnit) cloudera.com/hadoop-mrunit
  • 4. Why do I need that?
  • 5. Testing without MRUnit • Write tests that create JobConf or Configuration objects • conf.set(‘mapred.job.tracker’, ‘local’) • Developing new test input files stored alongside MapReduce test code • Lots of work to validate output files • External file I/O makes tests slooooow
  • 7. Testing with MRUnit • No external test input or output files • Programmatically specified • Less test harness code (but also perhaps less control) • Concise, fast tests
  • 8. Example class ExampleTest() { private Example.MyMapper mapper private Example.MyReducer reducer private MapReduceDriver driver @Before void setUp() { mapper = new Example.MyMapper() reducer = new Example.MyReducer() driver = new MapReduceDriver(mapper, reducer) } @Test void testMapReduce() { driver.withInput(new Text(‘a’), new Text(‘b’)) driver.withOutput(new Text(‘c’), new Text(‘d’)) driver.runTest() } }
  • 9. Example class ExampleTest() { private Example.MyMapper mapper private Example.MyReducer reducer private MapReduceDriver driver @Before void setUp() { mapper = new Example.MyMapper() reducer = new Example.MyReducer() driver = new MapReduceDriver(mapper, reducer) } @Test void testMapReduce() { driver.withInput(new Text(‘a’), new Text(‘b’)) .withOutput(new Text(‘c’), new Text(‘d’)) .runTest() } }
  • 10. Test map and reduce separately
  • 11. class ExampleTest() { private Example.MyMapper mapper private MapDriver driver @Before void setUp() { mapper = new Example.MyMapper() driver = new MapDriver(mapper) } @Test void testMap() { driver.withInput(new Text(‘a’), new Text(‘b’)) driver.withOutput(new Text(‘c’), new Text(‘d’)) driver.runTest() } }
  • 12. class ExampleTest() { private Example.MyReducer reducer private ReduceDriver driver @Before void setUp() { reducer = new Example.MyReducer() driver = new ReduceDriver(reducer) } @Test void testReduce() { driver.withInput(new Text(‘a’), [new Text(‘foo’), new Text(‘bar’)]) driver.withOutput(new Text(‘c’), new Text(‘d’)) driver.runTest() } }
  • 13. Counters! driver.withInput(...) driver.run() def counters = driver.getCounters() assertEquals(1, counters.findCounter (‘foo’, ‘bar’).getValue())
  • 14. Verifying logging def messages = [] def appender = [ append: { messages.add(it) }, requiresLayout: { false } ] as AppenderSkeleton Logger.getRootLogger().addAppender(appender) driver.runTest() assertTrue messages.find { it.getLevel.toString() == ‘WARN’ && it.getMessage().contains(‘My err’) } Logger.getRootLogger().removeAppender(appender)
  • 15. Cool stuff I haven’t tried... • The PipelineMapReduceDriver - allows testing a series of MapReduce passes • Just call addMapReduce(mapper, reducer) • Mock objects - MockReporter, MockInputSplit, and MockOutputCollector • Test combiners with myMapReduceDriver.setCombiner(myCombiner)
  • 18. shell$ ./myMapper.py < test.input | sort | ./myReducer.py > actual.out shell$ diff expected.out actual.out
  • 19. runTest() does not give meaningful information on failure
  • 20. Better to use run() and then assert
  • 21. driver.setInput(new Text(‘foo’), new Text(‘bar’)) def output = driver.run() assertEquals ‘baz’, output[0].first assertEquals ‘jy’, output[0].second
  • 23. runXxx() calls setup() called for new Hadoop API, but not old API
  • 24. Tests are not executed in a distributed way
  • 25. In Summary, MRUnit... • Makes testing your Hadoop jobs easier • Abstracts away a lot of the boilerplate test setup you need • Has it’s problems • but they are outweighed by the benefits
  • 26. ?

Editor's Notes