3. What is MRUnit?
• Testing library for MapReduce
• Developed by Cloudera
• Easy integration between MapReduce
and standard testing tools (e.g. JUnit)
cloudera.com/hadoop-mrunit
5. Testing without MRUnit
• Write tests that create JobConf or
Configuration objects
• conf.set(‘mapred.job.tracker’, ‘local’)
• Developing new test input files stored
alongside MapReduce test code
• Lots of work to validate output files
• External file I/O makes tests slooooow
7. Testing with MRUnit
• No external test input or output files
• Programmatically specified
• Less test harness code (but also perhaps
less control)
• Concise, fast tests
8. Example
class ExampleTest() {
private Example.MyMapper mapper
private Example.MyReducer reducer
private MapReduceDriver driver
@Before void setUp() {
mapper = new Example.MyMapper()
reducer = new Example.MyReducer()
driver = new MapReduceDriver(mapper, reducer)
}
@Test void testMapReduce() {
driver.withInput(new Text(‘a’), new Text(‘b’))
driver.withOutput(new Text(‘c’), new Text(‘d’))
driver.runTest()
}
}
9. Example
class ExampleTest() {
private Example.MyMapper mapper
private Example.MyReducer reducer
private MapReduceDriver driver
@Before void setUp() {
mapper = new Example.MyMapper()
reducer = new Example.MyReducer()
driver = new MapReduceDriver(mapper, reducer)
}
@Test void testMapReduce() {
driver.withInput(new Text(‘a’), new Text(‘b’))
.withOutput(new Text(‘c’), new Text(‘d’))
.runTest()
}
}
15. Cool stuff I haven’t
tried...
• The PipelineMapReduceDriver - allows
testing a series of MapReduce passes
• Just call addMapReduce(mapper, reducer)
• Mock objects - MockReporter,
MockInputSplit, and MockOutputCollector
• Test combiners with
myMapReduceDriver.setCombiner(myCombiner)
25. In Summary, MRUnit...
• Makes testing your Hadoop jobs easier
• Abstracts away a lot of the boilerplate test
setup you need
• Has it’s problems
• but they are outweighed by the benefits