California State University Mapreduce Worksheet
09-Dec-21,6:00PM;#Hackatopia,Tribeca Film Hackathon: Code As A New LanguageFor Content Creators Hackathon
28-Dec-21,7:00PM;#NYCHadoop,Hadoop-NYC Strata/Hadoop World Meetup at Google
NYC
31-Dec-21,3:00PM;#Hackatopia,Artists, Developers, engineers, don’t miss this
upcoming Boston hackathon
09-Jan-22,6:00PM;#Hackatopia,Soho Film Hackathon: Code As A New Language For
Content Creators Hackathon
28-Jan-22,7:00PM;#NYCHadoop,HIVE-NYC Strata/Hadoop World Meetup at Google NYC
31-Jan-22,3:00PM;#Hack,Designers, Developers, ENGINEERS, don’t miss this
upcoming Chicago hackathon
PART 1. Wordcount without using MapReduce framework
This is a WordCount-based problem – the goal is to find the number of lines containing a
specific search term. Write a Java (or Python – acceptable for part 1 only) program, without
using Hadoop MapReduce, that:
a. Searches for all of the following strings in the input file containing tweet data (you can
provide the search terms as parameters, or hardcode them): hackathon, Dec, Chicago, Java,
Engineers
b. Accepts a small input file to be searched containing lines of the form: Date,Time;Name,Tweet
This is the text file that you will be using as your input: HW 3 Input
c. Your code will search for all of the search strings in the input file and output the number of
lines that contained each search string (not the number of occurrences of a search string). The
matching is not case sensitive, i.e. if searching for the search string hackathon, all of the
following are a match: hackathon, Hackathon, hACKathon (and any other combination of upper
and lower case characters).
d. Your code should output the number of lines that contained each search string. Using the
input data above, the resulting counts will be:
Chicago 1
Dec 3
Java o
Hackathon 4
Engineers 2
Your program for part 2:
a. Searches for all of the following strings in the input file containing tweet data (you can
provide the search terms as parameters when you run your program, or hardcode them):
hackathon, Dec, Chicago, Java, Engineers. The program will output the number of lines that
contain these words.
b. This program accepts the same small input file you used in step 1 and searches it for the
search strings.
c. This program has a Mapper that will search the input file line by line to find matches. The
matching is not case sensitive (same as before).
The mapper code should not do any summing or buffering (no storing data in a map or array).
The summing must happen in the reducer code.
d. This program also has a Reducer. The Reducer code will input the key-value pairs generated
by the map phase and output the number of lines that contained each search string. Using the
input file, the resulting counts will be:
Chicago 1
Dec 3
Java o
Hackathon 4
Engineers 2
e. Upload homework.
To receive full credit, please hand in all of the following items:
1. Your program (all.java files and compiled classes and jar file). Please include 3 java files:
WordCount.java, Word CountMapper.java, and Word CountReducer.java and corresponding
.class files and .jar file.
Note: Your algorithm (from step 1) is not what you want to use in this MapReduce solution.
Think about the example covered in the book where temperatures (values) were sorted by year
(the key). These key-value pairs are guaranteed to arrive at the reducer(s) sorted by key. The
reducer can iterate through the values associated with a given key and process them. In the
MaxTemperature example, that processing was to select the max temperature by iterating
through the values. Think about what a reducer should do for a WordCount algorithm to be
efficient in a distributed system – write your algorithm so the summing happens in the reducer.
Top-quality papers guaranteed
100% original papers
We sell only unique pieces of writing completed according to your demands.
Confidential service
We use security encryption to keep your personal data protected.
Money-back guarantee
We can give your money back if something goes wrong with your order.
Enjoy the free features we offer to everyone
-
Title page
Get a free title page formatted according to the specifics of your particular style.
-
Custom formatting
Request us to use APA, MLA, Harvard, Chicago, or any other style for your essay.
-
Bibliography page
Don’t pay extra for a list of references that perfectly fits your academic needs.
-
24/7 support assistance
Ask us a question anytime you need to—we don’t charge extra for supporting you!
Calculate how much your essay costs
What we are popular for
- English 101
- History
- Business Studies
- Management
- Literature
- Composition
- Psychology
- Philosophy
- Marketing
- Economics