If you are processing a bunch of data, grouping it, joining it, filtering it, then you should probably be using pig.
So go download that, and get it all setup. You need:
- Java 1.6 (with
JAVA_HOME
setup) -
Hadoop (with
HADOOP_HOME
setup) - pig (of course)
Put all the relevant stuff in your PATH
too.
pig 101
So here's a simple pig script.