I’ve just added three blog posts I made during the Big Data bachelor course given at the Radboud university. As a master’s student I’m allowed to take on one or two bachelor courses if there’s a good reason… because no other course really goes into Spark, hadoop and Scala I figured it would be a nice addition to the Python-heavy curriculum. Not that I dislike Python, of course.
There are three posts in total:
Hadoop and the HDFS - an introduction to hadoop and HDFS.
Spark - On looking at a Kaggle competition data set in Spark
The class project: A solo project about submitting code to a national research cluster and running queries against 1.73 billion web pages.
You can find the posts here: Big Data Series
I learnt a lot and finished the class project with a 9.5.