Big Data Series

Big Data Series

I’ve just added three blog posts I made during the Big Data bachelor course given at the Radboud university. As a master’s student I’m allowed to take on one or two bachelor courses if there’s a good reason… because no other course really goes into Spark, hadoop and Scala I figured it would be a nice addition to the Python-heavy curriculum. Not that I dislike Python, of course.

There are three posts in total:

Hadoop and the HDFS - an introduction to hadoop and HDFS.
Spark - On looking at a Kaggle competition data set in Spark
The class project: A solo project about submitting code to a national research cluster and running queries against 1.73 billion web pages.

You can find the posts here: Big Data Series

I learnt a lot and finished the class project with a 9.5.

Powered by Hexo and Hexo-theme-hiker

Copyright © 2013 - 2020 All Rights Reserved.

UV : | PV :