Raoul-Gabriel Urma

Cambridge Spark

Workshop Abstract: This workshop will provide a hands-on introduction to the Big Data ecosystem, Hadoop and Apache Spark in practice. Through practical activities in Python, you will learn how to apply Apache Spark on a range of datasets to process and analyse data at scale. After taking this workshop you will be able to: – Understand the challenges in the Big Data ecosystem – Describe the fundamentals of the Hadoop ecosystem – Use the core Spark RDD APIs to express data processing queries – Understand how you can leverage cloud technologies such as Amazon EMR to process large data sets.

* What they need to bring to the workshop: laptop, Python 3.6, pyspark *

Saturday MainStage

Talk Abstract: Making Sense of Big Data File formats * Modern applications generate and manipulate a lot of data. The growth rate of the data is staggering. Unfortunately, large datasets can be expensive to store at large scale and also slow to process. In fact, memory speed has been evolving at a much lower rate in comparison to CPUs. Thankfully, there are various file formats suited for big data systems to help. In this webinar, you will learn about popular file formats suitable for big data systems with a focus on Parquet. Through live coded examples in Python, you will learn the good, the bad, the ugly, and how you can make use of Parquet in practice.

Bio:  Raoul-Gabriel Urma is the director of Cambridge Spark, a leading learning community for data scientists and developers in UK. In addition, he is also Chairman and co-founder of Cambridge Coding Academy, a growing community of young coders and pre-university students. Raoul is author of the bestselling programming book “Java 8 in Action” which sold over 25,000 copies globally. Raoul completed a PhD in Computer Science at the University of Cambridge. Raoul has delivered over 100 technical talks at international conferences. He has worked for Google, eBay, Oracle, and Goldman Sachs. He is also a Fellow of the Royal Society of Arts.

Saturday April 21st , 2018
9:00 am-
5:00 pm
Data Science Festival Mainstage (Ballot ticket only) CodeNode - 10 South Pl, London EC2M 7EB BALLOT TICKETS ARE NOW OPEN Get Tickets Due to the popularity of Data Science Festival events, we are now allocating event tickets via a random ballot. Registering here enters you into the ticket ballot for…
Friday April 20th , 2018
9:00 am-
5:00 pm
Join us for a day of deep dive learning at our 3.5-hour workshops, offering you a chance to take part in a combination of amazing and enlightening workshops with experts in their field. Lunch will be provided from 12:30PM-1: 30 PM giving you a chance to network and mingle with your…