Encontro técnico de abril [Cloudera]
O Hadoop é uma plataforma distribuída feita em Java voltada para processamento de grande massa de dados em clusters, ele foi inspirado no MapReduce do Google. No dia 12 de abril, sábado pela manhã, a Cloudera fará o encontro técnico em conjunto com o SouJava com o intuito de falar e tirar dúvida sobre essa ferramenta muito utilizada na era do BigData.
Obs.: Caso opte pela inscrição online, o link para transmissão será enviada por e-mail e divulgado no twitter do SouJava 10 minutos antes da transmissão.
- Palestra: Introduction to Apache Hadoop – HDFS and Map/Reduce Fundamentals
- Descrição: This talk will introduce the concept of Map/Reduce, a programming paradigm that enables the parallel processing of extremely large data sets. We’ll also introduce Hadoop’s implementation of Map/Reduce, and HDFS, the distributed file system that’s built into Hadoop to enable Map/Reduce. Nearly all of Hadoop is implemented in Java, and this talk will cover some of the details of writing a Map/Reduce job in Java.
- Palestrante: Aaron Myers
- Mini-Bio: Aaron T. Myers (aka ATM) is a Platform Software Engineer at Cloudera and an Apache Hadoop Committer/PMC Member at Apache. Aaron’s work is primarily focused on HDFS, High Availability, and Hadoop Security. Prior to joining Cloudera, Aaron was a Software Engineer and VP of Engineering at Amie Street, where he worked on all components of the software stack, including operations, infrastructure, and customer-facing feature development. Aaron holds both an Sc.B. and Sc.M. in Computer Science from Brown University.
- Palestra: Beyond Map/Reduce: Introduction to Apache Crunch and Apache Spark
- Descrição: Following Aaron’s talk, Todd will introduce Apache Crunch and Apache Spark. These two projects are higher-level frameworks which allow the programmer to express complex distributed data processing tasks on Hadoop in a more concise and simple manner than writing raw MapReduce jobs. Additionally, Todd will introduce Spark Streaming, a processing system which can run data flows on real-time data as it arrives. He will cover some example use cases that show how Hadoop can be used in such applications as real-time streaming data processing, machine learning, and model building.
- Palestrante: Todd Lipcon
- Mini-Bio: Todd Lipcon is an engineer at Cloudera who works on Core Hadoop as well as the Cloudera Distribution for Hadoop. Todd is also active in other Apache projects and is always excited to hear about the interesting ways in which people are using Hadoop for large scale data analysis. Previously, Todd came to Cloudera from Amie Street, where he worked on infrastructure, operations, data mining, and product development. Prior to that, he interned at Google developing machine learning methods to detect credit-card fraud on AdWords and Google Checkout. Todd holds a BSc in Computer Science from Brown University, where he completed an honors thesis developing a new collaborative filtering algorithm for the Netflix Prize Competition.