Skip to main content

Spark Fundamentals II BDU

Enrollment in this course is by invitation only

About This Course

Apache Spark is an open source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that requires low latency processing that a typical Map Reduce program cannot provide, Spark is the alternative. Spark performs at speeds up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining. Spark provides in-memory cluster computing for lightning fast speed and supports Java, Scala, and Python APIs for ease of development.

Spark combines SQL, streaming and complex analytics together seamlessly in the same application to handle a wide range of data processing scenarios. Spark runs on top of Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources such as HDFS, Cassandra, HBase, or S3.

Course Syllabus

After completing this course, you should be able to:

  • Describe what Spark is all about know why you would want to use Spark
  • Use Resilient Distributed Datasets (RDD) and DataFrame operations
  • Use Scala, Java, or Python to create and run a Spark application
  • Creating applications using Spark SQL, MLlib, Spark Streaming, and GraphX
  • Configure, monitor and tune Spark

Requirements

Course Staff

Course Staff Image #1

Henry L. Quach

Henry L. Quach is the Technical Curriculum Developer Lead for Big Data. He has been with IBM for 9 years focusing on education development. Henry likes to dabble in a number of things including being part of the original team that developed and designed the concept for the IBM Open Badges program. He has a Bachelor of Science in Computer Science and a Master of Science in Software Engineering from San Jose State University.

Course Staff Image #2

Alan Barnes

TBD

Grading Scheme

The minimum passing mark for the course is 60%, where the final test is worth 100% of the course mark. You have 3 attempts to take the test.

Frequently Asked Questions

What web browser should I use?

The Open edX platform works best with current versions of Chrome, Firefox or Safari, or with Internet Explorer version 9 and above.

See our list of supported browsers for the most up-to-date information.

  1. Course Number

    BD0212EN
  2. Classes Start

    Any Time, Self-Paced
  3. Estimated Effort

    5 hours