Skip to main content

Data Science Hands-On with Open Source Tools CognitiveClass

Enrollment in this course is by invitation only

About This Course

Get started with some of the most popular tools for collaborative data science on Cognitive Class Labs (AKA Data Scientist Workbench or BDU Labs), a free virtual lab environment that brings powerful open data science tools together so you can analyze, visualize, explore, clean data, run models and create apps. Prepare your data with OpenRefine, 'do' interactive Data Science with Jupyter and Zeppelin notebooks. If you are comfortable with R, run RStudio IDE on the Cloud! This free offering on the Cloud allows you to work no matter where you are without having to install anything. Start your work at home from a browser, continue it on the train (while you are commuting), and finish it at work!  All you need is an internet connection, while having all of these open source tools at your finger tips hosted on the cloud.  Moreover, checkout Seahorse, a tool with a visual approach to programming that will allow you to build data science pipelines. Seahorse is powered by Apache Spark and allows non-programmers write complex applications that may include the use of machine learning algorithms.

Course Syllabus

  • Module 1 -Introducing Cognitive Class Labs (Data Scientist Workbench)
    • What is Cognitive Class Labs (Data Scientist Workbench)?
    • Data Scientist Workbench Account features
    • Creating a Data Scientist Workbench account
    • Managing data within My Data
    • Preparing data with OpenRefine
  • Module 2 -Introducing Jupyter Notebooks
    • What are Jupyter notebooks?
    • Getting started with Jupyter
    • Data and Notebooks in Jupyter
    • Sharing your Jupyter Notebooks and data
    • Apache Spark in Jupyter Notebooks
  • Module 3 - Introducing Zeppelin Notebooks
    • What are Zeppelin Notebooks?
    • Zeppelin for Scala
    • Getting started with Zeppelin
    • Managing your Interpreters in Zeppelin
    • Apache Spark in Zeppelin Notebooks
  • Module 4 - Introducing RStudio IDE
    • What is RStudio IDE?
    • Uploading files, Installing Packages and loading libraries in RStudio IDE
    • Getting started with RStudio IDE
    • RStudio Environment and History
    • Apache Spark in RStudio IDE
  • Module 5 - Introducing Seahorse
    • What is Seahorse?
    • A Glimpse of Seahorse's Features
    • Getting started with Seahorse on Cognitive Class Labs
    • Creating and uploading Seahorse Workflows on Cognitive Class Labs
    • Exporting and Cloning the Seahorse Examples on Cognitive Class Labs

General Information

  • This course is free.
  • It is self-paced.
  • It can be taken at any time.
  • It can be audited as many times as you wish.

Recommended skills prior to taking this course

  • None


  • None

Course Staff

Polong Lin, Data Science Bootcamp instructor

Polong Lin

Polong Lin is a Data Scientist at IBM in Canada. Under the Emerging Technologies division, Polong is responsible for educating the next generation of data scientists through BDU. Polong is a regular speaker in conferences and meetups, and holds a M.Sc. in Cognitive Psychology.
Dr. Saeed Aghabozorgi, Data Science Bootcamp instructor

Saeed Aghabozorgi

Saeed Aghabozorgi, PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients’ ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.
  1. Course Number

  2. Classes Start

    Any Time, Self-Paced
  3. Estimated Effort

    4 hours