Skip to main content

Data Science Hands-On with Open Source Tools CognitiveClass

About This Course

Get started with some of the most popular tools for collaborative data science on Cognitive Class Labs (AKA Data Scientist Workbench or BDU Labs), a free virtual lab environment that brings powerful open data science tools together so you can analyze, visualize, explore, clean data, run models and create apps. Prepare your data with OpenRefine, 'do' interactive Data Science with Jupyter and Zeppelin notebooks. If you are comfortable with R, run RStudio IDE on the Cloud! This free offering on the Cloud allows you to work no matter where you are without having to install anything. Start your work at home from a browser, continue it on the train (while you are commuting), and finish it at work!  All you need is an internet connection, while having all of these open source tools at your finger tips hosted on the cloud.  Moreover, checkout Seahorse, a tool with a visual approach to programming that will allow you to build data science pipelines. Seahorse is powered by Apache Spark and allows non-programmers write complex applications that may include the use of machine learning algorithms.

Course Syllabus

  • Module 1 -Introducing Cognitive Class Labs (Data Scientist Workbench)
    • What is Cognitive Class Labs (Data Scientist Workbench)?
    • Data Scientist Workbench Account features
    • Creating a Data Scientist Workbench account
    • Managing data within My Data
  • Module 2 -Introducing Jupyter Notebooks
    • What are Jupyter notebooks?
    • Getting started with Jupyter
    • Data and Notebooks in Jupyter
    • Sharing your Jupyter Notebooks and data
    • Apache Spark in Jupyter Notebooks
  • Module 3 - Introducing Zeppelin Notebooks
    • What are Zeppelin Notebooks?
    • Zeppelin for Scala
    • Getting started with Zeppelin
    • Managing your Interpreters in Zeppelin
    • Apache Spark in Zeppelin Notebooks
  • Module 4 - Introducing RStudio IDE
    • What is RStudio IDE?
    • Uploading files, Installing Packages and loading libraries in RStudio IDE
    • Getting started with RStudio IDE
    • RStudio Environment and History
    • Apache Spark in RStudio IDE
  • General Information

    • This course is free.
    • It is self-paced.
    • It can be taken at any time.
    • It can be audited as many times as you wish.

    Recommended skills prior to taking this course

    • None

    Requirements

    • None

    Course Staff

    Polong Lin, Data Science Bootcamp instructor

    Polong Lin

    Polong Lin is a Data Scientist at IBM in Canada. Under the Emerging Technologies division, Polong is responsible for educating the next generation of data scientists. Polong is a regular speaker in conferences and meetups, and holds a M.Sc. in Cognitive Psychology.
    Dr. Saeed Aghabozorgi, Data Science Bootcamp instructor

    Saeed Aghabozorgi

    Saeed Aghabozorgi, PhD is a Data Scientist in IBM with a track record of developing enterprise level applications that substantially increases clients’ ability to turn data into actionable knowledge. He is a researcher in data mining field and expert in developing advanced analytic methods like machine learning and statistical modelling on large datasets.
  1. Course Number

    DS0105EN
  2. Classes Start

    Any Time, Self-Paced
  3. Estimated Effort

    4 hours
Enroll