Skip to main content

About This Course

From social media to news articles to machine logs, text data is everywhere. In this class, you'll learn how to do text analytics, also known as Information Extraction: how to extract structured data from text in order to derive valuable insights. You will learn about applications of information extraction in many domains: social media analytics, healthcare analytics, financial risk analysis and many others, and common tasks: extracting entities and relations between them, event extraction and sentiment analysis. You'll then dive into "Declarative Information Extraction", a powerful method to building high-performance and high-quality text analytics, and gain hands-on experience writing your own extractors with a tool called SystemT (available as IBM BigInsights Text Analytics).

Course Syllabus

  • Module 1 - Getting to know Information Extraction (IE)
  • Module 2 - Getting to know SystemT
  • Module 3 - IE with AQL
  • Module 4 - AQL Basics
  • Module 5 - Advanced AQL

Recommended skills prior to taking this course

  • None

Grading scheme

  • The minimum passing mark for the course is 60%, where the review questions are worth 40% and the final exam is worth 60% of the course mark.
  • You have 1 attempt to take the exam with multiple attempts per question.



Course Staff

Yunyao Li

Yunyao Li

Yunyao Li joined IBM Almaden Research Center in July 2007 after obtaining her Ph.D degree in Computer Science & Engineering from the University of Michigan in April 2007. Before that, she was a graduate student in the Database Research Group, Department of Electrical Engineering and Computer Science, under the guidance of Professor H. V. Jagadish. Her primary research area is Database Systems and Natural Language Processing. She is particularly interested in designing, developing and analyzing large scale systems that can improve the accessibility of information for a wide spectrum of users. Her current research towards this direction involves a number of disciplines, most notably natural language processing, databases, human-computer interaction, information retrieval, and machine learning. Before she started her Ph.D study, she obtained dual-degrees of M.S.E in Computer Science & Engineering and M.S in Information from Computer Science and Engineering and School of Information respectively at the University of Michigan. She went to college at Tsinghua University, Beijing, China, and graduated with dual-degrees of B.E in Automation and B.S in Economics.

Laura Chiticariu

Laura Chiticariu

Laura Chiticariu is a Research Staff Member in the Scalable Natural Language Processing group at IBM Research-Almaden. Her primary research is in Database Systems and Natural Language Processing. She is one of the core members of SystemT, a declarative system for specifying NLP algorithms and executing them at scale. Her current research focuses on making information extraction systems transparent and easier to use, utilizing a range of techniques including data provenance, information integration and machine learning. Laura has a Ph.D. in Computer Science from University of California, Santa Cruz, and a B.S. in Computer Engineering with a major in Automation and Industrial Informatics from Politehnica University of Bucharest.

Marina Danilevsky

Marina Danilevsky

Marina Danilevsky is a Research Staff Member of the SystemT group at IBM Almaden Research Center in San Jose, California. She is interested in data mining, text mining, natural language processing, network ontologies, information networks, and other related areas. She holds a Ph.D. in Computer Science, awarded in 2014 from the University of Illinois at Urbana-Champaign (UIUC). Her research was in the area of Data Mining, supervised by Professor Jiawei Han. She previously received an M.S. in Computer Science from UIUC in 2011, and a B.S. in Mathematics from the University of Chicago in 2007.

Huaiyu Zhu

Huaiyu Zhu

Huaiyu Zhu is a member in the Infrastructure for Intelligent Information Systems group at IBM Research-Almaden. His main research focus is on text analytics, natural language processing, machine learning and statistical information processing.

  1. Course Number

  2. Classes Start

    Any Time, Self-Paced
  3. Estimated Effort

    3 hours