Unstructured Data Analytics (95-865)

Many organizations need to analyze large amounts of data such as text, images, audio, and video to discover useful information. For example, a company may want to monitor how the public discusses its products in social media, or a forensics team may need to discover the contents of disk drives seized by law enforcement. A recurring issue is that we often do not know what structure is present in the data initially.

This course provides students with an understanding of common and emerging methods of organizing, summarizing, and analyzing large collections of this unstructured data (“unstructured data analytics”). While the focus is on algorithms and techniques, the course also provides an introduction to open-source software tools.

This is a 6 unit course. It is offered during the second half of the Fall (Mini-2) and Spring (Mini-4) semesters. Note that this course was previously titled “Text Analytics” but is now being updated to cover beyond just text; images analysis will play a prominent role in the course.

Learning Objective

By the end of the course, students are expected to have developed the following skills. Skills are assessed by the homework assignments and the final exam.

  • Recall and discuss common methods of conducting exploratory and predictive analysis of unstructured data;
  • Use search engines and common open-source software to perform common methods of exploratory and predictive analysis; and
  • Apply unstructured data analysis techniques discussed in class to solve problems faced by governments and companies.

Soft Prerequisites

Comfort coding in a high-level programming language like Python, R, or MATLAB will be helpful. There will be assignments involving programming in Python.

  • Units