Skip to main content

Integrated Data Science (Certificate)

The past decade has witnessed an accelerating growth in the volume and complexity of data in many data-enabled science and engineering (DESE) fields. To maximize the discovery potential, we must employ advanced data analytics methods and algorithms, visualization techniques, and high-performance computing. We are faced with unprecedented and multi-faceted challenges making skills in advanced data analytics most critical: statistics, data mining, machine learning, signal/image processing and visualization, data management and programming are becoming essential for many areas of science and engineering research. These skills bridge several disciplines and push research frontiers: from the methods disciplines of computer science, electrical engineering, applied math, and statistics to domain disciplines across science and engineering. Furthermore, the non-academic professional sector already has high demands for data scientists and engineers with PhD/MS education. They are being recruited as high-level staff at national research labs and centers, as “data wizards” at non-profit organizations, from within the financial sector, and at many industries including online retailers, social media, healthcare, and pharmaceuticals. These professional sectors are looking for data-analytics experts who can not only answer questions quantitatively, but pose the questions no one has yet identified; the same technical skills needed for data-enabled science and engineering research are in great demand in the growing number of data intensive industries as well.

The certificate in Integrated Data Science aims to organize and recognize students in this area, integrating curriculum in methods and domain disciplines and offering students the wide range of education they need to succeed as data scientists inside and outside of academia. The graduate certificate curriculum aligns well with the Northwestern Data Science Initiative and allows for further expansion, as other units across the university develop and add courses to the curriculum.

How to apply

Enrolled PhD students in The Graduate School may pursue this certificate with the permission of their program. In order to petition to have a Graduate Certificate awarded and appear on the transcript, students must submit the  Application for a Graduate Certificate  once all Graduate Certificate requirements have been completed, but no later than the time that the student files for graduation (in the final quarter of study). 

Who to contact

Please contact the program director, listed below, with questions about this program. Or, explore the Integrated Data-Driven Discovery in Earth and Astrophysical Sciences (IDEAS) program website for more information.

The following requirements are in addition to, or further elaborate upon, those requirements outlined in The Graduate School Policy Guide.

To complete the Integrated Data Science (IDS) Certificate requirements, students will take five courses including at least one course from group A, at least two courses from group B, at least one course from group C, and a fifth course from any group. The courses currently available in each curriculum group are described below, developed in connection to the NSF IDEAS traineeship.

Group A. Data Challenges in Domain Disciplines

DATA SCI 401: Data-Driven Research in Physics, Geophysics, and Astronomy
This course will integrate the domain-focused projects in P&A (Physics & Astronomy) and EPS (Earth and Planetary Sciences) and will be team-taught by one professor from P&A and one from EPS. This course will cover one quarter of material, but be spread over 2 quarters (winter and spring every year). It will focus on the science motivation and goals that unite three distinct research projects: LSST, aLIGO, and EarthScope. It will focus on principles and methods of data analysis. Spreading the course over two quarters will allow alignment and further interdisciplinary integration with DATA SCI 422 and DATA SCI 423.

ESAM 395/BIOL_SCI 354: Quantitative Biology
This course covers some of the landmark results in quantitative biology. Students will learn the biology, mathematics, physics and statistics needed to analyze a variety of data sets acquired from various studies before performing a re-analysis and re-plotting of a central result from these landmark papers. The papers will include studies in gene regulation, developmental biology, sequencing, and more. The course will also include an overview of coding and image analysis, introduction to landmark insights into quantitative biology, random genetic processes, gene expression, cell adaption, cell cycle, developmental morphogens, and phylgenomics.

KPHD 540: Computational Social Science: Methods and Applications
This course is designed to prepare PhD students for computational social science (CSS) research. 
These skills include data acquisition, null model design and programming, and data mining for structured and unstructured data.

Group B. Core Data Analytics

DATA SCI 421: Integrated Data Analytics I (cross-listed as PHYS 441: Statistical Methods for Physicists and Astronomers)
DATA SCI 422: Integrated Data Analytics II (cross-listed as EPS 329: Mathematical Inverse Methods in Earth and Environmental Sciences)
DATA SCI 423: Integrated Data Analytics III (cross-listed as ELEC_ENG 475: Machine Learning: Foundations, Applications, and Algorithms)

Group C. Electives in Data Analytics

From the Department of Chemical and Biological Engineering
  • Computational Biology: Principles and Applications (CHEM_ENG 379-0)

From the Department of Computer Science:

  • Design and Analysis of Algorithms (COMP_SCI 336)
  • Data Science (COMP_SCI 496)
  • Human-Centered Machine Learning (COMP_SCI 496)
Please note Machine Learning (COMP_SCI 349) will still count as an elective if taken before spring 2020. Data Management and Information Processing (COMP_SCI 317) is no longer offered; however it will still count as an elective if taken before Fall 2019.

From the Department of Electrical and Computer Engineering:

  • Digital Image Processing (ELEC_ENG 420)
  • Deep Learning Foundations from Scratch (ELEC_ENG 435)
  • Probabilistic Graphical Models (ELEC_ENG/COMP_ENG 495)
  • Statistical Pattern Recognition (ELEC_ENG 433)
  • Deep Reinforcement Learning (ELEC_ENG 473)
  • Social Media Mining (COMP_ENG 510)
  • Geospatial Vision and Visualization (COMP_ENG 495)

From the Department of Earth and Planetary Science:

  • Geophysical Time Series Analysis (EARTH 327)

From the Department of Engineering Sciences and Applied Mathematics

  • Models in Applied Mathematics (ES_APPM 421-1)
  • Numerical Methods for Random Processes (ES_APPM 448)

From the Department of Industrial Engineering and Management Sciences:

  • Statistical Methods for Data Mining (IEMS 304)

From the Department of Material Science and Engineering:

  • Atomic Scale Computational Material Science (MAT_SCI 458)

From the Department of Physics and Astronomy

  • Observational Astrophysics (ASTRON 421-0)

From the Department of Statistics:

  • Regression Analysis (STAT 350)
  • Introduction to Analysis of Financial Data (STAT 365)
  • Time Series Analysis (STAT 454)
  • Applied Bayesian Inference (STAT 457)
  • Advanced Topics: Theory of Data Mining (STAT 461)
  • Advanced Topics: Bayesian Statistics (STAT 461)
  •  
  •