Enterprise Big Data Professional
Enterprise Big Data Analyst

Course Duration ± 40 Hours

“The Enterprise Big Data Scientist (EBDS) course will teach participants advanced statistical and computational techniques to extract meaningful information from large dataset. The EBDS course provides a solid theoretical foundation for data scientists develop models and algorithms.”


The Enterprise Big Data Scientist (EBDS®) course provides a solid theoretical foundation of data science and provides the tools to design and develop algorithms to extract value from massive data sets. In this course, you will get and in-depth overview of the theory and tools required for statistical learning, inference and modelling. With these tools, you will be able to design predictive models and algorithms, and will be able to develop the best algorithms to tackle business problems.

The Enterprise Big Data Scientist course provides and in-depth overview of statistical modelling and inference techniques for more complex enterprise problems. Participants will learn statistical and machine learning theory, and will be provided with hands-on case studies and exercises in Python. The objective of these case studies is to learn how to apply algorithms to real-world situations in order to extract valuable information from large datasets.

Topics in this course include bias-variance trade-off, the curse of dimensionality, multivariate regression, multivariate linear regression, confidence intervals, logistic regression, cross validation techniques, regularization, tree-based methods, support vector machines, and
unsupervised learning. Each topic is first discussed from a theoretical perspective (how it works) and is subsequently illustrated with practical examples in Python (how to apply it).

The Enterprise Big Data Scientist course is the thrird level of the Big Data Framework course curriculum and certification program, that is globally accredited by APMG-International. The curriculum provides a vendor-neutral and objective understanding of Big Data technologies, algorithms and processes.

The Enterprise Big Data Analyst qualification is a practitioner course data analysists and aspiring data scientists that aim to obtain an in-depth understanding of data science. In this course, we will focus heavily on the scientific aspects, so that every participant will have the knowledge and skills to design algorithmic solutions to enterprise problems. In order to become an effective data scientist, mastering statistical and computational techniques is of crucial importance.

The course will provide an overview of statistical and computational techniques, which are illustrated in Python. Although this certification will not test programming skills, familiarity with Python and Jupiter notebooks be build during this course. In the corresponding examination, the emphasis is on the correct application of the theoretical models. However, participants are required to understand the output of Python scripts to interpret results.

This course positions learners to successfully complete the Enterprise Big Data Scientist certification exam.

Course Objectives

The course objectives of the Enterprise Big Data Scientist program include an advanced understanding of statistical and computational techniques to extract meaningful information from large datasets. Moreover, an Enterprise Big Data Scientist must be able to optimize models for accuracy, and select the best potential model to solve a business problem.

A certified Enterprise Big Data Scientist has proficiency in the design of advanced algorithms for classification and clustering problems, and is able to select the best design to optimize solutions for different input data. (S)he understands the theoretical difference between advanced statistical and machine learning models and is able to explain the difference between models. Moreover, a certified Enterprise Big Data Scientist can build an appropriate model when confronted with a particular business problem.

This Enterprise Big Data Scientist course will prepare participants to:

  • Understand tools for statistical modelling for large datasets
  • Understand the trade-off between model accuracy and model interpretability, including the Bias-Variance trade-off.
  • Understand fundamental statistical terminology and probability tools that are required in data science, including probability spaces, random variables, probability distributions, expectations, Bayes’ rule, and multivariate probability.
  • Apply parametric models to fit a statical learning problem and asses the quality of these parametric models.
  • Apply multi-variate linear regression models, including the analysis and statistical significance (confidence intervals) of the linear regression coefficients.
  • Understand how to use binary and categorical data into linear regression problems.
  • Understand how to solve classification problem using the Bayes optimal classifier, logistic regression and Linear Discriminant Analysis (LDA).
  • Analyse the accuracy of different classification approaches using the confusion matrix, different types of error rates, and the ROC of a classifier.
  • Understand techniques for estimating the testing error in statical learning problems, including k-fold cross-validation and the bootstrap.
  • Apply tree-based methods for non-linear regression and classification and how to apply tree-pruning techniques for optimal tree size selection.
  • Understand how to apply ensemble techniques for tree-based methods, including bagging, random forests and boosting
  • Apply and understand non-linear classification techniques, including Support Vector Machines (SVM).
  • Understand unsupervised clustering techniques based on principal components and the corresponding Principal Component Analysis technique for dimensionality reduction.
  • Understand advance unsupervised clustering techniques and how to apply them, including the k-means algorithm and hierarchical clustering.


This qualification is aimed at individuals who are involved in enterprise Big Data analysis, analytics or data science, and who aspire to be become professional data scientistic. Please note that this is an advanced course, requiring significant efforts to pass the exam and obtain the qualification.

The target audience of the Enterprise Big Data Scientist qualification therefore includes the following roles:

  • Data Analysts
  • Business Analysts
  • Business Data Analysts
  • Systems Analysts
  • Data Management Analysts
  • Business Analytics Consultants
  • Data Scientists
  • Data Modellers

Learning Materials

Participants to the Enterprise Big Data Scientist course will receive the following study materials:

  • 40 hours of instructor-led training and exercise facilitation
  • Learner Manual (excellent post-class reference)
  • Participation in unique exercises designed to apply concepts
  • Sample documents, templates, tools and techniques
  • Access to additional value-added resources and communities


The Enterprise Big Data Scientist is an advanced level course, that will require working experience in data analysis techniques. The Enterprise Big Data Analyst level is a mandatory prerequisite. Participant who have a similar pre-requisite can request a waiver of the pre-requisite from the Enterprise Big Data Framework Alliance Educational Board.


Successfully passing (65%) the 150-minute examination, consisting of 80 complex multiple-choice questions, leads to the Enterprise Big Data Scientist (EBDS) Certificate. The examination and certification process is administered by APMG-International on behalf of the Enterprise Big Data Framework Alliance.

Detailed Course Outline

Introduction to Big Data Science

  • Introduction to Data Science
  • Fundamental concepts in probability
  • Probability Distributions
  • Vectors and Matrices
  • Python Programming Review

Foundations of Statistical Learning

  • Bias-Variance Trade-off
  • Supervised Learning review – regression and classification
  • Curse of dimensionality
  • Parametric models
  • Assessing model quality
  • Training MSE and Test MSE

Multivariate Regression

  • Multivariate regression
  • Determining coefficients
  • Coefficient uncertainty and confidence intervals
  • Hypothesis testing on coefficients
  • Accuracy in multivariate regression

Advanced Classification Techniques

  • Bayes optimal classifier
  • Logistic Regression
  • Hypothesis testing in logistic regression
  • Logistic regression with multiple classes
  • Linear Discriminant Analysis
  • Univariate and Multivariate LDA

Model Accuracy and Errors

  • Classification Errors
  • Confusion Matrix
  • True Positives, False Positive
  • True Negative, False Negatives
  • Classification Error Rates
  • Receiver Operating Characteristics

Cross Validation and Feature Selection

  • Validation Set Approach
  • K-fold cross validation
  • The Bootstrap
  • Forward stepwise selection
  • Backward stepwise selection

Tree Based Methods     

  • Tree-based regression
  • Recursive splitting
  • Tree pruning
  • Classification trees
  • Ensemble Techniques
  • Bootstrap Aggregation
  • Random Forests
  • Boosting

Linear Classifiers  

  • The hyperplane
  • Maximum Margin Hyperplanes
  • Support Vector Classifiers
  • Support Vector Machines
  • Non-Linear Boundaries
  • The Kernel Trick

Unsupervised Learning

  • Principal Component Analysis
  • K-Means Clustering
  • Hierarchical Clustering
  • Clustering Distances

Where can I take the examination?

The examinations are distributed through Accredited Training Organizations (ATOs). For more information about authorized training partners, please visit the website of APMG-International.

Visit the APMG website