Description
Introduction
- A Practical Example: What You Will Learn in This Course
- What Does the Course Cover
The Field of Data Science – The Various Data Science Disciplines
- Data Science and Business Buzzwords: Why are there so Many?
- What is the difference between Analysis and Analytics
- Business Analytics, Data Analytics, and Data Science: An Introduction
- Continuing with BI, ML, and AI
- A Breakdown of our Data Science Infographic
The Field of Data Science – Connecting the Data Science Disciplines
- Applying Traditional Data, Big Data, BI, Traditional Data Science and ML
The Field of Data Science – The Benefits of Each Discipline
- The Reason Behind These Disciplines
The Field of Data Science – Popular Data Science Techniques
- Techniques for Working with Traditional Data
- Real Life Examples of Traditional Data
- Techniques for Working with Big Data
- Real Life Examples of Big Data
- Business Intelligence (BI) Techniques
- Real Life Examples of Business Intelligence (BI)
- Techniques for Working with Traditional Methods
- Real Life Examples of Traditional Methods
- Machine Learning (ML) Techniques
- Types of Machine Learning
- Real Life Examples of Machine Learning (ML)
The Field of Data Science – Popular Data Science Tools
- Necessary Programming Languages and Software Used in Data Science
The Field of Data Science – Careers in Data Science
- Finding the Job – What to Expect and What to Look for
The Field of Data Science – Debunking Common Misconceptions
- Debunking Common Misconceptions
               Probability
- The Basic Probability Formula
- Computing Expected Values
- Frequency
- Events and Their Complements
Probability – Combinatorics
- Fundamentals of Combinatorics
- Permutations and How to Use Them
- Simple Operations with Factorials
- Solving Variations with Repetition
- Solving Variations without Repetition
- Solving Combinations
- Symmetry of Combinations
- Solving Combinations with Separate Sample Spaces
- Combinatorics in Real-Life: The Lottery
- A Recap of Combinatorics
- A Practical Example of Combinatorics
Probability – Bayesian Inference
- Sets and Events
- Ways Sets Can Interact
- Intersection of Sets
- Union of Sets
- Mutually Exclusive Sets
- Dependence and Independence of Sets
- The Conditional Probability Formula
- The Law of Total Probability
- The Additive Rule
- The Multiplication Law
- Bayes’ Law
- A Practical Example of Bayesian Inference
Probability – Distributions
- Fundamentals of Probability Distributions
- Types of Probability Distributions
- Characteristics of Discrete Distributions
- Discrete Distributions: The Uniform Distribution
- Discrete Distributions: The Bernoulli Distribution
- Discrete Distributions: The Binomial Distribution
- Discrete Distributions: The Poisson Distribution
- Characteristics of Continuous Distributions
- Continuous Distributions: The Normal Distribution
- Continuous Distributions: The Standard Normal Distribution
- Continuous Distributions: The Students’ T Distribution
- Continuous Distributions: The Chi-Squared Distribution
- Continuous Distributions: The Exponential Distribution
- Continuous Distributions: The Logistic Distribution
- A Practical Example of Probability Distributions
Probability – Probability in Other Fields
- Probability in Finance
- Probability in Statistics
- Probability in Data Science
Part 3: Statistics
- Population and Sample
Statistics – Descriptive Statistics
- Types of Data
- Levels of Measurement
- Categorical Variables – Visualization Techniques
- Numerical Variables – Frequency Distribution Table
- The Histogram
- Histogram Exercise
- Cross Tables and Scatter Plots
- Mean, median and mode
- Skewness
- Variance
- Standard Deviation and Coefficient of Variation
- Covariance
- Correlation Coefficient
Statistics – Practical Example: Descriptive Statistics
- Practical Example: Descriptive Statistics
Statistics – Inferential Statistics Fundamentals
- Introduction
- What is a Distribution
- The Normal Distribution
- The Standard Normal Distribution
- Central Limit Theorem
- Standard error
- Estimators and Estimates
Statistics – Inferential Statistics: Confidence Intervals
- What are Confidence Intervals?
- Confidence Intervals; Population Variance Known; Z-score
- Confidence Interval Clarifications
- Student’s T Distribution
- Confidence Intervals; Population Variance Unknown; T-score
- Margin of Error
- Confidence intervals. Two means. Dependent samples
- Confidence intervals. Two means. Independent Samples
Statistics – Practical Example: Inferential Statistics
- Practical Example: Inferential Statistics
Statistics – Hypothesis Testing
- Null vs Alternative Hypothesis
- Rejection Region and Significance Level
- Type I Error and Type II Error
- Test for the Mean. Population Variance Known
- p-value
- Test for the Mean. Population Variance Unknown
- Test for the Mean. Dependent Samples
- Test for the mean. Independent Samples
Statistics – Practical Example: Hypothesis Testing
- Practical Example: Hypothesis Testing
Python – Introduction to Python
- Introduction to Programming
- Why Python?
- Why Jupyter?
- Installing Python and Jupyter
- Understanding Jupyter’s Interface – the Notebook Dashboard
- Prerequisites for Coding in the Jupyter Notebooks
Python – Variables and Data Types
- Variables
- Numbers and Boolean Values in Python
- Python Strings
Python – Basic Python Syntax
- Using Arithmetic Operators in Python
- The Double Equality Sign
- How to Reassign Values
- Add Comments
- Understanding Line Continuation
- Indexing Elements
- Structuring with Indentation
Python – Other Python Operators
- Comparison Operators
- Logical and Identity Operators
Python – Conditional Statements
- The IF Statement
- The ELSE Statement
- A Note on Boolean Values
Python – Python Functions
- Defining a Function in Python
- How to Create a Function with a Parameter
- Defining a Function in Python
- How to Use a Function within a Function
- Conditional Statements and Functions
- Functions Containing a Few Arguments
- Built-in Functions in Python
Python – Sequences
- Lists
- Using Methods
- List Slicing
- Tuples
- Dictionaries
Python – Iterations
- For Loops
- While Loops and Incrementing
- Lists with the range() Function
- Conditional Statements and Loops
- Conditional Statements, Functions, and Loops
- How to Iterate over Dictionaries
Python – Advanced Python Tools
- Object Oriented Programming
- Modules and Packages
- What is the Standard Library?
- Importing Modules in Python
Advanced Statistical Methods in Python
- Introduction to Regression Analysis
Advanced Statistical Methods – Linear Regression with StatsModels
- The Linear Regression Model
- Correlation vs Regression
- Geometrical Representation of the Linear Regression Model
- Python Packages Installation
- First Regression in Python
- Using Seaborn for Graphs
- How to Interpret the Regression Table
- Decomposition of Variability
- What is the OLS?
- R-Squared
Advanced Statistical Methods – Multiple Linear Regression with StatsModels
- Multiple Linear Regression
- Adjusted R-Squared
- Test for Significance of the Model (F-Test)
- OLS Assumptions
- Linearity
- No Endogeneity
- Normality and Homoscedasticity
- No autocorrelation
- Dealing with Categorical Data – Dummy Variables
- Making Predictions with the Linear Regression
Advanced Statistical Methods – Linear Regression with sklearn
- What is sklearn and How is it Different from Other Packages
- How are we Going to Approach this Section?
- Simple Linear Regression with sklearn
- Simple Linear Regression with sklearn – A StatsModels-like Summary Table
- Multiple Linear Regression with sklearn
- Calculating the Adjusted R-Squared in sklearn
- Feature Selection (F-regression)
- Creating a Summary Table with P-values
- Feature Scaling (Standardization)
- Feature Selection through Standardization of Weights
- Predicting with the Standardized Coefficients
- Underfitting and Overfitting
- Train – Test Split Explained
Advanced Statistical Methods – Practical Example: Linear Regression
- Practical Example: Linear Regression
Advanced Statistical Methods – Logistic Regression
- Introduction to Logistic Regression
- A Simple Example in Python
- Logistic vs Logit Function
- Building a Logistic Regression
- An Invaluable Coding Tip
- Understanding Logistic Regression Tables
- What do the Odds Actually Mean
- Binary Predictors in a Logistic Regression
- Calculating the Accuracy of the Model
- Underfitting and Overfitting
- Testing the Model
Advanced Statistical Methods – Cluster Analysis
- Introduction to Cluster Analysis
- Some Examples of Clusters
- Difference between Classification and Clustering
- Math Prerequisites
Advanced Statistical Methods – K-Means Clustering
- K-Means Clustering
- A Simple Example of Clustering
- Clustering Categorical Data
- How to Choose the Number of Clusters
- Pros and Cons of K-Means Clustering
- To Standardize or not to Standardize
- Relationship between Clustering and Regression
- Market Segmentation with Cluster Analysis
- How is Clustering Useful?
Advanced Statistical Methods – Other Types of Clustering
- Types of Clustering
- Dendrogram
- Heatmaps
Part 6: Mathematics
- What is a Matrix?
- Scalars and Vectors
- Linear Algebra and Geometry
- Arrays in Python – A Convenient Way To Represent Matrices
- What is a Tensor?
- Addition and Subtraction of Matrices
- Errors when Adding Matrices
- Transpose of a Matrix
- Dot Product
- Dot Product of Matrices
- Why is Linear Algebra Useful?
Part 7: Deep Learning
- What to Expect from this Part?
Deep Learning – Introduction to Neural Networks
- Introduction to Neural Networks
- Training the Model
- Types of Machine Learning
- The Linear Model (Linear Algebraic Version)
- The Linear Model with Multiple Inputs
- The Linear model with Multiple Inputs and Multiple Outputs
- Graphical Representation of Simple Neural Networks
- What is the Objective Function?
- Common Objective Functions: L2-norm Loss
- Common Objective Functions: Cross-Entropy Loss
- Optimization Algorithm: 1-Parameter Gradient Descent
Deep Learning – How to Build a Neural Network from Scratch with NumPy
- Basic NN Example
Deep Learning – TensorFlow 2.0: Introduction
- How to Install TensorFlow 2.0
- TensorFlow Outline and Comparison with Other Libraries
- TensorFlow 1 vs TensorFlow 2
- A Note on TensorFlow 2 Syntax
- Types of File Formats Supporting TensorFlow
- Outlining the Model with TensorFlow 2
- Interpreting the Result and Extracting the Weights and Bias
- Customizing a TensorFlow 2 Model
Deep Learning – Digging Deeper into NNs: Introducing Deep Neural Networks
- What is a Layer?
- What is a Deep Net?
- Digging into a Deep Net
- Non-Linearities and their Purpose
- Activation Functions
- Activation Functions: Softmax Activation
- Backpropagation
- Backpropagation Picture
Deep Learning – Overfitting
- What is Overfitting?
- Underfitting and Overfitting for Classification
- What is Validation?
- Training, Validation, and Test Datasets
- N-Fold Cross Validation
- Early Stopping or When to Stop Training
Deep Learning – Initialization
- What is Initialization?
- Types of Simple Initializations
- State-of-the-Art Method – (Xavier) Glorot Initialization
Deep Learning – Digging into Gradient Descent and Learning Rate Schedules
- Stochastic Gradient Descent
- Problems with Gradient Descent
- Momentum
- Learning Rate Schedules, or How to Choose the Optimal Learning Rate
- Learning Rate Schedules Visualized
- Adaptive Learning Rate Schedules (AdaGrad and RMSprop )
- Adam (Adaptive Moment Estimation)
Deep Learning – Preprocessing
- Preprocessing Introduction
- Types of Basic Preprocessing
- Standardization
- Preprocessing Categorical Data
- Binary and One-Hot Encoding
Deep Learning – Classifying on the MNIST Dataset
- MNIST: The Dataset
- MNIST: How to Tackle the MNIST
- MNIST: Importing the Relevant Packages and Loading the Data
- MNIST: Preprocess the Data – Create a Validation Set and Scale It
- MNIST: Preprocess the Data – Shuffle and Batch
- MNIST: Outline the Model
- MNIST: Select the Loss and the Optimizer
- MNIST: Learning
- MNIST: Testing the Model
Deep Learning – Business Case Example
- Business Case: Exploring the Dataset and Identifying Predictors
- Business Case: Outlining the Solution
- Business Case: Balancing the Dataset
- Business Case: Preprocessing the Data
- Business Case: Load the Preprocessed Data
- Business Case: Learning and Interpreting the Result
- Business Case: Setting an Early Stopping Mechanism
- Business Case: Testing the Model
Deep Learning – Conclusion
- Summary on What You’ve Learned
- What’s Further out there in terms of Machine Learning
- An overview of CNNs
- An Overview of RNNs
- An Overview of non-NN Approaches
Appendix: Deep Learning – TensorFlow 1: Introduction
- How to Install TensorFlow 1
- TensorFlow Intro
- Actual Introduction to TensorFlow
- Types of File Formats, supporting Tensors
- Basic NN Example with TF: Inputs, Outputs, Targets, Weights, Biases
- Basic NN Example with TF: Loss Function and Gradient Descent
- Basic NN Example with TF: Model Output
Appendix: Deep Learning – TensorFlow 1: Classifying on the MNIST Dataset
- MNIST: What is the MNIST Dataset?
- MNIST: How to Tackle the MNIST
- MNIST: Relevant Packages
- MNIST: Model Outline
- MNIST: Loss and Optimization Algorithm
- Calculating the Accuracy of the Model
- MNIST: Batching and Early Stopping
- MNIST: Learning
- MNIST: Results and Testing
Appendix: Deep Learning – TensorFlow 1: Business Case
- Business Case: Getting Acquainted with the Dataset
- Business Case: Outlining the Solution
- The Importance of Working with a Balanced Dataset
- Business Case: Preprocessing
- Creating a Data Provider
- Business Case: Model Outline
- Business Case: Optimization
- Business Case: Interpretation
- Business Case: Testing the Model
- Business Case: A Comment on the Homework
Software Integration
- What are Data, Servers, Clients, Requests, and Responses
- What are Data Connectivity, APIs, and Endpoints?
- Taking a Closer Look at APIs
- Communication between Software Products through Text Files
- Software Integration – Explained
Case Study – What’s Next in the Course?
- Game Plan for this Python, SQL, and Tableau Business Exercise
- The Business Task
- Introducing the Data Set
Case Study – Preprocessing the ‘Absenteeism_data’
- What to Expect from the Following Sections?
- Importing the Absenteeism Data in Python
- Checking the Content of the Data Set
- Introduction to Terms with Multiple Meanings
- Using a Statistical Approach towards the Solution to the Exercise
- Dropping a Column from a DataFrame in Python
- Analyzing the Reasons for Absence
- Obtaining Dummies from a Single Feature
- More on Dummy Variables: A Statistical Perspective
- Classifying the Various Reasons for Absence
- Using .concat() in Python
- Reordering Columns in a Pandas DataFrame in Python
- Creating Checkpoints while Coding in Jupyter
- Analyzing the Dates from the Initial Data Set
- Extracting the Month Value from the “Date” Column
- Extracting the Day of the Week from the “Date” Column
- Analyzing Several “Straightforward” Columns for this Exercise
- Working on “Education”, “Children”, and “Pets”
- Final Remarks of this Section
Case Study – Applying Machine Learning to Create the ‘absenteeism_module’
- Exploring the Problem with a Machine Learning Mindset
- Creating the Targets for the Logistic Regression
- Selecting the Inputs for the Logistic Regression
- Standardizing the Data
- Splitting the Data for Training and Testing
- Fitting the Model and Assessing its Accuracy
- Creating a Summary Table with the Coefficients and Intercept
- Interpreting the Coefficients for Our Problem
- Standardizing only the Numerical Variables (Creating a Custom Scaler)
- Interpreting the Coefficients of the Logistic Regression
- Backward Elimination or How to Simplify Your Model
- Testing the Model We Created
- Saving the Model and Preparing it for Deployment
- Preparing the Deployment of the Model through a Module
Case Study – Loading the ‘absenteeism_module’
- Deploying the ‘absenteeism_module
Case Study – Analyzing the Predicted Outputs in Tableau
- Analyzing Age vs Probability in Tableau
- Analyzing Reasons vs Probability in Tableau
- Analyzing Transportation Expense vs Probability in Tableau
Appendix – Additional Python Tools
- Using the .format() Method
- Iterating Over Range Objects
- Introduction to Nested For Loops
- Triple Nested For Loops
- List Comprehensions
- Anonymous (Lambda) Functions