Comprehensive Data Engineering program covering Python, Statistics, Database Essentials, Big Data, Data Wrangling, NumPy, and Pandas with intensive 1-month classroom/LVC training + 2 months of LIVE project mentoring and unlimited access to the Data Science Cloud Lab for hands-on practice.
(21,693 reviews)
Self Learning + Live Mentoring
Elite faculty from prestigious
Universities with deep research
And coaching expertise
Personalized counselling for career Enhancement in managerial roles
Focused on data science for decision making, Managing data science projects with essential technical overview
Techniques for scenarios with certainty, Low uncertainty and high certainty from Decision tree to monte carlo simulation
We’re dedicated to making our programs accessible. Pay in easy installments at 0% interest with no hidden costs.
Bajaj Finserv & ShopSe
31st December 2026
• What is Data Engineering?
• Data Engineering scope
• Data Ecosystem, Tools and platforms
• Core concepts of Data engineering
• Types of data sources
• Databases: SQL and Document DBs
• Managing Big data
• Data integrity basics
• Various aspects of data privacy
• Various data privacy frameworks and standards
• Industry related norms in data integrity and privacy: data engineering perspective
• Who is a data engineer?
• Various roles of data engineer
• Skills required for data engineering
• Data Engineer Collaboration with Data Scientist and other roles.
• Introduction of python
• Installation of Python and IDE
• Python objects
• Python basic data types
• String functions part
• String functions part
• Python Operators
• IF Conditional statement, IF-ELSE
• NESTED IF
• Python Loops Basics, WHILE Statement
• BREAK and CONTINUE statements
• FOR statements
• Introduction to Packages in Python
• Datetime Package and Methods
• Basic Data Structures in Python
• Basics of List
• List methods
• Tuple: Object and methods
• Sets: Object and methods
• Dictionary: Object and methods
• Functions basics
• Function Parameter passing
• Lambda functions
• Map, reduce, filter functions
• Introduction to Statistics: Descriptive And Inferential Statistics
• a.Descriptive Statistics
• b.Inferential Statistis
• Basic Terms Of Statistics
• Types Of Data
• Random Sampling
• Sampling With Replacement And Without Replacement
• Cochran’s Minimum Sample Size
• Types of Sampling
• Simple Random Sampling
• Stratified Random Sampling
• Cluster Random Sampling
• Systematic Random Sampling
• Multi stage Sampling
• Sampling Error
• Methods Of Collecting Data
• Exploratory Data Analysis Introduction
• Measures Of Central Tendencies: Mean,Median And Mode
• Measures Of Central Tendencies: Range, Variance And Standard Deviation
• Data Distribution Plot: Histogram
• Normal Distribution & Properties
• Z Value / Standard Value
• Empirical Rule and Outliers
• Central Limit Theorem
• Normality Testing
• Skewness & Kurtosis
• Measures Of Distance: Euclidean, Manhattan And Minkowski Distance
• Covariance & Correlation
• Hypothesis Testing Introduction
• P- Value, Critical Region
• Types of Hypothesis Testing
• Hypothesis Testing Errors : Type I And Type II
• Two Sample Independent T-test
• Two Sample Relation T-test
• One Way Anova Test
• Application of Hypothesis testing
• AWS Overview and Account Setup
• AWS IAM Users, Roles and Policies
• AWS S overview
• AWS EC overview
• AWS Lamdba overview
• AWS Glue overview
• AWS Kinesis overview
• AWS Dynamodb overview
• AWS Athena overview
• AWS Redshift overview
• AWS Glue Crawler and setup
• ETL with AWS Glue
• Data Ingesting with AWS Glue
• AWS Kinesis overview and setup
• Data Streams with AWS Kinesis
• Data Ingesting from AWS S using AWS Kinesis
• AWS Redshift Overview
• Analyze data using AWS Redshift from warehouses, data lakes and operations DBs
• Develop Applications using AWS Redshift cluster
• AWS Redshift federated Queries and Spectrum
• Azure Synapse setup
• Understanding Data control flow with ADF
• Data Pipelines with Azure Synapse
• Prepare and transform data with Azure Synapse Analytics
• Create Azure storage account
• Connect App to Azure Storage
• Azure Blob Storage
• Azure Data Factory Introduction
• Data transformation with Data Factory
• Data Wrangling with Data Factory
• Azure databricks introduction
• Azure databricks architecture
• Data Transformation with databricks
• Creating a Relational Database
• Querying in and out of Relational Database
• ETL from RDS to databricks
• Hands-on Project Case-study
• Setup Project Development Env
• Organization of Data Sources
• AZURE/AWS services for Data Ingestion
• Data Extraction Transformation
• Data Warehouse Introduction
• Database vs Data Warehouse
• Data Warehouse Architecture
• Data Lake house
• ETL (Extract, Transform, and Load)
• ETL vs ELT
• Star Schema and Snowflake Schema
• Data Mart Concepts
• Data Warehouse vs Data Mart —Know the Difference
• Data Lake Introduction architecture
• Data Warehouse vs Data Lake
• Python NumPy Package Introduction
• Array data structure, Operations
• Python Pandas package introduction
• Data structures: Series and DataFrame
• Importing data into Pandas DataFrame
• Data processing with Pandas
• Data Warehouse vs Data Lake
• Docker Introduction
• Docker Vs.VM
• Hands-on: Running our first container
• Common commands (Running, editing,stopping,copying and managing images)YAML(Basics)
• Publishing containers to DockerHub
• Kubernetes Orchestration of Containers
• Docker swarm vs kubernetes
• Data Orchestration Overview
• Apache Airflow Introduction
• Airflow Architecture
• Setting up Airflow
• TAG and DAG
• Creating Airflow Workflow
• Airflow Modular Structure
• Executing Airflow
• Setting Project Environment
• Data pipeline setup
• Hands-on: build scalable data pipelines
• Purpose of Version Control
• Popular Version control tools
• Git Distribution Version Control
• Terminologies
• Git Workflow
• Git Architecture
• Git Repo Introduction
• Create New Repo with Init command
• Copying existing repo
• Git user and remote node
• Git Status and rebase
• Review Repo History
• GitHub Cloud Remote Repo
• Code commits
• Pull, Fetch and conflicts resolution
• Pushing to Remote Repo
• Organize code with branches
• Checkout branch
• Merge branches
• Editing Commits
• Commit command Amend flag
• Git reset and revert
• Creating GitHub Account
• Local and Remote Repo
• Collaborating with other developers
• DATABASE Overview
• Key concepts of database management
• Relational Database Management System
• CRUD operations
• Introduction to Databases
• Introduction to SQL
• SQL Commands
• MY SQL workbench installation
• Numeric, Character, date time data type
• Primary key, Foreign key, Not null
• Unique, Check, default, Auto increment
• Create database
• Delete database
• Show and use databases
• Create table, Rename table
• Delete table, Delete table records
• Create new table from existing data types
• Insert into, Update records
• Alter table
• Inner Join, Outer Join
• Left Join, Right Join
• Self Join, Cross join
• Windows function: Over, Partition, Rank
• Select, Select distinct
• Aliases, Where clause
• Relational operators, Logical
• Between, Order by, In
• Like, Limit, null/not null, group by
• Big Data Overview
• Five Vs of Big Data
• What is Big Data and Hadoop
• Introduction to Hadoop
• Components of Hadoop Ecosystem
• Big Data Analytics Introduction
• HDFS – Big Data Storage
• Distributed Processing with Map Reduce
• Mapping and reducing stages concepts
• Key Terms: Output Format, Partitioners,
• Combiners, Shuffle, and Sort
• PySpark Introduction
• Spark Configuration
• Resilient distributed datasets (RDD)
• Working with RDDs in PySpark
• Aggregating Data with Pair RDDs
• Introducing Spark SQL
• Spark SQL vs Hadoop Hive
• Working with Spark SQL Query Language
• Kafka architecture
• Kafka workflow
• Configuring Kafka cluster
• Operations
• Creating an HDFS cluster with containers
• Creating pyspark cluster with containers
• Processing data on hdfs cluster with pyspark cluster
• Introduction to Business Intelligence & Introduction to Tableau
• Interface Tour, Data visualization: Pie chart, Column chart, Bar chart.
• Bar chart, Tree Map, Line Chart
• Area chart, Combination Charts, Map
• Dashboards creation, Quick Filters
• Create Table Calculations
• Create Calculated Fields
• Create Custom Hierarchies
• Power BI Introduction
• Basics Visualizations
• Dashboard Creation
• Basic Data Cleaning
• Basic DAX function
• Exploring Query Editor
• Data Cleansing and Manipulation:
• Creating Our Initial Project File
• Connecting to Our Data Source
• Editing Rows
• Changing Data Types
• Replacing Values
• Connecting to a CSV File
• Connecting to a Webpage
• Extracting Characters
• Splitting and Merging Columns
• Creating Conditional Columns
• Creating Columns from Examples
• Create Data Model















