Guide to Data Engineering with real-time use case on Retail Banking

Data holds great value in organizations since having access to raw data can help enterprises enhance their customer-related services, improve their business processes, product development, and better decision-making. Data can be structured, unstructured and semi-structured. In the upcoming series of blogs, I have picked up retail banking as the use case and attempt to present step by step procedure on how to analyze, design, build and visualize (transform raw data into information) which would in turn help in decision making.

What is this blog (or) series about?

I am trying to present retail banking as use case and showcase my knowledge and experience to make E2E data engineering cycle easier to understand for beginners and intermediate level.

What can you expect?

Explanation of source system in detail, understanding data and granularity, data model, discuss about data quality.
Designing a simple data warehouse from scratch
Building pipelines (Batch and near real-time)
Visualizing data using Power BI and answer questions through reports, charts, and graphs which in turn create value to bank.

Below is the step-by-step approach planned:
Note – Since we are discussing about retail banking domain , data governance is an additional topic which I would like to cover too. Every data engineer should have a good understanding on data governance which results in consistent and trusted information.

Real-time use Case in detail – What is the objective ? List of questions we would like to answer by design / transform and visualize data.
Data Storage and Data Modelling
Batch Processing
Near Real Time Data Processing
Monitoring and Control - Data storage and data processing
Data Governance
Data Visualization
Other Tools for Data Processing and Visualization

Pre-requisites:

Willingness to learn and problem-solving skills.
Basic knowledge about data modelling, data analysis and data warehouse will make it easier for you to understand the series.
If you are a person who has the perception to look at raw data as knowledge (or) information, then this series is designed for you.

Tools:

Since, I am certified and specialized in Microsoft cloud, I have channelized the complete series in Azure. I would be using the following tools:

Storage – Azure SQL
Batch Processing – Azure Data Factory
Near real-time
- Azure Streaming
- Apache Kafka
Azure Monitoring
Visualization – Power BI

Conclusion:

Data Engineering involves creating, designing and building data pipelines and thus transforming data for analytics or data science team who build the ML or AI on top of it. A data engineer should be one step ahead and consider data quality, volume, governance, error handling routines, monitoring pipelines, performance tuning, and take complete ownership of data. The future of data engineering is quite dynamic as it provides real-time data analytics and processing. In the coming times, there will be wide use of big data and other data science tools.

As stated above, ownership and responsibilities of data engineer is more and he/she should have good clarity on each topic. By the end of this series, you would have a complete understanding of this concept. Think of the bigger picture while addressing solutions and finally feel good when you see usage of data in decision making.

All right, I'll see you there.

Data Engineering on Retail Banking

Guide to Data Engineering with real-time use case on Retail Banking

What is this blog (or) series about?

What can you expect?

Pre-requisites:

Tools:

Conclusion:

Comments

More from this blog

API Security

Well spent weekend @ AWS Reinvent 2022 Recap

Cyber Security Series

New ML Governance Tools for Amazon SageMaker

Cyber Security Series

Command Palette

Guide to Data Engineering with real-time use case on Retail Banking

What is this blog (or) series about?

What can you expect?

Pre-requisites:

Tools:

Conclusion:

Community and Social Footprints :

Comments

More from this blog