Guide to Data Engineering with real-time use case on Retail Banking
Data holds great value in organizations since having access to raw data can help enterprises enhance their customer-related services, improve their business processes, product development, and better decision-making. Data can be structured, unstructured and semi-structured. In the upcoming series of blogs, I have picked up retail banking as the use case and attempt to present step by step procedure on how to analyze, design, build and visualize (transform raw data into information) which would in turn help in decision making.
What is this blog (or) series about?
I am trying to present retail banking as use case and showcase my knowledge and experience to make E2E data engineering cycle easier to understand for beginners and intermediate level.
What can you expect?
- Explanation of source system in detail, understanding data and granularity, data model, discuss about data quality.
- Designing a simple data warehouse from scratch
- Building pipelines (Batch and near real-time)
- Visualizing data using Power BI and answer questions through reports, charts, and graphs which in turn create value to bank.
Below is the step-by-step approach planned:
Note – Since we are discussing about retail banking domain , data governance is an additional topic which I would like to cover too. Every data engineer should have a good understanding on data governance which results in consistent and trusted information.
- Real-time use Case in detail – What is the objective ? List of questions we would like to answer by design / transform and visualize data.
- Data Storage and Data Modelling
- Batch Processing
- Near Real Time Data Processing
- Monitoring and Control - Data storage and data processing
- Data Governance
- Data Visualization
- Other Tools for Data Processing and Visualization
- Willingness to learn and problem-solving skills.
- Basic knowledge about data modelling, data analysis and data warehouse will make it easier for you to understand the series.
- If you are a person who has the perception to look at raw data as knowledge (or) information, then this series is designed for you.
Since, I am certified and specialized in Microsoft cloud, I have channelized the complete series in Azure. I would be using the following tools:
- Storage – Azure SQL
- Batch Processing – Azure Data Factory
- Near real-time
- Azure Streaming
- Apache Kafka
- Azure Monitoring
- Visualization – Power BI
Data Engineering involves creating, designing and building data pipelines and thus transforming data for analytics or data science team who build the ML or AI on top of it. A data engineer should be one step ahead and consider data quality, volume, governance, error handling routines, monitoring pipelines, performance tuning, and take complete ownership of data. The future of data engineering is quite dynamic as it provides real-time data analytics and processing. In the coming times, there will be wide use of big data and other data science tools.
As stated above, ownership and responsibilities of data engineer is more and he/she should have good clarity on each topic. By the end of this series, you would have a complete understanding of this concept. Think of the bigger picture while addressing solutions and finally feel good when you see usage of data in decision making.
All right, I'll see you there.
Community and Social Footprints :
Did you find this article valuable?
Support Cloudnloud Tech Community by becoming a sponsor. Any amount is appreciated!