Subscribe to the Teradata Blog

Get the latest industry news, technology trends, and data science insights each week.

Leveraging Teradata Vantage's Superior Performance for Real-Time Analytics

Leveraging Teradata Vantage's Superior Performance for Real-Time Analytics
It seems today that every industry is trying its best to comprehend and get an in-depth understanding of their customers’ needs. Indeed, banks and financial institutions are no excepetion and are leveraging data to derive insights about their business, and their customers. But today, insights are simply not enough – they need to be timely and actionable as well.
 
Yes, it’s not just about analytics now, but real-time analytics. Teradata’s work implementing real-time credit scoring for a reputed bank in Turkey is one such example of enabling real-time analytics to maximize revenue and minimize cost & time. Being one of largest banks in Turkey with more than 700 branches across the country and assets estimated at over ₺90,410,000,000 Turkish liras, our client was focused on optimizing its credit scoring pipeline to improve assessments and yield more precise scores, faster.
 
The end-to-end pipeline included creation of variables on the Vantage SQL engine and leveraging Teradata performance to replicate the SAS based credit scoring models to the Vantage Machine Learning Engine without loss in any model performance measures and ensuring in-database scoring. Variables were created in real time, with data coming from the bank’s DWH, as well as Turkey’s Credit Bureau. We started by optimizing the Oracle-based PL-SQL queries and translating them to Teradata SQL using SQL-E functions, Ordered Analytics, ADS creation tactics, UDFs and Stored Procedures. This process helped us create approximately 1000 variables at run time used in the machine learning model for credit score prediction.
 
Once the ADS creation process for training and scoring was in place, a Random Forest model was created using MLE after exploring other algorithms as options. Various perimeters, training sets and techniques were used to create the final Random Forest model with an 80% default catch rate. The statistically satisfied specifics of the models remained at 100 trees, max depth 8, node size 1 for a training set with a 22.19 % default rate. Some Oversampling & Overfitting was deliberately done to compensate for a low default rate.
 
An end-to-end real-time simulated pipeline running in 5.25 seconds -- which included creating the ADS variables and scoring one INQ_NUMBER -- was handed over to the IT and business teams at the Bank. This was a major win for the account team, bringing the entire process from less than 30 minutes to just 5.25 seconds, which provided the bank’s credit loan department with actionable insights and allowed them to make timely decisions. 
Portrait of Arooha Hijazi

(Author):
Arooha Hijazi

Combining her degree in Mathematics and her experience in IT, Arooha has successfully delivered business solutions for advanced analytics within industries such as Telecom and Finance. Arooha has worked on business problems such as churn prediction, credit scoring, market basket analysis, customer segmentation & recommendation engines and various other predictive models with many statistical techniques. She is  also the leader for Teradata's SAS focus group, as part of which she has conducted many workshops and worked on SAS collaborations for both local and international clients. She has also created content for SAS trainings and developed case studies and IPs using SAS technologies to develop analytical solutions. View all posts by Arooha Hijazi

Turn your complex data and analytics into answers with Teradata Vantage.

お問い合わせ