Project Portfolio

DONT FORGET TO REPLACE ME LATER

Data in Action: A Portfolio of Real-World Solutions

Welcome to my project portfolio! As a data scientist, I’m passionate about transforming complex data into clear, actionable insights that drive business success. My work isn’t just about building models; it’s about solving real-world problems, from mitigating financial risk to boosting operational efficiency.

Below is a selection of projects that showcase my experience in financial modeling, Natural Language Processing (NLP), computer vision, and business intelligence. Each one represents a unique challenge and a data-driven solution that delivered measurable results.


Financial Modeling & Risk Management

In the finance and insurance sectors, accurate risk assessment is paramount. These projects focus on using machine learning to build robust models that identify risk, prevent losses, and unlock new revenue opportunities.

1. Underwriting Model for Auto Loans

The goal was to develop a more nuanced underwriting model to classify auto loans by their long-term viability. I built and compared several survival models, including Accelerated Failure Time (AFT) and a Neural MTLR. The final production model, a Random Survival Forest (RSF), stratified customers into five distinct risk tiers.

2. Predicting 30+ Day Delinquency

To get ahead of potential defaults, I created a model to predict the likelihood of a customer becoming delinquent. The solution was a stacked ensemble where Logistic Regression, Random Forest, and XGBoost models ran in parallel, with their outputs fed into a final LightGBM model to produce a highly accurate risk score.

3. Real-Time Insurance Fraud Detection

This project’s objective was to build and deploy a tool for identifying fraudulent insurance claims. After extensive feature engineering, I trained and tuned XGBoost and SVM models. The final XGBoost model was deployed as a web service using Flask and Docker.


NLP, Computer Vision & Content Moderation

From digitizing physical documents to keeping online platforms safe, these projects demonstrate my skills in processing and understanding unstructured text and image data.

1. Automated OCR System to Digitize Documents

To reduce manual data entry, I architected a multi-stage OCR pipeline to digitize documents automatically. The process uses OpenCV for layout analysis, a U-Net to detect text blocks, a CRNN for text recognition, and a Bi-LSTM for named-entity recognition (NER) to classify the extracted text into a structured JSON output.

2. Spam & NSFW Detection for an Educational Forum

The task was to build a highly accurate filter to block spam and inappropriate content. Using the Snorkel framework for weak supervision, I programmatically generated a massive labeled dataset. I then trained a Bi-LSTM text classification model with an attention mechanism to identify and flag harmful content.

3. NSFW Ad Pop-up Detection

Here, I replaced an outdated, rule-based system with a deep learning model to detect NSFW pop-up ads. I curated a custom dataset and fine-tuned a VGG16 model for image classification. The model is retrained monthly to adapt to new patterns.

4. Correcting Garbled Text from OCR & Web Scraping

Low-quality text from faulty OCR or web-crawling bugs can corrupt a dataset. To solve this, I fine-tuned an ALBERT model (a lightweight BERT variant) with a custom loss function to automatically detect and filter out poorly formed text.


Business Intelligence & Operational Efficiency

Data can unlock incredible efficiencies and provide a competitive edge. These projects focus on using analytics and predictive modeling to streamline operations and inform strategic decisions.

1. Predicting Effective Call Times for a Call Center

To improve call center productivity, I built a model to prioritize the daily call list. The solution was a stacked classification model composed of a Logistic Regression and a Gradient Boosting Machine, with a Random Forest meta-classifier to generate a final propensity score for successful contact.

2. Applied Data Analysis for Price Negotiation

I worked with the sales department to help them conduct smarter price negotiations. By blending 3+ years of quotation records with raw material market data, I uncovered the underlying logic of historical pricing.

3. Predicting Building Energy Consumption

This project focused on predicting a building’s energy consumption based on its type and local climate data. Using Python for exploratory data analysis and feature engineering, I developed a predictive model to forecast energy usage.