My Portfolio

Experience.

Therapprove

June 2025 – Present

Data Science Intern

Developed Machine Learning models to match children with optimal pediatric therapists by leveraging caregiver input, provider availability and therapy specialization reducing referral-to-response time from weeks to near immediate
• Analyzed CRM and referral queue metadata using Postgres to identify client bottlenecks by therapy type, location, and insurance coverage surfacing trends that reduced average intake wait time and informed queue prioritization strategies.

Candid

May 2024 – December 2024

Data Science Intern

Developed and implemented scalable SQL-based ETL pipelines to process and standardize non-profit data from government publication (~10M+ records) accelerating internal data delivery by 25% for downstream analytics and product teams.
Partnered with Data Services and API engineering teams to integrate cleaned and mastered datasets into public-facing APIs, enhancing data accessibility for 10K+ external users while maintaining backward compatibility.
Collaborated with cross-functional product and engineering teams to translate stakeholder requirements into an intuitive Power BI dashboard, enabling data-driven decisions across departments and contributing to a 15% increase in product adoption.

eProtons

February 2023 – July 2023

Data Science Intern

Reconfigured PostgreSQL indexing strategies to improve query performance by 27% validated via logs, on high-volume energy datasets used in forecasting models.
Engineered distributed data pipelines on AWS EMR and PySpark, by orchestrating parallel data processing workflows, yielding a 5x acceleration in large-scale analytics tasks.
Designed and evaluated an A/B test comparing flat-rate and dynamic pricing models across EV charging stations, uncovering a 12% lift in session completion using SQL and Causal Inference techniques to control for location-based confounders.

Mukham

October 2022 – March 2023

Data Analyst

Spearheaded development of CNN-based facial recognition authentication models, cutting spoofing incidents by 50% across high security endpoints.
Augmented fraud detection performance using geolocation and time-series signals into predictive models, increasing precision by 35%.
Established image processing pipeline for facial data, improving image quality for 80% of enrolled users and minimizing the number of support tickets related to image failures.

Projects.

eBay Marketplace Product Strategy Analytics

This project analyzes eBay marketplace data using both the Browse and Marketplace Insights APIs to uncover sales patterns. It evaluates how listing formats, product variations, and keywords affect pricing and sales. After data cleaning and transformation, statistical modeling identifies optimal strategies for sellers, providing actionable insights on auction vs fixed-price performance and variation bundling impact.

A Real-Time Anomaly Detection Pipeline in Financial Trasactions

This project demonstrates a real-time fraud detection pipeline using the PaySim dataset, where financial transactions are streamed via Kafka and processed by Apache Spark Structured Streaming. A pre-trained unsupervised model (IsolationForest) flags suspicious transactions in near real time, writing them to CSV. A Streamlit dashboard then displays these anomalies in a user-friendly interface with charts and tables, providing immediate insights. This setup can be deployed locally or on a cloud VM (e.g., GCP), showcasing an end-to-end solution that merges data engineering (Kafka, Spark) with data science (anomaly detection) and a modern web-based dashboard.

TelConnect Customer Churn Prediction

This project develops a churn prediction pipeline using PySpark, SQL, and AdaBoost to analyze 7,043 customer records. Key features like contract type, tenure, and payment method help identify high-risk customers. A cost-benefit analysis optimizes retention strategies, reducing churn while minimizing costs, ensuring a data-driven approach to customer retention.

Intelligent Multiformat Document Summarization & Q&A

This project provides an intelligent text summarization and Q&A pipeline using Llama 3.1 (8B) hosted on GroqCloud. It can read and summarize large PDFs, DOCX files, and TXT documents in parallel while automatically redacting sensitive data via regex. The built-in Q&A feature allows you to ask questions about your uploaded document, returning relevant answers in real time. Everything runs inside a Streamlit web interface, making it easy to upload files, generate concise summaries, and obtain instant Q&A – all without needing to manage large model infrastructure locally.

Amazon Product Review Analysis

A comprehensive end-to-end pipeline for sentiment analysis on Amazon product reviews using both BERT and an LSTM model, followed by ad ranking logic to improve CTR. This project also demonstrates how to deploy the trained models to AWS (Lambda/SageMaker) for scalable, real-time inference.dentifies individuals, creating a robust and accurate face recognition system.

Advanced Face Recognition

This project implements a face recognition pipeline by combining traditional and deep learning techniques. It uses HOG for face detection and aligns faces with 68 landmarks. A deep learning model generates 128-d embeddings, ensuring similar faces are closer while different faces remain distinct. Finally, a classifier like Linear SVM identifies individuals, creating a robust and accurate face recognition system.

Google Ads Optimizer

This project optimizes Google Ads campaigns using advanced ML and data analysis. By analyzing 50,000 ad interactions, it identifies factors influencing CTR and uses Gradient Boosted Trees to predict ad clicks. A dynamic bidding system adjusts bids based on probabilities, improving targeting and ROI.

CityLink RideShare Hub

A platform promoting economical, sustainable intercity travel by connecting drivers and passengers on shared routes. Built with HTML, CSS, JavaScript, Flask, and SQL, it enhances accessibility and fosters a community-driven approach to travel.

Meta-Kaggle Analysis

Analyzes Meta-Kaggle data to explore trends in data science competitions, submission patterns, and tool popularity. Visualizations highlight Kaggle’s evolving role as a learning platform, mirroring broader trends in the data science community.

NYC Airbnb Dashboard

Analyzes 48,000+ listings to uncover pricing, demand, and availability patterns. Interactive visualizations highlight borough-specific trends, room type distributions, and pricing strategies, providing actionable insights for hosts, investors, and policymakers.

Talking Texts: NLP in Novels

Leverages NLP techniques to analyze and decode character dialogues in classic novels. Explores dialogue patterns and linguistic styles of different authors, uncovering character interactions and narrative insights.

FIFA Worldcup Sentiment Analysis

Analyzes Reddit comments about the 2022 World Cup using ML and transformer-based models like RoBERTa. Fine-tuned RoBERTa improved sentiment classification accuracy, showcasing modern NLP’s power for capturing nuanced fan reactions.

Social Media Mining Projects

Documentation and results for analyzing Reddit data across fashion trends, video game culture, and sports sentiments. Involves advanced analytical methods including sentiment analysis and topic modeling, revealing key community insights.

About Me.

Monisha Patro

Education.

Indiana University Bloomington

Vellore Institute of Technology

Experience.

Therapprove

Candid

eProtons

Mukham

Skills

Languages & Tools

Statistics

Frameworks & Platforms

Libraries

ML & Modeling

Visualization & BI

GenAI

Projects.

eBay Marketplace Product Strategy Analytics

A Real-Time Anomaly Detection Pipeline in Financial Trasactions

TelConnect Customer Churn Prediction

Intelligent Multiformat Document Summarization & Q&A

Amazon Product Review Analysis

Advanced Face Recognition

Google Ads Optimizer

CityLink RideShare Hub

Meta-Kaggle Analysis

NYC Airbnb Dashboard

Talking Texts: NLP in Novels

FIFA Worldcup Sentiment Analysis

Social Media Mining Projects

Contact.

University Email

Personal Email

Social Network

Address

City

Pin Code