About Me.

Data Science Illustration

Monisha Patro

- Data Scient

Monisha Patro

Hey, this is Monisha Patro!

I’m all about turning messy datasets into clear insights that actually help people. My background spans everything from building end-to-end analytics pipelines to discovering patterns in huge datasets, and I love using data to drive real-world impact. I’m especially drawn to data science and product analytics roles because I enjoy seeing how insights can shape a product’s success and create better experiences for everyone.

Feel free to explore my portfolio and reach out if you'd like to connect or collaborate—I’d love to hear from you!

Download Resume

Education.

Indiana University Bloomington

Master in Data Science.

August 2023 – May 2025

Vellore Institute of Technology

Bachelor in Computer Science and Engineering

June 2019 – May 2023

Experience.

Therapprove

June 2025 – Present

Data Science Intern

  • Developed Machine Learning models to match children with optimal pediatric therapists by leveraging caregiver input, provider availability and therapy specialization reducing referral-to-response time from weeks to near immediate
  • • Analyzed CRM and referral queue metadata using Postgres to identify client bottlenecks by therapy type, location, and insurance coverage surfacing trends that reduced average intake wait time and informed queue prioritization strategies.

Candid

May 2024 – December 2024

Data Science Intern

  • Developed and implemented scalable SQL-based ETL pipelines to process and standardize non-profit data from government publication (~10M+ records) accelerating internal data delivery by 25% for downstream analytics and product teams.
  • Partnered with Data Services and API engineering teams to integrate cleaned and mastered datasets into public-facing APIs, enhancing data accessibility for 10K+ external users while maintaining backward compatibility.
  • Collaborated with cross-functional product and engineering teams to translate stakeholder requirements into an intuitive Power BI dashboard, enabling data-driven decisions across departments and contributing to a 15% increase in product adoption.

eProtons

February 2023 – July 2023

Data Science Intern

  • Reconfigured PostgreSQL indexing strategies to improve query performance by 27% validated via logs, on high-volume energy datasets used in forecasting models.
  • Engineered distributed data pipelines on AWS EMR and PySpark, by orchestrating parallel data processing workflows, yielding a 5x acceleration in large-scale analytics tasks.
  • Designed and evaluated an A/B test comparing flat-rate and dynamic pricing models across EV charging stations, uncovering a 12% lift in session completion using SQL and Causal Inference techniques to control for location-based confounders.

Mukham

October 2022 – March 2023

Data Analyst

  • Spearheaded development of CNN-based facial recognition authentication models, cutting spoofing incidents by 50% across high security endpoints.
  • Augmented fraud detection performance using geolocation and time-series signals into predictive models, increasing precision by 35%.
  • Established image processing pipeline for facial data, improving image quality for 80% of enrolled users and minimizing the number of support tickets related to image failures.

Skills

Languages & Tools

PythonPython RR SQLSQL GitGit PySparkPySpark SparkSQLSparkSQL

Statistics

A/B Testing Hypothesis Testing ANOVA Chi-Square Causal Inference

Frameworks & Platforms

Apache SparkApache Spark KafkaKafka AirflowAirflow AWSAWS (EMR) GCPGCP FlaskFlask StreamlitStreamlit

Libraries

Scikit-learnScikit-learn TensorFlowTensorFlow PyTorchPyTorch KerasKeras HuggingFaceHuggingFace spaCyspaCy NLTKNLTK

ML & Modeling

Linear Regression Logistic Regression Decision Trees Random Forest AdaBoost K-Means PCA Gradient Boosting SVM CNN LSTM Time Series Forecasting BERT EDA Time-Series Analytics ETL Pipelines

Visualization & BI

Power BIPower BI TableauTableau

GenAI

Power BILangChain Power BIOpenAI API Power BILlamaIndex Power BIGroqCloud Prompt Engineering Vector Databases RAG Pipelines LLMs (BERT, GPT)

Projects.

 eBay Marketplace Product Strategy Analytic

eBay Marketplace Product Strategy Analytics

This project analyzes eBay marketplace data using both the Browse and Marketplace Insights APIs to uncover sales patterns. It evaluates how listing formats, product variations, and keywords affect pricing and sales. After data cleaning and transformation, statistical modeling identifies optimal strategies for sellers, providing actionable insights on auction vs fixed-price performance and variation bundling impact.

 A Real-Time Anomaly Detection Pipeline in Financial Trasactions

A Real-Time Anomaly Detection Pipeline in Financial Trasactions

This project demonstrates a real-time fraud detection pipeline using the PaySim dataset, where financial transactions are streamed via Kafka and processed by Apache Spark Structured Streaming. A pre-trained unsupervised model (IsolationForest) flags suspicious transactions in near real time, writing them to CSV. A Streamlit dashboard then displays these anomalies in a user-friendly interface with charts and tables, providing immediate insights. This setup can be deployed locally or on a cloud VM (e.g., GCP), showcasing an end-to-end solution that merges data engineering (Kafka, Spark) with data science (anomaly detection) and a modern web-based dashboard.

TelConnect Customer Churn Prediction

TelConnect Customer Churn Prediction

This project develops a churn prediction pipeline using PySpark, SQL, and AdaBoost to analyze 7,043 customer records. Key features like contract type, tenure, and payment method help identify high-risk customers. A cost-benefit analysis optimizes retention strategies, reducing churn while minimizing costs, ensuring a data-driven approach to customer retention.

Intelligent Multiformat Document Summarization & Q&A

Intelligent Multiformat Document Summarization & Q&A

This project provides an intelligent text summarization and Q&A pipeline using Llama 3.1 (8B) hosted on GroqCloud. It can read and summarize large PDFs, DOCX files, and TXT documents in parallel while automatically redacting sensitive data via regex. The built-in Q&A feature allows you to ask questions about your uploaded document, returning relevant answers in real time. Everything runs inside a Streamlit web interface, making it easy to upload files, generate concise summaries, and obtain instant Q&A – all without needing to manage large model infrastructure locally.

Amazon Product Review Analysis

Amazon Product Review Analysis

A comprehensive end-to-end pipeline for sentiment analysis on Amazon product reviews using both BERT and an LSTM model, followed by ad ranking logic to improve CTR. This project also demonstrates how to deploy the trained models to AWS (Lambda/SageMaker) for scalable, real-time inference.dentifies individuals, creating a robust and accurate face recognition system.

Advanced Face Recognition

Advanced Face Recognition

This project implements a face recognition pipeline by combining traditional and deep learning techniques. It uses HOG for face detection and aligns faces with 68 landmarks. A deep learning model generates 128-d embeddings, ensuring similar faces are closer while different faces remain distinct. Finally, a classifier like Linear SVM identifies individuals, creating a robust and accurate face recognition system.

Google Ads Optimizer

Google Ads Optimizer

This project optimizes Google Ads campaigns using advanced ML and data analysis. By analyzing 50,000 ad interactions, it identifies factors influencing CTR and uses Gradient Boosted Trees to predict ad clicks. A dynamic bidding system adjusts bids based on probabilities, improving targeting and ROI.

CityLink Rideshare

CityLink RideShare Hub

A platform promoting economical, sustainable intercity travel by connecting drivers and passengers on shared routes. Built with HTML, CSS, JavaScript, Flask, and SQL, it enhances accessibility and fosters a community-driven approach to travel.

Meta-Kaggle Analysis

Meta-Kaggle Analysis

Analyzes Meta-Kaggle data to explore trends in data science competitions, submission patterns, and tool popularity. Visualizations highlight Kaggle’s evolving role as a learning platform, mirroring broader trends in the data science community.

NYC Airbnb Dashboard

NYC Airbnb Dashboard

Analyzes 48,000+ listings to uncover pricing, demand, and availability patterns. Interactive visualizations highlight borough-specific trends, room type distributions, and pricing strategies, providing actionable insights for hosts, investors, and policymakers.

Talking Texts NLP

Talking Texts: NLP in Novels

Leverages NLP techniques to analyze and decode character dialogues in classic novels. Explores dialogue patterns and linguistic styles of different authors, uncovering character interactions and narrative insights.

FIFA 2022 Sentiment

FIFA Worldcup Sentiment Analysis

Analyzes Reddit comments about the 2022 World Cup using ML and transformer-based models like RoBERTa. Fine-tuned RoBERTa improved sentiment classification accuracy, showcasing modern NLP’s power for capturing nuanced fan reactions.

Social Media Mining

Social Media Mining Projects

Documentation and results for analyzing Reddit data across fashion trends, video game culture, and sports sentiments. Involves advanced analytical methods including sentiment analysis and topic modeling, revealing key community insights.

Contact.

University Email

monpatro@iu.edu

Personal Email

monishaapatro@gmail.com

Social Network

Address

Indiana University Bloomington, 107 S Indiana Ave

City

Bloomington

Pin Code

47405