- Data Scient
Hey, this is Monisha Patro!
I’m all about turning messy datasets into clear insights that actually help people. My background spans everything from building end-to-end analytics
pipelines to discovering patterns in huge datasets, and I love using data to drive real-world impact. I’m especially drawn to data science and product
analytics roles because I enjoy seeing how insights can shape a product’s success and create better experiences for everyone.
Feel free to explore my portfolio and reach out if you'd like to connect or collaborate—I’d love
to hear from you!
Master in Data Science.
August 2023 – May 2025
Bachelor in Computer Science and Engineering
June 2019 – May 2023
This project analyzes eBay marketplace data using both the Browse and Marketplace Insights APIs to uncover sales patterns. It evaluates how listing formats, product variations, and keywords affect pricing and sales. After data cleaning and transformation, statistical modeling identifies optimal strategies for sellers, providing actionable insights on auction vs fixed-price performance and variation bundling impact.
This project demonstrates a real-time fraud detection pipeline using the PaySim dataset, where financial transactions are streamed via Kafka and processed by Apache Spark Structured Streaming. A pre-trained unsupervised model (IsolationForest) flags suspicious transactions in near real time, writing them to CSV. A Streamlit dashboard then displays these anomalies in a user-friendly interface with charts and tables, providing immediate insights. This setup can be deployed locally or on a cloud VM (e.g., GCP), showcasing an end-to-end solution that merges data engineering (Kafka, Spark) with data science (anomaly detection) and a modern web-based dashboard.
This project develops a churn prediction pipeline using PySpark, SQL, and AdaBoost to analyze 7,043 customer records. Key features like contract type, tenure, and payment method help identify high-risk customers. A cost-benefit analysis optimizes retention strategies, reducing churn while minimizing costs, ensuring a data-driven approach to customer retention.
This project provides an intelligent text summarization and Q&A pipeline using Llama 3.1 (8B) hosted on GroqCloud. It can read and summarize large PDFs, DOCX files, and TXT documents in parallel while automatically redacting sensitive data via regex. The built-in Q&A feature allows you to ask questions about your uploaded document, returning relevant answers in real time. Everything runs inside a Streamlit web interface, making it easy to upload files, generate concise summaries, and obtain instant Q&A – all without needing to manage large model infrastructure locally.
A comprehensive end-to-end pipeline for sentiment analysis on Amazon product reviews using both BERT and an LSTM model, followed by ad ranking logic to improve CTR. This project also demonstrates how to deploy the trained models to AWS (Lambda/SageMaker) for scalable, real-time inference.dentifies individuals, creating a robust and accurate face recognition system.
This project implements a face recognition pipeline by combining traditional and deep learning techniques. It uses HOG for face detection and aligns faces with 68 landmarks. A deep learning model generates 128-d embeddings, ensuring similar faces are closer while different faces remain distinct. Finally, a classifier like Linear SVM identifies individuals, creating a robust and accurate face recognition system.
This project optimizes Google Ads campaigns using advanced ML and data analysis. By analyzing 50,000 ad interactions, it identifies factors influencing CTR and uses Gradient Boosted Trees to predict ad clicks. A dynamic bidding system adjusts bids based on probabilities, improving targeting and ROI.
A platform promoting economical, sustainable intercity travel by connecting drivers and passengers on shared routes. Built with HTML, CSS, JavaScript, Flask, and SQL, it enhances accessibility and fosters a community-driven approach to travel.
Analyzes Meta-Kaggle data to explore trends in data science competitions, submission patterns, and tool popularity. Visualizations highlight Kaggle’s evolving role as a learning platform, mirroring broader trends in the data science community.
Analyzes 48,000+ listings to uncover pricing, demand, and availability patterns. Interactive visualizations highlight borough-specific trends, room type distributions, and pricing strategies, providing actionable insights for hosts, investors, and policymakers.
Leverages NLP techniques to analyze and decode character dialogues in classic novels. Explores dialogue patterns and linguistic styles of different authors, uncovering character interactions and narrative insights.
monpatro@iu.edu
monishaapatro@gmail.com
Indiana University Bloomington, 107 S Indiana Ave
Bloomington
47405