AI Engineer and Data Scientist focused on building scalable machine learning, MLOps, and generative AI solutions. I work across data pipelines, model development, deployment, and retrieval-augmented systems to turn complex data into reliable products.
Out/2024 - Present
Dataside
MLOps Project: Developed a complete end-to-end Machine Learning solution, covering preprocessing, feature engineering, model training, and artifact and metric logging with MLflow. The chosen model was a Decision Tree Classifier, which achieved a ROC AUC of 1.0. For deployment, I containerized the application with Docker and exposed the prediction service via REST API with FastAPI. Big Data Anomaly Detection Project: Developed an anomaly detection solution for one of the largest steel companies in the Americas, focused on reducing material losses in inventory. I processed approximately 2 million rows in a Big Data environment using Spark and Databricks, implementing an unsupervised Isolation Forest model with Spark ML and SynapseML. The model identified 156 anomalies in the analyzed dataset. For explainability, I applied a Surrogate Model via SHAP with XGBRegressor to identify the most relevant variables in outlier classification. All experiments were tracked in MLflow, ensuring reproducibility and pipeline governance. Lubricant Oil Classification Model: Developed a Machine Learning model for a multinational company to classify lubricant oils into seven categories, emulating deterministic rules. The dataset was built using synthetic data generation, totaling approximately 2,000 samples per rule across around 516 rules. Through feature engineering and hyperparameter tuning, the model improved its ROC AUC Score from 87% to 99.9% compared to the previous version. The project included experiment tracking with MLflow, containerization with Docker, and model exposure via REST API with FastAPI.
Aug/2022 - Oct/2024
Self employed
RAG Project β Dom Rock: Developed, in partnership with Dom Rock, a RAG (Retrieval Augmented Generation solution for search and information retrieval across a base of approximately 50,000 multilingual scientific files on Alzheimer's disease. The solution employed open-source models such as Qwen and DeepSeek, optimized with Groq and LangChain, combining embedding models and vector databases to build robust NLP pipelines. Acted as an independent developer focused on creating Artificial Intelligence solutions and high-performance scalable systems. Developed robust backend systems using JavaScript, Java, and Python (Flask), ensuring efficient integration of services and APIs. Implemented NLP pipelines using open-source models (such as Qwen and DeepSeek) optimized with Groq and LangChain. Built semantic search systems using embedding models and vector databases for complex data processing.
Β© All rights reserved.