Machine Learning Portfolio

A collection of machine learning projects showcasing various algorithms, techniques, and real-world applications across computer vision, regression, and classification tasks.

4
Major Projects
Computer Vision
Image Classification
Regression
Predictive Modeling
Deep Learning
Neural Networks

German Road Signs Classification

Computer Vision CNN TensorFlow
View in Colab

Project Summary

Created a custom convolutional neural network to classify 43 classes of German traffic signs for a fictional self-driving car case study, GehirnWagen. Using transfer learning and custom architecture, the model achieved strong generalization on unseen data and perfect classification of STOP signs — a critical safety metric. This project simulates real-world constraints in autonomous driving perception.

Methods

  • • Data Preprocessing: Image resizing (100x100), normalization, extensive augmentation, oversampling of underrepresented classes
  • • Model Architecture: Custom CNN with 3 convolutional layers, batch normalization, dropout, and dense layers
  • • Training: Adam optimizer, early stopping, learning rate scheduler, and model checkpointing
  • • Evaluation: Accuracy, per-class F1, STOP-sign recall, and a full confusion matrix with performance thresholds

Results

Holdout Accuracy: 99.0%
STOP Sign Accuracy: 100.0%
R² Score: 0.9958
Confidence Threshold: 98% regulatory, 85% warnings

Project Visualizations

Bike Rental Demand Prediction

Regression Neural Network TensorFlow
View in Colab

Project Summary

Trained and optimized a neural network to predict hourly bike rental demand in Washington D.C. using the Capital Bikeshare dataset. Extensive feature engineering accounted for COVID-19 lockdown periods, weather, seasonality, and commute hours. The model was evaluated using time-aware train/test splits and tuned with early stopping and dropout regularization.

Methods

  • • Feature Engineering: Temporal breakdowns, lockdown indicators, commute flags, interaction terms
  • • Architecture: 3-layer fully connected network with dropout (256 → 128 → 64 → 1)
  • • Hyperparameter Tuning: Grid search over learning rate, layer size, and dropout with early stopping
  • • Validation: Time series split to avoid leakage (Aug 2023 cutoff)

Results

R² Score: 0.9598
RMSE: 100.41
MAE: 68.03
Median Error: 42.94
Within 10%: 50.55%
Within 20%: 77.46%

Project Visualizations

Housing Price Prediction

Regression XGBoost Census Feature Engineering
View in Colab

Project Summary

Built an XGBoost regression model to predict housing prices in King County using property features and enriched U.S. Census data. The pipeline included outlier mitigation, engineered interaction terms (e.g., living area × grade), and economic indicators such as income, education, and housing statistics.

Methods

  • • Feature Engineering: Interaction features, property age, and socioeconomic Census data
  • • Model: XGBoost Regressor inside a Pipeline with preprocessing
  • • Hyperparameter Tuning: Grid search with 5-fold cross-validation (over 120+ combinations)
  • • Evaluation: RMSE, R², and absolute/percentage error comparisons

Results

Train RMSE: $44,024
Train R²: 0.9805
Test RMSE: $93,997
Test R²: 0.9145
Top Feature: Bachelors+ % (2014)
Best Model: XGBoost Regressor

Project Visualizations

Bank Marketing Campaign Analysis

Classification Logistic Regression Data Analysis
View in Colab

Project Summary

Analyzed bank marketing campaign data to predict customer subscription to term deposits. The project involved classification modeling to identify factors that influence customer decisions and optimize marketing strategies. Used various machine learning algorithms to achieve high prediction accuracy and provide actionable business insights.

Methods

  • • Data Preprocessing: Handling categorical variables, missing values, and class imbalance
  • • Feature Engineering: Creating interaction terms and encoding categorical features
  • • Model Comparison: Logistic Regression, SVM, Random Forest, and Gradient Boosting
  • • Performance Metrics: Precision, Recall, F1-score, and ROC-AUC analysis

Results

Accuracy: 91.2%
Precision: 0.847
Recall: 0.763
ROC-AUC: 0.923
F1-Score: 0.803
Best Model: Random Forest

Project Visualizations

Tools & Technologies

Programming Languages

Python SQL

ML Libraries

Scikit-learn TensorFlow XGBoost RandomForest Pandas NumPy

Visualization

Matplotlib Seaborn Plotly