Machine Learning Projects

Throughout my final year, I've experimented with a range of machine learning projects across different domains. These projects showcase my skills in data preprocessing, model development, evaluation, and interpretation.

Semester 1 Semester 2: Classical ML Semester 2: Neural Networks

Semester 1: Data Science Fundamentals

Linear and Polynomial Regression - Gold Price Analysis

Python Pandas Matplotlib Time Series Analysis

This project focused on analyzing historical gold price data (from 2004 to 2024) to identify patterns and trends. Using various statistical techniques, I preprocessed the data, performed linear and polynomial regression data analysis, and created a graph for both.

Key Findings: taught the model on a fluctuating graph, and created a line of best fit (linear and polynomial).

The analysis utilized the XAU_1Month_data dataset and implemented Linear/polynomial Regression models to map the prices on a simple 2D x,y plain.

Decision Tree for Weather Prediction

Decision Trees Sklearn Weather Forecasting Feature Selection

Developed a decision tree model to predict weather patterns using the 'weather_forecast_data.csv' dataset. This project demonstrates my ability to implement classification algorithms for real-world prediction problems.

Key Achievements: Created an interpretable model that can predict weather conditions with significant accuracy based on meteorological features.

The implementation included pre-processing of weather data, feature selection to identify the most predictive variables, and visualization of the decision tree for interpretability. I also evaluated the model's performance using cross-validation and confusion matrices.

Semester 2: Classical Machine Learning Algorithms

Large-Scale ML with GPU Acceleration

Naive Bayes SVM KNN K-Means Clustering CUDA GPU Parallelization

Tackled the challenge of applying multiple ML algorithms to a massive dataset 'all_car_adverts.csv', containing approximately 800,000 rows and 32 columns. This project demonstrates my ability to handle big data and optimize computational resources for machine learning tasks.

Technical Achievement: Successfully leveraged GPU computing with CUDA cores to process the dataset, reducing training time from days to hours compared to CPU-only approaches.

I implemented and compared four different algorithms (Naive Bayes, Support Vector Machines, K-Nearest Neighbors, and K-Means Clustering) on this large-scale dataset. Each algorithm required specific optimizations to efficiently utilize GPU resources:

Naive Bayes: Implemented batch processing techniques to manage memory constraints while maintaining accuracy
SVM: Utilized CUDA-accelerated kernels to speed up complex matrix calculations essential for large datasets
KNN: Created custom distance computation methods optimized for parallel execution on GPU
K-Means: Developed specialized data partitioning strategies to enable efficient clustering of high-dimensional data

The project required extensive tweaking of various GPU configurations, memory management, and algorithm-specific optimizations to achieve acceptable performance with such a large dataset.

Results Comparison: GPU-accelerated implementations achieved significant speedup compared to CPU versions while maintaining comparable accuracy. SVM particularly bproved to be extremely computationally intensive.

Semester 2: Neural Network Projects

CNN for Sign Language Recognition

Convolutional Neural Networks Image Classification TensorFlow/Keras Data Augmentation

Developed a Convolutional Neural Network (CNN) to recognize sign language gestures from images. This project demonstrates my understanding of deep learning techniques for computer vision problems.

The model architecture includes convolutional layers, pooling layers, and fully connected layers, designed to capture hierarchical features in the sign language images. I implemented techniques like dropout and batch normalization to prevent overfitting.

Results: Achieved over 95% accuracy on the test set, making the model practical for real-world applications.

The trained model was saved as 'asl_sign_language_model.h5' and can be deployed for real-time sign language translation. I also created a detailed explanation of the CNN architecture and its design choices in 'ExplanationofCNNModelArch.md'.

Recurrent Neural Networks for Sequence Analysis

RNN LSTM Sequence Prediction NLP

Implemented Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks for sequence analysis tasks. This project demonstrates my ability to work with sequential data and time-dependent patterns.

The models were trained on Shakespearean text data for language processing tasks such as sentiment analysis and text generation. I experimented with different network architectures, temperatures, embedding techniques, and sequence lengths to optimize performance.

Key Achievement: Successfully implemented an LSTM model that could generate coherent Shakespearean text passages after training on a large dataset of his writing.