Projects

Analysing Mutational Pathways of Skin in Cancerous Patients

Research Assistance, Thomas Jefferson University, Sidney Kimmel Cancer Center, Cancer Genomics and Bioinformatics Core, 2024

As a volunteer research assistant at Thomas Jefferson University, I contributed to the Cancer Genomics and Bioinformatics Core under the guidance of Professor Paolo Fortina. My work focused on developing and optimizing pipelines for post-sequencing genomic data analysis, with the ultimate goal of offering advanced analytical services to internal researchers and compete with external vendors. This involved comparing Illumina DRAGEN Apps and Open Source Software solutions for secondary analysis, as well as implementing mutation signature analysis using SigProfiler, a tool based on non-negative matrix factorization (NMF), to identify patterns in DNA nucleotide changes and infer the causes of cancer. I had the responsiblity of ensuring the robustness of pipelines, translating between data models to help researches in using the software of their own preference for analysis, and ensure fast turnaround of secondary analysis.

Diversity Enforcement in Ensemble Models

MSc Thesis, The University of Manchester, 2024

My MSc thesis explores the role of ensemble diversity in machine learning, focusing on the development and application of a unified theory of diversity in ensemble models. Ensemble learning combines the outputs of multiple models to improve performance on prediction tasks, and diversity among ensemble members is a critical factor in their success. This project investigates the bias-variance-diversity trade-off, reproduces key findings from prior research, and extends the Negative Correlation Learning (NCL) framework to a broader family of loss functions, specifically Bregman Divergences.

Attributing Extremes in Weather

Research Investigation, Met Office, Informatics Lab, 2023

This project investigated the predictive potential of climate indices in explaining extreme weather outcomes across multiple meteorological variables, such as temperature, wind, and humidity. A climate index is a quantitative measure that summarizes key aspects of the climate system, such as sea surface temperature anomalies or atmospheric pressure patterns, and is often used to study and predict large-scale climate phenomena (e.g., El NiƱo-Southern Oscillation, North Atlantic Oscillation). The goal was to determine whether these indices could serve as reliable predictors for extreme weather events, contributing to improved long-range weather forecasting and climate research.

LSTM Architecture Variants for Named Entity Recognition (NER)

Undergraduate Final Project, The University of Manchester, 2022

Named Entity Recognition (NER) is a critical task in Natural Language Processing (NLP) that involves identifying and classifying entities such as names, dates, and locations in text. This project explored the use of LSTM-based architectures for NER, with a focus on improving performance through hyperparameter tuning, enriched text representations, and advanced model components like BiLSTM and Conditional Random Fields (CRF). The final model achieved an F1 score of 95.88% on the CoNLL-2003 English dataset, demonstrating the effectiveness of these techniques, despite the goal of the project being to explore and analyse architectural extensions.

Authorship Attribution Applied to Music

Natural Language Understanding Project, The University of Manchester, 2022

Authorship attribution is a multi-class classification problem aimed at identifying the author of a given piece of text from a predefined list of authors. This project explored the application of authorship attribution techniques to song lyrics, leveraging advanced neural network architectures such as LSTM, GRU, and Siamese networks. The goal was to classify lyrics by artist and investigate the potential for creating a music recommendation system based solely on lyrical content.