LSTM Architecture Variants for Named Entity Recognition (NER)
Completed in The University of Manchester, 2022
Named Entity Recognition (NER) is a critical task in Natural Language Processing (NLP) that involves identifying and classifying entities such as names, dates, and locations in text. This project explored the use of LSTM-based architectures for NER, with a focus on improving performance through hyperparameter tuning, enriched text representations, and advanced model components like BiLSTM and Conditional Random Fields (CRF). The final model achieved an F1 score of 95.88% on the CoNLL-2003 English dataset, demonstrating the effectiveness of these techniques, despite the goal of the project being to explore and analyse architectural extensions.
Project Overview
- Duration: September 2021 – May 2022 (8 Months)
- Technologies and Tools: Python, PyTorch, GloVe embeddings, BiLSTM, CRF, Optuna (hyperparameter tuning), CNN (character-level embeddings), POS tagging
- Skills Developed: Deep learning for NLP, hyperparameter optimization, experimental design, model evaluation, and data preprocessing
Results and Insights
- Importance of Text Representations:
- The use of GloVe embeddings significantly improved model performance, highlighting the critical role of high-quality text representations in NLP tasks.
- Extending the model with CNN character-level embeddings and POS tag embeddings further enhanced performance, achieving an F1 score of 95.88% and 95.84%, respectively. This underscores the value of incorporating additional linguistic features into the model.
- Bidirectional Context and CRF Layers:
- The BiLSTM-CRF architecture outperformed baseline LSTM models, demonstrating the importance of capturing bidirectional context and leveraging CRF layers for sequence labeling tasks.
- The CRF layer, while not always improving performance in isolation, proved valuable when combined with other advanced components.
- Hyperparameter Optimization:
- Using Optuna for hyperparameter tuning led to a best F1 score of 95.55%, showcasing the importance of systematic optimization in achieving state-of-the-art results.
- Competitive Performance:
- The final model achieved an F1 score of 92.52% on the test set, placing it within the range of state-of-the-art methods for the CoNLL-2003 dataset at the time.
- This performance validates the effectiveness of the chosen architecture and highlights the potential for further improvements through enriched text representations.
Computational Constraints: Balancing model complexity and training time was a key challenge. Pruning less promising trials early and focusing on faster-converging models helped manage this issue.
- Text Representation Limitations: While GloVe embeddings provided strong baseline performance, the project revealed opportunities for further improvement through large-scale embeddings (e.g., BERT, ELMo) and domain-specific representations.
- Generalization Across Domains: The project focused on the CoNLL-2003 dataset (news domain), but future work could explore the model’s performance on datasets from other domains (e.g., biomedical, financial).
Future Work
- Integration of Large-Scale Embeddings: Incorporate large pre-trained embeddings (e.g., BERT, ELMo) to further improve text representations and model performance.
- Cross-Domain Evaluation: Test the model on datasets from different domains to assess its generalizability and adaptability.
- Exhaustive Hyperparameter Search: Explore combinations of dependent components (e.g., CNN and POS embeddings) to identify the optimal architecture for specific tasks.
Conclusion
This project successfully demonstrated the effectiveness of LSTM-based architectures for Named Entity Recognition, achieving competitive results on the CoNLL-2003 dataset. The work highlighted the critical role of text representations, bidirectional context, and systematic hyperparameter tuning in NLP tasks. The insights gained from this project provide a strong foundation for future research, particularly in the areas of enriched text representations and cross-domain generalization.