LSTM Architecture Variants for Named Entity Recognition (NER)

Completed in The University of Manchester, 2022

Named Entity Recognition (NER) is a critical task in Natural Language Processing (NLP) that involves identifying and classifying entities such as names, dates, and locations in text. This project explored the use of LSTM-based architectures for NER, with a focus on improving performance through hyperparameter tuning, enriched text representations, and advanced model components like BiLSTM and Conditional Random Fields (CRF). The final model achieved an F1 score of 95.88% on the CoNLL-2003 English dataset, demonstrating the effectiveness of these techniques, despite the goal of the project being to explore and analyse architectural extensions.

Project Overview

  • Duration: September 2021 – May 2022 (8 Months)
  • Technologies and Tools: Python, PyTorch, GloVe embeddings, BiLSTM, CRF, Optuna (hyperparameter tuning), CNN (character-level embeddings), POS tagging
  • Skills Developed: Deep learning for NLP, hyperparameter optimization, experimental design, model evaluation, and data preprocessing

Results and Insights

  1. Importance of Text Representations:
    • The use of GloVe embeddings significantly improved model performance, highlighting the critical role of high-quality text representations in NLP tasks.
    • Extending the model with CNN character-level embeddings and POS tag embeddings further enhanced performance, achieving an F1 score of 95.88% and 95.84%, respectively. This underscores the value of incorporating additional linguistic features into the model.
  2. Bidirectional Context and CRF Layers:
    • The BiLSTM-CRF architecture outperformed baseline LSTM models, demonstrating the importance of capturing bidirectional context and leveraging CRF layers for sequence labeling tasks.
    • The CRF layer, while not always improving performance in isolation, proved valuable when combined with other advanced components.
  3. Hyperparameter Optimization:
    • Using Optuna for hyperparameter tuning led to a best F1 score of 95.55%, showcasing the importance of systematic optimization in achieving state-of-the-art results.
  4. Competitive Performance:
    • The final model achieved an F1 score of 92.52% on the test set, placing it within the range of state-of-the-art methods for the CoNLL-2003 dataset at the time.
    • This performance validates the effectiveness of the chosen architecture and highlights the potential for further improvements through enriched text representations.
  5. Computational Constraints: Balancing model complexity and training time was a key challenge. Pruning less promising trials early and focusing on faster-converging models helped manage this issue.

  6. Text Representation Limitations: While GloVe embeddings provided strong baseline performance, the project revealed opportunities for further improvement through large-scale embeddings (e.g., BERT, ELMo) and domain-specific representations.
  7. Generalization Across Domains: The project focused on the CoNLL-2003 dataset (news domain), but future work could explore the model’s performance on datasets from other domains (e.g., biomedical, financial).

Future Work

  • Integration of Large-Scale Embeddings: Incorporate large pre-trained embeddings (e.g., BERT, ELMo) to further improve text representations and model performance.
  • Cross-Domain Evaluation: Test the model on datasets from different domains to assess its generalizability and adaptability.
  • Exhaustive Hyperparameter Search: Explore combinations of dependent components (e.g., CNN and POS embeddings) to identify the optimal architecture for specific tasks.

Conclusion

This project successfully demonstrated the effectiveness of LSTM-based architectures for Named Entity Recognition, achieving competitive results on the CoNLL-2003 dataset. The work highlighted the critical role of text representations, bidirectional context, and systematic hyperparameter tuning in NLP tasks. The insights gained from this project provide a strong foundation for future research, particularly in the areas of enriched text representations and cross-domain generalization.