Semantic Similarity Detection Using Deep Learning and Feature Engineering

Built and evaluated multiple machine learning models to detect semantically duplicate questions using the Quora dataset. Compared traditional classifiers using hand-engineered similarity features with deep learning models including Siamese LSTM and Bilateral Multi-Perspective Matching (BiMPM). Explored GloVe and Word2Vec embeddings and developed a hybrid architecture combining CNN, LSTM, and engineered features. The best model—fine-tuned GloVe with LSTM+CNN and baseline features—achieved a log-loss of 0.21 on the Kaggle test set, significantly outperforming standard approaches. This project highlights the effectiveness of blending deep learning with traditional NLP features for semantic similarity tasks.

Caption: Overview of the approach.

For details View Report