🧠 Technical Details: Pharma Email Analyzer

The Pharma Email Analyzer is powered by a transformer-based language model fine-tuned for regulatory entity extraction and post-classification flagging in pharmaceutical communications.

🔍 Model Selection

Starting with a pretrained BERT-based encoder from Hugging Face, specifically one optimized for biomedical or scientific text (e.g., BioBERT, SciBERT, or distilBERT-base-uncased). These models offer strong domain alignment with pharma language, including acronyms, compound names, and regulatory phrasing.

Key selection criteria included:

🏗️ Fine-Tuning Pipeline

The model was fine-tuned on a curated dataset of anonymized regulatory emails, annotated with:

Training involved:

⚙️ Inference and Post-Processing

At runtime, the app:

  1. Accepts raw email text via a Flask interface
  2. Tokenizes and feeds it into the fine-tuned model
  3. Extracts structured entities with character offsets
  4. Applies rule-based logic to flag missing dependencies, overdue dates, or ambiguous ownership

The output is rendered in a user-friendly format, with session-based history and example prompts to guide exploration.

Back to Homepage