Architecture Stack | README | Model Research

Architecture Stack

Chapters

Tech Stack

Proposed text stack for building the model architecture

Table of Contents

</div>

Python (Backend) (Standard for LLMs)
- PyTorch (Deep Learning Framework)
- HuggingFace Transformer
- vLLM (Interface Engine)
React TypeScript (Frontend)

To establish TEEs, the hardware would require one of the following:

NLTK (Text Processing and Tokenisation)
HuggingFace Dataset (Apache Arrow)
- Memory mapped storage (stored in secure folders)

Clinical-Longformer
- 4096 tokens
- More Clinical Tuned
BigBird
- 4096 tokens
- Ideal for understanding long texts
- 128 million parameters

DeBERTa-v3 (Outpreforms BERT, RoBERTa and DeBERTa versions)
- Base Model :~: 184 million (parameters)
- Large :~: 435 million
- XLarge :~: 750 million
- XXLarge :~: 1.5 billion

Pinecone/Milvus/pgvector (Vector)
- Zilliz Cloud (Milvus Serverless option - reduces maintenance)
SQLite/PostgreSQL (Relational)
LangChain/LlamaIndex (Orchestrator)

How the model will operate, align itself, and component expansion

</div>

Table of Contents

Policy Extraction
- Crawler
- Document Parser
- Removing Irrelevant metadata
List Aggregation and Handling
- Lists are merged into paragraphs
- Long lists are split but include context statements and labels
Semantic Segmentation
- Algo to break doc into segments
- Ensure context is kept with cliques of related sentences

Domain-Specific Embeddings
- Subword Embeddings
- Train on a massive number of specific security policies
Hierarchical Multi-Label Classification
- 2 stage model: Top and Bottom
- Predict High-Level ISO domain
- Predict Low-Level classifiers for fine-grain attributes
Class Imbalance Handling
- Data Augmentation
- Class Weighting

NIST AI Risk Management Framework
- Development and Deployment Life Cycle (Governed)
- Must incorporate trustworthiness, transparency, and risk mitigation (Continuously)
Trusted Execution Environments
- Hardware-enforced isolation
Threat Modeling
- Threat Model Testing (Before Deployment)
  - OWASP
  - Microsoft Threat Modeling Tool