Live on This Site
Parameters2.92B
Tokens Trained~3B
Context Window256
Best PPL (Chat)22.2
ArchitectureWave Field V4
AttentionO(n log n) FFT
Currently Training
Parameters2.92B
Target Tokens10B
Context Window256
StatusTraining...
Once complete, this model will replace the current one with 3x more knowledge.
Scaling Roadmap
256 Context · 10B tokens
Base pretraining + chat tuning
$280 — TRAINING NOW
Agentic API · Tool Use
Function calling, ReAct agent framework
NEXT UP
2K Context · +2B tokens
Multi-paragraph understanding
~$150
8K Context · +2B tokens
Full document comprehension
~$250
32K Context · +3B tokens
Textbooks & long papers
~$500
128K Context · +3B tokens
Entire books & codebases
~$700
2M Context · +5B tokens
Library-level understanding
~$1,000
Cost Comparison: Wave Field vs Frontier
Wave Field
$2.9K
3B · 2M context · 22.8B tokens
Standard Transformer
$500K+
Same params · max 128K context
Wave Field's O(n log n) FFT attention is 95,000x cheaper than standard O(n²) at 2M context. Native long-context — no tricks, no approximations.
Why It's So Cheap
Standard transformers need O(n²) compute and memory for attention — at 2M tokens, that's 4 trillion ops per layer. Physically impossible on current hardware.
Wave Field replaces this with physics-based wave interference computed via FFT in O(n log n). Memory is O(n + G) instead of O(n²). This means:
• 128K context attention: 1 GB vs 128 GB
• 2M context attention: 16 GB vs 32 TB
• Total training to 2M: ~$2,900 vs impossible
Support the Project
Built from scratch by an independent researcher. Every dollar goes directly to GPU compute for scaling.
Follow & Support on X