ContextWormhole

Teleport beyond context limits with transformers, into arbitrary lengths

May 16, 2025

Over the past few weeks, I’ve been deliberately experimenting with small language models + knowledge distillation. I’d started to get annoyed with “short context length” challenges of these small models, so I made ContextWormhole to side step the problem entirely. Both library and cli can be installed via pip.

Travelling through a wormhole without dying may actually be ...

Transformer models still impose strict context limits; once you exceed the position-embedding table, generation breaks. ContextWormhole addresses that constraint without altering the underlying model. Install with pip install contextwormhole and wrap any Hugging Face causal model (GPT-2, Phi, etc.) to process long inputs by recycling position IDs.

The library implements three options: Sliding Window for overlapping chunks, Hierarchical Context for chunk-then-summary recursion, and Attention Sink that retains the initial prompt plus the most recent window. Each appears as a decorator in Python or a flag in the CLI, so switching methods requires no architectural changes.

Benchmarks on distilGPT-2 (CPU) show that all three handle 10 K-token inputs. Hierarchical is fastest (~1.7 s, ~60 MB). Attention Sink yields the highest coherence score; Sliding Window offers strict continuity at higher memory cost.

It’s easy to use, and allows me to use a limited-context LLM for extended use-cases. Now, I can have as long of a context window that I need in small models.

from contextwormhole import ContextWormholeModel, ExtendedContextConfig

# Create a configuration optimized for long contexts
config = ExtendedContextConfig(
    max_training_length=2048,
    window_size=512,
    overlap=128,
    chunk_size=512,
    summary_length=128,
    sink_tokens=32,
    temperature=0.8,
    verbose=True
)

# Initialize the model with our configuration
model = ContextWormholeModel("gpt2", **config.__dict__)

# Generate text using different strategies
result1 = model.sliding_window_generate(extremely_long_document, max_new_tokens=50)
result2 = model.hierarchical_generate(extremely_long_document, max_new_tokens=50)
result3 = model.attention_sink_generate(extremely_long_document, max_new_tokens=50)

Download it via pip today, and see github for technical docs. Use it with one command as follows. Please leave feedback as an issue.

contextwormhole --model gpt2 --input text.txt --strategy attention_sink --sink-tokens 2000

Manish’s Substack

Discussion about this post

Ready for more?