ContextWormhole
Teleport beyond context limits with transformers, into arbitrary lengths
Over the past few weeks, I’ve been deliberately experimenting with small language models + knowledge distillation. I’d started to get annoyed with “short context length” challenges of these small models, so I made ContextWormhole to side step the problem entirely. Both library and cli can be installed via pip.
Transformer models still impose strict context limits; once you exceed the position-embedding table, generation breaks. ContextWormhole addresses that constraint without altering the underlying model. Install with pip install contextwormhole and wrap any Hugging Face causal model (GPT-2, Phi, etc.) to process long inputs by recycling position IDs.
The library implements three options: Sliding Window for overlapping chunks, Hierarchical Context for chunk-then-summary recursion, and Attention Sink that retains the initial prompt plus the most recent window. Each appears as a decorator in Python or a flag in the CLI, so switching methods requires no architectural changes.
Benchmarks on distilGPT-2 (CPU) show that all three handle 10 K-token inputs. Hierarchical is fastest (~1.7 s, ~60 MB). Attention Sink yields the highest coherence score; Sliding Window offers strict continuity at higher memory cost.
It’s easy to use, and allows me to use a limited-context LLM for extended use-cases. Now, I can have as long of a context window that I need in small models.
from contextwormhole import ContextWormholeModel, ExtendedContextConfig
# Create a configuration optimized for long contexts
config = ExtendedContextConfig(
max_training_length=2048,
window_size=512,
overlap=128,
chunk_size=512,
summary_length=128,
sink_tokens=32,
temperature=0.8,
verbose=True
)
# Initialize the model with our configuration
model = ContextWormholeModel("gpt2", **config.__dict__)
# Generate text using different strategies
result1 = model.sliding_window_generate(extremely_long_document, max_new_tokens=50)
result2 = model.hierarchical_generate(extremely_long_document, max_new_tokens=50)
result3 = model.attention_sink_generate(extremely_long_document, max_new_tokens=50)Download it via pip today, and see github for technical docs. Use it with one command as follows. Please leave feedback as an issue.
contextwormhole --model gpt2 --input text.txt --strategy attention_sink --sink-tokens 2000 


https://github.com/mbhatt1/ContextWormhole