Asynchronous Intelligence

The following are some thoughts on asynchronous compute, AI inference, how it all connects, and why better tooling matters.

The year is 2026, and we can convert computational power on a server to useful, general-purpose intelligence. This motivates an important question: how much intelligence can we get per unit time?

Let's define a fake unit called tel, that we can count units of intelligence in, and let's assume a model of intelligence M in units of tel/tok (the intelligence per token). The intelligence per second P (tel/s) of a synchronous agent harness serving this model is the server inference speed, I (tok/s) multiplied by the model intelligence M (tel/tok). In a combined statement, P (tel/s) = IM (tel/s).

We observe that in a synchronous system, we're twice bottlenecked by factors largely outside our control: base model capability and inference speed. Frustrating, but we're not ready to give up quite yet. While we're twiddling our thumbs waiting for our server to stream more tokens, we think to ourselves that surely we can do some more computation, perhaps in parallel!

Generally, we have a single hot path, such as a user-agent interaction. While this is happening, we want several parallel agents all asynchronous to each other that can independently perform computation and continually integrate their computation (converted to intelligence) into the hot path.

In this asynchronous system, we can define two more parameters, n, the number of parallel agents and α, the percentage of asynchronous computation that can be usefully integrated into the hot path.

Now, our new statement is P = (αn)(IM). This is great — we can scale n to very high numbers, and we can keep developing better asynchronous infrastructure that pushes α closer to 1. By adding better tooling, we're able to overcome inference bottlenecks and access dramatically more intelligence.

We're early to the asynchronous intelligence era. In the not-to-distant future, it's reasonable to envision >99% of intelligence resulting from asynchronous computation integrated together.

This has been the vision of Asymmetric since Day 1, and we're building research-driven products at the bleeding edge of this future. Our journey started with Policies, asynchronous guardrails defined in natural language instructions. We then developed Memories, an adaptive memory system that uses asynchronous computation to review past agent interactions, extract useful learnings and user-specific knowledge, and integrate them into future inference. Our latest feature, Adaption, continually and asynchronously finetunes the most recalled memories into low-rank adapters (LoRAs) that are used at inference-time, resulting in models that keep getting better as they're used more.

If you're excited by asynchronous intelligence, send us an email or book an onboarding.

BOOK AN ONBOARDING CALL