Highlights from the InfiniEdge AI Community Call and Release 2.1

Tina Tsou
Sep 29, 2025
5 min read

In late September the Infinite Edge AI community gathered online and in person for a wide‑ranging update on Release 2.1 and a series of technical presentations. Below is a concise recap of the announcements and discussions.

Release 2.1: strengthening the platform

Edge Data Agent (EDA) and evaluation suites. Tina Tsou summarized progress across eight workstreams. A notable highlight was the Edge Data Agent, which converts a customer’s data into an on‑demand agent service that runs locally. Data remains on‑premise and code is generated just in time to process it. Release 2.1 introduced a comprehensive evaluation suite and expanded benchmark datasets; the team found that local agents now outperform general‑purpose LLMs on tasks like invoice matching and SQL reconstruction. Future releases will tackle larger real‑time data sources and more complex retrieval questions.

SPEAR and runtime enhancements. Sphere is the unified runtime for deploying agents in the cloud or at the edge. The new Spear metadata server acts as a scheduling and coordination hub, managing task registration, worker lifecycle and resource allocation. The platform supports multiple runtimes (process‑based and WebAssembly), and operators can choose to run Sphere in local or distributed mode.

Shifu and device integration. The Shifu gateway brings IoT devices into the Infinite Edge AI platform. Release 2.1 focused on bug fixes and dependency updates, preparing for a future Shifu SDK. Shifu remains lightweight and self‑healing, allowing new devices to integrate without changing core code.

Security & attestation. A major emphasis of Release 2.1 is hardware‑rooted trust. The team implemented a software Trusted Platform Module (TPM) on macOS to provide hardware‑like trust anchors even on devices without a built‑in TPM. A new proof‑of‑residency mechanism cryptographically binds the metrics agent running at the edge to the device’s TPM; all metrics are signed by a TPM‑resident key whose certificate chain flows from an attestation key (AK) to a permanent endorsement key (EK). They also added proof‑of‑geofencing, signing the device’s geographic region so the collector can verify location before accepting data.

From bearer tokens to hardware‑rooted trust

Ramki Krishnan examined the identity problem in distributed workloads. He outlined three phases: a Phase 0 world still dominated by IP‑based identity; a Phase 1 characterised by workload identities through SPIFFE/SPIRE; and a Phase 2 vision where workload, host and user identities are unified by hardware attestation and geofencing. Traditional IP‑address‑based controls are expensive and easily spoofed via VPNs, while long‑lived API keys and bearer tokens lack cryptographic binding.

The bearer‑token problem

Bearer tokens are used to secure inference applications, secret stores and even bootstrap systems. Krishnan noted that breaches such as the Okta incident have shown how HTTP HAR files can expose tokens, and Kubernetes bootstrap tokens can be stolen via server‑side request forgery. IP‑based geofencing can also be bypassed using VPNs.

Proof‑of‑residency and proof‑of‑geofencing

The proposed solution replaces bearer tokens with two cryptographically bound tokens:

Implementation details

Krishnan’s team built an end‑to‑end flow using a metrics agent, gateway and collector. The metrics agent (e.g., an OpenTelemetry agent) runs on a sovereign or edge cloud and signs metrics with a TPM‑derived application key; it also includes a workload‑geo‑ID HTTP header signed by the TPM. The gateway terminates TLS connections and enforces policy by verifying that the signature originates from an allowed host and that the timestamp is within an acceptable window. The collector performs similar checks at the payload level, including replay protection via nonces. The team was able to implement the entire workflow using a software TPM on macOS so developers can run and test the system on their laptops.

MobileLLM-R1: efficient large‑language‑model training for the edge

Zechun Liu from Meta presented MobileLLM-R1, a family of small language models designed for resource‑limited devices. He emphasised token efficiency: where popular models like Qwen‑1.0 train on ~36 trillion tokens, MobileLLM-R1 uses roughly 2 trillion tokens and a carefully balanced data mix yet achieves strong reasoning performance. The model has already been downloaded thousands of times and received media coverage.

Training strategy

Pre‑training: The team selected informative datasets and balanced coding, math and general‑knowledge sources through a leave‑one‑out analysis. Removing certain datasets, such as the multilingual “5‑way EDU,” significantly degrades performance, illustrating the importance of cross‑domain data. Domain specific data improves its own capability (e.g., math data boosts math benchmarks), but cross‑domain transfer also occurs; for instance, the StarCoder coding dataset benefits math tasks.
- Data‑mix ratios: Instead of uniform sampling, the researchers used an influence score to measure each sample’s impact on validation loss. They aggregated sample‑level scores to dataset‑level weights and used those weights to compute dynamic sampling ratios. Models trained with these ratios achieved lower perplexity than those trained with uniform sampling.
- Mid‑training (knowledge compression): A short mid‑training phase (around 5 % of total tokens) compresses the model’s knowledge by iteratively selecting high‑value samples and discarding those with negative or zero influence scores. The process converges when most samples offer no further improvement, narrowing the distribution of influence scores.
- Post‑training: Instruction tuning occurs in two stages. The first stage uses general instruction datasets (e.g., 2013), while the second stage focuses on math, coding and science reasoning. Mixing math and science data boosts both math and coding benchmarks, whereas mixing math and coding data can hurt general‑knowledge performance. Two‑stage training produces slightly better results than a single stage.

Results and lessons on RL vs SFT

The MobileLLM-R1 models match or outperform larger open‑source models despite using far fewer training tokens. For instance, the 950‑M parameter MobileLLM-R1 nearly matches performance of the 0.6‑B Qwen model while training on only 4.2 trillion tokens. In the small‑model regime, Liu found that reinforcement learning (RL) on reasoning traces provides some improvement but remains unstable; training the small model directly on supervised fine‑tuning (SFT) data distilled from a large teacher yields much higher accuracy. Extending RL for more steps did not produce further gains.

Edge Data Agent demo and future directions

Qi Wang returned to discuss the Edge Data Agent in more depth. He described data processing as a three‑layer process: raw data combined with a user’s query becomes information, and with sufficient context it becomes knowledge. EDA’s role is to handle the information layer, converting on prem data and user queries into relevant information and, eventually, knowledge. Qi noted that the project aims to address edge cases where local context and privacy demands make cloud‑only solutions inadequate. The team plans to support human‑in‑the‑loop data collection and labeling and to build agents that understand the device deployment environment.

Coding‑agent demo

As a proof of concept, Qi demonstrated a coding agent implemented in a single HTML file. Users can download the file, paste their API key and run a complete coding assistant in the browser, no installation required. The agent uses context (e.g., a URL for the Mobile MR1 paper) and a user query to generate inference code. Qi showed how the same approach could be used to build and deploy a fast MCP server by asking a large language model to create the necessary Python code and wrapping it into a batch file. He envisions pairing the coding agent with a retrieval agent, one that fetches and structures local data, so developers can build powerful on‑edge applications.

Looking ahead

The Edge AI community call highlighted how diverse research areas, secure identity, token efficient language models and on prem data agents, are converging to enable privacy preserving AI at the edge. Release 2.1 lays the groundwork with improved runtimes and hardware‑rooted trust. Krishnan’s work shows a path toward eliminating bearer tokens using proofs of residency and geofencing. Liu’s MobileLLM-R1 demonstrates that small models can achieve competitive reasoning performance with careful data selection and training strategies. Qi’s EDA demo suggests a future where agents assemble code and knowledge from local data, without requiring heavy infrastructure.

The community plans further integration among workstreams, combining secure identity with efficient models and edge data agents, and encourages collaboration to accelerate progress.

Tina Tsou