The Absolute: A Physics-Grounded Universal Measurement Framework for Human-Machine Intelligence Systems

Abstract

We introduce The Absolute, a universal measurement framework for intelligence systems grounded in fundamental physics limits. The framework defines an asymptotically approachable but unreachable reference point, analogous to absolute zero in thermodynamics, at which the complete intelligence cycle (sense, align, output, reset) operates at the boundaries imposed by the Landauer limit, Shannon limit, quantum noise floor, and thermodynamic efficiency ceilings. Unlike existing benchmarks that measure system performance relative to human baselines or task-specific metrics, The Absolute anchors evaluation to the non-negotiable constraints of the physical universe, producing measurements that never expire and never require recalibration.

The framework's central contribution is the Absolute Extension Ratio (AER), a composite metric that measures effective human extension as the product of system capability (expressed as a fraction of the relevant physics limit) and coupling efficiency between system and human operator. The AER captures a dimension no existing metric addresses: how much of a machine's capability actually reaches the human as usable extension of their ability to think, decide, and act. We derive four necessary and sufficient conditions (axioms) for intelligence systems approaching the thermodynamic limit, assess their closure through semantic consistency testing across three frontier AI systems, and map a complete implementation architecture to commercially available components. We identify three open problems (catastrophic novelty, co-adaptation, multi-human coordination) and propose this framework as an open standard for universal adoption.

Keywords: physical intelligence, artificial intelligence measurement, thermodynamic limits, human-machine systems, Landauer limit, Shannon limit, benchmarking, Absolute Extension Ratio

1. Introduction

The field of artificial intelligence lacks a universal, physics-grounded definition of perfect performance. Current evaluation approaches benchmark systems against human baselines (DeepMind's Levels of AGI framework [1]), task-specific success rates (LIBERO [2], BEHAVIOR-1K [3]), or relative improvements over prior models (MMLU [4], ARC-AGI [5]). Each approach shares a structural limitation: the reference point is either arbitrary, domain-specific, or subject to obsolescence as systems improve.

Thermodynamics solved an equivalent problem in the 19th century. The recognition that the laws of physics impose an absolute lower bound on thermal energy (0 K) gave thermal science a fixed reference point against which any system, anywhere, could be measured. This paper asks whether intelligence, broadly defined as the cycle of sensing, aligning sensory input with internal models, producing output, and resetting, possesses an equivalent bound.

We demonstrate that it does. The fundamental physics limits governing computation (Landauer [6]), information transfer (Shannon [7]), sensing (quantum noise floor [8]), and energy conversion (thermodynamic efficiency ceilings [9]) impose hard boundaries on every stage of the intelligence cycle. These boundaries are substrate-independent: they constrain biological neural tissue, silicon processors, and any future computing medium equally. We define The Absolute as the point at which every stage of the intelligence cycle operates at these limits simultaneously.

The framework makes a critical departure from prior work in physics-informed intelligence measurement. Existing proposals, including Takahashi and Hayashi's thermodynamic limits of physical intelligence [10], Perrier's watts-per-intelligence metric [11], and information-theoretic approaches extending Shannon's framework to computing [12], measure the machine in isolation. The Absolute measures the human-machine pair. This distinction is grounded in a foundational claim: the purpose of artificial intelligence, whether embodied or purely cognitive, is to extend human capability, not to replace it. The relevant quantity is therefore not how capable the machine is, but how much of that capability reaches the human as usable extension.

We formalize this through the Absolute Extension Ratio (AER), defined as system capability (expressed as a fraction of the relevant physics limit) multiplied by coupling efficiency between system and human. This metric has no precedent in the existing literature. It captures a dimension of system performance that no current benchmark addresses and provides a single, comparable number applicable across domains, substrates, and system types.

The paper is organized as follows. Section 2 reviews related work across thermodynamic intelligence, human-machine systems, and existing measurement frameworks. Section 3 presents the theoretical foundation of The Absolute. Section 4 derives the four axioms necessary and sufficient for intelligence systems approaching the thermodynamic limit. Section 5 introduces the Absolute Extension Ratio and its measurement methodology. Section 6 reports semantic consistency testing and empirical validation protocol. Section 7 maps the axioms to a complete implementation architecture. Section 8 names open problems. Section 9 discusses implications and limitations.

2. Related Work

2.1. Thermodynamic Approaches to Intelligence

The relationship between thermodynamics and intelligence has been explored across several independent research programs. Friston's free energy principle [13] roots biological intelligence in variational inference, proposing that living systems minimize surprisal (equivalently, variational free energy) as a sufficient account of neuronal dynamics, perception, and action. The principle provides a descriptive framework for how intelligence operates but does not define a measurement standard against which system performance can be quantified. Friston has noted that the free energy principle, like Hamilton's principle of stationary action, cannot be falsified in the traditional sense [14].

Wissner-Gross and Freer proposed causal entropic forces as a basis for intelligence, demonstrating that a system maximizing future causal entropy spontaneously exhibits intelligent behavior [15]. Still established a fundamental equivalence between predictive model inefficiency and thermodynamic dissipation [16]. England connected self-organization to energy dissipation through his theory of dissipation-driven adaptation [17]. Each of these contributions addresses the thermodynamic foundations of intelligent behavior. None provides a benchmarking standard.

The most directly adjacent recent work is Takahashi and Hayashi's "Thermodynamic Limits of Physical Intelligence" [10], posted to arXiv in February 2026. This paper derives two bits-per-joule metrics benchmarked against Landauer's bound, formalizing thermodynamic learning inequalities as corollaries of a closed-cycle intelligence formulation. Their approach measures the machine. The present framework measures the human-machine pair. Additionally, Perrier's watts-per-intelligence metric [11] links energy consumption to information-processing capacity through Landauer's principles. These convergent and independent efforts confirm that the physics-informed intelligence measurement space is maturing rapidly.

2.2. Human-Machine Systems and Coupling Metrics

The concept of intelligence as a property of the human-machine pair has historical roots in Licklider's "Man-Computer Symbiosis" [18] and Clark and Chalmers' extended cognition thesis [19]. Existing metrics for human-machine system performance include NASA-TLX for workload assessment [20], SAGAT for situation awareness [21], and comprehensive human-machine teaming frameworks addressing task allocation, trust calibration, and shared mental models [22]. Recent work on human-AI complementarity formalizes conditions under which collaborative performance exceeds either party acting alone [23].

None of these metrics expresses human-machine system performance as a fraction of physics limits. NASA-TLX measures subjective workload across six dimensions. SAGAT measures operator awareness of system state. The complementarity framework measures performance relative to individual baselines. Each provides empirically validated measurement within its domain. The Absolute Extension Ratio proposes a different kind of measurement: one anchored to the universe's own constraints rather than to human baselines or subjective assessment.

2.3. Intelligence Measurement Frameworks

Legg and Hutter proposed a universal intelligence definition grounded in Kolmogorov complexity and Solomonoff induction [24]. Their formulation is mathematically rigorous but not computable, limiting practical adoption. Chollet's ARC benchmark [5] brought psychometric design principles to intelligence measurement, defining intelligence as skill-acquisition efficiency over a distribution of tasks. DeepMind's Levels of AGI framework [1] proposed a matrix of performance (Emerging through Superhuman) and generality (Narrow through General) benchmarked against human percentiles. A 2025 interdisciplinary review found that AI benchmarks broadly suffer from construct validity issues, particularly when they claim to measure universal or general capabilities [25].

The present framework differs from all of the above in three respects: (1) the reference point is derived from physics, not from human performance or task distributions; (2) the unit of measurement is the human-machine pair, not the machine alone; (3) the metric is a single comparable number (AER) applicable across domains, substrates, and system types.

3. Theoretical Foundation

3.1. The Intelligence Cycle

We define intelligence operationally as a cycle with four phases: sense (acquire information from the environment or domain), align (integrate sensory input with an internal model of reality), output (produce a result, whether cognitive or physical), and reset (return the system to a state ready for the next cycle). This cycle is substrate-independent. A biological nervous system runs it when processing sensory input and generating motor commands. A language model runs it when processing a prompt and generating a response. A robotic system runs it when sensing its environment, planning a trajectory, and executing a movement.

The cycle's abstraction level is chosen deliberately. It is coarse enough to span all intelligent systems (biological, artificial, cognitive, physical) and fine enough that each phase maps to measurable physical quantities with known fundamental limits.

3.2. Fundamental Physics Limits

Each phase of the intelligence cycle is bounded by one or more fundamental physics limits. These limits emerge from the laws of thermodynamics, information theory, and quantum mechanics. They are not engineering limits subject to future improvement. They are the terms and conditions of the physical universe.

Computation (Landauer limit). The minimum energy required to irreversibly erase one bit of information is kT ln 2, where k is the Boltzmann constant and T is temperature in Kelvin [6]. At room temperature (300 K), this evaluates to approximately 2.87 × 10⁻²¹ joules per bit. This limit has been experimentally verified [26]. Current transistors operate at approximately 10⁻¹⁵ joules per switching event, roughly six orders of magnitude above the Landauer floor.

Scope note. This bound applies to logically irreversible operations (bit erasure) in a system coupled to a thermal bath. In principle, reversible computation can approach arbitrarily low dissipation, but essentially all contemporary practical digital systems remain dominated by irreversible operations.

Computation rate (Margolus-Levitin theorem). The maximum rate at which a physical system can transition between orthogonal quantum states is bounded by 4E/h, where E is the system's average energy and h is Planck's constant [27]. For a system with one joule of available energy, this yields approximately 6 × 10³³ operations per second.

Information transfer (Shannon limit). The channel capacity C = B log₂(1 + S/N) defines the maximum rate at which information can be transmitted through a noisy channel with arbitrarily low error [7]. This limit governs every information channel in an intelligence system: sensor to processor, processor to actuator, model to output, and system to human.

Sensing (quantum noise floor). The Heisenberg uncertainty principle imposes an irreducible minimum on measurement noise [8]. Below this floor, information does not exist to be captured. This limit applies to physical sensing processes: measuring photons, forces, temperatures, and other physical observables at scales where quantum uncertainty dominates.

Energy conversion (thermodynamic efficiency ceilings). Biological and artificial energy conversion processes each face specific thermodynamic ceilings. ATP hydrolysis operates at a theoretical maximum of 60-70% under ideal conditions. Mitochondrial oxidative phosphorylation reaches approximately 40%. Muscle contraction achieves roughly 25% [9]. The best electric motors reach 96% efficiency. Electroadhesive clutches hold loads at milliwatts [28].

3.3. Definition of The Absolute

Definition 1 (The Absolute). The Absolute is the state at which every phase of the intelligence cycle (sense, align, output, reset) operates simultaneously at the fundamental physics limits governing that phase. No physical system can reach The Absolute. Any physical system's performance can be expressed as a fraction of The Absolute along each axis of the intelligence cycle.

The Absolute functions identically to absolute zero in thermodynamics: an asymptotically approachable but unreachable reference point derived from the laws of physics, not from engineering practice. Because the reference point is physics, it is substrate-independent (applies to carbon and silicon equally), domain-independent (applies to cognitive and physical tasks equally), and temporally stable (never requires recalibration as technology improves).

3.4. Human Purpose as Thermodynamic Process

Definition 2 (Purpose). Purpose is the sustained, model-driven allocation of free energy toward preferred future states of reality.

This definition is physical, not metaphorical. A system exhibits purpose when it: (a) maintains an internal model of possible future states, (b) evaluates those states against a preference function, and (c) sustains directed energy allocation toward realizing preferred states. A rock exhibits no purpose (no model, no preference, no directed allocation). A bacterium swimming up a nutrient gradient exhibits minimal purpose. A researcher designing an experiment to test a hypothesis about a disease mechanism exhibits the most complex form of purpose currently known: modeling states of reality that have never existed and sustaining directed allocation toward making them real.

Under this definition, the machine's role is that of a purpose compiler: translating human intention into reality at the combined thermodynamic limit of human and machine. Assuming perfect alignment between the system's internal model and human intention, the two forms of waste defined in Section 4.4, thermodynamic dissipation and information-theoretic redundancy, together serve as a strict engineering proxy for execution inefficiency. Every wasted joule of physical energy and every wasted computation cycle represents a thermodynamic divergence from the perfectly efficient execution of the intended goal.

4. Four Axioms for Intelligence at the Thermodynamic Limit

Working backward from The Absolute, we derive four conditions that are individually necessary and jointly sufficient for an intelligence system to approach the thermodynamic limit. Each axiom has deep roots in existing research (detailed below). The contribution here is not the discovery of these principles in isolation but their identification as a closed, complete, and minimal set: removing any single axiom changes the system's class fundamentally, and no fifth axiom can be added without reducing to a combination of the existing four.

4.1. Axiom 1: Post-Linguistic Cognition

At execution time, the system reasons in the native variables of its operating domain, not through natural language representations. Language serves as a human interface layer for instruction, debugging, and collaboration. The system's internal reasoning operates on domain-native representations: physics variables for physical tasks, mathematical structures for mathematical tasks, molecular representations for biochemical tasks.

Derivation from The Absolute. Language is a lossy compression of domain information. Forcing execution-time reasoning through a language bottleneck injects information-theoretic redundancy at the first stage of the intelligence cycle. A system at the thermodynamic limit cannot tolerate avoidable information loss at any stage.

Relationship to prior work. This axiom describes the operating principle of subsymbolic AI and connectionism [29], which processes domain-native representations (images as pixel arrays, states as continuous vectors) without linguistic intermediation. Vision-Language-Action models [30] that convert perception to language tokens for reasoning represent the alternative paradigm. The axiom does not claim novelty for non-linguistic processing. It claims that linguistic processing at execution time is incompatible with the thermodynamic limit.

4.2. Axiom 2: Differentiable Reality Modeling

The system's internal model of reality is learnable. When predictions fail, gradients update the causal model of reality itself, not merely the output policy. The system distinguishes between output errors ("I acted incorrectly given my understanding") and model errors ("my understanding of reality is incorrect").

Derivation from The Absolute. A system with a fixed reality model accumulates model-reality divergence over time. Each divergence introduces prediction error. Prediction error at any cycle stage propagates as both thermodynamic dissipation and information-theoretic redundancy through all subsequent stages. A system at the thermodynamic limit must continuously minimize model-reality divergence.

Relationship to prior work. Differentiable physics simulation is a mature research field with established tools including MuJoCo MJX [31], Google Brax [32], NVIDIA Newton [33], and DiffTaichi [34]. The axiom does not claim novelty for differentiable simulation. It claims that fixed (non-differentiable) reality models are incompatible with the thermodynamic limit.

4.3. Axiom 3: Targeted Internal Simulation (Artificial Dreaming)

The system uses surprise signals (prediction failures) to seed targeted internal simulation. Simulation budgets are allocated to the specific frontier of the system's ignorance, generating thousands of variations around identified failure points faster than real-time interaction permits.

Derivation from The Absolute. A system limited to learning at the speed of real-time interaction accumulates an experience deficit relative to the complexity of its operating environment. This deficit manifests as suboptimal output: the system encounters novel situations it has not yet learned to handle. At the thermodynamic limit, the system's internal model must converge on the true dynamics of its environment faster than real-time interaction alone permits.

Relationship to prior work. Artificial curiosity [35], world models [36], the Dreamer series [37], LeCun's JEPA architecture [38], and curiosity-driven exploration [39] all implement variants of internal simulation driven by prediction error or information gain. The metaphor of "dreaming" is deliberately evocative but the mechanism is well-established. The axiom's contribution is positional: identifying this mechanism as necessary for approaching the thermodynamic limit and specifying that simulation must be targeted (seeded by gradient signals from Axiom 2), not random.

4.4. Axiom 4: Zero-Entropy Execution

The system's primary cost function combines two distinct quantities: thermodynamic entropy production (ΔS, measured in J/K, capturing energy wasted as heat) and information-theoretic redundancy (excess bits processed beyond the task's minimum description length, measured in bits). Both represent waste; they are measured independently and on different scales. Under the assumption of perfect human-machine alignment, every wasted joule of physical energy and every wasted computation cycle acts as a proxy for execution inefficiency.

Derivation from The Absolute. The Absolute is defined as operation at fundamental physics limits. Entropy production above the thermodynamic minimum is, by definition, the distance from those limits. Therefore, minimizing entropy production is equivalent to approaching The Absolute.

Relationship to prior work. Efficiency optimization is a universal engineering concern, and cross-entropy loss functions are standard in deep learning. Prigogine's analysis of entropy production in irreversible processes [40] provides historical context for thermodynamic cost analysis. The minimum entropy production theorem applies to linear near-equilibrium regimes; the present axiom invokes entropy minimization as an engineering design objective for systems operating far from equilibrium, not as a claim that such systems naturally evolve toward minimum dissipation. The axiom's contribution is framing entropy as the primary cost function (not a secondary optimization target) and connecting it explicitly to the human purpose framework: entropy production is the primary engineering proxy for execution inefficiency, conditional on the alignment assumption stated in Section 3.4.

4.5. Axiom Closure

The four axioms form a closed dependency loop. Differentiable Reality Modeling (Axiom 2) provides the gradient signal that Targeted Internal Simulation (Axiom 3) uses to allocate simulation budgets. Axiom 3 provides the sample efficiency that Zero-Entropy Execution (Axiom 4) requires to minimize wasted computation. Axiom 4 provides the optimization target that Post-Linguistic Cognition (Axiom 1) enables, because neither thermodynamic dissipation nor information-theoretic redundancy can be minimized in a system reasoning through a lossy abstraction layer. Axiom 1 provides the representational foundation that Axiom 2 requires, because gradients cannot flow through language token representations of physical or mathematical reality.

Removing any single axiom does not degrade performance. It changes the system's class, preventing it from reaching the thermodynamic limit on any axis. Section 6 details the specific failure mode produced by each axiom's removal.

4.6. Falsifiability

The framework can be falsified by any of the following: (1) demonstrating that a system violating one or more axioms achieves performance at the thermodynamic limit; (2) demonstrating that the stated physics limits are incorrect; (3) identifying a fifth independent condition necessary for approaching the thermodynamic limit that does not reduce to a combination of the four axioms; (4) demonstrating that the axiom dependency loop contains a broken link (i.e., removing one axiom does not change the system's class).

5. The Absolute Extension Ratio

5.1. Motivation

Existing benchmarks measure machine performance in isolation. A surgical robot's sensing resolution, a language model's accuracy on reasoning tasks, and a robotic manipulator's grasp success rate are all properties of the machine alone. The Absolute proposes that the relevant unit of measurement is not the machine but the human-machine pair, grounded in the claim that the purpose of artificial intelligence is to extend human capability.

A system at 90% of the physics limit with 50% coupling efficiency extends the human less than a system at 60% of the physics limit with 95% coupling efficiency. No existing metric captures this relationship.

5.2. Definition

Definition 3 (Absolute Extension Ratio). The AER is the product of system capability C (expressed as a fraction of the relevant physics limit, where C ∈ [0, 1]) and coupling efficiency η (expressed as a fraction of lossless transfer between system and human, where η ∈ [0, 1]):

AER = C × η

Requirement (Capability reporting). The physics limit used to compute C is determined by the measurement axis: (i) energy efficiency per computation → Landauer limit (kT ln 2 per irreversible bit erasure); (ii) information throughput → Shannon channel capacity; (iii) computation rate → Margolus-Levitin bound; (iv) sensing resolution → quantum noise floor; (v) energy conversion → thermodynamic ceiling of the conversion process. Every reported C value MUST include: the axis, the chosen physics limit, and the environmental parameters required by that limit (at minimum temperature T for Landauer; bandwidth B and noise power for Shannon; mean energy above ground state for Margolus-Levitin).

Operational definition of capability C. The phrase “relevant physics limit” is phase-specific. Because different phases of the intelligence cycle are bounded by different limits (Section 3.2), capability is measured as a vector over phases and reported with attribution:
C⃗ = (C_sense, C_align, C_output, C_reset), with each component in [0, 1]. A component value of zero indicates that the corresponding phase is non-functional; in this case AER = 0 and the system is reported as “below measurement threshold” rather than assigned an AER_dB value.
Unless explicitly stated otherwise, the scalar capability used in the AER is the bottleneck value C = min(C_sense, C_align, C_output, C_reset).

Each component is computed against the appropriate bound for that phase: for computation-heavy alignment and reset steps performed on irreversible digital hardware, C_align and C_reset may be benchmarked against the Landauer limit per irreversible bit erasure; for sensing and communication, C_sense is benchmarked against the Shannon capacity of the sensing/IO channel under its measured noise conditions; and for embodied output, C_output is benchmarked against the best-known thermodynamic/mechanical efficiency ceiling for the task (e.g., ideal mechanical work for the required state change divided by measured electrical energy delivered to actuation, with appropriate efficiency bounds). Total-system energy must therefore be decomposed by phase; it is not physically meaningful to benchmark macroscopic actuation losses directly against an informational erasure bound.

The full measurement of a human-machine system is the AER vector across all active axes: perceptual extension, cognitive extension, physical extension, temporal coupling, and intentional coupling. A purely cognitive system (e.g., a research AI) activates the cognitive and coupling axes. A purely physical system (e.g., a robotic manipulator) activates the physical and coupling axes. A unified system activates all axes. At The Absolute, every active element of the vector equals 1.0.

Reporting convention. Because current systems operate many orders of magnitude below The Absolute on the capability axis, raw AER values cluster near zero and provide limited visual discrimination. For engineering comparison, we define a logarithmic reporting transform:
AER_dB = 10 log10(AER / AER_ref)
where AER_ref is a fixed reference value chosen once per measurement domain: the AER of a stated baseline system measured under a stated standard task, frozen at the time the domain is established. Let C_ref ∈ (0, 1] be the baseline system’s capability fraction and η_ref ∈ (0, 1] be its coupling efficiency; then AER_ref = C_ref × η_ref. The underlying Definition 3 (AER = C × η) remains the canonical metric; AER_dB is a reporting transform applied after measurement, analogous to expressing power in decibels rather than watts.

AER_dB values are comparable only within a single measurement domain sharing the same AER_ref. Cross-domain comparison requires either a universal AER_ref (which sacrifices domain-specific discrimination) or explicit cross-calibration between domain baselines. The universality property described in Section 5.4 applies to the canonical AER = C × η, whose reference point is physics. AER_dB is a domain-local reporting convenience. In addition, AER_dB comparisons are valid only when the compared systems share the same measurement axis and therefore the same underlying physics limit (e.g., Landauer-to-Landauer within compute, Carnot-to-Carnot within actuation).

5.3. Coupling Dimensions

Coupling efficiency η decomposes into three measurable dimensions:

Informational coupling (η_i). The fraction of system output that reaches the human's cognitive process in a form the human can integrate. A system producing results the human cannot interpret has η_i approaching zero regardless of result quality.

Temporal coupling (η_t). The degree of synchronization between machine speed and human biological time scales. A system operating in microseconds is not extending a human who requires 200 milliseconds to form a motor intention unless it synchronizes with the human's rhythm.

Intentional coupling (η_p). The degree to which the system is coupled to the human's purpose rather than merely to explicit commands. At the limit, the system evolves from something the human operates to something the human extends through. The critical constraint: the system never optimizes by removing the human from the loop.

The composite coupling efficiency is η = f(η_i, η_t, η_p), where f is a combination function whose specific form is an open research question. We propose the geometric mean as a starting point: η = (η_i × η_t × η_p)^(1/3), which ensures that a zero in any dimension drives the composite to zero. Alternative formulations (weighted arithmetic mean, minimum function) are discussed in Section 8.

Mandatory vector reporting. Any AER report must include the full coupling vector (η_i, η_t, η_p) alongside the composite η and the scalar AER. The dimension with the lowest value must be explicitly named as the limiting dimension.

Constraints on alternative aggregators. Any function f proposed as an alternative to the geometric mean must satisfy: (1) Monotonicity (non-decreasing in each argument); (2) Boundedness ([0,1]^3 → [0,1]); (3) Zero propagation (f=0 when any argument is 0, unless that dimension is explicitly omitted as inapplicable for the domain rather than set to zero); (4) Attribution (given η and the vector, it must be possible to identify the binding constraint dimension). Unless explicitly stated otherwise in an AER report, η is computed using the geometric mean defined above.

5.4. Properties of the AER

The AER has several properties that distinguish it from existing metrics. (1) Substrate-independence: applicable to cognitive, physical, and hybrid systems. (2) Temporal stability: because the reference point is physics, the metric never requires recalibration. (3) Discriminative at practical scales: while system capability as a raw fraction of the applicable phase-specific limit (often Landauer for irreversible digital computation) clusters near zero for all current systems (typically 10⁻⁶ to 10⁻⁹), the AER_dB reporting transform defined in Section 5.2 converts these raw values into a logarithmic scale where a 10× AER improvement corresponds to +10 dB. This makes differences between systems at similar capability levels immediately visible. The multiplicative structure of AER = C × η further ensures that coupling efficiency differences produce meaningful discrimination even when capability fractions are close. (4) Captures non-obvious tradeoffs: a simpler, less capable system with superior coupling may produce a higher AER than a more powerful system with poor human integration.

6. Validation

6.1. Semantic Consistency Testing

The logical coherence of the four axioms was submitted to semantic and structural examination by three independent frontier AI systems: Claude (Anthropic), ChatGPT (OpenAI), and Gemini (Google DeepMind). Each system received the same instruction: identify a fifth independent axiom that does not reduce to a combination of the existing four, or produce a counterexample demonstrating that a system violating one or more axioms could achieve performance at the thermodynamic limit.

6.2. Results

All three systems, through different reasoning paths, converged on the same conclusion: the axiom set is complete and closed. No system produced a fifth axiom that was not reducible to a combination of existing axioms. No system produced a counterexample.

One result merits specific attention. Gemini, working through the derivation independently and without having been provided the fourth axiom, added Zero-Entropy Execution to the axiom set before being told it existed. The system documented a real-time reasoning trace showing a paradigm shift as it arrived at this conclusion. This constitutes generative convergence: a different system, without access to the fourth axiom, arriving at the same structure through its own reasoning process. The epistemic weight of this result is bounded by the shared-training-data limitation discussed in Section 6.3.

6.3. Limitations of Validation

We are explicit about what this validation demonstrates and what it does not. Three AI systems sharing overlapping training data and architectural assumptions constitute a test of logical consistency and completeness within the reasoning frameworks available to current AI. This is not independent scientific replication. The systems' convergence is evidence that the axiom set is internally coherent and that no obvious gap exists. It is not empirical proof that systems implementing the axioms outperform systems that do not. Such proof requires building integrated systems under the framework and physically measuring their energetic dissipation, detailed in the empirical validation protocol below.

6.4. Axiom Removal Tests

Each axiom was independently removed and the resulting system class analyzed. Removal of Axiom 1 (Post-Linguistic Cognition) produces a system that reasons through a language bottleneck, injecting lossy compression noise into every gradient, every simulation, and every output. Removal of Axiom 2 (Differentiable Reality Modeling) produces a system unable to distinguish model errors from output errors, forcing the dreaming system (Axiom 3) to simulate randomly rather than targeting ignorance. Removal of Axiom 3 (Targeted Internal Simulation) bounds learning speed to real-time interaction, creating an experience deficit proportional to environmental complexity. Removal of Axiom 4 (Zero-Entropy Execution) removes the optimization target that defines approach toward The Absolute. In each case, the system does not degrade in performance. It changes class: it becomes a system structurally incapable of reaching the thermodynamic limit on any axis.

6.5. Empirical Validation Protocol

True validation of the framework requires physical implementation and thermodynamic measurement. The required protocol consists of: (1) constructing a hybrid neuromorphic-CMOS testbed executing an embodied continuous-control task; (2) performing the task using a standard discrete-time linguistic reasoning architecture; (3) performing the same task using an architecture strictly adhering to Axioms 1–4; and (4) directly measuring the total dissipated heat Q (in joules) utilizing a high-precision closed calorimeter. Under the assumption of an isothermal environment at absolute temperature T, the thermodynamic entropy production is estimated as ΔS ≈ Q/T; if temperature varies materially during the trial, compute ΔS = ∫ dQ/T(t). The framework is empirically supported if the axiom-compliant system achieves both: (a) statistically significant reduction in dissipated energy per task completion (validating the efficiency axis), and (b) equal or superior human-machine pair task performance measured independently (validating that efficiency gains do not reduce capability or coupling). Full framework validation additionally requires computing the AER for both systems and demonstrating that AER ranking is consistent with the independent pair-performance ranking. Anti-gamesmanship constraint: energy reductions count only when accompanied by equal-or-better task performance; a system that fails the task (including 'do nothing' or powered-off behavior) is scored as zero successful tasks and is excluded from efficiency comparison. This protocol constitutes partial validation of the efficiency axis in one embodied domain. Full validation of the framework's domain-independence and coupling claims requires: (i) a human operator performing the task through the system, with η_i, η_t, and η_p measured per Section 5.3; (ii) AER computed for both systems; (iii) replication across at least two domains (one embodied, one cognitive). Temperature variation is material if the maximum excursion during the trial exceeds 5 K or 2% of the mean absolute temperature, whichever is larger. Below this threshold, the isothermal approximation ΔS ≈ Q/T introduces error below 2%.

7. Implementation Architecture

To demonstrate that the axiom framework produces actionable engineering specifications, we map each axiom to commercially available or near-term components for a physical embodiment (the most constrained case, as it requires both cognitive and physical subsystems). The same axioms applied to a purely cognitive system (e.g., a research AI) would involve only the computational and informational components. Each component selection was derived from the axioms: the physics limits defined the selection criteria, and commercially available hardware was evaluated against those criteria.

For Axiom 1, the sensing layer includes event-based vision (Prophesee IMX636/GenX320, sub-150 microsecond latency), high-resolution tactile sensing (Meta DIGIT 360, over 18 sensing features per fingertip; Carnegie Mellon AnySkin, distributed tactile sensing), and force-torque sensing with reduced parasitic dynamics (Bota Systems PixONE, internal cable routing). For Axiom 2, differentiable physics simulation is provided by GPU-accelerated engines such as MuJoCo MJX and LAAS-CNRS contact gradient solvers. For Axiom 3, the world model uses frameworks like Meta's V-JEPA 2 for latent-space prediction paired with structured state-space models like Graph Mamba for sequence modeling. For Axiom 4, the actuation layer includes CubeMars AK80-64 actuators (up to 141.2 Nm/kg maximum torque density), Maxwell DuraBlue supercapacitors for regenerative energy capture, electroadhesive clutches for low-power static holding, and near-term technologies like Clone Robotics Myofiber artificial muscles. This architecture represents a mix of commercially shipping hardware, open-source software frameworks, and advanced research prototypes available as of February 2026.

8. Open Problems

The framework has three identified gaps that require resolution before it can claim completeness.

Catastrophic novelty. The four axioms handle gradual learning through continuous gradient updates and targeted simulation. They provide no explicit theory of graceful degradation when the system encounters genuine discontinuities: phenomena so far outside the model's experience that gradient signals are meaningless. Examples include phase transitions in materials, novel pathogens with no structural analog, and physical situations representing true distributional shifts. A system that performs near the thermodynamic limit under known conditions but fails catastrophically under novel conditions is not reliably extending human capability.

Co-adaptation. The framework defines The Absolute from the machine side and measures coupling to the human. But coupling is bidirectional. The human must learn to trust and integrate with the system. The human's mental model of the system, trust calibration, and willingness to rely on the system in high-stakes situations are human-side variables that affect the AER but are not addressed by the four axioms. The coupling is optimizable from the machine side. The human side of the coupling is a learning curve the framework currently has no mechanism to accelerate.

Multi-human coordination. The framework defines a single human-system pair. Work is often collaborative: multiple surgeons in an operating room, a research team sharing a computational system, a construction crew operating coordinated machinery. When multiple humans with different intentions couple to the same system or to interacting systems, the coupling problem becomes combinatorial. The Absolute for a team is not the sum of The Absolutes for each individual. Its definition remains open.

AER combination function. The specific form of f(η_i, η_t, η_p) for combining the three coupling dimensions is proposed as a geometric mean but not derived from first principles. Alternative formulations (weighted arithmetic mean, minimum function, domain-specific weightings) may prove more appropriate for specific application domains. Empirical work comparing the discriminative power and predictive validity of alternative formulations is needed.

Practical discriminative power. A physicist would correctly note that expressing system capability as a raw fraction of the applicable phase-specific limit (often Landauer for irreversible digital computation) places all current systems near zero (10⁻⁶ to 10⁻⁹), providing limited discrimination between practical alternatives. The AER_dB reporting transform (Section 5.2) addresses this for system comparison by converting raw AER values to a logarithmic scale referenced to a domain-specific baseline. The choice of baseline system and standard task for each measurement domain remains an open standardization question.

9. Discussion

Industry relevance (near-term). Even before full standardization, the framework is immediately usable as a reporting template: (1) measure and report energy-per-task and heat-per-task for embodied systems; (2) report coupling as an explicit vector (η_i, η_t, η_p) so teams can isolate whether failures come from information transfer, latency synchronization, or intent alignment; and (3) report which physics limit and environmental parameters were used for each capability axis so results are reproducible across labs. This supports practical engineering decisions (sensor/interface design, latency budgeting, and energy optimization) without requiring computation of an exact theoretical minimum for open-world tasks; in those cases, C may be reported as a bounded estimate under a stated task specification, and η remains the primary discriminator of human extension.

The Absolute framework makes a specific and testable claim: that the performance of intelligence systems should be measured relative to the fundamental limits of the physical universe, and that the unit of measurement should be the human-machine pair. The Absolute Extension Ratio operationalizes this claim as a single comparable metric.

The framework's four axioms synthesize existing research directions (subsymbolic AI, differentiable simulation, world models, efficiency optimization) into a minimal complete set. We claim synthesis, not discovery, for the individual axioms. The contribution is the argument that these four conditions form a closed, necessary, and sufficient set for approaching the thermodynamic limit, and that this set generates actionable engineering specifications that converge across independent derivations.

The AER metric, to our knowledge, has no precedent in the published literature. No existing benchmark expresses human-machine system performance as a fraction of physics limits. This specific formulation fills a gap that has been independently identified by multiple research groups: DeepMind acknowledged the need for a measurement standard but explicitly stated they did not propose one [1]; the AI benchmarking community has documented widespread construct validity concerns [25]; and the convergence of multiple independent groups on physics-informed intelligence measurement in 2025-2026 confirms the timeliness of this contribution.

The framework is published as an open standard with no patents, no licenses, and no restrictions. The specification, axiom derivations, component selections, measurement methodology, and open problems are released in full. The rationale is consistent with the framework's own principles: if the purpose of this technology is to extend human capability universally, then restricting access to the measurement standard that evaluates it would contradict the framework's foundational claim.

We explicitly acknowledge the absence of an integrated demonstrator. The framework exists as a complete architecture specification with commercially available components. Integration remains an engineering problem of substantial difficulty and cost. We make no claim that integration is trivial and recognize this as the primary barrier between specification and empirical validation.

10. Conclusion

We have introduced The Absolute, a universal measurement framework for intelligence systems grounded in fundamental physics limits. The framework defines a fixed reference point, analogous to absolute zero, against which any intelligence system can be evaluated regardless of substrate, domain, or embodiment. We have derived four necessary and sufficient axioms for systems approaching the thermodynamic limit, argued their closure through semantic consistency testing and ablation analysis, and introduced the Absolute Extension Ratio as a novel composite metric for human-machine system performance.

The framework proposes that the fundamental question for intelligence systems is not how capable is this machine, but how much of this machine's capability reaches the human as usable extension. The AER provides a methodology for answering that question against a reference point that never moves.

References

[1] Morris, M. R., et al. Levels of AGI: Operationalizing progress on the path to AGI. arXiv:2311.02462 (2023).

[2] Liu, B., et al. LIBERO: Benchmarking knowledge transfer for lifelong robot learning. NeurIPS (2023).

[3] Li, C., et al. BEHAVIOR-1K: A benchmark for embodied AI with 1,000 everyday activities. CoRL (2023).

[4] Hendrycks, D., et al. Measuring massive multitask language understanding. ICLR (2021).

[5] Chollet, F. On the measure of intelligence. arXiv:1911.01547 (2019).

[6] Landauer, R. Irreversibility and heat generation in the computing process. IBM J. Res. Dev. 5(3), 183-191 (1961).

[7] Shannon, C. E. A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379-423 (1948).

[8] Heisenberg, W. Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik. Z. Phys. 43, 172-198 (1927).

[9] Nelson, D. L. & Cox, M. M. Lehninger Principles of Biochemistry. 8th ed. W.H. Freeman (2021).

[10] Takahashi, R. & Hayashi, M. Thermodynamic limits of physical intelligence. arXiv:2602.05463 (2026).

[11] Perrier, T. Watts-per-intelligence: Part I. arXiv:2504.05328 (2025).

[12] Back to bits: Extending Shannon's communication performance framework to computing. arXiv:2508.05621 (2025).

[13] Friston, K. The free-energy principle: A unified brain theory? Nature Reviews Neuroscience 11, 127-138 (2010).

[14] Friston, K. Am I self-conscious? Frontiers in Psychology 9, 579 (2018).

[15] Wissner-Gross, A. D. & Freer, C. E. Causal entropic forces. Physical Review Letters 110, 168702 (2013).

[16] Still, S., et al. Thermodynamics of prediction. Physical Review Letters 109, 120604 (2012).

[17] England, J. L. Statistical physics of self-replication. J. Chemical Physics 139, 121923 (2013).

[18] Licklider, J. C. R. Man-computer symbiosis. IRE Trans. Human Factors in Electronics 1(1), 4-11 (1960).

[19] Clark, A. & Chalmers, D. The extended mind. Analysis 58(1), 7-19 (1998).

[20] Hart, S. G. & Staveland, L. E. Development of NASA-TLX. Advances in Psychology 52, 139-183 (1988).

[21] Endsley, M. R. Toward a theory of situation awareness in dynamic systems. Human Factors 37(1), 32-64 (1995).

[22] O'Neill, T., et al. Advancing human-machine teaming: Concepts, challenges, and applications. arXiv:2503.16518 (2025).

[23] Complementarity in human-AI collaboration: Concept, sources, and evidence. European J. Information Systems (2025).

[24] Legg, S. & Hutter, M. Universal intelligence: A definition of machine intelligence. Minds and Machines 17(4), 391-444 (2007).

[25] Can we trust AI benchmarks? An interdisciplinary review of current issues in AI evaluation. arXiv:2502.06559 (2025).

[26] Bérut, A., et al. Experimental verification of Landauer's principle. Nature 483, 187-189 (2012).

[27] Margolus, N. & Levitin, L. B. The maximum speed of dynamical evolution. Physica D 120(1-2), 188-195 (1998).

[28] Diller, S., et al. A lightweight, low-power electroadhesive clutch and spring for exoskeleton actuation. ICRA (2016).

[29] Rumelhart, D. E., McClelland, J. L., & PDP Research Group. Parallel Distributed Processing. MIT Press (1986).

[30] Brohan, A., et al. RT-2: Vision-language-action models transfer web knowledge to robotic control. arXiv:2307.15818 (2023).

[31] Todorov, E., Erez, T., & Tassa, Y. MuJoCo: A physics engine for model-based control. IROS (2012).

[32] Freeman, C. D., et al. Brax: A differentiable physics engine for large scale rigid body simulation. NeurIPS (2021).

[33] NVIDIA. Announcing Newton: An open-source physics engine for robotics simulation (2025).

[34] Hu, Y., et al. DiffTaichi: Differentiable programming for physical simulation. ICLR (2020).

[35] Schmidhuber, J. Formal theory of creativity, fun, and intrinsic motivation. IEEE Trans. Autonomous Mental Development 2(3), 230-247 (2010).

[36] Ha, D. & Schmidhuber, J. World models. arXiv:1803.10122 (2018).

[37] Hafner, D., et al. Mastering diverse control tasks through world models. Nature (2025).

[38] LeCun, Y. A path towards autonomous machine intelligence. OpenReview (2022).

[39] Pathak, D., et al. Curiosity-driven exploration by self-predictive next feature learning. ICML (2017).

[40] Prigogine, I. Introduction to Thermodynamics of Irreversible Processes. Wiley (1967).