Compute-in-Memory (CIM/PIM): A Comprehensive State-of-the-Art Survey

Where compute-in-memory actually stands in 2026 — silicon-validated efficiency (TSMC 89–254 TOPS/W, Tsinghua/IBM/NeuRRAM Nature papers), the accelerator-not-CPU-replacement reality, two-tier commercialization (Witmem's 10M+ shipments, Anker Thus, d-Matrix, EnCharge), five technical routes compared, and the ADC/precision/toolchain bottlenecks.

Compute-in-MemoryCIMPIMAI HardwareReRAMNOR FlashSRAMAI AcceleratorEdge AIResearch

Disclaimer: This article aggregates publicly reported research and industry figures. Many efficiency numbers are macro-level peaks rather than full-chip measurements, and some startup figures are vendor-reported or simulated. The reasoning involved AI-assisted generation, has not undergone peer review, and may contain errors. Data are current as of June 2026.

Five Questions Up Front

Before the full report, five quick questions to frame where compute-in-memory actually stands.

1. Has the efficiency hypothesis been validated?

Conclusion: Yes, with solid empirical proof — but the advantage has strict boundaries. Compute-in-memory aims to solve the von Neumann architecture's problem of "data movement consuming too large a share of power (up to 90%)." This hypothesis has been repeatedly validated at the device and circuit level:

  • Academic silicon measurements: top teams at Tsinghua, Stanford/UCSD, and IBM have published silicon-measured chips in Nature showing roughly 10–100× energy-efficiency gains over conventional GPUs like the NVIDIA V100.
  • Industry data: TSMC's fully-digital SRAM CIM macro hit a striking 254 TOPS/W (5nm), and analog ReRAM reached the 78.4 TOPS/W level.
  • Boundary reminder: most of these dazzling figures are under specific AI-inference / matrix (MAC) workloads, and many are "macro peaks." Once you add the full chip's ADC/DAC, on-chip interconnect, and control logic, real end-to-end efficiency drops sharply.

2. Does it have the potential to fully replace the CPU market?

Conclusion: Not at all. It is positioned as an "accelerator," competing for the GPU/NPU inference share.

  • Can't replace the CPU: compute-in-memory is physically customized for multiply-accumulate (MAC) and matrix operations. It is extremely poor at the CPU's strengths — branch-heavy, control-flow-complex, high-precision floating point, and frequent random reads/writes (e.g., running an OS or database).
  • Coexistence: in reality, nearly every compute-in-memory chip (Witmem, Houmo, etc.) needs an attached or built-in CPU (e.g., RISC-V) to handle general computing and control flow.
  • The real target: what it can genuinely "eat into" is the GPU/NPU share of AI inference (especially power-sensitive on-device and edge), not the CPU.

3. How far have experiments and commercialization gone?

Conclusion: Commercialization has split into two tiers — on-device is genuinely in mass production, while high-compute/cloud is still in early ramp-up.

  • On-device / low compute (mature mass production): the clearest path. Led by Witmem (NOR Flash), whose WTM2101 is the world's first compute-in-memory SoC to reach million-scale mass production, with 10M+ units shipped, used in TWS earphones, smartwatches, and more.
  • High compute / edge and data center (early ramp-up): gaining momentum. The digital SRAM route is moving faster (Houmo's M50, d-Matrix's Corsair are released or shipping); the high-compute analog route and emerging memories (ReRAM, charge-domain analog) are mostly still taping out, testing, or in initial shipment, and need another year or two to prove reliability at scale.

4. Where are the bottlenecks?

Conclusion: Mainly stuck on analog overhead, device physical limits, and the software ecosystem.

  • ADC/DAC overhead (the most fatal hardware weakness): in the analog route, the ADC eats up enormous area (~30%–81%) and power (~50%–58%); pushing for higher precision makes ADC energy rise exponentially.
  • Precision and physical non-idealities: analog devices have inherent noise, resistance drift, yield, and variation issues that limit precision — which is exactly why many high-compute players have switched to a "fully digital" route.
  • The "compilation wall" and missing ecosystem (the biggest software weakness): each vendor uses its own custom programming interface with no unified standard. Compared with NVIDIA's unbreakable CUDA ecosystem, compute-in-memory's software toolchain is highly immature and model-migration cost is too high.

5. Anker as the flagship case — are there any technical papers?

Conclusion: Anker has not published any compute-in-memory technical paper. Per the report's deep dive:

  • Technology source: Anker's earphone (Thus™ A1 chip) uses a "model-defines-chip" joint-development model. Its underlying core compute-in-memory IP is actually supplied by its vendor Witmem (NOR Flash route).
  • Anker's role: Anker mainly leads model design and architectural requirements (even developing its own scheduling OS), while the actual chip implementation is handed to Witmem. This model lets a terminal brand quickly capture the low-power dividend of compute-in-memory without building a full-stack chip team.

TL;DR

  • The efficiency hypothesis has been empirically confirmed, but within strict bounds: On the narrow workload of AI inference / matrix operations, the energy-efficiency advantage of compute-in-memory has been repeatedly validated in silicon by both academia (Tsinghua, Nature 2020; Stanford/UCSD NeuRRAM, Nature 2022; IBM, Nature 2023) and industry (TSMC's ISSCC series, Witmem, Houmo, d-Matrix, EnCharge). SRAM digital CIM macros have reached 89–254 TOPS/W and analog ReRAM has reached the ~78.4 TOPS/W level, delivering roughly 10–100× energy-efficiency gains over GPU/CPU. But these are mostly macro-level peaks; full-chip end-to-end usable efficiency must be discounted significantly.
  • It will not replace the CPU; it is an accelerator / complementary path: Compute-in-memory is essentially an accelerator customized for multiply-accumulate (MAC) / matrix multiplication. It is unsuited to general-purpose computing that is branch-heavy, control-flow-complex, or high-precision floating-point. It threatens the GPU/NPU share in AI inference (especially edge and on-device), not the CPU's position in general-purpose computing. The fact that nearly every CIM SoC still needs a paired CPU is the proof.
  • Commercialization has landed in two tiers: On-device, low-compute parts have genuinely reached mass production (Witmem's WTM2101 has shipped over 10 million units; Anker's Thus is in earphones). High-compute / data-center parts remain in early ramp-up (Houmo M50 and d-Matrix Corsair have shipped; Yizhu/EnCharge/Axelera are taping out or in initial shipment). The main bottlenecks are analog precision, ADC/DAC overhead, device non-idealities, and the immaturity of software toolchains and ecosystems.

Key Findings

1. The Anker case is confirmed: On April 22, 2026, Anker released the world's first "neural-network compute-in-memory AI audio chip, Thus™," based on NOR Flash, with native support for a roughly 4-million- (4-mega-) parameter model; internal testing showed up to ~150× peak AI compute improvement over conventional Bluetooth earphone chips. According to Leiphone, in the version named Thus™ A1, "conventional Bluetooth chips have overall compute of about 30M FLOPS, while Thus™ A1 reaches 5G FLOPS, an improvement of about 150×, and not at the cost of power consumption." The chip was jointly developed by Anker and Witmem (Zhicun) over three years (the "model-defines-chip" model), debuting in the soundcore (renamed "Anker Audio") Liberty 5 Pro / Pro Max earphones, and earned a Guinness certification for "clearest call earphones." Founder Yang Meng has been a firm believer in compute-in-memory across multiple interviews (e.g., Episode 10 of Shi Talks Chips' "Big Names Talk Chips," titled "Large models can't solve NVIDIA's problem"); his core argument is that the von Neumann "divide-and-conquer" approach wastes over 90% of energy on data movement, that compute-in-memory is closer to the human brain and is the ideal architecture for on-device large models, and he is bullish on "bionic algorithms / inference-while-learning" replacing the static Transformer.

2. Five technical routes: near-memory computing (HBM-PIM/AiM — Samsung, SK Hynix); SRAM digital CIM (Houmo, d-Matrix, Axelera, PIMCHIP); SRAM analog / charge-domain CIM (EnCharge); DRAM in-/near-memory (Samsung, SK Hynix, Syntiant); and non-volatile-memory CIM (NOR Flash: Witmem, Mythic, Anker; ReRAM/memristor: Tsinghua, Yizhu, TetraMem; PCM: IBM; MRAM: TensorChip/Samsung).

3. Representative measured efficiency data: TSMC fully-digital SRAM CIM macro — 89 TOPS/W at 22nm, 254 TOPS/W at 5nm; Tsinghua's memristor CNN (Nature 2020) — roughly 100× the efficiency of an NVIDIA V100; IBM's PCM analog chip (Nature 2023) — 12.4 TOPS/W chip-sustained performance, ~14× higher efficiency than digital approaches; Witmem's WTM2101 in a 40nm process — compute/efficiency equivalent to 6–10× of a 12nm part; Houmo H30 — 7.3 TOPS/W full-SoC efficiency, 15 TOPS/W for the AI core; EnCharge EN100 — claimed >40 TOPS/W, with test chips at >150 TOPS/W (8-bit).

4. Main bottlenecks: In analog CIM, the ADC accounts for roughly 30% of area and ~50% of power in ReRAM designs (some reports as high as 81% area, 58% energy), and its energy rises exponentially with precision; device non-idealities (memristor drift, yield, variation); analog precision (noise-to-signal ratio can reach 90%); and missing software toolchains / programming models — the lack of a unified programming interface has fragmented the ecosystem (the "compilation wall").

Details

1. Efficiency Validation: Has the Core Hypothesis Been Empirically Proven?

Conclusion: Yes — but one must strictly distinguish "macro peak" from "full-chip usable efficiency," and the advantage holds only on AI matrix-operation workloads.

The core physical principle of compute-in-memory is to perform vector-matrix multiplication (MAC) directly inside the memory array. Witmem CEO Wang Shaodi's explanation is the most intuitive (per his Zhidx talk transcript): analog compute-in-memory uses Ohm's law (voltage × conductance = current) and Kirchhoff's law, so a single read operation on a memory array completes millions of parallel multiply-adds in one pass; "the commonly used vector-matrix multiplication … needs only one memory read operation to complete multiplication and addition of millions of parameters. With a traditional GPU architecture, for million-scale multiply-add computation, the memory reads alone would exceed 50,000." He also states that "99% of artificial intelligence is matrix multiplication." Intel research shows that at 7nm, data-movement power reaches 35 pJ/bit, accounting for 63.7% of total power; in high-compute AI applications, data movement consumes roughly 90% of the time and power. This is the "memory wall / power wall," and it is the source of compute-in-memory's efficiency advantage.

Landmark academic results (measured in silicon):

  • Tsinghua University (Wu Huaqiang and Gao Bin's team), Nature 577, 641–646 (2020): "Fully hardware-implemented memristor convolutional neural network." The first fully hardware implementation of a CNN using eight 1T1R memristor arrays of 2,048 cells each, achieving software-level accuracy and nearly 100× the energy efficiency of an NVIDIA V100.
  • Stanford/UCSD/Tsinghua (Weier Wan, H.-S. Philip Wong, Gert Cauwenberghs, et al.), NeuRRAM, Nature 608, 504–512 (2022): 48 cores, 3 million RRAM devices, with a further 2× efficiency improvement over the previous best RRAM-CIM chip; MNIST 99.0%, CIFAR-10 85.7%, Google speech commands 84.7% — accuracy comparable to 4-bit software models. It proved that "high efficiency + high flexibility + high accuracy" can be achieved simultaneously.
  • IBM (S. Ambrogio et al., Nature 620, 768–775, 2023): "An analog-AI chip for energy-efficient speech recognition and transcription." 14nm, integrating 35 million PCM devices across 34 tiles; the paper states it "can achieve up to 12.4 tera-operations per second per watt (TOPS/W) chip-sustained performance," with speech-recognition accuracy comparable to a fully-digital chip but "more than 14 times as energy efficient." Its HERMES core (JSSC 2022) reaches 1.59 TOPS/mm².
  • TSMC's ISSCC series (fully-digital SRAM CIM): 89 TOPS/W at 22nm (ISSCC 2021), 254 TOPS/W at 5nm (ISSCC 2022), 6163 TOPS/W/b at 4nm (ISSCC 2023); plus a 96Kb dual-mode gain-cell CIM at 16nm reaching 73.3–163.3 TOPS/W and 33.2–91.2 TFLOPS/W (ISSCC 2024). On the analog ReRAM side, a Tsinghua/Taiwan team's fully-integrated analog ReRAM chip (ISSCC 2020) reached 78.4 TOPS/W for fully-parallel MAC.

Benchmark against traditional architectures: The figures above generally show 10–100× efficiency gains over CPU/GPU. But three qualifications are essential: (1) TOPS/W is mostly a macro-level peak — full-chip efficiency drops significantly once ADC/DAC, on-chip interconnect, SRAM buffering, and control logic are added; (2) the advantage holds only on MAC-heavy AI inference workloads; (3) the high TOPS/W of analog routes often comes at the cost of precision (INT4/INT8). So "the hypothesis is validated" should be read as "repeatedly confirmed at the device/circuit level on the specific workload of AI inference," not as "a universal victory in general-purpose computing."

2. Replacement Potential: A Complement/Accelerator, or a CPU Replacement?

Conclusion: Clearly an accelerator / complementary path. It threatens the GPU/NPU share of AI inference and cannot replace the CPU's general-purpose computing.

  • Suitable workloads: AI inference, matrix multiplication / convolution, the prefill (compute-intensive) stage of Transformers, CNN/LSTM, vector-matrix multiplication.
  • Unsuitable workloads: branch-heavy, control-flow-complex general-purpose computing that needs frequent random reads/writes and high-precision floating point (operating systems, database transactions, general scalar processing). This is precisely where CPUs excel and where compute-in-memory cannot replace them structurally.
  • Architectural evidence: Nearly every compute-in-memory SoC is "paired with a CPU." Witmem's WTM2101 pairs a RISC-V CPU for non-matrix operations; Houmo's H30 uses the Tianshu IPU but still needs a host CPU. This shows the industry consensus: CIM as accelerator, CPU as control and general-purpose compute.
  • Impact on market structure: (1) On-device / edge AI inference — CIM has the best chance of replacing existing low-power NPUs/DSPs (wearables, TWS, IoT, autonomous driving, AI PCs); (2) data-center inference — d-Matrix, EnCharge, and Yizhu target the GPU's inference share (not training), competing on the TCO and energy efficiency of the "inference era"; (3) training — still essentially GPU territory; analog device precision and writability are insufficient to support training. Yang Meng, d-Matrix CEO Sid Sheth, and others all stress "inference," not "training," as CIM's main battlefield. The decoding (memory-bound) stage benefits from bandwidth, while the prefill (compute-bound) stage benefits from compute density, so CIM's advantage is more pronounced in prefill.

3. Experimental and Commercialization Progress

On-device, low compute (genuinely in mass production):

  • Witmem (Zhicun): Based on analog NOR Flash compute-in-memory. Mass-produced the WTM2101 in March 2022 (the world's first commercial compute-in-memory SoC, 40nm, in a tiny 2.6×3.2mm² WLCSP package, ~50 GOPS compute, power as low as 5µA, 1.8MB of parameters, paired with RISC-V) — the world's first compute-in-memory chip to reach million-scale commercial mass production. It has shipped over 10 million units, used in TWS earphones, hearing aids, AR glasses, smartwatches, and smart home devices, with customers including Huawei, Xiaomi, and vivo. The WTM-8 series targets 4–32 TOPS (real-time 4K-8K video processing); Wang Shaodi says its compute is 800–1000× that of the second generation, reaching tens of TOPS, and it has taped out.
  • Anker Thus (see the case-study section).
  • Mythic (NOR Flash analog CIM): M1076 single chip at 25 TOPS, MP10304 PCIe card at 100 TOPS. After an architectural reset in 2024–2025, it raised an oversubscribed $125M round led by DCVC in December 2025, pivoted to "full-stack inference," claiming 100× the efficiency of top GPUs; partnering with Honda on an automotive-grade SoC.
  • Syntiant: DRAM-based near-memory + analog, the NDP series (NDP101/NDP200), sub-milliwatt always-on voice/sensing, claiming a 100× efficiency improvement (NDP200 claimed at 6.4 GOP/s @ 1mW).
  • Reexen (Jiutian Ruixin): Pioneered "sense-store-compute integration" (ASP+ADC+CIM); founder Liu Hongjie holds a PhD in neuromorphic engineering from ETH Zürich. ADA100 has mass-produced and shipped over a million units (KWS at ~150µA); the ADA200 series at 20 TOPS/W; raised a ~RMB 100M+ Series A in June 2022 (co-led by Vertex/Weihao Chuangxin and Pudong Kechuang).

High compute / edge and data center (early ramp-up):

  • Houmo Intelligence (founder Wu Qiang, Princeton PhD): Digital SRAM CIM. In May 2023 it released China's first compute-in-memory autonomous-driving chip, the Hongtu H30 (12nm, 256 TOPS @ INT8, typical power 35W, full-SoC efficiency 7.3 TOPS/W, AI core 15 TOPS/W, ResNet-50 batch=1/8 at 8700/10300 FPS respectively). In 2024 the M30 (100 TOPS @ 12W) landed in a China Mobile all-in-one device, and at MWC 2024 it demonstrated a 7-billion-parameter on-device LLM at 15–20 tokens/s. In July 2025 it released the Manjie M50 (160 TOPS @ INT8, 100 TFLOPS @ bFP16, up to 48GB, typical power 10W, a 7B model at 25+ tokens/s, supporting up to roughly 100-billion parameters), with the second-generation Tianxuan architecture plus the Houmo Avenue toolchain (CUDA front-end compatible, supporting SIMD/SIMT); it has partnered with Lenovo, iFlytek, and China Mobile. Funding: a strategic round of several hundred million RMB in July 2024 from the China Mobile industry-chain fund and others, with an earlier Pre-A+ from Matrix Partners, Qiming, Lake Bleu, the Lenovo sub-fund, and others.
  • d-Matrix: Digital in-memory computing (DIMC). In November 2024 (SC24) it released and shipped the Corsair chip — officially claiming a single server running Llama3 8B at "60,000 tokens/second at 1 ms/token," 2400 TFLOPs INT8 per card, 150 TB/s bandwidth, and "up to 10x faster interactive speed, 3x better performance per TCO, and 3x greater energy efficiency." In November 2025 it closed a $275M Series C at a $2B valuation (cumulative funding $450M), with participation from Temasek, Qualcomm (QIA), Microsoft M12, and others; it is partnering with Alchip on 3D DRAM (3DIMC, the Raptor chip, claimed to be 10× faster than HBM4).
  • EnCharge AI (Princeton's Naveen Verma; charge-domain analog SRAM CIM): uses capacitive coupling rather than device current to address analog noise and scalability. In 2025 it released the EN100; per EE Times, based on capacitive analog compute-in-memory, it "can achieve a power efficiency above 40 TOPS/W"; the M.2 version offers "200+ TOPS of AI compute power in an 8.25W power envelope," and the four-chip PCIe version offers roughly 1 PetaOPS @ 40W with 128GB LPDDR, claiming up to ~20× better performance-per-watt than competitors. It raised a $100M+ Series B (led by Tiger Global) in February 2025, cumulatively over $144M, originating from DARPA / DoD funding.
  • Axelera AI (Netherlands, an imec spin-out; digital in-memory D-IMC): the Metis AIPU (12nm, 214 TOPS @ INT8, 15 TOPS/W) has shipped as PCIe/M.2 cards; in October 2025 it released Europa (629 TOPS @ INT8, shipping in 1H 2026); cumulative funding over $200M (including a Samsung investment and a €61.6M EuroHPC grant to develop the Titania chiplet).
  • Yizhu Technology (founder Xiong Dapeng): China's first high-compute AI chip based on fully-digital ReRAM compute-in-memory, originating from a Tsinghua team, targeting cloud inference / data centers; its single board targets a breakthrough past 1000 TOPS, with roughly 10× energy-efficiency at 28nm, and its partner CanYuan Semiconductor has mass-produced 28nm ReRAM. Its angel round was led by Legend Star, CAS Star, and Huixin, followed by further funding of several hundred million RMB.
  • TensorChip (Qianxin): Pioneered "reconfigurable compute-in-memory" (fusing compute-in-memory with reconfigurable computing); founded by Chen Wei / Geng Yunchuan (Tsinghua), claiming >10–100 TOPS/W efficiency and >1000–4000 TOPS per card; closed tens-of-millions-RMB funding in 2022, with a prototype chip already running.
  • PIMCHIP: Based on SRAM compute-in-memory; the team comes from Tsinghua and elsewhere; the PIMCHIP-S200 brought compute-in-memory to 28nm for the first time, targeting ultra-low-power always-on voice / edge.

Near-memory computing (led by large vendors):

  • Samsung HBM-PIM (Aquabolt-XL): The industry's first HBM-PIM in 2021, embedding compute engines into the HBM2 die, claiming a 2× performance improvement and 70% lower power; also LPDDR5-PIM and MRAM-PIM.
  • SK Hynix GDDR6-AiM (Accelerator-in-Memory): Announced at ISSCC 2022, at 1.25V (below the usual 1.35V); paired with a CPU/GPU it runs specific computations 16× faster with up to 80% lower power; the AiMX accelerator card targets generative AI / LLMs. In late 2024 Samsung and SK Hynix jointly pushed LPDDR6-PIM standardization (JEDEC), targeting on-device AI.

4. Full-Coverage Comparison of Technical Routes

RouteRepresentative vendorsMaturityStrengthsWeaknesses
Near-memory computing PNM (HBM-PIM/AiM)Samsung, SK Hynix, Intel, IBMPrototype / small-scale deploymentCompatible with existing architectures, bandwidth gains, easy to adoptStill memory-compute separated, limited efficiency gain (within tens of ×), needs JEDEC standards
SRAM digital CIMHoumo, d-Matrix, Axelera, PIMCHIP, TSMCIn mass production / shippingDigital lossless precision, fully CMOS-compatible, refreshable weights, supports high computeSRAM is volatile (loses data on power-off), low density, large area, leakage power
SRAM analog / charge-domain CIMEnChargeTaping out / initial shipmentVery high efficiency (>40–150 TOPS/W), capacitive coupling avoids device noiseLimited analog precision, ADC overhead, requires hardware-software co-design
DRAM in-/near-memorySamsung, SK Hynix, SyntiantPrototype / partial mass productionLarge capacity, bandwidth-friendlyProcess limits logic, refresh power, limited precision
NOR Flash CIMWitmem, Mythic, AnkerMass productionNon-volatile, mature process, ultra-low power, low costLow precision (≤8-bit), small compute, slow writes, suited to on-device
ReRAM/memristor CIMTsinghua, Yizhu, TetraMemLab → early mass productionNon-volatile, high density, large efficiency potential, CMOS-compatibleDevice drift/variation/yield, analog precision, immature supply chain
PCM CIMIBMLab prototypeMulti-level storage, non-volatile, IBM full-stack validationHigh write power, drift, mass-production cost
MRAM CIMTensorChip, SamsungLab / IPHigh endurance, non-volatile, fastHard multi-level storage, small resistance window, cost

5. The Anker Case in Depth

In 2023, benchmarking against Huawei's 2012 Labs, Anker established its "2023 Lab" and simultaneously launched CIM chip development. The trigger was its audio research institute's finding that once an AI noise-reduction model's parameters surged to about 1.x mega (million-scale), conventional Bluetooth chips lasted only about an hour — "basically unusable." The root cause was "the fundamental conflict between the characteristics of audio algorithms and the computing architecture of traditional chips." In Yang Meng's words: "At the time no company was building compute-in-memory chips for our application domain, and we had no chip-design capability. After surveying all the relevant chip companies, we chose to join hands with a compute-in-memory company that, like us, follows first principles."

Supplier confirmation: Multiple media outlets (ifanr, Leiphone, Sina Finance, 52audio, TechWeb) confirmed at the product launch on May 22, 2026, that Thus™ A1 was jointly developed by Anker and Witmem (note: at the April technical briefing it was only described as a "joint-development model" without naming the supplier; the May launch named Witmem). This is a classic "model-defines-chip" case — Anker led the model design and architectural requirements, and Witmem did the chip implementation. Notably, Witmem's mass-production route is analog NOR Flash compute-in-memory, which aligns closely with the description of Thus being NOR-Flash-based, further corroborating the supplier relationship.

Yang Meng's compute-in-memory thesis (synthesized from the Shi Talks Chips interview, CSDN interview analyses, and Sina/Leiphone reports; the video-interview points are paraphrased summaries, while the launch quotes are direct citations):

  1. Von Neumann = divide-and-conquer, separating compute from storage; "most of the power is wasted in the movement process," which is an obstacle to the development of modern computing; about 90% of AI compute energy comes from data movement. He uses the metaphor of "a couple living on two banks of a river" for the traditional architecture, and says Thus "moves the compute directly to where the model lives, so the model never has to move house again."
  2. The Transformer is a transitional state; future AI will develop in a bionic direction, imitating the human brain.
  3. Compute-in-memory is the ideal architecture for on-device large models, avoiding data movement and being closer to the human brain.
  4. Bionic algorithms / inference-while-learning: the human brain reasons while continuously learning and self-training; future algorithms will keep evolving during operation, no longer needing a separate training phase (unlike today's static large models).
  5. Strategic closed loop: "Since you must run large models, you must customize the chip; since you customize the chip, you must build the operating system" — the chip solves compute and power, and the OS (developed in parallel by the "2023 Lab") solves privacy and real-time scheduling.

Anker's 2025 R&D spend was RMB 2.893 billion (+37.2% YoY) on revenue of RMB 30.5 billion; Yang Meng says there is "no upper limit on investment in advanced fields." Thus is a three-year chip platform, and the second generation has begun full in-house development. Anker has not published a compute-in-memory technical paper; its core compute-in-memory IP comes from Witmem. Yang Meng also admits he "regrets not starting digital compute-in-memory a year earlier," because at that time the track was not yet hot and talent was easy to find.

6. Bottleneck Analysis

  1. ADC/DAC overhead (most fatal for analog routes): In ReRAM CIM, the ADC accounts for roughly 30% of area and ~50% of power, with some reports as high as 58% energy / 81% area, and its energy rises exponentially with precision; multiplexed ADCs add latency, and crossbar wire resistance causes IR drop that limits array size. Academia is mitigating this with ADC-less designs, low-bit partial-sum quantization, and memristor-adaptive ADCs (some work claims a 40–64× reduction in ADC energy).
  2. Analog precision and device non-idealities: the noise-to-signal ratio can reach 90% and device skew can exceed 60%; memristor drift, retention degradation, IR drop, device-to-device variation, and yield. These require model-driven calibration, non-ideality-aware training, and chip-in-the-loop fine-tuning to compensate. This is exactly why Yizhu, Houmo, Axelera, d-Matrix, and EnCharge have all chosen "fully digital" or "charge-domain" approaches to preserve precision.
  3. Process integration difficulty: the supply chains for emerging memories (ReRAM/PCM/MRAM) are immature; only a few players such as TSMC and CanYuan Semiconductor have achieved 28nm ReRAM mass production.
  4. Missing software toolchains and programming models: each vendor builds on a custom programming interface, with no unified standard, a fragmented ecosystem, and upper-layer software that cannot interoperate — this is the key "compilation wall" blocking large-scale adoption. Houmo built the CUDA-front-end-compatible "Houmo Avenue," Axelera built the Voyager SDK, and Witmem developed its own toolchain, but the gap with the CUDA ecosystem is obvious.
  5. Ecosystem and compatibility: poor compatibility with existing architectures, requiring model retraining/quantization, with high migration cost for customers.

Recommendations

For readers tracking or positioning around compute-in-memory:

  1. Distinguish the two tracks with very different maturity: "on-device low compute" and "high compute." The on-device track (TWS, wearables, IoT, AI PC) can already be adopted in mass production; Witmem, Anker, Syntiant, and Reexen have clear, low-risk paths. If you build consumer-electronics on-device AI, you can evaluate a NOR Flash CIM solution now. High compute / data center is still early; 2026–2027 is the volume-validation window.

  2. Track three key benchmarks/thresholds to judge the technology inflection point: (a) whether full-chip (not macro) usable efficiency stably reaches more than 10× a GPU's with lossless precision; (b) whether a unified compute-in-memory software stack / compilation standard emerges (the marker of the ecosystem moving from fragmentation to convergence); (c) whether ReRAM/PCM yield and drift meet automotive-grade / data-center reliability requirements. Any breakthrough would significantly change the landscape.

  3. Investment/partnership priority: digital SRAM CIM (Houmo, d-Matrix, Axelera) is the most stable in precision and manufacturability and is the most realistic high-compute route today; charge-domain analog (EnCharge) has the highest efficiency but needs to prove precision at scale; high-compute ReRAM (Yizhu) has the greatest potential but the highest risk. For on-device, Witmem is the first choice (validated shipping).

  4. Lessons from the Anker model: "model-defines-chip" plus joint development with a specialized compute-in-memory company is a low-risk path for a terminal brand to enter compute-in-memory, gaining differentiation without building a full-stack chip team; but mind the cadence of accumulating capability (joint development for gen one, fully in-house for gen two) and the dependence on a single supplier.

  5. Beware marketing rhetoric: vendor-advertised TOPS/W is mostly peak/macro data; claims of "replacing GPU/CPU" and "100× efficiency" require asking about workload type, precision, and whether the figure is full-chip measured. Do not bet on compute-in-memory for training in the short term.

Caveats

  • The "calibration" problem with TOPS/W data: Most efficiency figures cited in this report are macro peaks or measurements under specific batch/precision conditions; full-chip end-to-end usable efficiency will be significantly below peak. For example, IBM's Nature 2023 chip peaks at 12.4 TOPS/W, but its full-system sustained performance estimate is lower. Direct cross-vendor comparison requires caution.
  • Anker Thus's "<1ms inference latency" could not be independently confirmed: 5 GFLOPS, 150×, NOR Flash, 4-mega parameters, and the Witmem supplier are all confirmed by multiple sources, but the "<1ms latency" mentioned in some secondhand reports was not verified in authoritative sources and should be treated as unconfirmed.
  • Some Chinese startups' compute/efficiency figures are vendor-reported or simulated (e.g., Yizhu's "1000 TOPS," TensorChip's ">1000–4000 TOPS"), not yet validated by independent third-party silicon measurement; valuations are mostly undisclosed (Houmo and Reexen disclosed only round sizes such as "several hundred million RMB" / "RMB 100M-level").
  • Yang Meng's interview views are his personal judgments and commercial positions (e.g., "the Transformer is a transitional state," "bionic algorithms replace training") — forward-looking predictions rather than realized facts, and some content comes from secondhand summaries of video interviews, with a possible bias toward endorsing his own route.
  • Houmo's paper count: only one is clearly identified — ISSCC 2024 (the "lightning-type" mixed digital-analog CIM co-authored with Southeast University / Peking University); the CEO claims the team published more than 30 international top-tier journal papers over two years, which could not be verified item by item.
  • Compute-in-memory remains a rapidly evolving field. The data in this report are current as of June 2026; new tape-outs / funding / standards progress may refresh it quickly.
Helper Disconnected