Derya Unutmaz, MD avatar

Derya Unutmaz, MD

@DeryaTR_

8/16/2025, 10:47:46 PM

I asked GPT-5 pro to define 5 levels of AGI, with year by year milestones. In my opinion this was the best definition of AGI levels of any model!

L1 Broad Assistant (target: this year)

•Capability: Pass a 50‑domain open‑book task battery with ≥ 85 percent exact‑match or judged correctness. Includes code generation plus live tool use (search, spreadsheets, Python, simple databases).

L2 Reliable Operator (target: 2026)

•Capability: Owns multi‑day projects with sparse supervision. Toolchain includes code repos, CI, vector DBs, LIMS or ELN, web RPA, and structured memory.

L3 Autonomous Researcher (target: early to mid‑2027)

•Capability: Turns ambiguous goals into testable hypotheses, runs simulations or wet‑lab automation through APIs, designs experiments, updates priors, and publishes reports with reproducible code and data.

L4 Strategic Orchestrator (target: late 2027)

•Capability: Sets multi‑month roadmaps, allocates budget, hires or routes to contractors, negotiates with vendors, and prioritizes across programs. Produces competitive strategy and risk analysis with scenario planning.

L5 Open‑World Innovator (target: 2028)

•Capability: Superhuman synthesis and innovation across science, engineering, product, and policy. Generates original theories and architectures, runs cross‑domain programs, and can found and operate complex organizations under guardrails.

Year‑by‑year milestones

•2025 (L1 achieved)
•Ship Broad Assistant that passes your Phase A suite.
•Stand up secure tool gateways and immutable logging.
•Run limited Phase B pilots on internal projects.
•2026 (reach L2)
•Expand to multi‑day agents with structured memory and approvals.
•Achieve the 95 percent reliability target and sub‑incident rate.
•Begin Phase C pilots with capped external impact.
•Early to mid‑2027 (reach L3)
•Add experiment design and execution loops with safe lab integrations.
•Demonstrate weeks‑long autonomy with milestone reviews and robust recovery.
•Publish red‑team results and mitigation coverage.
•Late 2027 (reach L4)
•Demonstrate portfolio orchestration, budgeting, and vendor negotiations.
•Integrate formal methods on safety‑critical components.
•Establish independent governance board with shutdown authority.
•2028 (reach L5)
•Pass open‑world trials with superhuman innovation metrics and negligible safety incidents.
•Deploy multi‑layer tripwires and constitutional control with live oversight.
•Publish public safety and performance reports.
Share
Explore

TwitterXDownload

v1.2.1

The fastest and most reliable Twitter video downloader. Free to use, no registration required.

© 2024 TwitterXDownload All rights reserved.