# **The Universal AIQ Framework**

**A Standard for Individual AI Competency** **Version:** 4.0 (2025 Edition)

## **The One-Page Pitch**

**The Problem:** Organizations cannot measure AI competency. Self-assessments are inflated. Usage metrics reward waste. Certifications expire the day they're earned.

**The Solution:** A single 0-100 score built on five dimensions, with evidence-based confidence ratings and automatic skill depreciation as AI evolves.

**The Result:** A trustworthy signal that answers: *"Can this person actually use AI well, safely, and productively?"*

## **The Formula**

$$AIQ = \sum (\text{Dimension Score} \times \text{Role Weight}) \times \text{Evidence Multiplier}$$  
*All dimension scores are time-adjusted. Result: 0-100 with confidence rating.*

### **Score Bands**

| Score | Level | What It Means |
| --- | --- | --- |
| **0-20** | Unaware | No meaningful AI adoption. At risk of displacement. |
| **21-40** | User | Basic AI usage. Follows instructions. Needs supervision. |
| **41-60** | Practitioner | Daily productive use. Can evaluate quality. \~25% efficiency gain. |
| **61-80** | Builder | Deploys reliable systems. Creates measurable business value. |
| **81-95** | Architect | Advances practices. Mentors others. Trusted for critical work. |
| **96-100** | Pioneer | Industry-recognized contribution. Shapes how AI is used. |

## **Part I: The Five SCOREs Dimensions**

Every individual is scored on the same five dimensions that form the **SCOREs** framework. Only the weights change based on role.

- **S** = Study (learn about the field)
- **C** = Copy (try/mimic something, validate)
- **O** = Output (ship actual products)
- **R** = Research (contribute to field, add something novel)
- **Es** = Ethical security (security/alignment)

| Dimension | What It Measures | Core Question |
| --- | --- | --- |
| **S = Study** | Information diet, conceptual fluency, trend awareness | Where do you learn? |
| **C = Copy** | Evaluation skill, scientific rigor, quality judgment | How do you validate? |
| **O = Output** | Deployment capability, workflow integration, reliability | What have you built? |
| **R = Research** | Innovation, novel methods, advancing the field | What's new because of you? |
| **E = Ethical security** | Safety practices, ethics, failure awareness, compliance | Are you trustworthy? |

### **Dimension 1: Study (Information & Fluency)**

*Where do you learn about AI? Can you explain why things work or fail?*

| Level | Points | Description |
| --- | --- | --- |
| **0** | 0   | No AI awareness. Avoids or fears the technology. |
| **1** | 1-4 | Mainstream news only. Passive consumption of hype/fear cycles. |
| **2** | 5-8 | LinkedIn influencers, YouTube "Top 10 Tools" content. |
| **3** | 9-12 | Developer blogs, release notes, AI-focused newsletters. |
| **4** | 13-16 | Technical reports, GitHub repos. Can explain why models fail. |
| **5** | 17-20 | ArXiv papers, model weights, source code. Predicts capability shifts. |

### **Dimension 2: Copy (Evaluation & Rigor)**

*How do you know if AI output is good? Can you prove it?*

| Level | Points | Description |
| --- | --- | --- |
| **0** | 0   | No validation. Blind trust: "It looks right to me." |
| **1** | 1-4 | "Vibes check." Runs prompt once, manually reviews. |
| **2** | 5-8 | Maintains test cases. Systematic manual Pass/Fail grading. |
| **3** | 9-12 | A/B tests models. Comparative benchmarks. Uses eval tools. |
| **4** | 13-16 | Automated evals. LLM-as-Judge. Quantified metrics (precision, recall). |
| **5** | 17-20 | Statistical confidence intervals. CI/CD for prompts. Regression testing. |

### **Dimension 3: Output (Deployment & Impact)**

*What have you built that others actually use? What value did it create?*

| Level | Points | Description |
| --- | --- | --- |
| **0** | 0   | Chat interface only. No deployment or workflow integration. |
| **1** | 1-5 | Personal productivity (Copilot, ChatGPT Plus). Time savings only. |
| **2** | 6-10 | Simple wrapper apps. Basic API integration. Likely negative ROI. |
| **3** | 11-15 | Internal tools used by team. RAG pipelines. $10k+ verified savings. |
| **4** | 16-20 | Production agentic systems. Revenue-generating. External users. |
| **5** | 21-25 | Vertical AI platform. Fine-tuned models. $100k+ verified value. |

### **Dimension 4: Research (Innovation & Contribution)**

*Do you advance the field or just consume it?*

| Level | Points | Description |
| --- | --- | --- |
| **0** | 0   | Treats AI as magic. No understanding of mechanisms. |
| **1** | 1-4 | Conceptual understanding: tokens, temperature, context windows. |
| **2** | 5-8 | Architectural knowledge. Understands Transformers. Implements papers. |
| **3** | 9-12 | Contributes: fine-tunes models, publishes weights, shares methods. |
| **4** | 13-16 | Researches: novel architectures, publishes at conferences. |
| **5** | 17-20 | Invents: paradigm-shifting discoveries. Industry-recognized impact. |

### **Dimension 5: Ethical security (Safety & Responsibility)**

*Can you be trusted with AI? Do you use it safely?*

| Level | Points | Description |
| --- | --- | --- |
| **0** | 0   | Dangerous. Pastes PII into public models. Ignores bias. |
| **1** | 1-3 | Compliant. Follows rules. Uses only sanctioned tools. |
| **2** | 4-6 | Cautious. Fact-checks outputs. Human-in-the-loop for decisions. |
| **3** | 7-9 | Proactive. Tests for hallucinations. Documents failure modes. |
| **4** | 10-12 | Guardian. Catches risks in others' work. Designs safety protocols. |
| **5** | 13-15 | Leader. Shapes org policies. Trains others on safe practices. |

## **Part II: Role-Based Weighting**

Different roles require different skill mixes. A researcher needs Research; an engineer needs Output; an executive needs Study and Ethical security. Weights normalize scores so roles can be compared.

### **Combined Weights (Single Score)**

| Role | Study | Copy | Output | Research | Ethical security |
| --- | --- | --- | --- | --- | --- |
| **General (default)** | 20% | 30% | 30% | 5% | 15% |
| **Developer** | 10% | 25% | 40% | 15% | 10% |
| **Researcher** | 15% | 25% | 25% | 25% | 10% |
| **Support** | 25% | 20% | 30% | 10% | 15% |
| **Leader** | 30% | 15% | 20% | 10% | 25% |

### **Dual Scoring (v1.1)**

The AIQ framework now supports dual scoring to separate individual skills from organizational enablement:

| Score Type | Measures | Core Question |
| --- | --- | --- |
| **Personal Readiness** | Individual capability to deliver AI value | "Am I ready to use AI effectively?" |
| **Corporate Impact** | Organizational deployment and governance | "Has my org enabled AI delivery?" |
| **Gap** | Personal − Corporate | Identifies misalignment between skills and enablement |

**Gap Interpretation:**

| Gap | Meaning | Recommended Action |
| --- | --- | --- |
| **> +30** | High skills, low org enablement | Advocate for AI pilot projects |
| **+10 to +30** | Moderate positive gap | Seek deployment opportunities |
| **−10 to +10** | Balanced | Continue current trajectory |
| **< −10** | Org ahead of individual | Invest in learning/upskilling |

#### Personal Readiness Weights

| Role | Study | Copy | Output | Research | Ethical |
| --- | --- | --- | --- | --- | --- |
| **General** | 40% | 35% | 10% | 10% | 5% |
| **Developer** | 30% | 40% | 15% | 10% | 5% |
| **Researcher** | 30% | 25% | 10% | 30% | 5% |
| **Support** | 45% | 30% | 10% | 10% | 5% |
| **Leader** | 40% | 30% | 10% | 10% | 10% |

*Personal Readiness emphasizes knowledge acquisition (Study + Copy = 75% for General role).*

#### Corporate Impact Weights

| Role | Study | Copy | Output | Research | Ethical |
| --- | --- | --- | --- | --- | --- |
| **General** | 5% | 10% | 50% | 5% | 30% |
| **Developer** | 5% | 10% | 55% | 5% | 25% |
| **Researcher** | 5% | 15% | 45% | 10% | 25% |
| **Support** | 5% | 10% | 50% | 5% | 30% |
| **Leader** | 5% | 5% | 40% | 5% | 45% |

*Corporate Impact emphasizes deployment and governance (Output + Ethical = 80% for General role).*

### **Company Type Modifiers**

Organizations have different AI priorities. Company type modifiers adjust dimension weights to reflect strategic focus:

| Company Type | Philosophy | Key Adjustments |
| --- | --- | --- |
| **Startup** | "Ship it, learn, iterate" | Output ×1.4, Research ×1.2, Copy ×0.7, Ethical ×0.7 |
| **Enterprise** | "Reliable, scalable, governed" | Copy ×1.2, Output ×0.8 |
| **Aspirational** | "Build AI the right way" | Ethical ×1.2, Research ×1.2, Study ×0.8, Copy ×0.8 |

*Modifiers are applied to base role weights, then renormalized to 100%.*

## **Part III: Evidence Multiplier**

Claims without proof are discounted. Evidence modulates confidence, not capability. Your score reflects what you can demonstrate, not what you claim.

| Evidence Level | Multiplier | Description |
| --- | --- | --- |
| **Self-report only** | 0.70x | No artifacts. No validation. "Trust me." |
| **Peer/manager validated** | 0.85x | Someone else confirmed the work and quality. |
| **Auto/Audit verified** | 1.0x | Automated verification OR full manual audit with quantified impact. |

*Note: Automated verification and manual audit are equivalent—both achieve 1.0x. The choice depends on what's practical for the skill being assessed, not a hierarchy of rigor.*

### **Confidence Ranges**

Each assessment level produces a score with a confidence range representing the uncertainty in self-reported vs. validated data:

| Level | Score Formula | Range | Interpretation |
| --- | --- | --- | --- |
| **Self (L1)** | Raw × 0.70 | Score−10 to Raw+2 | Wide range; unvalidated claims |
| **Peer (L2)** | Raw × 0.85 | Score−5 to Raw+2 | Narrower; peer confirmation |
| **Auto/Audit (L3)** | Raw × 1.0 | Score±2 | Tight range; verified evidence |

Example: A raw score of 72 displays as:

- **L1:** 50 (range 40–74)
- **L2:** 61 (range 56–74)
- **L3:** 72 (range 70–74)

The range upper bound represents potential score with full verification. The lower bound accounts for possible over-reporting.

### **Confidence Rating**

Every AIQ score is reported with a confidence level based on average evidence quality:

| Avg Multiplier | Confidence | Interpretation |
| --- | --- | --- |
| **\< 0.7** | LOW | Mostly self-reported. Treat score as aspirational. |
| **0.7 \- 0.9** | MEDIUM | Peer-validated. Reasonable working estimate. |
| **\> 0.9** | HIGH | Auto/audit verified. Score is trustworthy. |

### **Point Distribution Curves (v1.1)**

Different organizations may prefer different point distributions across levels. The default is Bell Curve, which provides more granularity where most people score (levels 2-3).

| Distribution | L0 | L1 | L2 | L3 | L4 | L5 | Best For |
| --- | --- | --- | --- | --- | --- | --- | --- |
| **Linear** | 0% | 12.5% | 32.5% | 52.5% | 72.5% | 92.5% | Legacy compatibility |
| **Bell Curve (default)** | 0% | 10% | 30% | 70% | 90% | 100% | Workforce differentiation |
| **Progressive** | 0% | 10% | 25% | 50% | 80% | 100% | Rewarding excellence |
| **Sigmoid** | 0% | 5% | 25% | 75% | 95% | 100% | Emphasizing L2→L3 breakthrough |

*Percentages show points awarded as fraction of dimension maximum.*

## **Part IV: Time Decay**

AI evolves rapidly. Skills learned years ago are worth less today. The framework automatically depreciates knowledge based on how fast that category changes.

$$\text{Current Value} = \text{Original Value} \times (0.5)^{\frac{\text{years}}{\text{half-life}}}$$

| Skill Category | Half-Life | Examples |
| --- | --- | --- |
| **Tool-Specific** | 1 year | ChatGPT UI tricks, specific API syntax, vendor shortcuts |
| **Framework** | 2 years | LangChain patterns, prompt templates, RAG architectures |
| **Conceptual** | 4 years | How attention works, eval design, model selection |
| **Foundational** | 10 years | Statistics, experimental design, programming, ML theory |

### **Example: Why This Matters**

In 2023, Alex mastered prompt engineering (Framework skill, 2-year half-life) and earned 15 points. It's now 2026 (3 years later).

$$\text{Current value: } 15 \times (0.5)^{\frac{3}{2}} = 15 \times 0.35 = 5.3 \text{ points}$$  
To maintain their score, Alex must continue learning.

## **Part V: Production Learning Bonus**

Learning-by-doing is more valuable than sandbox training. Skills demonstrated in live business projects with real stakes earn a **1.5x multiplier**.

| Standard Learning (1.0x) | Production Learning (1.5x) |
| --- | --- |
| Complete online course | Build eval pipeline for live chatbot |
| Sandbox prompt experiments | Deploy RAG system used by customers |
| Watch conference talks | Present findings from production incident |
| Read tutorial on fine-tuning | Fine-tune model for actual business use case |

*Why: Tutorials have zero stakes. Production work forces you to handle edge cases, failures, and real user feedback. It's harder and more valuable.*

## **Part VI: Quality Controls & Anti-Gaming**

AIQ explicitly rejects "more usage \= better." The framework includes safeguards against inflation, junk output, and credit theft.

| Red Flag | Consequence |
| --- | --- |
| **Garbage output** | AI work rejected by users or requires heavy rework → **zero Output credit** |
| **Unverifiable claims** | "Saved $100k" without documentation → **automatic audit, 0.70x multiplier** |
| **Credit theft** | Two people claim 80%+ of same project → **both audited, split enforced** |
| **Deployed but unused** | System with zero real users or adoption → **minimal Output credit** |
| **Credit burning** | High API costs without corresponding value → **Output score capped** |
| **Safety violations** | PII exposure, policy breach, bias incident → **Ethical security score to zero** |

### **Team Project Attribution**

When multiple people work on a project, impact is split by contribution:

- Each project has a single declared total value.
- Contributors claim percentages (must sum to 100%).
- All parties must accept the split.
- Manager signs off on final attribution.

*Default guidance: Primary owner (up to 50%), Material contributor (10-30%), Advisory (0-10%).*

## **Part VII: Implementation**

### **Three Levels of Verification**

| Level | Time | Method | Multiplier |
| --- | --- | --- | --- |
| **1\. Self** | 5 min | Self-assessment only | 0.70x |
| **2\. Peer** | 30 min | Peer/manager validation | 0.85x |
| **3\. Auto/Audit** | 1+ hr | Automated verification OR full audit | 1.0x |

*Note: Both automated verification and manual audit achieve the same 1.0x multiplier. Good automation is not inferior to manual audit—both provide equivalent confidence when properly implemented.*

### **Recommended Rollout**

1. **Week 1-2:** Baseline self-assessment across org
2. **Month 1:** Manager calibration sessions
3. **Quarterly:** Re-assess with time decay applied
4. **Ongoing:** Integrate into hiring, reviews, training allocation

## **Part VIII: Appropriate Use**

**Intended For:**

- Individual development planning and skill gap identification
- Internal benchmarking and training prioritization
- Hiring criteria (with appropriate calibration)
- Identifying high-leverage contributors for AI initiatives
- Setting role-appropriate expectations

**NOT Intended For:**

- Compensation decisions without proper calibration
- Surveillance or productivity micromanagement
- Ranking individuals without role/context adjustment
- Measuring organizational AI maturity (different framework needed)
- Replacing human judgment about job performance

## **THE AIQ PROMISE**

A high AIQ score should mean one thing:

"This person can be trusted to use AI well, safely, and productively in real work."

### **Quick Reference: The Complete Formula**

1. **Step 1:** Score each dimension (Study, Copy, Output, Research, Ethical security) using the SCOREs rubrics.
2. **Step 2:** Apply time decay based on when skills were learned.
3. **Step 3:** Apply production bonus (1.5x) for live project learning.
4. **Step 4:** Multiply by role weights.
5. **Step 5:** Apply evidence multiplier (0.70x to 1.0x).
6. **Step 6:** Sum dimensions → Final AIQ (0-100).
7. **Step 7:** Report with confidence rating (Low/Medium/High).
