Science / Fuzzy Logic Framework

The Mathematics of "We’re Not Sure"

Traditional scoring forces false certainty. Fuzzy mathematics lets us represent what we actually know: sometimes the answer is clear, sometimes it's not, and knowing the difference matters for decisions.

Download Technical Paper Book a Demo

Why Fuzzy Logic?

Consider a simple question: Is this candidate’s response “good”?

In classical logic, the answer must be yes or no. True or false. 1 or 0.

But that’s not how evaluation actually works. A response might be:

Mostly good with some weak spots
Good in ways that some evaluators value but not others
Good if you interpret it one way, less good if you interpret it another
Unclear enough that reasonable people would disagree

Forcing a binary answer loses information. Forcing a single number (7.3 out of 10) creates false precision — it looks exact but hides the uncertainty underneath.

Fuzzy logic provides mathematical tools for representing partial truths and genuine uncertainty. Instead of “good” or “bad,” we can represent “73% confident it’s good, 12% confident it’s bad, 15% uncertain.”

This isn’t vagueness — it’s precision about imprecision. We’re being mathematically rigorous about the limits of what we know.

The Traditional Scoring Problem

Let’s see why traditional scoring fails with a concrete example.

Three evaluation models assess a candidate’s response:

Model	Score (0–100)
Semantic Similarity	82
Reasoning Depth	61
Relevance	78

Traditional approach: Average the scores. (82 + 61 + 78) / 3 = 73.7

The candidate gets a 74. Looks precise. But what does it mean?

The models significantly disagree. Semantic Similarity sees a strong response (82). Reasoning Depth sees a weak response (61). That’s a 21-point gap — not minor noise, but meaningfully different evaluations.

The average hides this disagreement. Someone looking at “74” has no idea that the score is contested. They might treat it with the same confidence as a score where all models agreed at 74.

Now consider a different candidate:

Model	Score (0–100)
Semantic Similarity	75
Reasoning Depth	73
Relevance	74

Average: 74

Same final score. But completely different reliability. This 74 is solid — all models agree. The previous 74 is shaky — models disagree significantly.

Traditional scoring can’t distinguish these cases. Fuzzy scoring can.

Introducing TR-q-ROFNs

LayersRank uses Type-Reduced q-Rung Orthopair Fuzzy Numbers (TR-q-ROFNs) to represent evaluation outcomes.

Don’t let the name intimidate you. The concept is intuitive once you see it.

The Three Components

Every evaluation has three components:

Truth (T)

The degree to which evidence supports a positive assessment. Range: 0 to 1. Higher T = stronger evidence the candidate performed well.

Falsity (F)

The degree to which evidence supports a negative assessment. Range: 0 to 1. Higher F = stronger evidence the candidate performed poorly.

Refusal (R)

The degree of uncertainty, indeterminacy, or disagreement. Range: 0 to 1. Higher R = more uncertainty about the evaluation.

The Pythagorean Constraint

These three components must satisfy:

Constraint

T² + F² + R² = 1

This constraint ensures they trade off against each other. Strong evidence for positive (high T) leaves less room for uncertainty (R must be lower). Strong negative evidence (high F) also reduces uncertainty. High uncertainty (high R) means the evidence doesn’t clearly point either direction.

Why “Orthopair” and “q-Rung”?

“Orthopair” means we separately track positive evidence (T) and negative evidence (F) rather than collapsing them into a single scale. This matters because “no positive evidence” is different from “strong negative evidence.”

“q-Rung” refers to the mathematical generalization parameter. With q=2 (which LayersRank uses), we get the Pythagorean constraint above. Higher q values allow more extreme combinations of T and F.

Why “Type-Reduced”?

Type-2 fuzzy sets model uncertainty about the fuzzy membership values themselves — uncertainty about uncertainty. Type-Reduction is a process that converts these complex representations into actionable values.

For LayersRank, this means we can model disagreement between evaluation models (uncertainty about the evaluation) and then reduce it to scores and confidence levels you can actually use for decisions.

The Pipeline

From Model Scores to Fuzzy Numbers

Here’s how we convert raw model scores into fuzzy numbers.

Normalize Model Outputs

Each model produces a score on its native scale. We normalize to 0–1.

Model	Raw Score	Normalized
Semantic Similarity	0.82	0.82
Reasoning Depth	6.1/10	0.61
Relevance	0.78	0.78

Calculate Agreement

We measure how much models agree using variance or similar dispersion metrics.

Low variance = high agreement. High variance = low agreement.

Scores: [0.82, 0.61, 0.78]
Mean:    0.737
Variance: 0.0078
Std Dev: 0.088

This is moderate disagreement — not extreme, but not tight agreement either.

Derive T, F, R

We map the normalized scores and agreement into fuzzy components:

Truth (T):   Based on central tendency, weighted by agreement
             T = mean_score × agreement_factor
             T ≈ 0.70

Falsity (F): Based on negative evidence signals
             F = (1 - mean_score) × agreement_factor
             F ≈ 0.15

Refusal (R): Derived from disagreement
             R = √(1 - T² - F²)
             R = √(1 - 0.49 - 0.0225) = √0.4875
             R ≈ 0.70

This indicates substantial uncertainty — appropriate given the model disagreement.

Convert to Score and Confidence

The final score derives from T and F, while confidence derives from R:

Score = f(T, F, R)  →  calibrated to expected distributions
Confidence = 1 - R  →  0.30 or 30%

T, F, and R together determine both the score and its reliability. The interval width also derives from R — higher R means wider interval.

Worked Example

When Models Agree: High Confidence

Candidate answers a system design question. Four models evaluate.

Model	Raw Output	Interpretation
Semantic	0.81	Strong conceptual match
Lexical	0.72	Good terminology
Reasoning	0.78	Solid logical depth
Relevance	0.84	Directly addresses question

Agreement Analysis

Mean: 0.7875

Std Dev: 0.045

Low std dev = high agreement

Fuzzy Components

T ≈ 0.82 (strong positive)

F ≈ 0.06 (weak negative)

R ≈ 0.15 (low uncertainty)

Final Output

79 ± 385% confidence

The candidate scored 79. We’re quite confident about it. The true score is almost certainly between 76 and 82.

Worked Example

When Models Disagree: Handling Uncertainty

Same question, but the candidate’s response is ambiguous.

Model	Raw Output	Interpretation
Semantic	0.86	Strong keyword match
Lexical	0.79	Good terminology
Reasoning	0.52	Shallow logic, lacks depth
Relevance	0.71	Partially addresses question

Agreement Analysis

Mean: 0.72

Std Dev: 0.13

High std dev = low agreement

Fuzzy Components

T ≈ 0.65 (moderate positive)

F ≈ 0.20 (some negative)

R ≈ 0.45 (substantial uncertainty)

Initial Output

72 ± 955% confidence

Our best guess is 72, but we’re not very confident. The true score could reasonably be anywhere from 63 to 81. Interpret with caution.

Adaptive Follow-Up Triggered

Because R (0.45) exceeds our threshold (0.25), the system generates a follow-up question:

“You mentioned several design considerations. Can you walk through your reasoning for how you’d handle failure scenarios?”

The candidate responds with more depth. Reasoning model now scores 0.71 instead of 0.52. Agreement improves. R drops to 0.18.

Revised Output

76 ± 482% confidence

The follow-up resolved the ambiguity. We now have a reliable assessment.

Impact

Why This Matters for Hiring

Better Decision-Making

When confidence is high, you can act decisively. When confidence is low, you know to gather more information or weight the score less heavily. A traditional “74” gives you nothing to work with. A “74 ± 9, 55% confidence” tells you exactly how much to trust it.

Fairer Evaluation

Ambiguous responses get flagged rather than arbitrarily scored. The candidate gets a chance to clarify via adaptive follow-up rather than being penalized for one unclear answer. This is especially important for candidates whose communication style differs from the training data.

Auditability

Every score has a mathematical derivation. T, F, and R values are logged. Model scores are recorded. If someone asks “why did this candidate get 76?”, you can trace through the exact calculation. This matters for compliance, candidate feedback, and continuous improvement.

Technical Details

For those who want the precise mathematics.

The q-ROFS Definition

A q-rung orthopair fuzzy set A on universe X is:

Formal Definition

A = {⟨x, T_A(x), F_A(x)⟩ | x ∈ X}

Where:
  T_A: X → [0,1]  (truth membership function)
  F_A: X → [0,1]  (falsity membership function)

Constraint:
  (T_A(x))^q + (F_A(x))^q ≤ 1

For q = 2 (Pythagorean fuzzy sets):
  T² + F² ≤ 1

Refusal degree:
  R = √(1 - T² - F²)

Type-Reduction

For Type-2 fuzzy sets where membership values are themselves fuzzy, type-reduction converts to Type-1 via centroid or other methods.

In LayersRank, the “Type-2” aspect comes from model disagreement — we have uncertainty about the evaluation itself. Type-reduction aggregates the multiple model perspectives into a single fuzzy number for each response.

Aggregation Operators

To combine multiple fuzzy evaluations (across questions, across dimensions), we use generalized aggregation operators that preserve the fuzzy structure:

Weighted Aggregation

For two TR-q-ROFNs:
  α₁ = (T₁, F₁, R₁)
  α₂ = (T₂, F₂, R₂)

Weighted average:
  T_agg = w₁T₁ + w₂T₂
  F_agg = w₁F₁ + w₂F₂
  R_agg = √(1 - T_agg² - F_agg²)

More sophisticated operators (Einstein, Hamacher) can provide different aggregation behaviors for specific use cases.

Mathematics That Serves Decisions

The fuzzy framework isn't academic exercise — it's the foundation for hiring decisions you can trust and defend. See how it works in practice.