Science / Fuzzy Logic Framework
The Mathematics of "We’re Not Sure"
Traditional scoring forces false certainty. Fuzzy mathematics lets us represent what we actually know: sometimes the answer is clear, sometimes it's not, and knowing the difference matters for decisions.
Why Fuzzy Logic?
Consider a simple question: Is this candidate’s response “good”?
In classical logic, the answer must be yes or no. True or false. 1 or 0.
But that’s not how evaluation actually works. A response might be:
- Mostly good with some weak spots
- Good in ways that some evaluators value but not others
- Good if you interpret it one way, less good if you interpret it another
- Unclear enough that reasonable people would disagree
Forcing a binary answer loses information. Forcing a single number (7.3 out of 10) creates false precision — it looks exact but hides the uncertainty underneath.
Fuzzy logic provides mathematical tools for representing partial truths and genuine uncertainty. Instead of “good” or “bad,” we can represent “73% confident it’s good, 12% confident it’s bad, 15% uncertain.”
This isn’t vagueness — it’s precision about imprecision. We’re being mathematically rigorous about the limits of what we know.
The Traditional Scoring Problem
Let’s see why traditional scoring fails with a concrete example.
Three evaluation models assess a candidate’s response:
| Model | Score (0–100) |
|---|---|
| Semantic Similarity | 82 |
| Reasoning Depth | 61 |
| Relevance | 78 |
Traditional approach: Average the scores. (82 + 61 + 78) / 3 = 73.7
The candidate gets a 74. Looks precise. But what does it mean?
The models significantly disagree. Semantic Similarity sees a strong response (82). Reasoning Depth sees a weak response (61). That’s a 21-point gap — not minor noise, but meaningfully different evaluations.
The average hides this disagreement. Someone looking at “74” has no idea that the score is contested. They might treat it with the same confidence as a score where all models agreed at 74.
Now consider a different candidate:
| Model | Score (0–100) |
|---|---|
| Semantic Similarity | 75 |
| Reasoning Depth | 73 |
| Relevance | 74 |
Average: 74
Same final score. But completely different reliability. This 74 is solid — all models agree. The previous 74 is shaky — models disagree significantly.
Traditional scoring can’t distinguish these cases. Fuzzy scoring can.
Introducing TR-q-ROFNs
LayersRank uses Type-Reduced q-Rung Orthopair Fuzzy Numbers (TR-q-ROFNs) to represent evaluation outcomes.
Don’t let the name intimidate you. The concept is intuitive once you see it.
The Three Components
Every evaluation has three components:
Truth (T)
The degree to which evidence supports a positive assessment. Range: 0 to 1. Higher T = stronger evidence the candidate performed well.
Falsity (F)
The degree to which evidence supports a negative assessment. Range: 0 to 1. Higher F = stronger evidence the candidate performed poorly.
Refusal (R)
The degree of uncertainty, indeterminacy, or disagreement. Range: 0 to 1. Higher R = more uncertainty about the evaluation.
The Pythagorean Constraint
These three components must satisfy:
Constraint
T² + F² + R² = 1
This constraint ensures they trade off against each other. Strong evidence for positive (high T) leaves less room for uncertainty (R must be lower). Strong negative evidence (high F) also reduces uncertainty. High uncertainty (high R) means the evidence doesn’t clearly point either direction.
Why “Orthopair” and “q-Rung”?
“Orthopair” means we separately track positive evidence (T) and negative evidence (F) rather than collapsing them into a single scale. This matters because “no positive evidence” is different from “strong negative evidence.”
“q-Rung” refers to the mathematical generalization parameter. With q=2 (which LayersRank uses), we get the Pythagorean constraint above. Higher q values allow more extreme combinations of T and F.
Why “Type-Reduced”?
Type-2 fuzzy sets model uncertainty about the fuzzy membership values themselves — uncertainty about uncertainty. Type-Reduction is a process that converts these complex representations into actionable values.
For LayersRank, this means we can model disagreement between evaluation models (uncertainty about the evaluation) and then reduce it to scores and confidence levels you can actually use for decisions.
The Pipeline
From Model Scores to Fuzzy Numbers
Here’s how we convert raw model scores into fuzzy numbers.
Normalize Model Outputs
Each model produces a score on its native scale. We normalize to 0–1.
| Model | Raw Score | Normalized |
|---|---|---|
| Semantic Similarity | 0.82 | 0.82 |
| Reasoning Depth | 6.1/10 | 0.61 |
| Relevance | 0.78 | 0.78 |
Calculate Agreement
We measure how much models agree using variance or similar dispersion metrics.
Low variance = high agreement. High variance = low agreement.
Scores: [0.82, 0.61, 0.78] Mean: 0.737 Variance: 0.0078 Std Dev: 0.088
This is moderate disagreement — not extreme, but not tight agreement either.
Derive T, F, R
We map the normalized scores and agreement into fuzzy components:
Truth (T): Based on central tendency, weighted by agreement
T = mean_score × agreement_factor
T ≈ 0.70
Falsity (F): Based on negative evidence signals
F = (1 - mean_score) × agreement_factor
F ≈ 0.15
Refusal (R): Derived from disagreement
R = √(1 - T² - F²)
R = √(1 - 0.49 - 0.0225) = √0.4875
R ≈ 0.70This indicates substantial uncertainty — appropriate given the model disagreement.
Convert to Score and Confidence
The final score derives from T and F, while confidence derives from R:
Score = f(T, F, R) → calibrated to expected distributions Confidence = 1 - R → 0.30 or 30%
T, F, and R together determine both the score and its reliability. The interval width also derives from R — higher R means wider interval.
Worked Example
When Models Agree: High Confidence
Candidate answers a system design question. Four models evaluate.
| Model | Raw Output | Interpretation |
|---|---|---|
| Semantic | 0.81 | Strong conceptual match |
| Lexical | 0.72 | Good terminology |
| Reasoning | 0.78 | Solid logical depth |
| Relevance | 0.84 | Directly addresses question |
Agreement Analysis
Mean: 0.7875
Std Dev: 0.045
Low std dev = high agreement
Fuzzy Components
T ≈ 0.82 (strong positive)
F ≈ 0.06 (weak negative)
R ≈ 0.15 (low uncertainty)
Final Output
79 ± 385% confidence
The candidate scored 79. We’re quite confident about it. The true score is almost certainly between 76 and 82.
Worked Example
When Models Disagree: Handling Uncertainty
Same question, but the candidate’s response is ambiguous.
| Model | Raw Output | Interpretation |
|---|---|---|
| Semantic | 0.86 | Strong keyword match |
| Lexical | 0.79 | Good terminology |
| Reasoning | 0.52 | Shallow logic, lacks depth |
| Relevance | 0.71 | Partially addresses question |
Agreement Analysis
Mean: 0.72
Std Dev: 0.13
High std dev = low agreement
Fuzzy Components
T ≈ 0.65 (moderate positive)
F ≈ 0.20 (some negative)
R ≈ 0.45 (substantial uncertainty)
Initial Output
72 ± 955% confidence
Our best guess is 72, but we’re not very confident. The true score could reasonably be anywhere from 63 to 81. Interpret with caution.
Adaptive Follow-Up Triggered
Because R (0.45) exceeds our threshold (0.25), the system generates a follow-up question:
“You mentioned several design considerations. Can you walk through your reasoning for how you’d handle failure scenarios?”
The candidate responds with more depth. Reasoning model now scores 0.71 instead of 0.52. Agreement improves. R drops to 0.18.
Revised Output
76 ± 482% confidence
The follow-up resolved the ambiguity. We now have a reliable assessment.
Impact
Why This Matters for Hiring
Better Decision-Making
When confidence is high, you can act decisively. When confidence is low, you know to gather more information or weight the score less heavily. A traditional “74” gives you nothing to work with. A “74 ± 9, 55% confidence” tells you exactly how much to trust it.
Fairer Evaluation
Ambiguous responses get flagged rather than arbitrarily scored. The candidate gets a chance to clarify via adaptive follow-up rather than being penalized for one unclear answer. This is especially important for candidates whose communication style differs from the training data.
Auditability
Every score has a mathematical derivation. T, F, and R values are logged. Model scores are recorded. If someone asks “why did this candidate get 76?”, you can trace through the exact calculation. This matters for compliance, candidate feedback, and continuous improvement.
Technical Details
For those who want the precise mathematics.
The q-ROFS Definition
A q-rung orthopair fuzzy set A on universe X is:
Formal Definition
A = {⟨x, T_A(x), F_A(x)⟩ | x ∈ X}
Where:
T_A: X → [0,1] (truth membership function)
F_A: X → [0,1] (falsity membership function)
Constraint:
(T_A(x))^q + (F_A(x))^q ≤ 1
For q = 2 (Pythagorean fuzzy sets):
T² + F² ≤ 1
Refusal degree:
R = √(1 - T² - F²)Type-Reduction
For Type-2 fuzzy sets where membership values are themselves fuzzy, type-reduction converts to Type-1 via centroid or other methods.
In LayersRank, the “Type-2” aspect comes from model disagreement — we have uncertainty about the evaluation itself. Type-reduction aggregates the multiple model perspectives into a single fuzzy number for each response.
Aggregation Operators
To combine multiple fuzzy evaluations (across questions, across dimensions), we use generalized aggregation operators that preserve the fuzzy structure:
Weighted Aggregation
For two TR-q-ROFNs: α₁ = (T₁, F₁, R₁) α₂ = (T₂, F₂, R₂) Weighted average: T_agg = w₁T₁ + w₂T₂ F_agg = w₁F₁ + w₂F₂ R_agg = √(1 - T_agg² - F_agg²)
More sophisticated operators (Einstein, Hamacher) can provide different aggregation behaviors for specific use cases.
Further Reading
Foundational Papers
- Zadeh, L.A. (1965). Fuzzy sets. Information and Control.
- Yager, R.R. (2017). Generalized orthopair fuzzy sets. IEEE Transactions on Fuzzy Systems.
- Atanassov, K.T. (1986). Intuitionistic fuzzy sets. Fuzzy Sets and Systems.
Applications to Decision-Making
- Liu, P. & Wang, P. (2018). Multiple-attribute decision-making based on q-rung orthopair fuzzy aggregation operators.
- Wei, G. et al. (2019). Multiple attribute decision making with q-rung orthopair fuzzy information.
Mathematics That Serves Decisions
The fuzzy framework isn't academic exercise — it's the foundation for hiring decisions you can trust and defend. See how it works in practice.