By
David Champion
How Naitur Score Helps Psychiatric Researchers Automate Questionnaire Scoring
Naitur Score helps psychological and psychiatric researchers extract scoring rules from questionnaire PDFs, match them to participant datasets, and compute validated scores with measurable accuracy.
Concierge Medicine
Functional & Longevity

Psychological research often depends on validated questionnaires: instruments like the PHQ-9, GAD-7, MADRS, and other clinical scales. These tools are essential, but the work behind them is often manual and time-consuming.
Researchers have to extract scoring rules from published questionnaire documents, identify the right columns in participant datasets, account for reverse-coded items, calculate subscales, and validate results against expected score ranges. It is repetitive work, but it is also high-stakes. A small scoring error can affect the quality of analysis, participant classification, or downstream clinical decisions.
Naitur Score was built to make that process faster, more reliable, and easier to validate.
Instead of treating AI as a black box, Naitur Score breaks the scoring workflow into clear steps: extracting measure structure from PDFs, matching questionnaire items to dataset columns, inferring response values, validating columns, detecting multiple measures in a dataset, and identifying participant IDs. Each step is tested independently so performance can be measured, improved, and trusted.
During development, Naitur evaluated 15 AI models from 7 providers across more than 7,400 API calls and 23.5 million tokens. The result was a tiered architecture: fast, lower-cost models handle the simpler structured tasks, while a premium reasoning model is reserved for the hardest part — extracting scoring logic from arbitrary questionnaire PDFs.
This matters because clinical research cannot rely on “usually correct.” Naitur’s evaluation process defines clear accuracy thresholds, uses holdout validation to prevent overfitting, and continuously adds new edge cases as the system encounters real-world usage.
Privacy is also central to the design. Individual participant responses are not sent to an LLM during scoring. AI is used for preparation work, such as understanding questionnaire structure and matching columns, while final participant-level scoring is deterministic and based on validated formulas.
For researchers, the benefit is straightforward: fewer hours spent on manual scoring, fewer opportunities for human error, and a more consistent way to work with validated psychological measures at scale.
Naitur Score is currently focused on psychiatric and psychological research workflows where accuracy, auditability, and privacy matter. As the system expands, each new measure and edge case strengthens the platform’s evaluation suite and improves future performance.
For a deeper look at the evaluation process, model testing, privacy architecture, and technical results, read David Champion’s full breakdown here: https://davidchampion.substack.com/p/naitur-scoring-engine-for-psychiatric
