Supplementary Material for: Evaluating Hymenoptera Venom Allergy Severity: A Data-Centric Comparison of Grading Instruments

posted on 2024-02-10, 09:47 authored by Kačar M., Košnik M.
Introduction: While a consensus seems to have been reached with regards to the definition of anaphylaxis, there is no universal instrument for scoring allergic reaction severity despite more than 30 having been proposed by the time of writing. This severely hampers comparison of data between studies. While scales have been compared with regards to their utility in grading food-related reactions, no such comparisons have been made for Hymenoptera venom-associated reactions. Methods: The study conducted a retrospective analysis to compare the severity of Hymenoptera venom allergy reactions in104 participants with suspected Hymenoptera venom allergy. The study applied six grading instruments to each reaction, also evaluating them against the NIAID/FAAN anaphylaxis criteria. Sensitivity, specificity, and Receiver Operating Characteristic (ROC) Area Under the Curve (AUC) for identifying anaphylaxis were calculated. Severity scales were simplified into "mild," "moderate," and "severe" categories. The most common severity grade across the five scales was determined using a custom function to establish a consensus severity grade. Results: The most common culprit insects were honeybees (49.0%). Among the 88 participants with generalized reactions, the highest proportion had involvement of four organ systems. The scales showed high specificity for detecting anaphylaxis, especially when using higher grades of the Mueller, WAO, and Dribin scales. The diagnostic yields (AUC) varied, with the WAO scale having the highest AUC (0.94) for grades 3, 4, and 5. Spearman correlation analysis showed the strongest correlations seen between the Brown and Dribin, Ring & Messmer and Dribin, and Ring & Messmer and Reisman scales. The lowest correlations were observed with the Mueller scale when paired with the WAO, Reisman, and Dribin scales. An inter-rater reliability analysis showed substantial agreement between scales with the same number of grading levels. The agreement was highest for the Brown and Dribin scales, indicating a strong consistency in reaction severity classification across different instruments. Conclusion: While all instruments were effective in stratifying reactions, they showed limitations in differentiating milder phenotypes. The Brown and Dribin scales stood out for their high agreement with the consensus score and sensitivity in identifying anaphylaxis. Our findings suggest that adopting either of these scales could significantly unify the reporting of allergic reactions. We believe, the format of an instrument should be tailored to its intended purpose, with clinical decision aids being simpler and research tools being more detailed.


