Supporting Clinical Decision Making: Semantics Based Classification of Medical Referral Letters


  • Ian Wilson Computing and Mathematical Sciences, University of South Wales, United Kingdom



Clinical and Health Information, Classification, Decision Systems, Natural Language Processing, Support Vector Machines, Semantic Indexing


This study aims to develop a Natural Language Processing based decision support system built from a repository of knowledge drawn from referral letters written between primary care doctors and specialist medical consultants. The developed system translates pre-processed referral letters into a semantic matrix of document vectors and a set of vocabulary features, based solely on the words used within each referral letter. The system applies a one-versus-rest heuristic using a Support Vector Machine (SVM) to convert a multinomial classification problem into individual binary classifications. Each document is matched to its probabilistic best fit specialism. The National Health Service Wales sourced 111,700 examples. Accuracy of 91.8% against 29 medical specialities is achieved. Accuracy increases to 97.4% and 99%, respectively, when also including one or two nearest neighbours to the best fit, providing a basis for informing the decision making of a medical professional. The study demonstrates the efficacy of using referral letters to allow or classification into specialisms and subsequent allocation of specialist care. The approach taken in this study does not require added ontologies and is readily extendable. The system offers support to medical professionals, particularly within training scenarios or where access to opinion may be in short supply.


Download data is not yet available.


J. Todd, B. Richards, B. J. Vanstone and A. Gepp, "Text Mining and Automation for Processing of Patient Referrals," Applied Clinical Information, vol. 9, no. 1, pp. 232-237, 2018.

Spasic and K. Button, "Patient Triage by Topic Modeling of Referral Letters: Feasibility Study," JMIR Medical Informatics, vol. 8, no. 11, 2020.

G. W. Furnas, T. K. Landauer, T. K. Gomez and S. T. Dumais, "Human factors and behavioural science: Statistical semantics: Analysis of the potential performance of keyword information systems.," Bell System Technical Journal, vol. 62, no. 6, pp. 1753-1806, 1983.

W. Weaver, Machine Translation of Languages, W. Locke and D. Booth, Eds., Cambridge, Massachusetts: MIT Press, 1955, pp. 15-23.

J. R. Firth, A Synopsis of Linguistic Theory 1930-1955 'Studies in Linguistic Analysis', Oxford, 1957.

Z. S. Harris, "Distributional Structure," WORD, vol. 10, no. 2-3, pp. 146-162, 1954.

D. Sculley, "Web-scale k-means clustering," in Proceedings of the 19th international conference on World wide web, Raleigh, North Carolina, 2010.

S. Sheikhalishahi, R. Miotto, J. T. Dudley, A. Lavelli, F. Rinaldi and V. Osmani, "Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review," JMIR Med Inform, vol. 7, no. 2, 2019.

X. Wang, A. Chused, N. Elhadad, C. Friedman and M. Markatou, "Automated Knowledge Acquisition from Clinical Narrative Reports," in AMIA Annu Symp Proc., Washington, DC, 2008.

Ö. Uzuner, I. Solti and E. Cadag, "Extracting medication information from clinical text," Journal of the American Medical Informatics Association, vol. 17, no. 5, pp. 514-518, 2010.

O. Bodenreider, "The Unified Medical Language System (UMLS): integrating biomedical terminology," Nucleic Acids Research, vol. 32, no. 1, pp. D267-D270, 2004.

M. Q. Stearns, C. Price, K. A. Spackman and A. Y. Yang, "SNOMED clinical terms: overview of the development process and project status.," in AMIA Symposium, 2001.

S. Gehrmann, F. Dernoncourt, Y. Li, E. T. Cralson, J. T. Wu, J. Welt, J. Foote Jr., E. T. Moseley, D. W. Grand, P. D. Tyler and L. A. Celi, "Comparing deep learning and concept extraction based methods for patient phenotyping from clinical narratives," PLoS ONE , vol. 13, no. 2, 2018.

W.-H. Weng, K. B. Wagholikar, A. T. McCray, P. Szolovits and H. C. Chueh, "Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach," BMC Medical Informatics and Decision Making, vol. 17, 2017.

H. Faris, M. Habib, M. Faris, M. Alomari and A. Alomari, "Medical speciality classification system based on binary particle swarms and ensemble of one vs. rest support vector machines," Journal of Biomedical Informatics, 2020.

Z. S. Harris, "Distributional Structure," Word, vol. 10, no. 2, pp. 146-162, 1954.

G. Boole, An investigation of the laws of thought: on which are founded the mathematical theories of logic and probabilities, London: Cambridge: Macmillan and Co., 1854.

K. S. Jones, "A statistical interpretation of term specificity and its application in retrieval.," Journal of Documentation, vol. 28, no. 1, pp. 11-21, 1972.

G. Salton and M. J. McGill, Introduction to Modern Information Retreival, McGraw-Hill, Inc., 1986.

D. L. Lee, H. Chang and K. Seamons, "Document ranking and the vector-space model," IEEE Software, vol. 14, no. 2, pp. 67-75, 1997.

V. N. Gudivada and C. R. Rao, Computational analysis and understanding of natural languages, Amsterdam: Elsevier, 2018.

Bagga and B. Baldwin, "Entity-based cross-document coreferencing using the Vector Space Model," in ACL '98/COLING '98: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, 1998.

S. Dumais, J. Platt, D. Heckerman and M. Sahami, "Inductive learning algorithms and representations for text categorization," in Proceedings of the seventh international conference on Information and knowledge management, New York, 1998.

Y. Yang and X. Liu, "A re-examination of text categorization methods," in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, 1999.

D. Meyer, F. Leisch and K. Hornik, "The support vector machine under test," Neurocomputing, vol. 55, no. 1-2, pp. 169-186, 2003.

K. Harimoorthy and M. Thangavelu, "Multi-disease prediction model using improved SVM-radial bias technique in healthcare monitoring system," Journal of Ambient Intelligence and Humanized Computing, 2020.

B. E. Boser, I. M. Guyon and V. N. Vapnik, "A training algorithm for optimal margin classifiers," in Proceedings of the fifth annual workshop on Computational learning theory, New York, 1992.

W. McKinney and others, "Data structures for statistical computing in python," in Proceedings of the 9th Python in Science Conference, 2010.

E. Loper and S. Bird, "NLTK: the Natural Language Toolkit," in Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics, Stroudsburg, 2002.

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pasoss, D. Cournapeau, M. Brucher, M. Perrot and É. Duchesnay, "Scikit-learn: Machine Learning in Python," Journal of Machine learning Research, vol. 12, pp. 2825-2830, 2011.

Digital Health and Care Wales, Doctor referral letter dataset, 2019.

M. Sokolova and G. Lapalme, "A systematic analysis of performance measures for classification tasks," Information Processing & Management, vol. 45, no. 4, pp. 427-437, 2009.

M. T. Ribeiro, S. Singh and C. Guestrin, ""Why Should {I} Trust You?": Explaining the Predictions of Any Classifier," in Proceedings of the 22nd {ACM} {SIGKDD} International Conference on Knowledge Discovery and Data Mining, San Francisco, ACM, 2016, pp. 1135-1144.

Augenstein, L. Derczynski and K. Bontcheva, "Generalisation in named entity recognition: A quantitative analysis," Computer Speech & Language, vol. 44, pp. 61-83, 2017.

G. K. Savova, J. J. Masanz, V. P. Ogren, J. Zheng, S. Sohn, K. C. Kipper-Schuler and C. G. Chute, "Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications," Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C., & Chute, C. G. (2010). Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the Americ, vol. 17, no. 5, pp. 507-513, 2010.

UK Government, "Open government licence for public sector information," 2014. [Online]. Available: [Accessed 25 March 2021].



How to Cite

Ian Wilson. (2023). Supporting Clinical Decision Making: Semantics Based Classification of Medical Referral Letters. Data Science: Journal of Computing and Applied Informatics, 7(1), 24-34.