Automating medical coding with ICD classification API

The healthcare industry is endeavoring to meaningfully scale digital efforts especially with a view to meet rising managing customer expectations. While the industry is already struggling with interoperability, there is a greater need for data visibility and better customer engagements more than ever. The digital consumers, used to Uber and Amazon, expect the same experiences from healthcare providers.

Given the changing dynamics of customer expectations and technological advancements, the healthcare industry has now turned to APIs for addressing the system’s specific needs. APIs have further brought another level of innovation by integrating with Artificial Intelligence (AI).

Continuing our efforts to make AI easy to adopt, just launched an API platform with a library of AI APIs built for healthcare and wellness industry processes.  For the health sector, APIs like Face2BMI, ICD Classification and Onboarding APIs, have been used to drive customer satisfaction through simpler processes and lower wait times.

ICD Classification API by

Any medical condition or disease can be standardized for representation using the International Code for Diseases (ICD). While traditionally this requires specialists who understand the terms and possess medical expertise to classify a given disease into its ICD Code, we at have researched and developed an AI model which does the same.

The ICD Classification API can automate medical coding on ICD 9/10/11 for a Medical Diagnosis via text inputs. This pre-trained AI Model built using deep learning  and is available as plug and play API. You can send diagnosis text through the API, the system responds with 5 possible ICD Codes along with confidence score, for any medical diagnosis that closely matches the diagnosis description provided.

How is it built?

The primary objective of this API is to provide labels to disease/diagnosis information. We use a transfer learning model on top of Bert Architecture to develop this.

The pretrained model expects ICD description as the input, which is then encoded into the vector. During the inferencing, we feed the diagnosis information to the model and encode the text and find the ICD code with maximum cosine similarity ICD API functionality


The input channel at the time of training is ICD Description and during inferencing is Diagnosis/Discharge text.


During the testing phase, text data pre-processes using regex cleaning, handling the case and mapping abbreviation dictionaries which hold the multiple abbreviations along with their full forms.

Embedding Layer (Transfer Learning)

The pretrained bert vector is used for embedding purposes, which encodes the icd description and input text into a vector.

Cosine Similarity

With Cosine similarity, the input text matches with all the ICDs vector output, the top similarly ICD codes corresponding to ICD description provided.

Output flow of ICD Classification API
Output flow of ICD classification API 

Where it is being used

  • Life & Health Insurance - The API can be used in medical insurance to automate and standardize the medical coding on ICD 10 using the diagnosis provided in the .
  • Analyzing Diagnostic reports - It can help to find the ICD code of digitized diagnostics reports.


The ICD Classification API works as a simple “text in, text out” interface. Currently, the API is successfully able to identify the top 5 ICD codes for the given medical condition. The potential applications are in industries like insurance, healthcare and wellness, where the API can be easily integrated into existing infrastructure.

You can easily get a hands-on experience of all our APIs on You can test the APIs with the sample documents provided, or with your own documents too!

Happy exploring!

Ketaki Joshi

Ketaki Joshi

Driving outreach and messaging to the Insurance community about using AI driven products to gain growth and efficiency. Exploring and developing opportunities for collaboration.
Sachin Kumar Pandey

Sachin Kumar Pandey