Digitizing KYC extraction through AI APIs

It has been a decade of new strides in ‘Digital transformation’ initiatives in the financial services industry. Along with continuously introducing new products and services, organizations are also looking for ways to enhance customer experiences and stay ahead of the curve. However, between new launches, overlapping applications, and building seamless customer experiences, building scalable ‘platforms’ in a short span proved to be a challenge. Experts then turned to APIs, which provide the ease of scalability and integration.

While the use of APIs is not something new, the integration of Artificial Intelligence (AI) with APIs is  now helping organizations achieve a ‘platform at scale’ approach, rather than building one-off applications.

Continuing our efforts to make AI easy to adopt, we launched a library of ‘AI’ APIs to meet such specific needs. This blog covers one of the more interesting APIs from the library, ‘KYC extraction’ API.

KYC Extraction API by Arya.ai

For accurate information retrieval from any Indian KYC document, KYC Extraction API captures information from a variety of ID proof(s) that need to be submitted as part of KYC regulation. The module identifies multiple ID documents such as PAN card, AADHAR card, Voter ID, Passport, Driving license and extracts information accordingly.

How is it built

The main stepwise components of this solution are to:

  1. Extract the text and important features from the Image/PDF document
    a. State of the art OCR technology is used to extract all the text from the document
    b. Extracts important image features from the document
    C. Works quite well on documents with valuable quality in terms of
    resolution, lightning conditions and other types of quality metrics

  2. Classify the document among valid KYC documents
    a. This is an important step towards extracting any document specific entity
    b. Various localised features of each type of document is used to classify the

  3. Recognise the relevant entities for corresponding type of document
    a. Each KYC document has different types of entities, and the structure of
    entities on the document varies from one type of KYC to another in terms of
    placement, presence of header, format, etc.
    b. After classification of valid KYC documents, It is published to Arya’s State of the Art Entity extraction module.
    c. Arya’s Entity extraction module is specifically trained to detect and recognise the entities corresponding to the respective KYC document

Combining all the modules, Arya provides a very robust, efficient and accurate API to recognize, classify and extract information for the KYC documents.

Arya.ai's KYC API workflow


A. Inputs:

The API/Module expects any type of KYC document issued by Indian/State Government. It could be one among five (Aadhar/PAN/Driving License/Voter ID/Passport) KYC documents issued by concerned Indian official organizations

B. Outputs:

The API/Module -

  • Classifies the document among one of the five(Aadhar/PAN/Driving License/Voter ID/Passport) KYC documents
  • Relevant extracted entities of the KYC document
Arya.ai KYC API functionality

Where it is being used currently

The KYC API is currently being used at one of the biggest Private sector banks in India. It is also being used for different types of products/services that require validation of KYC documents as per guidelines of regulatory bodies.

Below are some of the use cases:

  1. Account Opening
  2. Service Request
  3. Loan Processing
  4. Credit Card issuance


The motivation to build the KYC API was to provide a robust solution for all KYC related validations and approvals. Most workflows in Banking, Financial and Insurance industry, including Account Opening, Customer Request, Anti Money Laundering etc. have to be strictly compliant with the KYC norms. Automation was greatly needed to speed up these process. Some of the challenges encountered while developing the module included the following

  1. Documents could belong to various domains, i.e. Identity proof, Residence Proof, etc.
  2. Recognizing document and extracting document specific entities
Demo of Arya.ai's KYC API

You can easily get a hands-on experience of the KYC API and other APIs in our library on https://api.arya.ai/. You can test the APIs with the sample documents provided, or with your own documents too!

Happy browsing!

Ketaki Joshi

Ketaki Joshi

Driving outreach and messaging to the Insurance community about using AI driven products to gain growth and efficiency. Exploring and developing opportunities for collaboration.
Deepak Labh

Deepak Labh

Sr. Research Scientist