Golive R&D department has undertaken an innovative project that aims to automate the multi-modal classification of Turkish documents in order to improve companies' digital document storage processes. Name of the project; “Multi-Modal Automatic Document Classification Algorithm Based on Text and Image Based Classifiers in Turkish Language“. This project aims to classify scanned documents faster and more accurately by using a combination of text-based and image-based classification. Figure 1 shows an example document.
What is Multimodal Classification?
Multimodal classification is the process of categorizing documents based on the different types of data (modalities) they contain. This is employed in cases where documents encompass various sensory and text-based data, such as text, images, audio, video, or other modalities. Multimodal document classification aims to achieve a more comprehensive and detailed classification outcome by analyzing documents based on multiple modalities.
What is Our Method?
Two separate models were developed to increase accuracy in the project: Text-based classification model and Image-based classification model.
The text-based model utilized the Optical Character Recognition (OCR) method to obtain the textual content of the documents. OCR is a technology that digitally transfers the text in a document to a computer using a device such as a scanner or camera. After scanning, OCR makes the text editable, searchable, and storable.
The image-based model was developed with a pre-trained deep learning model and convolutional neural networks. It uses a dataset containing documents scanned and photographed on different devices and resolutions.
Integration of Models
In the project, the features obtained from image-based and text-based models were combined with the XGBoost algorithm to perform classification. Figure 2 shows the schema of the created model.
Success Results
As a result of the project, the classification success achieved with the multimodal classification model was %98 . These successful results highlight the quality of the data analysis and classification solution provided by the project.
Academic Contribution
The successful results of this project were compiled into an article and published, and participation in the relevant conference was ensured. In this way, the project was introduced and shared in the academic world.
As a result, this multimodal document classification algorithm developed by Golive R&D department makes great progress in the automatic classification of Turkish documents and creates great value for businesses by improving the digital document storage process. This project can be shown as an example of Golive's innovation capacity and technological leadership.