

Vertesia Inc., a unified low-code platform for developing and deploying custom generative artificial intelligence applications, today announced the launch of a new semantic document preparation service it says will increase the reliability of AI applications and speed up development.
Vertesia provides a cloud-based application programming interface service that prepares underlying data for use by generative AI models, ensuring output accuracy. According to the company’s own research, up to 50% of the development time spent on generative AI applications is dedicated to document preparation.
The new semantic document preparation service is designed to ease this process and provide developers a rich context for large language models to work with, which Vertesia claims can “eliminate” generative AI hallucinations.
A hallucination is an error where an LLM generates an incorrect or false answer that it states confidently. The causes can be numerous, including training data issues, inherent limitations or challenges in understanding nuanced language or context such as incomplete or noisy data.
“The two concerns we hear most from enterprise leaders are consistent: 95% accuracy isn’t good enough and data preparation is a costly, time-consuming challenge,” said Chief Revenue Officer Chris McLaughlin. “Our Semantic DocPrep service was built to solve both — giving developers a set of APIs to automate document preparation and significantly improve the accuracy and relevancy of LLM outputs.”
The company said using its preparation service, it can convert even the most complex documents, such as reports and regulatory filings, into richly structured, semantically tagged XML. It will do this without rewriting or altering the source. Since the process preserves the original structure, relationships and context, it ensures that the LLM can understand the document without misinterpreting the information, which greatly increases the accuracy of responses.
This document transformation method is designed for developers building custom generative AI applications and retrieval-augmented generation pipelines, also known as RAG, which are used to enhance the accuracy of generative AI apps with real-time data.
The company said its data transformation engine deconstructs documents at the page level and uses the most appropriate AI model based on the content: dense text, tabular data, images or a mix. It will either use LLMs, optical character recognition or vision models. By using this hybrid model, it avoids rewrites to maintain consistency and preserve the original text and generate high-fidelity XML outputs.
The service is accessible via an API, which can be combined directly into a development pipeline. This allows developers to send documents for preparation and receive XML outputs ready for chunking, indexing and model ingestion. No setup or model training is required.
The new Semantic DocPrep is part of the company’s already existing platform, which provides infrastructure for organizations who are looking to build, deploy and manage custom generative AI applications and agents at scale.
THANK YOU