I am excited to announce that KnowledgeLake has obtained patent pending status for our recently developed intellectual property focused on intelligent content classification, information extraction, and automatic training via supervised learning. Our new approach leverages several recent innovations in machine learning and computer vision that enable us to solve these problems in a more efficient and robust manner.
The first invention listed in the patent is our unique approach to document clustering. KnowledgeLake starts by looking at documents as pictures instead of text on paper. Our approach leverages similar technology that is brought to bear in facial recognition but applied to the document classification problem space. This approach enables us to rapidly (in milliseconds) identify and categorize similar content. This method enables downstream processes to act on categorized content in an intelligent and efficient manner without waiting for full-page OCR which can take over 5 seconds per page.
The second invention in our patent is our unique approach to document classification and extraction. Most of the vendors in the capture industry have handled classifying and extracting information from documents with a common approach. The first drawback to the common approach is that it requires an expert to author complex and cumbersome regular expressions to search the text of a document to classify and extract information. The core reliance on OCR is also a major drawback as it is prone to accuracy issues—especially if the document is of mediocre quality.
Our approach differs in that we do not require engineering expertise to train the system. End users can teach the system to classify and extract information from documents with ease. Rolling out new workloads only takes a few hours versus weeks or months with other vendors. Our use of image-based identification enables us to quickly and accurately match patterns on documents without the burden of full-page OCR and with higher accuracy, especially in low-quality documents.
While we have already created a user experience for users to quickly and easily teach the system, we wanted our platform to be able to learn on its own. The goal of supervised learning is to recognize patterns and create heuristics that lead to improved business efficiencies. KnowledgeLake’s supervised learning sits in the background observing users during their workday.
As the system identifies patterns in the content and behavior of the users, it automatically adds this knowledge to the training set which removes the necessity for human input for future versions of the document. This will ensure that every customer’s environment will evolve with changing content and create an increased rate of ROI for all our customers.
Stay tuned, as we’ll have additional announcements coming soon!
Other posts you might be interested in
View All Posts
5 min read
KnowledgeLake is Making a Splash
June 6, 2022 Read More
Cloud
4 min read
A Limitless and Searchable Document Repository in the Cloud: KnowledgeLake’s Tahoe Update
May 12, 2022 Read More
Banks & Credit Unions
5 min read