By: Chris Caplinger
This final part of a three part series on imaging and workflow using SharePoint Products and Technologies will briefly cover the required parts list for an imaging solution with SharePoint Products and Technologies. The first part of the series focused on “What is Imaging”, explaining the technology in a nutshell. The second part explained “Why” you would want to use a SharePoint Document Library as the repository for an imaging system. If you want to get up-to-date on this series, you can find part one here, and part two here.
Capture
Before you can store images in a SharePoint Document Library, you have to capture the image. When you talk about capture I think it’s simple to break the actual capture of document images capture down into three basic categories:
Microsoft does provide a desktop scanning application with MODI (Microsoft Office Document Imaging) that is available with Microsoft Office 2003 Professional. The devices used with MODI are typically Twain compliant scanners that operate at very low speeds but are small and compact to fit perfectly on the occasional users desktop.
Production/High Speed Document Scanning is usually accomplished with scanners that have automatic document feeders (ADF), capturing pages at rates up to 140 pages per minute. These scanners have the ability to scan pages in simplex or duplex operation using virtually any size paper than can be fit into a file cabinet. Microsoft does not provide a program that can operate these scanners, but there are many third part solutions that handle this task.
Fax capture is also a niche where Microsoft depends on third party vendors. These vendors deliver solutions that manage several fax lines on a server, converting the faxed pages into multi-page documents that can be uploaded to the server.
Indexing
After an image is captured, is must be “Indexed” before it is sent to a document library. Indexing is the process of assigning Meta data to the captured documents in order for it to be easily retrieved at a later time. There are many methods for indexing documents:
Key from image is basically manually indexing, and probably the most popular form of indexing captured documents. During this process users typically look at the captured images on a screen and manually key in the indexing values needed for Meta data searching once the document is stored in a SharePoint Document Library.
Bar-coding is often used in conjunction with and sometimes as a replacement for Key from image. When images contain bar codes with Meta data, most production capture software can read the bar codes and automatically populate Meta data fields. Sometimes these bar codes are simply used to indicate the beginning of a new document in large batch scanning operation. Alternatively, they can be used to notify the indexing software of the document type that is preparing to be scanned.
Optical/Intelligent Character Recognition is the process of recognizing typed or handwritten data on images and converting this information to Meta data. This process is often misunderstood and does NOT always provide a good replacement for manual indexing. It can, however, with the right software and implementation team, save time when you are scanning large volumes of very similar forms.
SharePoint Products and Technologies contain no ability for production indexing of captured documents. Instead they depend on ISVs (Independent Software Vendors) to provide this functionality. If you are scanning with MODI, you can download a free product here to aid with this. Production capture and fax solutions should provide this functionality out of the box.
Release
After paper documents have been captured and indexed they need to be moved to a SharePoint document library. I usually call this releasing, but you could call it upload, saving or make up something new. Out of the box with SharePoint Products and Technologies, you can only upload images using the capabilities available with a web browser. By writing some code you can use the SharePoint APIs or FrontPage RPC to build applications that release images to document libraries. Perhaps the simplest way to get scanned images into SharePoint Document Libraries is to just purchase a solution from a Microsoft ISV Partner. These solutions can typically scan, index and release all within one application.
Document Searching
Once documents are released and stored into SharePoint Document Libraries, users will typically want to search for these documents by the column data associated with the list item. To my knowledge the only solution for searching specific column data using logical search operators and wild cards is a web part our company provides in KnowledgeLake Imaging.
Another possibility for searching document images is to full text OCR the image and use the out-of-the-box searching capabilities within both SharePoint Portal Server 2003 or Windows SharePoint Services. Many capture applications can full text OCR into both TIFF and PDF formats, or iFilters can be used by either Microsoft or third parties.
Image Retrieval and Viewing
Now that your documents are safely stored in SharePoint, you will want to retrieve and view them. More than likely your images will be stored in either TIFF or PDF format, although in the future XPS could become a viable storage medium for multi page image files. To view PDF files you simply need to install the free Acrobat Reader, but making changes and saving them back is not possible without Adobe Acrobat Professional or a competing product. These products also don’t have the ability to directly save PDF files back to document libraries without the aid of KnowledgeLake Connect Professional.
TIFF files provide similar problems, although you can use MODI to view a TIFF or MDI document directly from your browser, you will need to download this free product in order to save them directly back to the document library.
Workflow
Workflow within SharePoint could itself be at least one entire article, so I’m going to limit discussion very briefly to the basic capabilities of workflow and suggestions when purchasing for a workflow solution. First, workflow is considered to be part of an entire document imaging solution because once you’ve invested the money to capture and index documents, you will likely want to put those images in front of the right person at the right time in order to maximize your return on investment.
When purchasing a workflow solution, these are some of the features that I would recommend looking for:
Well, what good would it be to talk about workflow and SharePoint right now without at least mentioning the Office Workflow that was announced at PDC a couple weeks back? This product is for real, but a couple things to keep in mind: 1) it won’t be released until late next year, with widespread adoption sometime in 2007; 2) it is built for Office documents, not images. The return on investment with imaging and workflow is huge, and you should start considering this today instead of waiting a couple years. I do think however that you should talk to your vendor about interoperability with Office Workflow before making an investment.
Chris Caplinger is the CTO of KnowledgeLake, a Microsoft Gold partner and provider of SharePoint-based Imaging and Workflow solutions. www.knowledgelake.com
About KnowledgeLake
KnowledgeLake, Inc., headquartered in St. Louis, Missouri, is the market
leader in developing document imaging and capture products Microsoft
SharePoint based Enterprise Content Management (ECM) solutions. Built
on the Microsoft .NET platform, KnowledgeLake enables organizations of
any size to leverage SharePoint for solving today’s complex document
problems. KnowledgeLake document imaging and capture products extend
the native capabilities of SharePoint with a powerful suite of
applications that allow users to capture large quantities of mission
critical documents and transform them into easily consumable business
information. The founders of KnowledgeLake have decades of experience
architecting, selling and implementing ECM products and solutions. To
find out more, visit the KnowledgeLake website at:
www.knowledgelake.com
The names of actual companies and products mentioned herein may be the trademarks of their respective owners. For more information: public.relations@knowledgelake.com 314-898-0500