Tied to last week’s post in regards to alternatives to InputAccel/Captiva, this week’s post will discuss some of the top tips in regards to scanning in Documentum. As mentioned in the previous article, these points related to scanning in general and not just Input Accel/Captiva. For our Alfresco and SharePoint readers, all of the below tips are just as relevant. We are happy to have Millennia, our outsource scanning partner, co-write this article with TSG.
- Garbage in = Garbage Out – Like a copier, scanners can only “clean-up” so much. Bad paper, carbon copies, faxes all can result in lousy images. Prep time before scanning as well as pushing for clean paper will result in a cleaner scanning process as well as cleaner images.
- PDF is better than TIFF – Documentum users probably already understand this but sometimes users are distracted with the TIFF versus PDF decision. PDF is a more capable format for most clients. TIFF can present some issues in regards to what viewer is configured to process from the browser. TIFF does work better for OCR engines but converting from TIFF to PDF after OCR is fairly common.
- Image Size – Size of images can dramatically affect image system performance. Pushing 600 dpi images over the network can result in sluggish overall performance. In general:
- 300 dpi (dots per inch) for scanning – best for OCR
- 200 dpi for viewing/storage
- Black and white is much smaller than color
- Despeckle is great to clean up images
- Grey is the worst – users should understand that the biggest image is not all black or all white given compression techniques. Biggest image is one white dot, one black dot, one white dot…
- Running a scanner hours at a time is a difficult job – Right up there with copier attendee, running a scanner can be a boring and difficult job. Clients should either:
- Consider outsourcing. TSG typically partners with Millennia.
- Get Creative – this isn’t the type of job a knowledge worker would want to do every day. Think of interns, students and other part-time workers to keep the cost down without the scanning resource suffering from the burnout factor.
- Barcodes versus OCR – In setting up scanning solutions for clients, we routinely get asked if the software can read a character field somewhere on the document. While OCR is great, users should understand that OCR can be difficult based on quality of the ink/paper and a host of other factors. That said, if possible, we recommend barcodes – 100% recognition – check digits – all the components for an accurate read. Use them on pre-printed forms that will be scanned, cover sheets and separator pages whenever possible. We typically just use some simple bar codes rather than trying to embed too much (ex: 3D barcode).
- Barcode Stickers – For a previous client, the elegant way to recognize Invoices and Expense reports before scanning was not a separator page but pre-printed barcode stickers that were affixed onto the paper. The scanning solution could easily separate the documents in the batch and use the barcode to link back to the Accounts Payable or other system without having to index in Captiva or whatever scanning solution at all. Stickers gave us a paper control as well.
- Desktop Scanning – As mentioned in the previous post, scanners don’t have to be big and huge. For small volumes and distributed paper sources, desktop scanners can produce simplified solutions.
- Drop Off Scanning – Sometimes the person who knows the most about how to index the document isn’t the person running the scanner. Rather than having the scanning solution have to find the right person to index, one elegant solution for a client involved having the knowledge worker index the document in Documentum with no content, having the system print out a barcoded cover sheet, and having the knowledge worker drop off the packet for scanning. In this manner, the scan operator just had to scan the document and the application would store the content in Documentum with the attributes already attached. In most cases the system would leave the cover page on the document (good summary of what is contained) for both the physical paper as well as the image.
- Capture Native PDF – why have someone print out a document, mail to our scanning solution only to have it scanned back into PDF? For TSG, we always email PDF invoices and never mail. Not quite sure if our clients are printing for payment but the thought would be allowing email or internet capture of documents. The internet provides the ability to have the other party index some of the attributes (Invoice Amount, PO… for Accounts Payable Application).
- Two $20,000 scanners are better than one $40,000 scanner – Given the throughput (80,000 pages a day), it is better spend to have two scanners (back-up scanner) than spend more for a higher end scanner. Millennia is currently recommending the Kodak i620 scanner for about $21,000. (100 PPM). TSG likes a variety of scanners in the $7,000 range for less throughput.
- Never rely only on structured folder names and structured file names as your method of storing the scanned documents. Always apply meta data to the image files and preferably meta data from a source list. For example, all invoices should be matched to the source accounting data such as PO number and Vendor ID.
If you have any other thoughts, please comment below or contact us for additional detail.