Right after I graduated from college I was; in retrospect, one of those ambitious and creative/innovative types of kids with a degree in IT. I truly believed in that renowned saying in “a good UNIX admin never does something more than twice.” When I was first hired at Hawaii Microfilm Service (HMS) I was informed by the person I was replacing as to what it was that I was there to do. To be honest, the job wasn’t really IT related as most would percieve it. The company was used by several companies; primarily banks, to convert paper and microfilm documents to a smaller and more convenient digital format. The microfilm was converted by using a big projection scanner that would roll the microfilm and take a digital picture that was saved on a swappable hard drive. Aside from sequential numbering of the images that would zero out at the beginning of every roll, the organization of the digital media was chaos. The slight improvement on that was how the paper documents were scanned in. I don’t recall the exact hardware or software that was used. But, I do remember it that it wasn’t standard consumer grade assets. At the time that I started, my primary duty was to run the paper document scanners and burn the product to DVD. Not exactly “Digital Product Manager/IT Manager” right?
I generally like to be work efficient and implement the quote when I can. With all the computers they had around for the different scanners they all had something that I was surprised they weren’t really using; a Network Interface Card. Tapping back into my High School days when I was taking the Cisco Network Academy STEM class and was the Student Group Lead in wiring up the campus. I ran some network cables to comply as best as I could to National Fire Protection Association (NFPA) National Electrical Code (NEC) standards at the time. Granted, it wasn’t the best as I was using consumer grade network switches that I got when I working at CompUSA. My manager said they didn’t need the switches or any changes because everything was working fine the way it was. After setting up the network and was able to access all the scanner computers and move all the files over the network, the fun began. The main beast was an old computer sitting on a shelf with dust on it and hadn’t be powered on in who knows how long.
The beautiful thing about GNU Linux is that you could do some pretty awesome modern things on some really old hardware. Using Dusty; the computer on the shelf, I installed CentOS GNU Linux and reallocated some of the hardware between the computers. I setup Dusty to be Samba (SMB) server with a bunch of hard drives and multiple DVD burners. The idea was to set it up to either receive or get the digital copies from all the scanners, organize them, create a DVD image, burn the DVD image, then archive the DVD image using 7-zip extreme compression.
This is where I learned why Dad was so adamant about teaching me discipline. I had to teach the other workers how to do things a little more differently. And most importantly, elaborate the benefits they would get from doing it that way. For the microfilm scanner operators, I had to direct them to start saving the output to the SMB server into folders uniquely named to refer to the what job it was for. The benefit? They no longer had to sneakernet the hard drives to me and I didn’t have to try to sort through all the drives for the different jobs and worry about the damage from handling. The only reason I trusted the microfilm scanners writing directly to SMB server was because the scanning software was nice enough to provide a checksum file to verify data integrity. Then comes the paper document scanners where I learned the concept of being able to automate one’s self out of a job.
There were three paper document scanners, but only one person doing the paper document scans. That didn’t seem really efficient at all. After getting more familiar with the scanning software; which I don’t remember the name, I learned that not all the functions were being acceptably used. The main one is the Object Character Recognition (OCR) abilities. The software had a way of designating specific scanning areas for certain types of information. You could also create forms a template of scanning areas on documents. And best yet, how to auto detect what forms to use based off of scanning areas. Why is that important? All formal documents follow a standard positional format from margins, titles, page numbers, and headers. The software that scanned the documents also created a proprietary executable and a disk image automatically when scanning is detected as being done. I wasn’t a fan of the executable. However, I did set it to create a disk image for the SMB server to copy over and mount. In the disk image there was a file to index all the OCR detected fields with the types of data and referral to which Tagged Image File Format (TIFF) image file has it. Using a little bit of PERL I was able to extract information to use in the new standardized naming and labeling convention and automate the data input for each scan job. Therefore all three scanners were running at the same time and putting in the information that I was manually inputting earlier.