Skip to main content

Android on hold

Since the Android git repository is offline, I'm having to find something else to occupy my time.

I've always wanted to learn more about OpenCV, so I'm working on a new project using that.  At work we have piles and piles of paper forms that have been filled out by hand.  I've already done some trial and error work and determined that both OpenCV and PIL can clean up the scanned copies enough that I can do OCR on the printed portion of the forms using Tesseract OCR.  This doesn't get any of the dynamic data, but it does allow me to identify what type of form it is.

Running OCR on an entire document takes some time, so it would be better to grab smaller regions of interest and only OCR them.  Some of the interesting challenges are that some forms are portrait while others are landscape.  I'd also like to handle the case where a form was fed into the scanner upside down.

Comments

Popular posts from this blog

Python and libpuzzle

As much as I've dogged on Python in the past (significant whitespace, really?), I've got to admit that it's got some cool features too. For example, I'm playing with libpuzzle  (a library for visually comparing images).  It has a command line utility and a C and PHP API.  Unfortunately, the CLI utility doesn't allow one to dump the raw comparison vector, and it's a PITA to write C just to play with a library. Python's native "ctypes" to the rescue! from ctypes import * class PuzzleCvec(Structure): _fields_ = [("sizeof_vec", c_size_t), ("vec", c_char_p)] class PuzzleCompressedCvec(Structure): _fields_ = [("sizeof_compressed_vec", c_size_t), ("vec", c_char_p)] class PuzzleContext(Structure): _fields_ = [("puzzle_max_width", c_uint), ("puzzle_max_height", c_uint), ("puzzle_lambdas", c_uint), ...

Mass updating AWS Lambda Log Group retention

AWS Lambda and I have a love/hate relationship.  There is much about Lambda to like, but there are also some very sharp edges operationally. One of the cool things is that you get a new CloudWatch Log Group for every new Lambda function without any effort on your part.  Less cool is that it has unlimited retention.  If you haven't yet followed Yan Cui's advice , then you can use some Bash/CLI magic to fix retention on your existing Log Groups. First, get a list of all your default Lambda log groups:  aws logs describe-log-groups --log-group-name-prefix "/aws/lambda" | grep logGroupName | cut -d : -f 2 | cut -d \" -f 2 > /tmp/lambda_logs Read that into a Bash array:  readarray -t log_groups < /tmp/lambda_logs Then, add a 7 day retention policy to all those log groups:  for i in "${log_groups[@]}"; do aws logs put-retention-policy --log-group-name $i --retention-in-days 7; done It's a hack, but if you're going to put in th...

If it builds, ship it...

I've finally gotten a working build.  It turns out what I really needed was to get up @ 0630, take a shower, and just plug away. Anyhow, the source is up on github:   https://github.com/coredog64 I still need to get it to build an image for the Streak before I can actually test it.