This is a tall order, the policy statements for the company I work for are received in PDF format. I am looking for a way to extract specific blocks of data ( in the form of tables but they poorly structured ) from within a PDF and export them to a excel spreadsheet or sql. Are there any document management systems or tools able to do this on a large scale? Maybe a program that can scan a page and convert it to a spreadsheet?
Most of OCR software can be configured to "scan" from a picture or PDF (that's one way PDF documents are converted into editable text). It's another story what you would do with that text. You can start looking at VBA, and try to find out where your data is based on tokens within the scanned documents.