A new version (v11) of Able2Extract‘s PDF conversion software is now available. Last year in February we reviewed version 10 and showed how it helps with our focus area of converting PDF’s into USABLE Excel documents. Although our focus is Excel, we have also looked a bit at working with PDF’s and with pdf to Word as well as pdf to Excel conversions.
With the version 11 review we have focused a bit more on the new features, as the old features seem to be as effective as ever (see a summary at the end of the article).
Some highlighted features (in order of importance to Excel users) are:
- OCR improvements- scanned PDFs are now searchable
When you receive statements and the like in a scanned PDF format it is useful to be able to search the PDF to find the page/ pages that interest you.
But if it was scanned, it will be stored as an image instead of identifiable text. By using the enhanced OCR you can search the ‘image’ for key text e.g. product numbers. As shown below, we can specify the OCR options and then just above that you will see the ‘Search Text’ option.
This will give you more control over what you need to import from PDF to Excel.
- PDF annotations – give you the ability to leave comments on your PDFs and collaborate with your team
As noted above this has been improved. We particularly like and use the ability to add a watermark to a PDF. This could be a note that the PDF is a sample, or perhaps provide a link to where is it used in the Excel process.
- PDF redaction – allows you to black out and erase sensitive information
Similar to the annotations, it is sometimes useful to send a PDF as an example but it might contain data that you don’t want the viewer to see.
As shown below, you can easily ‘delete’ data in a PDF that you need to send to other users.
- Improved PDF editing – users can now insert images and vector shapes inside their PDFs
Another feature, which is useful with Excel use, is when you have done your conversion (PDF to Excel) to get the numbers out and perhaps have done some additional work (created a chart and done some analysis). You may want to insert the results of your analysis (e.g. the chart) back into the PDF to make it easier to understand. This is easy with version 11.
The new version has added a number of new features that will help Excel users.
If you regularly need to get PDF’s into Excel you should rather invest in Able2Extract. The time savings and improved accuracy will pay for itself on the first conversion!
Below a summary of what Able2Extract can do (and has always been able to do!)
Details from Version 10- these still work in version 11 and in most cases have been advanced
Below a summary from the version 10 review. We have summarised the review below for ease of reading but if you want to see the full version go to the version 10 (Feb 2016) review.
Custom Conversions from PDF to Excel
Given the number of ways PDF’s are created, you will often have to customise the conversion process to take into account things like wrapped text and merged cells. The steps are the same but instead of choosing automatic conversion, you must choose the CUSTOM conversion.
Fixing Column Issues in PDF’s (merged cells, character alignment)
A common issue with pdf to Excel is how the conversion treats merged cells and characters that don’t appear in nice neat columns. Note below that the green lines are showing where Able2Extract wants to split the columns (this is what will happen if you choose the default conversion).
The first red arrow shows the result of merged cells. This heading is going over multiple columns, but as a result of the information below being in their own columns, the heading is going to be cut, and you will have part of the heading in one cell and another part of the heading in another cell (which is not fun to correct).
Another issue is the alignment. The second arrow shows what looks like a single column, but the text headings are left aligned and the numbers are right aligned, almost, but not quiet, creating separate columns. As can be seen with the green lines, this is going to split it.
No problem for Able2Extract. We can erase the column lines selected by choosing the ‘Erase Column Line’ option.
By clicking on the relevant lines we can remove them from the conversion process. So as shown below, by looking at the document and adjusting the green lines for issues with merged cells and character alignment, the preview shows all the information in the correct columns.
So less data clean up for you!
Fixing row issues on PDF conversion (wrap text type issues)
What about PDF’s where it looks like wrap text has been used in cells and you want to maintain the text in a single cell. Below, this questionnaire has blocks of data that we want to include in a single cell.
However, as noted below, a conversion by default will treat each new line as a new row.
There are a number of options to get around this.
Firstly, you can note that if you click on row settings (1 in the image below), you get some extra options.
A very useful option is (2) which allows you to use any horizontal lines in the document to specify the rows that should be together in a cell. This is very useful if the PDF was created in word tables where you can see the horizontal lines. It also may guide you in how to create PDF’s from Excel. Perhaps you can ask the person creating the pdf to include the grid lines so that it can help the conversion process.
In this case we don’t have lines, so we are going to use (3) manual row editing.
Much like when we changed the columns, we have chosen to Erase Row Lines. As shown below, if we group the data, Excel will have the imported data from the pdf in a single cell which is easier to work with.
What about scanned PDF’s
Thankfully, most PDF’s are generated straight from a computer system, so it is perfectly straight and easier to get into Excel. But what about scanned documents. Using the inbuilt OCR (Optical Character Recognition), Able2Extract can convert even these pdf’s.
Below a scan with lots of colours and other markings. Also note that the page has been scanned slightly skew.
This is where OCR comes into play. In this case we need to help the software out a bit. By clicking the Area button, I have specified exactly where the table is.
When you run the Custom convert, we can specify which columns we want and in this case I have asked it to identify the rows by using the horizontal rows in the document. As shown below it has done a good job of using the document’s horizontal rows.
The end result is something like this. Given all the extra colours and lines drawn onto the document we think this is very impressive.
One way we found to make the conversion even better is to use some imaging software and to ‘straighten’ the scan so that it is straighter. This way it is easier to line up the columns and rows and the conversion is better.