HOME / COMMUNITY Switch to knowledge base

Machine Learning for Receipts

Automation is a key part of what we do, finding new ways to reduce the time spent on repetitive tasks allows you to spend more time running your business. Recently we have been looking at ways to streamline the receipt tagging process and today we will start beta testing a new receipt analysis tool.

The Receipt Analyser uses Optical Character Recognition (OCR) coupled with Machine Learning (ML) algorithims to detect patterns in receipts and extract key data such as supplier name, date and total amount.

How does it work?

Once enabled the Receipt Analyser adds a small wand icon in the Receipt Hub preview screen.

image

When you tap this icon the receipt will be submitted for review and in a few seconds any matches will be returned. Currently the Receipt Analyser is optimised to work on thermal till receipts and looks for the following data points.

  • Supplier name
  • Date of purchase
  • Total amount

You can show the raw matches by clicking on the link in the top right of the analyser output box. A supplier name match will occur when the supplier name returned exactly matches the name of the supplier already saved on your account. If the text does not precisely match, the analyser will switch to “learning mode” and will record the supplier that you proceed to manually match the receipt to. That way when similar text is extracted in future the analyser will know which supplier to automatically match to.

image

Receipt Analyser limitations

The Receipt Analyser is based on OCR and Machine Learning and therefore accurate extraction is based on a number of variables.

1. Document type

The Machine Learning module is trained to work with thermal till receipts. Invoices and multi-page documents will see varying degrees of success. The analyser can extract text from images (png, jpg) and PDF files.

2. Image quality

High resolution, high contrast receipts with fewer distortions will make it easier for the OCR process to accurately extract the text from the receipt image. Reduction of background noise and cropping as close as possible to the actual receipt will deliver optimum results.

3. Supplier name matching

The supplier name will look for exact matches, if an exact match is not identified the analyser will record the supplier name to which the user manually matches the receipt and will apply that same rule automatically in future.

Beta programme requirements

Initially we are looking for a relatively small number of participants to test the Receipt Analyser.

If you are routinely processing receipts in QuickFile you can request access to the programme in Help >> Additional Services >> Beta Features and then enter the code " ML0001". This particular feature will require a power user subscription to activate.

6 Likes

Hi @Glenn, I applied to join the Beta programme for this new feature of “Machine Learning for Receipts” which you actioned.

However I have come to realise that I will be unable to contribute much feedback on the subject. The main reason being receipts are hardly ever raised using the “Receipt Hub” method as the usual method to raise receipts is via “Bank Tagging Rules” after uploading transactions, this is the preferred method to process receipts as there is less chance of input error; and so the “Receipt Hub” is only used for the purposes of attaching a receipt document to an already issued receipt.

The only occasion that the “Receipt Hub” is used for processing receipts are payments made by cash and as Receipt Analyser is optimised for working on thermal till receipts these are not commonly used for cash sales.

I have had a go at testing the facility on some PDQ thermal till receipts that did not require receipts raising just to see how the module works, but these were in Euros as based in France at present.

PDQ thermal receipts issued in France do not use a decimal point but a decimal comma, but this is generally the decimal seperator used throughtout Europe. I have found the analyser hardly ever identified any data, not even the date. The date not being identified maybe due to the format here is slightly different “LE 14/01/20 A 13:16” and the amount appearing as “13,50 EUR”.

Not very useful for my purposes, but hopefully this tool will find a use for someone.

Thanks for your feedback Alan

Yes I guess in that case it will be quite limited, the ML training was optimised for decimal point based separation between pounds and pence, so I think any other formatting will present a problem.

Regarding using the Receipt Hub just for matching to existing records, we plan eventually to start pre-processing receipts which may allow us to instantly suggest matching items based on the amount.

It’s very much in the early stages for now, but in time it should improve.

1 Like

Hi @Glenn happy to help test - I’ve a couple of months worth of uploads to go through and ‘match’ (circa 40 receipts) this week, so this is great timing! :wink:

Appreciate that it’s still early days/subject to change etc. but could you possibly expand the blog post a little on the detail of how you are technically implementing the OCR function?

Specifically I’m thinking of how the data processing and storing functions are currently working/planned to work in the future.

I’m assuming for example you’ve not hand coded your own OCR tool but (more sensibly!) are leveraging a 3rd party function and/or an API :thinking:Google’s [Tesseract] perhaps?

  • Is the initial processing being done locally on/within QuickFiles infrastructure? (E.g. if an existing match is found, is it processed internally)

  • What (if anything) is passed out to the API/3rd Party? (E.g. assuming there isn’t an existing supplier entry for ‘someexampleservice’, does this then poll an external source for similar matches?)

  • Learning mode ‘auto matches’ - do you see this OCR vs Supplier matching list being per QuickFile account or a single QuickFile community list? (E.g. if I assign ‘The Bessemer Hotel’ to ‘a.n.othersupplier’ what effect would that have on my future matches and other users whom also use that supplier)

To be clear I’m genuinely interested in the possibilities of additional automation - especially if the handful of widgets that get purchased on a regular basis could be auto matched with confidence to the correct supplier/invoice etc…saves me cringing at the thought of all those I’ve got to do this week. :grimacing:

I’m sure you’ve already got it in hand but more clarity around visibility of sensitive purchases and GDPR’s offsite data processing elements is probably wise too before a wider release, just so users whom possibly have strict(er) data controls can review before activation.

Personally I’ve no issue with Google/Amazon etc. having a clearer picture of the shampoos and clipper blades that we buy throughout the year…especially so as they deliver most of them! :grin: but that may not be true for all QuickFiles users.

Right off to hit the ‘hub’!

John.

Right now we’re using Azure Form Recognizer which accepts the file as an input and returns a bunch of structured data. We then decide which parts are useful in terms of receipt processing. If we get a good match on the supplier name and the amount, the remaining fields can often be bound from supplier defaults.

We’ve coupled the above with some extra pattern recognition, so that if an exact supplier match is not possible, we instead record the blob of text suggested by Form Recognizer as the supplier name and link it to the supplier ID manually selected in the Receipt Hub. In essense it doesn’t need to be exact, just consistent and then the matching will do it’s thing. In theory the matching efficiency should improve over time.

It’s not a silver bullet and there’s huge divergence in receipt / invoice layouts to limit the ability of current ML technology to work flawlessly. It should however prove to be useful, particularly if you tend to process high volumes of receipts from a small pool of suppliers.

2 Likes

Had a minor little problem with a Poundstretcher receipt - the receipt was able to be read but Poundstretcher has a sequence of lines:

  1. Total to Pay
  2. Cash tendered
  3. Change
    The machine learning picked up the change instead of the Total to Pay. Something to watch out for!

Poundstretcher Example.pdf (129.2 KB)

Hi Glenn,
I just want to give a little feedback about this feature. First of all I really like it. The software has still problems to catch everything correctly but I am pretty sure you are still in the testing process. Just 1 little thing because it sounds a bit funny. Today I renewed my power subscription and uploaded the receipt to quickfile (I saved the receipt on my account as an pdf on my computer an uploaded it to quickfile via email). I thought I have to test your software with your invoice :upside_down_face:. Your scanner did catch the invoice pretty well, the only thing which was wrong was the amount. For the amount I paid your software took your VAT reg. Number. I thought it is maybe interesting for you to know about it. It was not a big thing, I changed the amount and fine.
Oh I forgot. I took a screen shot if you are interested but I did not want to upload it here because to much private data but I can private message you, if you want the screen shot.

2 Likes

Invoice type layouts are more of a challenge as the ML was trained primarily for receipt formats. That said I’ve seen reasonable accuracy when it comes to testing invoices.

It’s interesting that it picked up the VAT number, we’ll do a few tests to determine why that is.

1 Like