Some PDFs with diacritic characters not showing in Receipt Hub

I am importing PDFs into the Receipt Hub via a Dropbox integration. A small number of PDFs display error messages, such as the one as below:

Here’s the offending PDF ‘MINT S.C.’ in the RH view:

Curiously, the ‘SOHOSTEL’ one is showing properly.

Both files open properly on my Mac with Preview.

Also, when I download the offending PDFs from the Imported folder on Dropbox and export them as PDFs from Preview (however weird that may sound) and remove from the file name the characters prefixed by QuickFile, they work fine.

The PDFs have been OCR’d and are searchable, so maybe the problem is that they contain some offending weird characters inside of them.

My first suspicion is the comma in the file name. Although we will try to replicate this and provide a fix.

EDIT:

BAGINSKI_+6-5-2015_2015-05-08.pdf - FAILS
BAGIŃSKI_+6-5-2015_2015-05-08.pdf - WORKS

Oddly it doesn’t like the encoding on the accented N. The QF link doesn’t accent the N and therefore the file does not load. When I access the file directly from our storage provide the N is accented.

You have other files there with accented Ó which translate fine, not sure why the Ń is treated differently?

EDIT 2:

Unless we change our column type to nvarchar we won’t be able to support files with diacritic characters in the name. Personally I don’t think it’s wise to do that but instead to replace the diacritic characters before uploading. This is something we’ll look into.

In the mean time if you can I would try to avoid using diacritic characters in your file names.

1 Like

Thank you for troubleshooting the problem so quickly.

Meanwhile, wouldn’t it be a good idea to:

  1. Post this information on the appropriate help page.
  2. Implement a filename check on upload so that files with names containing potentially offending characters are rejected.

We’re looking to get a fix in place by today.

These issues can take time to fully test and resolve, particularly given that files are imported into QF from multiple sources, not just the upload box.

Ó is within the range of ISO-8859-1 (the “basic Latin” Unicode block) and Windows Cp1252, Ń isn’t so you would need proper Unicode support for that.

1 Like

We’ve just implemented a fix to parse out any diacritic characters in file names. This will work when uploading files in the following places.

  • Receipt Hub
  • Document Manager
  • Directly attached to purchase invoices
  • Directly attached to sales invoices
  • Emailed to receipts@quickfile.co.uk

We’ve also put in a fix for Dropbox but this needs to be deployed separately.

1 Like

Thank you for immediate response. As I use Dropbox, when can I expect the fix to be in deployed?

With any luck, mid-afternoon.