Hi Eric
Maybe you are trying to solve the wrong problem
I speak as an expert in trying to solve the wrong problem - I do it all the time
Maybe the problem is not, "How do I extract tablulated numeric data from a highly formatted text file?"
Maybe the real problem is, "How do we get all utility companies to agree to providing bill data in a standard format that can be used by any business?"
They must have the data in a table - it is probably easier for them to send an extract of your data from the table, than to format it. Maybe you should ask?
Guy
------------------------------
Guy Boswell
Care for the Family
Newport
02920810800
------------------------------
Original Message:
Sent: Feb 15, 2023 11:00 PM
From: Eric Leible
Subject: AI Builder Best Practices for Document Processing & Multipage Tables
HI Everyone,
I'm trying to train my model to extract data from utility bill invoices. This collection of invoices are from the same utility company and could have anywhere from 2 pages to 12 pages. The first page and last page usually do not contain any data that needs extracted, but not always. The pages in between usually have either one table or two tables per page. These tables possess the data that needs extracted from each page. Pages with two tables usually contain bar charts and text in between the two tables. The tables on each of the pages have the exact same headers. I've selected the table or both tables on each page and tagged the data accordingly. I've used the "Table continues on next page" feature to ensure that all the data from each page ties to the same table tag. I've used 16 invoices and i'm still below 50% accuracy. I'm hoping someone can share some solutions/best practices to help increase my scores. If increasing my sample size is the answer, please let me know but i'm hoping there's other best practices that might help as well.
Thanks
------------------------------
Eric Leible
Specialist - Service Management Analytics
6365440979
------------------------------