A paper written in early 1998, at a time when Yahoo! and AltaVista dominated the search engine market, proposed using machine learning techniques to “propagate” rankings of website importance from well-ranked pages through to the pages they linked to in a recursive manner.1 This approach represented a significant improvement on the search technology used by the incumbents at the time, which was largely based on independent analysis of page content.
To prove the thesis, though, the authors had the problem of writing such a search engine. As they described in the paper, “To test the utility of PageRank for search, we built a web search engine called Google”. Thus, was born one of the most valuable enterprises in the world today – from the idea of applying a better algorithm to a problem of information retrieval.
Bringing invoice handling into the 21st Century
What has this got to do with the world of purchase orders, invoices, and payments?
Currently, even the most advanced firms’ systems are a world away from the automated elegance of Google. Invoices are input into a simple ERP system and paid out after extensive manual review and according to rigid classification rules. The scope for human error increases the likelihood of duplicate payments, and the slow manual processes ultimately leads to late payments for suppliers and additional expenses for buyers.
It doesn’t have to be this way. Imagine that instead of processing invoices by timestamp, they were processed by an automatically-calculated ‘PayRank’ – a ranking that took into account the likelihood that each invoice needed a manual review. Invoices below a certain rank would be paid automatically and instantly. Invoices above a certain rank might need more attention, as the system may have flagged them due to a large monetary value or as representing a likely duplicate payment. Manual oversight of such an automated process would instill confidence and ensure transparency whilst maximizing efficiency gains.
Mimicking the mind
Such a system sounds useful in theory, but can a machine really detect duplicate payments, beyond simple pattern-matching on amounts? What about fraudulent invoices?
Within the last few years, computers have been trained to learn very robust models of language via a technique know as deep neural networks, which resemble the pattern for information processing in the human brain. This technique has been used to very accurately automatically recognise and classify text, for example through Google’s Tesseract OCR. In addition to all of the now common tasks they perform, such systems could be trained to recognize and make decisions based on data in scanned invoices. They would be capable of accurately checking millions of invoices before a human worker has finished making their morning cup of coffee.
But, of course, invoices are not spreadsheets with all the data neatly arranged in fully standardized rows and columns. So, any system which is going to analyse invoices accurately for potential problems needs to be able to understand what is called unstructured text – information which isn’t ordered in a predefined way.
Machine learning has the answer here too. For example, word2vec models, trained by neural networks on massive amounts of text, have been able to correctly identify all sorts of concepts. These models look for words which are frequently used together and create a network of interrelated associated words or ‘clusters’. Such systems have been able to identify everything from the names of capitals of major cities all the way down to the names of suppliers of hydraulic actuators from business directories. In these models, every word is mapped to a machine-readable format (a vector of numbers) and therefore can be clustered together in a virtual space of ‘word meaning’.
Let’s take a simple example like the word “piston””2.