Astra: Fraud Detection Software

This post is going to be short, since I legally cannot get into the details.

Some fuel companies negotiate deals with trucking companies to give them cash back based on how much fuel they buy from that fuel company. This is done by issuing unique cards with which fuel is purchased.

Lately, criminal syndicates have been cloning those cards to game the system and get cash back for money they didn’t spend.

Sifting through the transaction history to search for suspicious activity was slow, fraud often only being identified a month later, giving criminals a massive headstart to evade investigators.

I was hired to write software to quickly identify suspicious transactions, enabling investigators to nip fraud sprees in the bud, saving [fuel-company] significantly more money.

Astra reads masterlists from [fuel-company], listing every transaction made in the region with cashback cards. Astra would then create a database of all the transactions associated with each card, with each card being associated to a registration number, perform fraud analysis and tally up all the money involved.

The user could select any number of cards or vehicle registration numbers, and see all the transactions considered suspicious, involved in that card/reg-no. The user could see every suspicious event tied to each transaction done on the selected cards, along with the money-amounts involved.

The user can export the findings and amounts for viewing in Excel, if needed.

The more interesting parts about Astra are that it turned out to be better at detecting subtle cases of fraud than human investigators, and that it needed to be rather robust.

The masterlists were Excel files, and they were quite often formatted differently than the original spec required, or just haphazardly.

Things like some transactions having American date formats, others having British, and others still not being actual “date” amounts (Excel stores dates as a distance in seconds from a certain point in history, a lot like UNIX time) but rather just plain text that needed to be parsed into a datestring, with varying delimiter characters, and then having its regional formatting inferred. Entire columns of information would appear and disappear from month to month, some columns would be renamed or just rearranged to be later/earlier in the Excel file than before. Numeric values would use commas to delimit decimal place, then switch to using periods, then back to commas. Significant portions of data would be censored without warning, some months.

Making Astra suitably robust against the ever changing format of the masterlists was, by far, harder than detecting the actual fraud.

On the more entertaining side, during a demo presentation to [fuel-company], Astra spontaneously pointed out a bug in [fuel-company]’s existing data-processing pipeline. It’s since been fixed.