Big data analytics in audit

2 October, 2014 author: Jacques de Swart

The application of big data in auditing is a hot topic (see, for example, Accountant 2013, “Large audit firms are investing heavily in big data”). In this article, I want to give a vision on what big data means in this context and how to position it in auditing.

The definition of big data is pretty hard to give. Earlier this year, one of my master students, Bas Jansen, made an inventory of over 20 publicly available definitions of big data. Some highlights were: “If your personal laptop can handle the data on an Excel spreadsheet, it is not big.” (Siraj Dato, 2014), “If you know what questions to ask of your transactional cash register data, which fits nicely into a relational database, you probably don’t have a big data problem. If you’re storing this same data and also an array of weather, social and other data to try to find trends that might impact sales, you probably do.” (Matt Asay, 2013), “Big data refers to things you do on a large scale that are not possible on a small scale. (…) Big data is about correlation, not causation. It’s the what, not the how.” (Mayer-Schönberger & Cukier, 2013). IBM uses four Vs to characterize big data: Volume, Velocity, Variety and Veracity, where the latter refers to the uncertainty of data. My own preferred, very short definition that I used in an article in the Accountant, reads: “Explorative analysis of literally large data sets from heterogeneous sources”.

Positioning big data analytics in auditing is even harder, because the variety of powerful applications of big data analytics in auditing is huge and unstructured. The International Standards on Auditing (ISAs) are not of much help here, since the ISAs are strolling behind big data developments at a considerable distance. ISA 520 on Analytical Procedures even states that an auditor should limit himself to confirmative analysis. The variety in big data definitions may be large, but the explorative nature is an element that most definitions have in common. The final attainment levels for the auditing curriculum as set by the Commissie Eindtermen Accountantsopleidingen (CEA) do not give much guidance in this context either. The CEA has positioned data analysis under Mathematics & Statistics as one of the auxiliary specialties in Section 3.4.6.4. The Chi-squared test is the most technical term in this section. The good news for auditors who are afraid of statistics is that this test should only be understood at Level 1, the most superficial of the three levels.

My colleagues Barbara Majoor and Jan Wille and myself have recently introduced the Push-left principle to structure the use of big data analytics in auditing (MAB, 2013). In short, we state that the biggest challenge that auditors face is that they should replace some existing Evidence Gathering Activities (EGAs) by data analytical techniques instead of adding these techniques as nice-to-haves. Adding means that no budgets are reserved for data analytics, that the auditor is not forced to derive audit comfort from data analytics, and, consequently, that auditing will not exploit the big data opportunity. However, given the current state of ISAs we understand that auditors are afraid to skip EGAs that make their audits ISA compliant in favor of EGAs that are – to put it mildly – not encouraged by ISA.

To face this challenge, we state that an auditor should strive for three objectives when applying big data. First, big data analytics should control audit risk in a quantitative manner. In May 2013, four weeks before Hans Blokdijk died, the Limperg Institute organized a symposium to honor him as “(pro)motor of statistical auditing”. In his final speech he claimed: “It is inevitable that auditors will be forced to quantify their audit risk by means of statistics.” If data to be audited can be reconciled against a reference data source that is electronically available, all misstatements can be reported. This means that – potentially after corrections – audit risk, which is the risk that the auditor misses material misstatements, is zero. In case that such electronic reference sources are not available, statistical models may be used to quantify audit risk in a statistically sound manner.

Secondly, big data analytics should help to improve the audit process. More and more in- and external data can be made available in electronic form. Not considering the option to utilize this data in audits cannot guarantee the efficiency of audits to be optimal. Necessary condition for this claim to hold is that whenever data analytics are utilized, these should replace a control activity that would have been applied without data analytics. This also holds for the situation that the auditee already uses data analytics as part of his internal control framework. Within an ISA based audit approach, it should be first investigated whether comfort can be gained from reviewing these data analytics.

Finally, big data analytics should create insights to improve the auditee’s business. Next to the primary goal of any audit approach – giving assurance on financial statements, using data analytics to create insights for the auditee to improve his own business is the secondary goal. In practice it often happens that auditor’s data analytics are transferred to the auditee to become part of the auditee’s internal framework.

If auditors would force themselves to use big data analytics that meet all these three objectives in every audit, I am sure that the audit profession will give itself a boost in relevance for society and “emerge as a new kind of professional, the data scientist, who combines the skills of software programmer, statistician and storyteller/artist to extract the nuggets of gold hidden under mountains of data.” (The Economist, 2011).