Basics of using AI in businesses
Trust is built on four pillars
The four pillars Shapiro outlined in his report “The Automated Actuarial” are quality, resilience, integrity, and effectiveness. To this end, companies should consider the following points:
- Data quality
The less a company has data under control, the more important it is to have a clear assessment of where it came from, how it was created, and how good it is. This is particularly important for the many “new” types of data from the fields of telematics and IoT, but also for external data from open sources such as images or texts.
Companies should ask themselves: How resilient is my overall analytical process? Do I only have a one-time laboratory process? Or is my analysis designed for the long term? Here it is important to ensure governance and security along the entire analytical process chain – from the data to the automated decision-making.
It is crucial to ensure the integrity of the data analysis. For this, companies must be able to document and explain the processes and the choice of their methods. Do they fit the question? Are they mathematically appropriate?
Does analytics do what it should? Are your statements reliable? Is it non-discriminatory?
- Auditability required
These points can already be implemented very well with a consistent, high-performance analytics platform. Because even the question of data quality can turn out very differently – depending on whether it is asked in connection with analytics and machine learning or in classic reporting.
In relation to the overall process, there should be a special focus on the consistency of the data to the decision. Too many workarounds and tool breaks always lead to manual steps, shadow systems, and thus hardly controllable governance. Auditability plays an important role here: Can it be proven who made which decision for whom on the basis of which data, which model version, and which business rules? And could the data be used for this purpose? Automatic documentation, transparent comparison options for algorithms, and options for effective and agile teamwork (keyword DataOps) complement the skills required for the four pillars of trust.
- Algorithms also need transparency
Explainability and transparency, therefore, relate to the entire analytical process. But what about the “black box” of machine learning algorithms? There too, transparency must be guaranteed by an analytical platform.
The good news: algorithms aren’t quite that opaque. Even if there are no easily understandable sets of rules to be derived, it is still possible – regardless of the specific procedure – to investigate what decisive factors are in the algorithmic decision. The research field that deals with this explainability are called “Fairness, Accountability, and Transparency in Machine Learning”, or FAT ML for short.
- The ideal model: interpretable & fair
Learned models should not only be good, but also interpretable and fair. What does it mean exactly? The following questions have to be answered:
- Are there anomalies in the training data that the machine learning model has inevitably adopted? Whoever selects or controls the data has a significant influence on the relationships that can be learned.
- Are important relationships properly mapped in the data? Can everyone rely on the model they have learned? The model may work well when applied to the training data – but are the relationships learned general enough to be transferred to new data?
- Is the model non-discriminatory? For example, an insurance company is not allowed to use the characteristic “gender” or strongly correlated characteristics for rating. There can be more or less strong structural and cultural biases in data, which are then algorithmically learned and even reinforced.
It is important that all diagnostic procedures to explain the effectiveness of algorithms are independent of the specific machine learning algorithm (model-agnostic). Partial Dependence (PD) is a good example of how one can imagine such diagnostic possibilities.
- Influence of characteristics
For a medical diagnostic application, a model was trained that is supposed to determine the probability that a patient is sick with the flu on the basis of patient characteristics.
A PD diagram is used to show the functional relationship between the individual model inputs (e.g. fever, general condition, and rash) and the model’s predictions. A PD diagram also shows how the model’s predictions depend on the values of the input variables of interest – at the same time, the influence of all other characteristics is taken into account. If fever has a strongly positive influence on the likelihood of flu, the general condition of the patient a slightly negative one, while the rash is rather indifferent.
- The algorithm is not an excuse
Anyone who uses AI to automate processes and decisions has to deal with the ethical aspects described – for moral, regulatory, and practical reasons. After all, no company wants bad results to have a negative impact on its image. Explainability and transparency relate to the entire analytical process, not just to a machine learning algorithm that automates a decision.
But the infamous machine learning algorithms are not a black box that is locked forever and ever. A justification for the consequences of using AI can never be: “The algorithm made me do it”. Only trust and transparency remove barriers to the use of AI – to the benefit of consumers, legislators, and companies that use data analysis.