sia.hackernoon.com

Did you hear of the self-driving Uber car that hit and killed a woman in Arizona? On another occasion, a facial recognition solution profiled an innocent man of color as a criminal in New Jersey, and Amazon’s AI-powered recruitment tool displayed bias against female candidates.

Clearly, artificial intelligence makes mistakes. Significant, even life-altering mistakes. So, how can we still get the benefits of AI while eliminating these types of errors? One option is letting human experts train, evaluate, and monitor AI business solutions after deployment. This concept is called human-in-the-loop (HITL) machine learning. Gartner predicts that in some industries, HITL AI solutions will comprise around 30% of all automation offerings by 2025.

We talked to our AI expert, Maksym Bochok, to understand how humans fit in the loop, which benefits they bring, and how to organize this process.

Human in the loop definition and benefits

To err is human, to really foul things up takes a computer.

- Paul Ehlrich, a German physician and a Nobel Prize winner

Now Ehlrich’s quote is more relevant than ever before. With AI handling critical applications, the margin for error is getting slimmer. And machines are not perfect. They build their understanding of the task based on the received training data, and can make erroneous assumptions.

And this takes us to the human-in-the-loop machine learning terminology.

Human in the loop means integrating human employees into the machine learning pipeline so that they can continuously train and validate models. This includes all people who work with models and their training data.

How human-in-the-loop adds value to your machine learning algorithms

Maintains a high level of precision. This is particularly important for domains that can’t tolerate errors. For example, when manufacturing critical equipment for an aircraft, we want automation and speed, but we can’t jeopardize safety. HITL is beneficial in less critical applications as well. For example, large consultancy companies that rely heavily on AI for document regulatory compliance involve human in the loop machine learning to validate their natural language processing algorithms.
Eliminates bias. Machine learning models can become biased during training. Moreover, they can acquire bias after deployment, as they continue to learn. Human employees can detect and eliminate this phenomenon at early stages by correcting the algorithm accordingly.
Ensures transparency. ML algorithms evaluate thousands or even millions of parameters to make a final decision, and it often can’t be explained. With HITL, there is a human who understands how algorithms work and can justify the decisions they make. This is called explainable AI. For instance, when a person applies for a loan and is denied, they might ask a loan officer to explain the reasoning behind the rejection and what the applicant can do to increase their chances next time.
Opens employment opportunities. We often hear about AI stealing people’s jobs. Machine learning with a human in the loop provides an example of how the technology can create new vacancies. Just look at the Indian data annotators market.

The role of humans in the AI pipeline

Maksym explains how humans can be a part of the AI pipeline to enhance its ability to make predictions. Machine learning models operate under either supervised or unsupervised learning modes. In case of supervised learning, people can perform the following tasks:

Labeling and annotation. A human employee labels the training dataset. Depending on the required expertise, this can be a domain expert or any employee with proper training.
Re-engineering the model. If needed, ML engineers and programmers can make adjustments to the algorithm to make sure it can get the best out of the provided dataset.
Training and re-training. Employees feed the model with the annotated data, view the output, make corrections, add more data if possible, and re-train the model.
Monitoring the model’s performance after deployment. The human in the loop machine learning lifecycle doesn’t stop after deploying the AI solution on the client’s premises. ML engineers continue to monitor its performance with the client’s consent and make adjustments to the model when required through selective verification of its output. The cases obtained through selective verification will augment the initial training dataset to improve the algorithm’s performance.

In unsupervised machine learning, algorithms take unlabeled data as input and find structure on their own. In this case, humans do not annotate the dataset and don’t interfere much in the initial training. But they can significantly enrich the model by performing step 4 above.

When human in the loop machine learning is an absolute necessity

Maksym believes that the human in the loop approach is beneficial for most machine learning use cases. AI solutions are impressive at making optimal predictions when trained on large extensive datasets, while humans can recognize patterns from a limited supply of low-quality data samples. Combining both capabilities together can create a powerful system. Even though in some applications ML models can do well with limited human intervention, there are cases where a full-blown human in the loop system is a must:

When any mistake by the algorithm can be very costly, such as in medical diagnosis.
When the data you need to properly train the algorithm is scarce. More training data always equates to better model performance. With the help of post-production model monitoring, you can augment the training data with relevant samples, giving the model more examples to learn from.
In the case of one-shot learning when an algorithm is trained on hundreds or even thousands of samples to classify some objects. And then another class is added, and the algorithm has to learn to identify it from only a few training samples.
In heavily regulated industries where it is essential to explain how the algorithms reached its conclusions. For example, when doctors use AI to suggest personalized cancer treatments, they need to justify this treatment plan to the patient.

When looking at the type of data that ML algorithms process, HITL AI would be essential for computer vision applications and natural language processing (NLP), especially when it comes to sentiment analysis of a text that might contain sarcasm. HITL is less important for tabular data and time series analysis.

Tips on enhancing artificial intelligence with human in the loop practices

Maksym offers the following tips on how to successfully implement the human in the loop approach in machine learning:

When monitoring and analyzing an algorithm’s performance after deployment, no matter how good the human in the loop system is, human participants will not be able to pay attention to every input the algorithm processes and every output it generates. Choose your cases wisely. Use selective verification to pick the cases that are worthy of your attention. Maksym suggests these approaches to smart case selection:
- Based on confidence levels. For example, an algorithm needs to classify every input image either as a cat or a dog. The images that receive a confidence level of around 48/52 or anything similar are the ones that confuse the algorithms and need to be properly labeled and used to re-train the model.
- Random verification of “trivial” cases. Let’s assume that only one out of ten cases holds valuable information when it comes to an algorithm’s performance. An example of such a case is when the model is overconfident about a wrong prediction. You should definitely consider this case, but you also need to randomly select one out of the remaining nine cases to make sure the algorithm doesn’t grow overconfident with its wrong predictions or allow bias.
When analyzing the cases you picked in the previous step, don’t limit yourself to the final result. Instead of looking at the output of the final set of neurons in neural networks, check the previous layer, like in the image below, and analyze the distribution of distances between a wrong prediction and the closest correct predictions the algorithm makes.
Encourage the algorithm’s end users to give feedback on its performance. Construct feedback forms and make them available to everyone, so that users can convey any concerns they may have.
Keep augmenting the training dataset iteratively using data points from the previous steps. This way, you will be sure that your algorithm remains relevant even when some changes take place at the client’s operations.

Off-the-shelf HITL-enabled AI tools

There are some ready-made human in the loop machine learning tools that allow you to label training datasets and verify the outcome. However, you might not be able to implement the tips above with these standardized tools. Here are a few human in the loop tool examples:

Google Cloud HITL

This solution offers a workflow and a user interface (UI) that people can utilize to label, review, and edit the data extracted from documents. The client company can either use their own employees as labelers or can hire Google HITL workforce to accomplish the task.

The tool has certain UI features to streamline labelers’ workflow and filter the output based on the confidence threshold. It also allows companies to manage their labelers' pool.

Amazon Augmented AI (Amazon A2I)

This human in the loop artificial intelligence tool allows people to review low-confidence and random ML predictions. Unlike Google Cloud HITL, which only operates on text, Amazon A2I can complement Amazon Recognition to extract images and validate results. It can also help review tabular data.

If a client is not happy with the supplied A2I workflow, they can develop their own approach with SageMaker or a similar tool.

DataRobot Humble AI

Humble AI permits people to specify a set of rules that ML models have to apply while making predictions. Every rule includes a condition and a corresponding action. Currently, there are three actions:

No operation, when humans just monitor the corresponding condition without interfering
Overriding prediction, when people can replace the model’s output with a different value
Returning error, simply discarding the prediction altogether

So, is machine learning with a human in the loop the best approach for you?

Employing the human-in-the-loop AI approach improves accuracy, transparency, and quality of predictions. It also increases costs and time needed to complete the task due to human intervention while creating employment opportunities, which is a positive side effect.

Despite the obvious benefits of HITL AI, there are applications where human-out-of-the-loop is a preferred approach because of the risks associated with certain activities. Think of autonomous weapon development and deployment.

If you feel like your ML algorithms can use a human in the loop, but you are not sure how to balance operational costs and the desired accuracy and explainability, reach out to machine learning consultants. They will work with you to find the right fit. If human-in-the-loop machine learning is not the optimal solution in your case, there are other ML tricks that can help you overcome the problem of training data scarcity:

Transfer learning, when you fine-tune pre-trained models with your own data
Semi-supervised learning, when you use a large unlabeled dataset together with a small number of labeled samples
Self-supervised learning, when you mask a random part of the training sample in each batch and the algorithm tries to predict it

Are you considering improving your ML model’s accuracy and explainability? Get in touch! ITRex AI experts will study your situation and devise an optimal human in the loop approach to address your needs.