Tailoring AI models with ‘black-box forgetting’
Artificial Intelligence (AI) has come along way in the past decade. The vastly improved capabilities of large-scale vision-language models like CLIP and ChatGPT have allowed them to perform reasonably well in a wide range of tasks, but this versatility sometimes comes at a cost.
Not only is training and operating such models extremely energy and time-intensive, but their generalist ‘jack-of-all-trades’ capabilities could potentially be counter-productive within many practical applications, reducing the accuracy of responses. In an attempt to overcome this challenge, researchers from the Tokyo University of Science (TUS) have been working on a new method to enhance the efficiency of AI models, while also improving privacy.
Dubbed ‘black-box forgetting’, the methodology is designed to iteratively optimise the text prompts present to a black-box vision-language classifier model to have it selectively ‘forget’ some of the classes it can recognise.
“In practical applications, the classification of all kinds of object classes is rarely required,” said TUS Associate Professor, Go Irie. “For example, in an autonomous driving system, it would be sufficient to recognise limited classes of objects, such as cars, pedestrians and traffic signs. We would not need to recognise food, furniture or animal species. Retaining the classes that do not need to be recognised may decrease overall classification accuracy, as well as cause operational disadvantages such as the waste of computational resources and the risk of information leakage.”
Although some methods for selective forgetting in pre-trained models do exist, these assume a white-box setting, where the user has access to the internal parameters and architecture of the model. More often than not, users deal with black-boxes; they do not have access to the model itself or most of its information.
The researchers therefore developed a derivative-free optimisation strategy which does not require access to the model. The researchers extended a method known as CMA-ES, using CLIP AI as the target model. The algorithm works by sampling various candidate prompts to feed the model and evaluate the results via predefined objective functions, updating a multivariate distribution based on the calculated values.
The algorithm wasn’t perfect, though, with the performance of the optimisation techniques deteriorating quickly for large-scale problems. To solve this, the researchers developed a new parametrisation technique called ‘latent context sharing’ which involved decomposing latent content derived from prompts into smaller elements which were considered to be ‘unique’ to a prompt token or ‘shared’ between multiple tokens. Using smaller units reduced the dimensionality of the issue at hand, making it more manageable for the algorithm.
Irie and his team validated their approach using several benchmark image classification datasets, trying to get CLIP to ‘forget’ 40% of the classes in a given dataset. According to the team, this marks the first study in which the goal is to have a pre-trained vision-language model fail to recognise specific classes under black-box conditions. Based on ‘reasonable performance baselines’ the results were promising, the researchers claimed.
So, what implications could this have on AI and machine learning? According to the researchers, it could help large-scale models perform better at specialised tasks and therefore increase their applicability. Alternatively, the method could help to prevent image generation models from producing undesirable content, or help tackle privacy issues.
“If a service provider is asked to remove certain information from a model, this can be accomplished by retraining the model from scratch by removing problematic samples from the training data,” said Irie. “However, retraining a large-scale model consumes enormous amounts of energy. Selective forgetting, or so-called machine unlearning, may provide an efficient solution to this problem.”