Leaner large language models could enable efficient local use on phones and laptops
Researchers have introduced a technique for compressing a large language model's reams of data, which could increase privacy, save energy and lower costs. The new algorithm works by trimming redundancies and reducing the precision of an LLM's layers of information. This type of leaner LLM could be stored and accessed locally on a device like a phone or laptop and could provide performance nearly as accurate and nuanced as an uncompressed version.