According to Meta, Galactica can “summarize academic papers, solve math problems, generate Wiki articles, write scientific code, annotate molecules and proteins, and more.” But quickly after its launch, it was fairly simple for outsiders to immediate the mannequin to offer “scientific research” on the advantages of homophobia, anti-Semitism, suicide, consuming glass, being white, or being a person. Meanwhile, papers on AIDS or racism have been blocked. Charming!
As my colleague Will Douglas Heaven writes in his story concerning the debacle: “Meta’s misstep—and its hubris—show once again that Big Tech has a blind spot about the severe limitations of large language models.”
Not solely was Galactica’s launch untimely, but it surely reveals how inadequate AI researchers’ efforts to make giant language fashions safer have been.
Meta might need been assured that Galactica outperformed opponents in producing scientific-sounding content material. But its personal testing of the mannequin for bias and truthfulness ought to have deterred the corporate from releasing it into the wild.
One widespread manner researchers purpose to make giant language fashions much less prone to spit out poisonous content material is to filter out sure key phrases. But it’s arduous to create a filter that may seize all of the nuanced methods people could be disagreeable. The firm would have saved itself a world of hassle if it had performed extra adversarial testing of Galactica, through which the researchers would have tried to get it to regurgitate as many various biased outcomes as potential.
Meta’s researchers measured the mannequin for biases and truthfulness, and whereas it carried out barely higher than opponents corresponding to GPT-3 and Meta’s personal OPT mannequin, it did present a number of biased or incorrect solutions. And there are additionally a number of different limitations. The mannequin is skilled on scientific sources which might be open entry, however many scientific papers and textbooks are restricted behind paywalls. This inevitably leads Galactica to make use of extra sketchy secondary sources.
Galactica additionally appears to be an instance of one thing we don’t actually need AI to do. It doesn’t appear as if it will even obtain Meta’s acknowledged purpose of serving to scientists work extra rapidly. In truth, it will require them to place in a number of additional effort to confirm whether or not the knowledge from the mannequin was correct or not.
It’s actually disappointing (but completely unsurprising) to see massive AI labs, which ought to know higher, hype up such flawed applied sciences. We know that language fashions tend to reproduce prejudice and assert falsehoods as information. We know they will “hallucinate” or make up content material, corresponding to wiki articles concerning the historical past of bears in area. But the debacle was helpful for one factor, not less than. It reminded us that the one factor giant language fashions “know” for sure is how phrases and sentences are fashioned. Everything else is guesswork.