The A.I. Thread of OMG We're Being Replaced

Gimp can also be modded/skinned to look a lot like photoshop

As posted below, this was just fake and/or uninformed internet outrage. It took me 3 minutes of reading the new terms to understand everything in their clarification (they did not change the terms, just clarified it for low-attention users).

Of all companies leveraging AI heavily, Adobe might actually be among the most ethical (which granted, it’s not much, but it’s something).

Hallucinations are inevitable.

https://arxiv.org/pdf/2401.11817

I don’t completely understand this paper, so maybe someone else can chime in, but at first blush, it looks like they are starting with proven traits of computability. For any formal system (computer) there is a true statement in that system that cannot be proved (output). An LLM is a formal system, therefore there are “true” statements which it cannot say.

Which is technically correct, but doesn’t seem to be terribly relevant to the field of AI, since it applies to all formal systems, which would include any computer or human attempt to apply a rigorous, formal approach to truth.

They also include an empircal study in which various LLMs are tasked with enumerating possible token sets, and they all fail, but it seems like this failure is unrelated to their proof.

If anyone wants to dig deeper and correct or clarify my impression, that would be welcome.

Man computer scientists have it easy now that they are completely free of the shackles of science, posting working papers without any peer review and getting attention based on the abstract rather than the quality of the contents.

There are a bunch of archives that scientists post to. For example, my papers are always put out first on https://www.biorxiv.org/ before or at the same time as we submit them for review in journals. Due to the speed of the review process, some papers are on the archive for 12-18 months before they are published, despite the fact that there are (at best) minor revisions. Peer review takes FOREVER, and longer now than it used to.

Actually, how does peer review work? I know the idea behind it but what’s the actual business of it? Other than a paper that’s groundbreaking, do scientists have a set amount of time each period to devote to running peer review experiments? Or if your finding is just too boring, it never gets reviewed?

It does say on the first page that it’s a preprint that is under review. Either way an interesting read.

Peer review is mediated through the journals. You submit a paper, and the editor tries to select 3 (or 4, in my discipline they now frequently get 3 scientists and a statistician, which is a good thing) people to review your paper that should have reasonable prior knowledge and experience to read your manuscript and provide feedback. This is done anonymously. The reviewers provide feedback to the authors. Then one of three things happens: 1) Enough of the reviews dislike the work with significant enough problems that the work is rejected. 2) Reviewers ask for clarification, additional experiments, etc, but conditional on those, the paper is accepted (this is the most “normal” outcome) 3) The paper is accepted (I’ve never seen this happen on a first go. It’s like the reviewers feel the need to be incredibly picky, or need to ask for an impossible experiment) Sometimes, only 2 out of 3 reviewers will think the work can go forward, and the editor will make a judgement call.

This is supposed to happen as professional courtesy. You are called on to review, and you are reviewed. Many people have (paradoxically?) had less time to review as managing a lab during the pandemic was quite hard, and doing review work on top of that when everyone was pushing hard to get something done was sometimes asking a lot. So sometimes it could be weeks or months before a paper was actually received by reviewers, and even longer (up to 6 months) before you got feedback.

Whatever field you’re in is frequently small enough that one can often guess who the reviewer is by the sorts of questions they ask, especially if they are working on a very similar problem. Why didn’t you do X (the approach they are using) is frequently asked, and you need to carefully justify why you have not used someone else’s unpublished method. Indeed, reviewers can try to tank your paper so you don’t get to publish before they do.

That’s kinda the basics as far as I understand it. I’m not (and never will be) the principle investigator, so I may have an imperfect understanding compared to some of my peers - but I’ve been through the receiving end of the process a ton of times.

I was just making a general comment about how in machine learning speed has become the most important thing. Noone bothers to prove (mathematically / statistically) anything about their approaches, it’s just ‘we thought of this, then applied it to this dataset, and the results look pretty good’. It’s the academic equivalent of throwing a million things at a wall (as the field is huge now and everyone wants to be the person that writes the next big ML paper) to see what handful stick.

I am not a computer scientist, I am in economics / statistics, but looking at their paper I have the impression is that they are showing something formally that is obvious (i.e. that no LLM will ever perform perfectly) and not of practical import. It isn’t that a LLM can hallucinate that’s important, as hallucination is defined now in this literature, but that it occurs at a frequency that is practically meaningful to users. Anyway, happy to be proven wrong but just my impression.

Thanks for posting the article, I don’t mean anything I write here to suggest it shouldn’t be shared.

So, there are differences between the different areas within computer science + things may have changed since I was active, but with those caveats - conferences are/were the thing in Computer Science. Journals were interesting if you had some larger results that you wanted to elaborate more formally, but usually not first priority. The reason being that the field moves so fast, that if you don’t also publish fast, you might not get to publish at all.

So, at least back then, lead time on reviews tended to be short - typically ~4 months. Papers tended to go from one conference, get reviewed, and be resubmitted to the next. If you were fast, you could probably do up to 3 in a year, and if you couldn’t publish it in one of the 3-4 big conferences, that is probably a sign that your results probably aren’t that good to begin with.

Acceptance was also not the norm. Any major conference would probably receive 4-5 times more paper proposals than it was able to accept, so even papers that would be considered “the best” in one conference might have been rejected in a prior conference.

Honestly, it’s the same in industry. I think part of the reason is that this raw empiricism is the easiest thing to measure. Nobody really cares why it works as long as numbers go up. Of course, sometimes that can mean things can go very wrong in ways that might have been forseen if they’d understood what was going on behind the outputs.

Which I’m astonished has lasted this long! Academic careers are really hard work, and nobody wants to spend their limited time reviewing papers. There’s no incentive.

Peer review in its formal, ‘journal mediated’ form is often put on a bit of a pedestal. In practice it’s unrewarded work, often gamed by all sides, and rarely a helpful bar to pass. Most papers out there are full of wrong central claims, and that includes the peer reviewed ones.

The peer review that actually happens and is useful is when people try to build on other people’s ideas. If they work, people cite the original authors and build on that work. If it doesn’t, it gets forgotten. That’s the peer review that actually happens and is valuable, but it’s slow, informal, and requires immersion in the literature and culture of the field to know what has passed this ‘peer review’ and what hasn’t.

I think they are just proving that LLMs are not universal computers, which I don’'t think is an interesting result. Tying it to “hallucinations” seems to be hype. In particular, they don’t seem to consider that an LLM coupled with a uniersal computers could be a universal computer.

It kind of reminded me of Minsky “proving” that neural nets were useless because Perceptrons could only handle linearly differentiable functions. Of course, this paper won’t have the impact Minaky’s did

Having read it again, it seems like that they are starting with the assumption that LLMs are black box functions:

We put a single assumption on LLMs: an LLM is a function h which, given any finite-length input string s = w0:q−1, outputs a string of tokens h(s) that completes s within a finite time.

Definition 2 (Large Language Model). Let S be a computable set^a of all the finite-length
strings of alphabet A and (s0, s1, . . .) be an one-to-one enumeration of all the elements in S. A
large language model, denoted h, is a function that completes the input string s ∈ S using the
function’s predicted tokens h(s), in a finite time. Function h is attained procedurally using a set
of training samples of input-completion pairs.

I.e., an LLM is any function that completes a token sequence. There doesn’t seem to be any criteria that an LLM even contains a neural network of any kind. It sounds a lot like a finite Turing machine.

It appears they are proving that any computable algorithm will “hallucinate”. But the definition of “hallucinate” depends on some ill-defined “ground truth function”:

Hallucination is in essence an erroneous output produced by an LLM. Without getting tangled in the
difficult problem of formalising “correctness” in our real world, we define hallucination in a formal
world where all we care about is a computable ground truth function f on S. In other words, in our
formal world, f is the ideal function that produces correct completion f (s) for any input string (or
prompt, question, query, etc.)

For example, f could be a function that answers “true” for factual statements and “false” for
non-factual ones.

This is where I’m getting stuck: it appears they are defining “truth” as whatever set of responses contains at least one negation from each possible LLM. E.g., if GPT-4 says “All elephants are mammals” then we define truth as “It is not true that all elephants are mammals.” (Section 4, Table 1)

But that would be ridiculous, so I’m misunderstanding something.

And that might be Altman’s real skill — beguiling people that don’t really build things with his instinctual ability to tell them exactly what they want to hear, to get in exactly the room he needs to be in at exactly the right time to connect exactly the right people, all without ever having to actually do anything or create anything.

And the dagger

OpenAI’s growth is stalling, with Alex Kantrowitz reporting that user growth has effectively come to a halt based on a recent release claiming that ChatGPT had 100 million users a couple of weeks ago, the exact same number that the company claimed to have in November 2023. ChatGPT is a damned expensive product to operate, with the company burning through capital at a prodigious rate, and while OpenAI is aggressively monetizing it (both to consumers and to businesses), it’s far from crossing the break-even rubicon.

These people are afraid of OpenAI potentially creating a computer that can think for itself at a time they should be far more concerned about the career-manipulator and con artist that’s running the company. Sam Altman is dangerous to artificial intelligence not because he’s building artificial general intelligence — a kind of AI that meets or surpasses human cognitive capabilities, and potentially mimics human sentience, a bit like Data from Star Trek: The Next Generation — but because Sam Altman’s focus is on what Sam Altman can build to grow the power and influence of Sam Altman.

A fucking men.

Interesting - pretty early in the hype-cycle for these kind of articles to be dropping…

Potentially tracks with some of what I have seen locally with companies throwing money at AI projects and infrastructure before they even have defined use cases.

Also, Broadcom announced Q1 numbers, smashing expectations - ~40% YoY revenue growth (~12% excluding VMware contributions).

Relevant to this thread as they play in AI with their own chips, plus one tidbut on the investor call was for every 1B spent on AI GPU’s, there is a 25%-35% pullthrough on AI and DC networking for Broadcom.

Thay also announced a 10-for-1 share split.

Wall Street said, “Yes, we’ll have some of that.”

image

I mean, that’s what we call a bubble.

I mean, it is like the dotcom bubble.

There is actual useful technology behind all of this, we just have a lot of deep pocketed corporations, investors, all trying to outspend one another in a race to obtain a piece of the AI powered future, spending way more money than what this tech is currently worth, hoping that buying for the promising future is a good bet, rather than what these models are currently capable of, which is pretty narrow.

Where, in the dotcom boom, you had companies and investors buy anything “web” and spend way too much money for stuff that wasn’t fully capable of doing what was promised at the time, nor had the userbase to sustain that sort of spending etc.

It’ll probably crash the market in a year or so. (Me being cynical)

Ooh nice, I have a chunk of Broadcom in my portfolio. Off to check Etrade!