• 1 Post
  • 18 Comments
Joined 1 year ago
cake
Cake day: June 12th, 2023

help-circle







  • I’d have to see that in action before I pass judgement but given LLMs predilection for hallucination and the vagaries of how humans report tech faults I would be surprised if it was significantly more accurate or effective than a human. After all if its working out if there’s a known issue then essentially its not much beyond a script at that point and in that case do you want to trade the unpredictability of what an LLM might recommend vs something (human or otherwise) that will follow the script?

    Even if an LLM were an effective level 0 helpdesk it would still need to overcome the user’s cultural expectation (in many places) that they can pick up the phone and speak to somebody about their problem. Having done that job a long long time ago, diagnosing tech problems for people who don’t understand tech can be a fairly complex process. You have to work through their lack of understanding, lack of technical language. You sometimes have to pick up on cues in their hesitations, frustrated tone of voice etc.

    I’m sure an LLM could synthesis that experience 80% of the time, but depending on the tech you’re dealing with you could be missing some pretty major stuff in the 20%, especially if an LLM gives bad instructions, or closes without raising it etc. So you then need to pay someone to monitor the LLM and watch what its doing - at which point you’ve hired your level 1 tech again anyway.







  • I just want to jump in here as the whole thing about the tonnes of factual errors stuff…

    A lot of the allegations about the accuracy of their data basically came down to arguments about the validity of statistics garnered from testing methodology; and how Labs guy claimed their methods were super good, vs other content creators claiming their methods were better.

    My opinion is that all of these benchmarking content creators who base their content on rigorous “testing” are full of their own hot air.

    None of them are doing sampling and testing in volume enough to be able to point to any given number and say that it is the metric for a given model of hardware. So the value reduces to this particular device performed better or worse than these other devices at this point in time doing a comparable test on our specific hardware, with our specific software installation, using the electricity supply we have at the ambient temperatures we tested at.

    Its marginally useful for a product buying general comparison - in my opinion to only a limited degree; because they just aren’t testing in enough volume to get past the lottery of tolerances this gear is released under. Anyone claiming that its the performance number to expect is just full of it. Benchmarking presents like it has scientific objectivity but there are way too many variables between any given test run that none of these folks isolate before putting their videos up.

    Should LTT have been better at not putting up numbers they could have known were wrong? Sure! Should they have corrected sooner & clearer when they knew they were wrong? Absolutely! Does anybody have a perfect testing methodology that produces reliable metrics - ahhhh, im not so sure. Was it a really bitchy beat up at the time from someone with an axe to grind? In my opinion, hell yes.