Very much smart people

RmDebArc_5@piefed.zip · 3 days ago

Very much smart people

breg@discuss.tchncs.de · 3 days ago

Scrubbles@poptalk.scrubbles.tech · 3 days ago

The majority of “AI Experts” online that I’ve seen are business majors.

Then a ton of junior/mid software engineers who have use the OpenAI API.

Finally are the very very few technical people who have interacted with models directly, maybe even trained some models. Coded directly against them. And even then I don’t think many of them truly understand what’s going on in there.

Hell, I’ve been training models and using ML directly for a decade and I barely know what’s going on in there. Don’t worry I get the image, just calling out how frighteningly few actually understand it, yet so many swear they know AI super well

waigl@lemmy.world · edit-2 3 days ago

And even then I don’t think many of them truly understand what’s going on in there.

That’s just the thing about neural networks: Nobody actually understands what’s going on there. We’ve put an abstraction layer over how we do things that we know we will never be able to pierce.

notabot@piefed.social · 3 days ago

I’d argue we know exactly what’s going on in there, we just don’t necessarily, know for any particular model why it’s going on in there.

GreenMartian@lemmy.dbzer0.com · 3 days ago

But, more importantly, who is going on in there?

Klear@quokk.au · 3 days ago

And how is it going in there?

GreenMartian@lemmy.dbzer0.com · 3 days ago

Not bad. How’s it going with you?

jqubed@lemmy.world · 3 days ago

That’s what we’re trying to find out! We’re trying to find out who killed him, and where, and with what! Tim Curry in Clue shouting the above text

TropicalDingdong@lemmy.world · 3 days ago

Gigasser@lemmy.world · 3 days ago

The real question is where it’s going on?

Nightwatch Admin@feddit.nl · 3 days ago

Excellent opportunity for a “that’s what she said” joke.

limelight79@lemmy.world · edit-2 3 days ago

I have a masters degree in statistics. This comment reminded me of a fellow statistics grad student that could not explain what a p-value was. I have no idea how he qualified for a graduate level statistics program without knowing what a p-value was, but he was there. I’m not saying I’m God’s gift to statistics, but a p-value is a pretty basic concept in statistics.

Next semester, he was gone. Transferred to another school and changed to major in Artificial Intelligence.

I wonder how he’s doing…

Fushuan [he/him]@lemmy.blahaj.zone · 2 days ago

I have a bachelor’s and master’s in computer science, specialised in data manipulation and ML.

The problem with AI is that you don’t really need to understand the math behind it to work with it, even with training. Who cares how the distribution of the net affects results and information retention? who cares how stochastic gradient descent really works? You get a network crafted by professionals that gets X input parameters, which modify the network’s capacity in a way that’s given to you, explained, and you just press play in the script that trains stuff.

It’s the fact that you only need to care about input data quality and quantity and some input parameters that freaking anyone can work with AI.

All the thinking on the NN is given to you, all the tools to work with training the NN are given to you.

I even worked with darknet and Yolo and did my due diligence to learn Yolov4, how it condensed info and all that, but I really didn’t need to for the given use case. Most of the work was labelling private data and cleaning it thoroughly. Then, playing with some Params to see how the final results worked, how the model over fitted…

That’s the issue with people building AI models, their work is more technical that that of “prompt engineers” (😫), but not much.

Poik@pawb.social · 2 days ago

When you’re working at the algorithm level, you get funny looks… Even if it gets to state of the art results, who cares because you can throw more electricity and data at it instead.

I worked specifically on low data algorithms, so my work was particularly frowned upon by modern ai scientists.

I’m not doxxing myself, but unpublished work of mine got published in parallel as Prototypical Networks in 2017. And everyone laughed (<- exaggeration) at me researching RBFs which were considered defunct. (I still think they’re an untapped optimization.)

sp3ctr4l@lemmy.dbzer0.com · edit-2 3 days ago

Ding ding ding.

It all became basically magic, blind trial and error roughly ten years ago, with AlexNet.

After AlexNet, everything became increasingly more and more black box and opaque to even the actual PhD level people crafting and testing these things.

Since then, it has basically been ‘throw all existing information of any kind at the model’ to train it better, and then a bunch of basically slapdash optimization attempts which work for largely ‘i dont know’ reasons.

Meanwhile, we could be pouring even 1% of the money going toward LLMs snd convolutional network derived models… into other paradigms, such as maybe trying to actually emulate real brains and real neuronal networks… but nope, everyone is piling into basically one approach.

Thats not to say research on other paradigms is nonexistent, but it is barely existant in comparison.

SkyeStarfall@lemmy.blahaj.zone · edit-2 3 days ago

Il’ll give you the point regarding LLMs… but conventional neural networks? Nah. They’ve been used for a reason, and generally been very successful where other methods have failed. And there very much are investments into stuff with real brains or analog brain-like structures… it’s just that it’s far more difficult, especially as have very little idea on how real brains work.

A big issue regarding digitally emulating real brain structures is that it’s very computationally expensive. Real brains work using chemistry, after all. Not something that’s easy to simulate. Though there is research in this are, but that research is mostly to understand brains more, not for any practical purpose, from what I know. But also, this won’t solve the black box problem.

Neural networks are great at what they do, being a sort of universal statistics optimization process (to a degree, no free lunch etc.). They solved problems that failed to be solved before, that now are considered mundane. Like, would anyone really think it would be possible to have your phone be able to detect what it was you took a picture of 15 years ago? That was considered to be practically impossible. Take this xkcd from a decade ago, for example https://xkcd.com/1425/

In addition, there are avenues that are being explored such as “Explainable AI” and so on. The field is more varied and interesting than most people realize. And, yes, genuinely useful. And not every neural network is a massive large scale one, many are small-scale and specialized.

sp3ctr4l@lemmy.dbzer0.com · 3 days ago

I take your critiques in stride, yes, you are more correct than I am, I was a bit sloppy.

Corrections appreciated =D

SkyeStarfall@lemmy.blahaj.zone · 3 days ago

Hopefully I don’t appear as too much of a know-it-all 😭 I often end up rambling too much lmao

It’s just always fun to talk about one’s field ^^ or stuff adjacent to it

sp3ctr4l@lemmy.dbzer0.com · edit-2 3 days ago

Oh no no no, being an actual subject matter expert or at least having more precise and detailed knowledge and or explanations is always welcome imo.

You’re talking to an(other?) autist who loves data dumping walls of text about things they actually know something about, lol.

Really, I appreciate constructive critiques or corrections.

How else would one learn things?

Keep oneself in check?

Today you have helped me verify that at least some amount of metacognition is still working inside of this particular blob of wetware, hahaja!

EDIT:

One motto I actually do try to live by, from the Matrix:

Temet Nosce.

Know Thyself.

… and a large part of that is knowing ‘that I know nothing’.

Aceticon@lemmy.dbzer0.com · 2 days ago

Way back in the 90s when Neural Networks were at their very beginning and starting to be used in things like postal code recognition for automated mail sorting, it was already the case that the experts did not know why it worked, including why certain topologies worked better than others at certain things, and we’re talking about networks with less than a thousand neurons.

No wonder that “add shit and see what happens” is still the way the area “advances”.

catch22@programming.dev · 3 days ago

Feature Visualization How neural networks build up their understanding of images

https://distill.pub/2017/feature-visualization/

mrmacduggan@lemmy.ml · 3 days ago

This method is definitely a great way to achieve some degree of explainability for images, but it is based on the assumption that nearby pixels will have correllated meanings. When AI is making connections between far-away features, or worse, in a feature space that cannot be readily visualized like images can, it can be very hard to decouple the nonlinear outputs into singular linear features. While AI explainability has come a long way in the last few years, the decision-making processes of AI are so different from human thought that even when it can “show its work” by showing which neurons contributed to the final result, it doesn’t necessarily make any intuitive sense to us.

For example, an image-identification AI might identify subtle lens blur data to determine the brand of camera that took a photograph, and then use that data to make an educated guess about which country the image was taken in. It’s a valid path of reasoning. But it would take a lot of effort for a human analyst to notice that the AI is using this process to slightly improve its chances of getting the image identification correct, and there are millions of such derived features that combine in unexpected ways, some logical and some irrationally overfitting to the training data.

ZMoney@lemmy.world · 2 days ago

deleted by creator

skisnow@lemmy.ca · 3 days ago

I’ve given up attending AI conferences, events and meetups in my city for this exact reason. Show up for a talk called something like “Advances in AI” or “Inside AI” by a supposed guru from an AI company, get a 3 hour PowerPoint telling you to stop making PowerPoints by hand and start using ChatGPT to do it, concluding with a sales pitch for their 2-day course on how to get rich creating Kindle ebooks en masse

Scrubbles@poptalk.scrubbles.tech · 3 days ago

Even the dev oriented ones are painfully like this too. Why would you make your own when you subscribe to ours instead? Just sign away all of your data and call this API which will probably change in a month, you’ll be so happy!

expr@programming.dev · 3 days ago

Yeah, I’ve trained a number of models (as part of actual CS research, before all of this LLM bullshit), and while I certainly understand the concepts behind training neural networks, I couldn’t tell you the first thing about what a model I trained is doing. That’s the whole thing about the black box approach.

Also why it’s so absurd when “AI” gurus claim they “fixed” an issue in their model that resulted in output they didn’t want.

No, no you didn’t.

Scrubbles@poptalk.scrubbles.tech · 3 days ago

Love this because I completely agree. “We fixed it and it no longer does the bad thing”. Uh no, incorrect, unless you literally went through your entire dataset and stripped out every single occurrence of the thing and retrained it, then no there is no way that you 100% “fixed” it

ragas@lemmy.ml · edit-2 3 days ago

I mean I don’t know for sure but I think they often just code program logic in to filter for some requests that they do not want.

My evidence for that is that I can trigger some “I cannot help you with that” responses by asking completely normal things that just use the wrong word.

Scrubbles@poptalk.scrubbles.tech · 3 days ago

It’s not 100%, and you’re more or less just asking the LLM to behave, and filtering the response through another non-perfect model after that which is trying to decide if it’s malicious or not. It’s not standard coding in that it’s a boolean returned - it’s a probability that what the user asked is appropriate according to another model. If the probability is over a threshold then it rejects.

ragas@lemmy.ml · 3 days ago

I once trained an AI in Matlab to spell my name.

I alternate between feeling so dumb because that is all that my model could do and feeling so smart because I actually understand the basics of what is happening with AI.

Amberskin@europe.pub · 3 days ago

I made a cat detector using Octave. Just ‘detected’ cats in small monochrome bitmaps, but hey, I felt like Neo for a while!

NιƙƙιDιɱҽʂ@lemmy.world · 3 days ago

I made a neural net from scratch with my own neural net library that could identify cats from dogs 60% of the time. Better than a coin flip, baybeee!

monotremata@lemmy.ca · edit-2 3 days ago

I made a neural net from scratch with my own neural net library and trained it on generating the next move in a game of Go, based on thousands of games from an online Go forum.

It never even got close to learning the rules.

In retrospect, “thousands of games” was nowhere near enough training data for such a complex task, and if we had had enough training data, we never could have processed all of it, since all we were using was a ca. 2004 laptop machine with no GPU. So we just really overreached with that project. But still, it was a really pathetic showing.

Edit: I switched from “I” to “we” here because I was working with a classmate, but we did use my code. She did a lot of the heavy lifting in getting the games parsed into a form where the network could train on it, though.

JandroDelSol@lemmy.world · 3 days ago

business majors are the worst i swear to god

ѕєχυαℓ ρσℓутσρє@lemmy.sdf.org · 3 days ago

They are literally what’s causing the fall of our society.

Dogiedog64@lemmy.world · 3 days ago

Objectively, per Ed Zitron.

Scrubbles@poptalk.scrubbles.tech · 3 days ago

Didn’t you know? Being adept at business immediately makes you an expert in many science and engineering fields!

kboy101222@sh.itjust.works · 3 days ago

adept

I think you’re giving them a little too much credit there

ragas@lemmy.ml · 3 days ago

My wife is a business major.

I always tell her that the enemy is in my bed.

(I have no clue why she does not think that this is funny. ;))

GreenShimada@lemmy.world · 3 days ago

I have personally told coworkers that if they train a custom GPT, they should put “AI expert” on their resume as it’s more than 99% of people have done - and 99% of those people didn’t do anything more than tricked ChatGPT into doing something naughty once a year ago and now consider themselves “prompt engineers.”

Scrubbles@poptalk.scrubbles.tech · 3 days ago

Absolutely agree there

FauxLiving@lemmy.world · 3 days ago

Hell, I’ve been training models and using ML directly for a decade and I barely know what’s going on in there.

Outside of low dimensional toy models, I don’t think we’re capable of understanding what’s happening. Even in academia, work on the ability to reliably understand trained networks is still in its infancy.

sobchak@programming.dev · 2 days ago

I remember studying “Probably Approximately Correct” learning and such, and it was a pretty cool way of building axioms, theorems, and proofs to bound and reason about ML models. To my knowledge, there isn’t really anything like it for large networks; maybe someday.

Aceticon@lemmy.dbzer0.com · 2 days ago

Which is funny considering that Neural Networks have been a thing since the 90s.

Poik@pawb.social · 2 days ago

… 1957

Perceptrons. The math dates back to the 40s, but '57 marks the first artificial neural network.

Also 35 years is infancy in science, or at least teenage, as we see from deep learning’s growing pains right now. Visualizations of neural network responses and reverse engineering neural networks to understand how they tick predate 2010 at least. Deep Dream was actually built off an idea of network inversion visualizations, and that’s ten years old now.

Treczoks@lemmy.world · 3 days ago

NONE of them knows what’s going on inside.

We are right back in the age of alchemy, where people talking latin and greek threw more or less things together to see what happens, all the while claiming to trying to make gold to keep the cash flowing.

stinky@redlemmy.com · 3 days ago

The image feels like “Those who know 😀 Those who don’t know 😬”

TropicalDingdong@lemmy.world · 3 days ago

And the number of us who build these models from scratch, from the ground up, even fewer.

blazeknave@lemmy.world · 3 days ago

I’ve been selling it even longer than that and I refuse to use the word expert.

make -j8@lemmy.world · 3 days ago

Hot take : Adding “Prompt expert” to a resume is like adding “professional Googler”

Echo Dot@feddit.uk · 2 days ago

There used to be some skill involved in getting search engines to give you the right results, these days not so much but originally you did have to inject the right kind of search terms and a lot of people couldn’t work that out.

Many years ago back before Google became so dominant I had a co-worker who could not get her head around the idea that you didn’t in fact have to ask a search engine in the form of a question with a question mark on the end. It used to be somewhat of a skill.

hansolo@lemmy.today · 2 days ago

This is actually very true. I did always object to knowing that Boolean operators work in Google coming to be called “Dorking.” I amassed a sizeable MP3 collection in the early oughts thanks to searching “.mp3” and finding people’s public folders filled with their CD rips. Just out there, freely hanging the internet wind.

These days SEO optimization has rendered Google itself borderline useless, and IIRC they removed some operators from use at some point. I have to use DDG, Brave and Leta searching Google if I want to find anything that’s not just a URL for an obvious thing. And half the time none of that works anyway and I can’t even find things I’ve found previously.

slowcakes@programming.dev · 2 days ago

You actually do “file:mp3”, this is how found most of my course literature without ending up in a bunch off spam sites.

hansolo@lemmy.today · 2 days ago

I used Filetype:mp3 but I’ve noticed it really doesn’t really work anymore.

BluesF@lemmy.world · 2 days ago

SEO optimization

Man they should really incorporate optimization into the initialism

hansolo@lemmy.today · 2 days ago

They optimized how optimized it was ;)

VeryFrugal@sh.itjust.works · edit-2 2 days ago

I’d trust the latter any day.

pyre@lemmy.world · 2 days ago

the latter just means IT expert

brucethemoose@lemmy.world · 3 days ago

It was the same with crypto TBH. It was a neat niche research interest until pyramid schemers with euphemisms for titles got involved.

UnderpantsWeevil@lemmy.world · 3 days ago

With crypto, it was largely MLM scammers who started pumping it (futily, for the most part) until Ross Ulrich and the Silk Road leveraged it for black market sales.

Then Bitcoin, specifically, took off as a means of subverting bank regulations on financial transactions. This encouraged more big-ticket speculators to enter the market, leading to the JP Morgan sponsorship of Etherium (NFTs were a big part of this scam).

There’s a whole historical pedigree to each major crypto offering. Solana, for instance, is tied up in Howard Lutnick’s play at crypto through Cantor Fitzgerald.

brucethemoose@lemmy.world · edit-2 3 days ago

Interesting.

I guess AI isn’t so dissimilar, with major ‘sects’ having major billionaire/corporate backers, sometimes aiming for specific niches.

Anthropic was rather infamously funded by FTX. Deepseek came from a quant trading (and to my memory, crypto mining) firm, and there’s loose evidence the Chinese govt is ‘helping’ all its firms with data (or that they’re sharing it with each other under the table, somehow). Many say Zuckerberg open-sourced llama to ‘poison the well’ over OpenAI going closed.

FauxLiving@lemmy.world · 3 days ago

Silk Road and other black market vendors existed well before the scams started. You could mail order drugs online when bitcoin was under $1, the first bubble pushed the price to $30 before crashing to sub-$1 again. THEN the scams and market manipulation took off.

Later people forked the project to create new chains in order to run rug pulls and other modern crypto scams.

UnderpantsWeevil@lemmy.world · 3 days ago

Silk Road and other black market vendors existed well before the scams started

Silk Road was launched in 2011, the same year of the first big Mt. Gox crypto heist (now largely recognized as an inside job).

Crypto scams are as old as Bitcoin itself.

EightBitBlood@lemmy.world · 2 days ago

Except no, because Bitcoin started in 2009. What OP said above is 100% accurate. Others that were interested in early crypto and lived through it like me experienced the same. The scams didn’t start until crypto had value a couple years in.

UnderpantsWeevil@lemmy.world · 2 days ago

Again, look up the history of Mt. Gox.

It was only that largest of the scam sights.

EightBitBlood@lemmy.world · 2 days ago

My dude. That’s like saying, “look up the history of Wallstreet” implying the MOMENT it was active, it was full of scams.

MT. Gox SLOWLY became a site to trade crypto as if it were a security. Just like Wallstreet. When that started working and became valuable, then the scams started.

The fact it worked for CRYPTO at all was by complete ACCIDENT too. As proven by the REAL FULL name of Mt Gox: “Magic the Gathering Online eXchange.”

You wanna believe there were devious plans to scam Bitcoin at MtGox from the beginning despite it originally being a place to trade Magic Cards?

There’s also basics economics. Bitcoin was worth less than a dollar at that point. Who is going to create complex technical scams for pennies?

No one. That’s why the scams started when the pennies turned into dollars. Crime is only going to Crime when there’s profit to be made.

UnderpantsWeevil@lemmy.world · 2 days ago

That’s like saying, “look up the history of Wallstreet” implying the MOMENT it was active, it was full of scams.

You seriously might want to look up the history of Wall Street.

alt_xa_23@lemmy.world · 3 days ago

Don’t forget that the development of Ethereum was funded in large part by Peter Thiel

Knock_Knock_Lemmy_In@lemmy.world · 3 days ago

Theil is buying now but he’s a late arrival.

Anthony Di Iorio wrote: “Ethereum was founded by Vitalik Buterin, Myself, Charles Hoskinson, Mihai Alisie & Amir Chetrit (the initial 5) in December 2013. Joseph Lubin, Gavin Wood, & Jeffrey Wilcke were added in early 2014 as founders.”

Danquebec@sh.itjust.works · 2 days ago

Even I know what this is and I don’t have a background in AI/ML.

killeronthecorner@lemmy.world · 3 days ago

This image is clearly of my hands with an elastic band at the back of class two decades ago

da_cow (she/her)@feddit.org · edit-2 3 days ago

Yeah but why am I arguing with them?

killeronthecorner@lemmy.world · 3 days ago

Maybe it’s because they were stretching.

Avicenna@lemmy.world · 3 days ago

Wait till you talk to LinkedIn people interested in Quantum Physics

Krudler@lemmy.world · 3 days ago

My favorite story was going on a date with a woman, who by rights was very bright. She had a PhD and went on and on about quantum this and that. We were heading to the live music stage and talking a long L-shaped gravel path… I chirped “shall we hypotenuse it across the feild?” She replied “what’s a hypotenuse?”

ddh@lemmy.sdf.org · 3 days ago

The Venn diagram of LinkedIn people who post about Quantum Physics and those who post about Deepak Chopra is almost a circle.

Avicenna@lemmy.world · 2 days ago

Seems like one more person that I didn’t need to know existed but now I do, thanks

Infernal_pizza@lemmy.dbzer0.com · 3 days ago

OK but what actually is this image?

SatyrSack@lemmy.sdf.org · 3 days ago

Basic model of a neural net. The post is implying that you’re arguing with bots.

https://en.wikipedia.org/wiki/Neural_network_(machine_learning)

CookieOfFortune@lemmy.world · 3 days ago

Wouldn’t a bot recognize this though?

SatyrSack@lemmy.sdf.org · 3 days ago

A bot might, but this post is pointing out how common it is for people who consider themselves AI experts to not recognize this diagram that is basically part of AI 101

driving_crooner@lemmy.eco.br · 3 days ago

They’re not saying that the bots are asking what the image is, but users (may be bots or not) that sell themselves as AI/ML experts.

lobut@lemmy.ca · 3 days ago

https://youtu.be/7YqEZrP5t1g

they’re just robots Morty!

maniclucky@lemmy.world · 3 days ago

Would you recognize if someone made a block diagram of your brain?

General_Effort@lemmy.world · 2 days ago

The post is implying that you’re arguing with bots.

Maybe, but it also might be suggesting that people are not fundamentally different.

Asetru@feddit.org · 3 days ago

Illustration of a neural network.

Gladaed@feddit.org · 3 days ago

The simplest neural network (simplified). You input a set of properties(first column). Then you weightedly add all of them a number of times(with DIFFERENT weights)(first set of lines). Then you apply a non-linearity to it, e.g. 0 if negative, keep the same otherwise(not shown).

You repeat this with potentially different numbers of outputs any number of times.

Then do this again, but so that your number of outputs is the dimension of your desired output. E.g. 2 if you want the sum of the inputs and their product computed(which is a fun exercise!). You may want to skip the non-linearity here or do something special™

Poik@pawb.social · 2 days ago

Simplest multilayer perceptron*.

A neural network can be made with only one hidden layer (and still, mathematically proven, be able to output any possible function result, just not as easily trained, and with a much higher number of neurons).

Gladaed@feddit.org · edit-2 2 days ago

The one shown is actually single layer. Input, FC hidden layer, output. Edit: can’t count to fucking two, can I now. You are right.

Poik@pawb.social · edit-2 2 days ago

It’s good. Thanks for correcting yourself. :3

The graphs struck me as weird when learning as I expected the input and output nodes to be neuron layers as well… Which they are, but not in the same way. So I frequently miscounted myself while learning, sleep deprived in the back of the classroom. ^^;;

nialv7@lemmy.world · 2 days ago

Multilayer perceptron

Zwiebel@feddit.org · edit-2 3 days ago

To elaborate: the dots are the simulated neurons, the lines the links between neurons. The pictured neural net has four inputs (on the left) leading to the first layer, where each neuron makes a decision based on the input it recieves and a predefined threshold, and then passes its answer on to the second layer, which then connects to the two outputs on the right

Bakkoda@sh.itjust.works · 3 days ago

Logic.

Fermion@mander.xyz · 3 days ago

Many player cat’s cradle

sandywarhole@lemmy.zip · 3 days ago

isn’t this the Trial of the Sekhemas in PoE2?

ArtVandelay@lemmy.world · 2 days ago

As a data scientist who also plays POE2, I laughed at this a lot longer than I should have

AdrianTheFrog@lemmy.world · 3 days ago

Probably bc they forgot the bias nodes

(/s but really I don’t understand why no one ever includes them in these diagrams)

zr0@lemmy.dbzer0.com · 3 days ago

Same as if you’d ask a crypto bro how a blockchain actually works. All those self proclaimed Data Scientists who were able to use pytorch once successfully by following a tutorial, just don’t want to die.

NovaSel@lemmy.world · 2 days ago

Something to do with Large Language Models?

NateNate60@lemmy.world · 2 days ago

It’s a neural network diagram

lightnegative@lemmy.world · 2 days ago

import tensorflow as tf

nroth@lemmy.world · 2 days ago

That particular network could never put up a good argument. At best, it might estimate, or predict numbers or 1-2 discrete binary states.

oddlyqueer@lemmy.ml · 2 days ago

Nalivai@lemmy.world · 2 days ago

That’s what it actually does, people are just this good at anthropomorphising

Whelks_chance@lemmy.world · 3 days ago

I’ve never had it well explained why there are (for example , in this case) two intermediary steps, and 6 blobs in each. That much has been a dark art, at least in the “intro to blah blah” blogposts.

OhNoMoreLemmy@lemmy.ml · 3 days ago

Probably because there’s no good reason.

At least one intermediate layer is needed to make it expressive enough to fit any data, but if you make it wide enough (increasing the blobs) you don’t need more layers.

At that point you then start tuning it /adjusting the number of layers and how wide they are until it works well on data it’s not seen before.

At the end, you’re just like “huh I guess two hidden layers with a width of 6 was enough.”

Whelks_chance@lemmy.world · 3 days ago

All seems pretty random, and not very scientific. Why not try 5 layers, or 50, 500? A million nodes? It’s just a bit arbitrary.

Honytawk@feddit.nl · 2 days ago

It is random, at least while it is learning. It would have most likely tried 5 layers, or even 50.

But the point is to simplify it enough while still working the way it should. And when maximizing the efficiency, you generally get only a handful of efficient ways your problem can be solved.

OhNoMoreLemmy@lemmy.ml · 3 days ago

In practice it’s very systematic for small networks. You perform a search over a range of values until you find what works. We know the optimisation gets harder the deeper a network is so you probably won’t go over 3 hidden layers on tabular data (although if you really care about performance on tabular data you would use something that wasn’t a neural network).

But yes, fundamentally, it’s arbitrary. For each dataset a different architecture might work better, and no one has a good strategy for picking it.

Poik@pawb.social · 2 days ago

There are ways to estimate a little more accurately, but the amount of fine tuning that is guesswork and brute force searching is too damn high…