• Whelks_chance@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    3 days ago

    I’ve never had it well explained why there are (for example , in this case) two intermediary steps, and 6 blobs in each. That much has been a dark art, at least in the “intro to blah blah” blogposts.

    • OhNoMoreLemmy@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 days ago

      Probably because there’s no good reason.

      At least one intermediate layer is needed to make it expressive enough to fit any data, but if you make it wide enough (increasing the blobs) you don’t need more layers.

      At that point you then start tuning it /adjusting the number of layers and how wide they are until it works well on data it’s not seen before.

      At the end, you’re just like “huh I guess two hidden layers with a width of 6 was enough.”

      • Whelks_chance@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 days ago

        All seems pretty random, and not very scientific. Why not try 5 layers, or 50, 500? A million nodes? It’s just a bit arbitrary.

        • Honytawk@feddit.nl
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 days ago

          It is random, at least while it is learning. It would have most likely tried 5 layers, or even 50.

          But the point is to simplify it enough while still working the way it should. And when maximizing the efficiency, you generally get only a handful of efficient ways your problem can be solved.

        • OhNoMoreLemmy@lemmy.ml
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 days ago

          In practice it’s very systematic for small networks. You perform a search over a range of values until you find what works. We know the optimisation gets harder the deeper a network is so you probably won’t go over 3 hidden layers on tabular data (although if you really care about performance on tabular data you would use something that wasn’t a neural network).

          But yes, fundamentally, it’s arbitrary. For each dataset a different architecture might work better, and no one has a good strategy for picking it.

          • Poik@pawb.social
            link
            fedilink
            English
            arrow-up
            2
            ·
            2 days ago

            There are ways to estimate a little more accurately, but the amount of fine tuning that is guesswork and brute force searching is too damn high…