• 7 Posts
  • 699 Comments
Joined 1 year ago
cake
Cake day: June 9th, 2023

help-circle



  • Generally the term Markov chain is used to discribe a model with a few dozen weights, while the large in large language model refers to having millions or billions of weights, but the fundamental principle of operation is exactly the same, they just differ in scale.

    Word Embeddings are when you associate a mathematical vector to the word as a way of grouping similar words are weighted together, I don’t think that anyone would argue that the general public can even solve a mathematical matrix, much less that they can only comprehend a stool based on going down a row in a matrix to get the mathematical similarity between a stool, a chair, a bench, a floor, and a cat.

    Subtracting vectors from each other can give you a lot of things, but not the actual meaning of the concept represented by a word.


  • To note the obvious, an large language model is by definition at its core a mathematical formula and a massive collection of values from zero to one which when combined give a weighted average of the percentage that word B follows word A crossed with another weighted average word cloud given as the input ‘context’.

    A nuron in machine learning terms is a matrix (ie table) of numbers between zero and 1 by contrast a single human nuron is a biomechanical machine with literally hundreds of trillions of moving parts that darfs any machine humanity has ever built in terms of complexity. This is just a single one of the 86 billion nurons in an average human brain.

    LLM’s and organic brains are completely different and in both design, complexity, and function, and to treat them as closely related much less synonymous betrays a complete lack of understanding of how one or both of them fundamentally functions.

    We do not teach a kindergartner how to write by having them read for thousands of years until they recognize the exact mathematical odds that string of letters B comes after string A, and is followed by string C x percent of the time. Indeed humans don’t naturally compose sentences one word at a time starting from the beginning, instead staring with the key concepts they wish to express and then filling in the phrasing and grammar.

    We also would not expect that increasing from hundreds of years of reading text to thousands would improve things, and the fact that this is the primary way we’ve seen progress in LLMs in the last half decade is yet another example of why animal learning and a word cloud are very different things.

    For us a word actually correlates to a concept of what that word represents. They might make mistakes and missunderstand what concept a given word maps to in a given language, but we do generally expect it to correlate to something. To us a chair is a object made to sit down on, and not just the string of letters that comes after the word the in .0021798 percent of cases weighted against the .0092814 percent of cases related to the collection of strings that are being used as the ‘context’.

    Do I believe there is something intrinsically impossible for a mathematical program to replicate about human thought, probably not. But this this not that, and is nowhere close to that on a fundamental level. It’s comparing apples to airplanes and saying that soon this apple will inevitably take anyone it touches to Paris because their both objects you can touch.


  • Like say, treating a program that shows you the next most likely word to follow the previous one on the internet like it is capable of understanding a sentence beyond this is the most likely string of words to follow the given input on the internet. Boy it sure is a good thing no one would ever do something so brainless as that in the current wave of hype.

    It’s also definitely becuse autocompletes have made massive progress recently, and not just because we’ve fed simpler and simpler transformers more and more data to the point we’ve run out of new text on the internet to feed them. We definitely shouldn’t expect that the field as a whole should be valued what it was say back in 2018, when there were about the same number of practical uses and the foucus was on better programs instead of just throwing more training data at it and calling that progress that will continue to grow rapidly even though the amount of said data is very much finite.


  • Except when it comes to LLM, the fact that the technology fundamentally operates by probabilisticly stringing together the next most likely word to appear in the sentence based on the frequency said words appeared in the training data is a fundamental limitation of the technology.

    So long as a model has no regard for the actual you know, meaning of the word, it definitionally cannot create a truly meaningful sentence. Instead, in order to get a coherent output the system must be fed training data that closely mirrors the context, this is why groups like OpenAi have been met with so much success by simplifying the algorithm, but progressively scrapping more and more of the internet into said systems.

    I would argue that a similar inherent technological limitation also applies to image generation, and until a generative model can both model a four dimensional space and conceptually understand everything it has created in that space a generated image can only be as meaningful as the parts of the work the tens of thousands of people who do those things effortlessly it has regurgitated.

    This is not required to create images that can pass as human made, but it is required to create ones that are truely meaningful on their own merits and not just the merits of the material it was created from, and nothing I have seen said by experts in the field indicates that we have found even a theoretical pathway to get there from here, much less that we are inevitably progressing on that path.

    Mathematical models will almost certainly get closer to mimicking the desired parts of the data they were trained on with further instruction, but it is important to understand that is not a pathway to any actual conceptual understanding of the subject.





  • Personally I tend to be hesitant on relaxing the duel means of egres rule completely when i’ve seen buildings in Vancouver use two sets of stairs interwoven in the same stairwell to achieve the same effect with only a 30% or so increase in floor space. Even if it’s statistically not much help knowing there are two ways out of the building in an emergency does have an advantage, and i’m not convinced that it’s actually as much of a factor into the proliferation of double stacked corridors as them just being the cheapest way build.

    Otherwise i’m definitely a big fan of the suggestions, especially more interconnections between buildings.



  • Actual radioactivity matters quite a bit as to how safe it is, and that’s dependent quite a bit on the actual amount and thorium density of the specific item in question. From Tokaimura to a banana levels of radiation covers quite a large range, and I in good faith assumed that if someone was going out of their way to find and mention thorium on Unix socks of all places it was probably a pretty safe assumption that they know the basics and could check where on that spectrum the item falls, that it’s an excuse to play with a detector is most of the fun after all.




  • Please explain to me how any of the child level explanation of the stock market is obfuscation, or again how you think the market cap, a purely theoretical number, could possibly be redistributed to employees outside of things the company already does to some extent, and finally why it applies in this case with a company who’s stock price is based purely on speculation about what it could do in the future and not anything it’s employees are currently doing.

    Also from your comment about how share price literally is the only measure of value for a company I’m taking it you follow the theroy of value that value directly equals the amount of money paid for it, which seems inherently contradictory to this entire conversation.



  • Technically, they don’t even make the actual graphics cards, they just design them and then outsource manufacturing to TSMC.

    But don’t you know that doesn’t matter, because by 2028 every singe company in the world is going to need a data center filled with tens of thousands of AI accelerators turning their own scrape of the internet into a chatbot, and so one of the companies that makes thouse accelerators is definitely going to have as much business as companies that make half of everyone’s phones or computer software./s


  • Firstly it shows the value of individual shares multiplied by the number of shares, not the company as a whole. Secondly, in this case Nvidia’s share price is based on what the company may be able to expand to do in the future, not what it currently does. Thirdly, where would this repersentive percentage come from? If it’s, issueing new stock to employees, A Nvida already does that a lot, B, creating new stock is not practically reliant on overall market cap so why is it relevant, and C, would employees also be punished for destroying the valuation if it turns out that every company doesn’t actually need a data center full of several thousand AI accelerators scraping the internet to make unique chat bots and Nvida’s market cap falls back down to what it would be based on how much money the company actually makes?

    Again, Nvida primarily makes chip designs for outsourced fabrication, not market cap, that three trillion isn’t like revenue for Nvidia. In your painting example, market cap would be like if two unrelated billionaires bet 10 billion on whether or not that painter would be successful in selling a hundred different 1m paintings in the next six months, the painter might have an easier time say getting a loan for new supplies from a bank if they can point to the billionaire betting so much on them, but you know it’s not like the painter was actually paid that 10 billion that makes up the bet, right? So it’s kind of weird to say that the painter’s work as a whole is definitely worth that 10 billion bet.


  • I’m saying that while a companies market capitalization is a real number that can tell you things about a company, it is not like anyone involved has a three trillion actual dollars. The company doesn’t see any of that money directly unless they directly issue more stock which would devalue the current stock, though there are some other ways for a company to use it to their advantage. Investors might be able to get a small percentage of that by selling, but only because someone else bought in with an equal amount of money, and a large sell will drive down the price.

    More to the point, the evaluations people are doing with Nvidia don’t have much to do with what the company actually produces and puts out into the world today, but the assumption that it can turn its current leadership position in AI accelerator chip designs into growing massively in size in the future when every company needs a large data center or two to train their own individual LLM’s.

    A individual stocks price is driven primarily by what people think that individual stock certificate can be sold for in the future, and effected by things like how many people are trying to sell, adding all of those certificates up at current market price doesn’t actually give anyone involved much information, nor does it reflect the actual quality, quantity, material, or labor taken to make things, in this case branded computer chip blueprints, that a company puts out into the world.

    Now there are a lot of competing theories of ways to try and measure labor’s value, but my work being only as valuable as the speculative amount my organization as a whole might be theoretically sold for as a whole in the future if no one tries to undercut anyone else isn’t one of the more popular ones.