Word2vec
I wrote this program in in 8th grade.
This program finds meaningful vectors that correspond to the meanings of the words it’s given. This program takes in all the words and symbols from the corpus, splits them up, takes the relative positions between the words, and then strongly compresses that information by training a neural network on it. The lower the blue line during training gets, the more compressed the data is. Once the neural network is trained on this information, different parts of the network store relevant information for different words. The incredible part of this is that this word-specific information in the neural network actually describes the coordinates of a meaningful point in an abstract space - not a normal 2D or 3D space as we are used to, but a much more complex 100D space. It is basically impossible to visualize anything in a 100 dimensional space, but many of the same rules and mathematics we are used to in lower dimensional space can still be used when we go to 100D. For example, we can still find the distance between two points in the same way as we would in 2D space, by applying the Pythagorean theorem. The reason these 100D points are interesting is because words with points that are closer to each other have similar meaning. And it gets even better because (when this method is applied to much larger collections of text), you can actually do literal arithmetic on the points the words correspond to. Google’s paper describing the word2vec system gives the example of vector(“King”) - vector(“Man”) + vector(“Woman”) giving a vector close to vector(“Queen”). By now it would probably be a good idea to say that ‘vector’ and ‘point’ mean mostly the same thing in this context (sometimes in the field of machine learning its helpful to look at the distance between two points in terms of the angle between them from the origin, and other times it’s helpful to rotate the points by an angle around the origin, so the word vector is often used) I find this system really interesting because it is one of the shortest ways I could find to automatically quantify the meanings of words. I first learned how this process works in 8th grade, and now (11th as I write this) I understand much more about this. The idea of turning words into vectors is a very powerful one, and gave me a very helpful inital intuition about how LLMs work. Inside the code, there is a variable named ‘corpus’ that I would recommend changing the content of and reruning the program.