![[Tensorflow Playground Decomposition.png]]
A neural network is a way to hierarchically decompose complex input data into related features, then recompose it into meaningful abstractions
Each layer is a map between inputs and activations for certain feature which can reduce the complexity of the input space to a high dimensional spacenput space to a high dimensional spa
## Multi Head Attention
Certainly! Let's break down how Multi-Head Attention could help interpret the sentence: "A cat wanders into a room and kneels down to drink from a bowl of milk."
### Setting the Stage
Suppose our model has 3 attention heads. Each head will look at the sentence but focus on different relationships between the words.
1. **Head 1** might specialize in capturing the subject-verb-object relationships.
2. **Head 2** could be looking for contextual or semantic meaning, like understanding that "room" and "bowl" are places or containers.
3. **Head 3** might focus on the actions, identifying sequential or cause-effect relationships.
### Computing Attention
For simplicity, let's focus on the word "drink" and see how each head attends to the rest of the sentence.
1. **Head 1 (Subject-Verb-Object)**
- Query for "drink" will give it high attention scores for "cat" (subject) and "milk" (object), as they are directly related to the action of drinking.
2. **Head 2 (Context/Semantics)**
- This head might give higher weights to "room" and "bowl," understanding that these are locations where the action can take place.
3. **Head 3 (Actions/Sequences)**
- This head may link "wanders," "kneels down," and "drink" as a sequence of actions. It recognizes that "kneels down" likely precedes "drink."
### Aggregating Outputs
After each head computes its weighted sum of all the words for "drink," the outputs are concatenated and passed through a linear layer to integrate the insights from all heads.
### Intuitive Understanding
Imagine three detectives are investigating what the word "drink" is doing in this sentence:
1. **Detective 1** notes who is doing the drinking and what is being drunk: "It's the cat, and it's milk."
2. **Detective 2** identifies where this action could be relevant: "Hmm, it's in a room, and there's a bowl involved."
3. **Detective 3** looks at the sequence of actions: "First, the cat wanders, then kneels, and finally drinks."
All their observations are then combined to give a thorough understanding of the role "drink" plays in the sentence.
By attending to different aspects of the sentence, Multi-Head Attention allows the model to have a richer understanding of the text, which is especially useful for tasks like translation, summarization, and question-answering. Given your interest in being proof-oriented and scientific, you might appreciate how this mechanism allows the model to generate multiple "hypotheses" about each word and then integrate them into a coherent understanding.