Context-Free Grammars (CFGs)
Also called constituency or phrases structure grammar.
Starting unit: words
Like noun, determiner, preposition and verb.
Words combine into phrases
Phrases can combine into bigger phrases
Phrases structure organizes words into nested constituents.
Some examples of grammars:
- noun phrases: Det N or Det (Adj) N (PP)
- preposition phrases: P NP
- etc
With such grammar above and lexicon, we could easily parse some fragments like “the cuddly cat by the door”.
Dependency structure
Dependency structure shows which words depend on (modify, attach to, or are arguments of) which other words.
An arc from a word to another word which depends on it.
Why are we interested on sentence structure? Human put words together to express complex meaning, listeners need to work out what modifies (attaches to) what. A model needs to understand sentence structure in order to be able to interpret language correctly.
Prepositional phrase attachment ambiguity
Different languages have their own ambiguities.
Like “San Jose cops kill man with knife” and “Scientists count whales from space”.
This problem gets much more complex. A key parsing decision is how we “attach” various constituents.
Catalan number: $\frac{2n!}{(n + 1)! n!}$ means the number of parses one sentence has is determined by the prepositional phrases it has. That’s an exponentially growing series.
Coordination scope ambiguity
Like “Doctor: No heart, cognitive issues”.
Adjectival or Adverbial Modifier Ambiguity
Like “Students get first hand job experience”.
verb Phrase (VP) attachment ambiguity
Skip this example.
Dependency Grammar and Dependency Structure
Dependency syntax postulates that syntactic structure consists of relation between lexical items, normally binary asymmetric relations (”arrows”) called dependencies.
An arrow connects a head (governor, superior, regent) with a dependent (modifier, inferior, subordinate).
Usually, dependencies form a tree (a connected acyclic, single-root graph).
We add a fake ROOT so every word is a dependent of precisely one other node.
Treebank
The term treebank was coined by linguist Geoffrey Leech in the 1980s, by analogy to other repositories such as a seedbank or bloodbank. This is because both syntactic and semantic structure are commonly represented compositionally as a tree structure. The term parsed corpus is often used interchangeably with the term treebank, with the emphasis on the primacy of sentences rather than trees. —— From wikipedia
Starting off, building a treebank seems a lot slower and less useful than writing a grammar (by hand).
Advantages:
- Reusability of the labor
- Broad coverage, not just a few intuitions
- Frequencies and distributional information
- A way to evaluate NLP system
Dependency parsing
- A sentence is parsed by choosing for each word what other word (including ROOT) it is a dependent of
- Usually some constraints:
- Only one word is a dependent of ROOT
- Don’t want cycles A to B, B to A
- This makes the dependencies a tree
- Final issue is whether arrows can cross (be non-projective) or not
Projectivity
- Definition of a projective parse: There are no crossing dependency arcs when the words are laid out in their linear order, with all arcs above the words
- Dependencies corresponding to a CFG tree must be projective
I.e., by forming dependencies by taking 1 child of each category as head - Most syntactic structure is projective like this, but dependency theory normally does
allow non-projective structures to account for displaced constituents - You can’t easily get the semantics of certain constructions right without these
nonprojective dependencies
Methods of Dependency Parsing
- Dynamic programming
- Graph algorithms
- Constraint satisfaction
- “Transition-based parsing” or “deterministic dependency parsing”
The last will be introduced then.
Transition-based parsing
- A stack $\sigma$ starts with a ROOT in it
- A buffer $\beta$ starts with the input sentence
- A set of dependency arcs $A$ starts with nothing in it
- A set of actions
The biggest problem: how we choose the next action?
Machine learning! Each action is predicted by a discriminative classifier (e.g., softmax classifier) over each legal move. It provides very fast linear time parsing.
Evaluation of Dependency Parsing
- Unlabeled Accuracy Score: only wrong dependency pair is wrong
- Labeled Accuracy Score: wrong label is also wrong
Comments NOTHING