Context-Free Grammars (CFGs)

Also called constituency or phrases structure grammar.

Starting unit: words

Like noun, determiner, preposition and verb.

Words combine into phrases

Phrases can combine into bigger phrases

Phrases structure organizes words into nested constituents.

Some examples of grammars:

noun phrases: Det N or Det (Adj) N (PP)
preposition phrases: P NP
etc

With such grammar above and lexicon, we could easily parse some fragments like “the cuddly cat by the door”.

Dependency structure

Dependency structure shows which words depend on (modify, attach to, or are arguments of) which other words.

An arc from a word to another word which depends on it.

Why are we interested on sentence structure? Human put words together to express complex meaning, listeners need to work out what modifies (attaches to) what. A model needs to understand sentence structure in order to be able to interpret language correctly.

Prepositional phrase attachment ambiguity

Different languages have their own ambiguities.

Like “San Jose cops kill man with knife” and “Scientists count whales from space”.

This problem gets much more complex. A key parsing decision is how we “attach” various constituents.

Catalan number: $\frac{2n!}{(n + 1)! n!}$ means the number of parses one sentence has is determined by the prepositional phrases it has. That’s an exponentially growing series.

Coordination scope ambiguity

Like “Doctor: No heart, cognitive issues”.

Adjectival or Adverbial Modifier Ambiguity

Like “Students get first hand job experience”.

verb Phrase (VP) attachment ambiguity

Skip this example.

Dependency Grammar and Dependency Structure

Dependency syntax postulates that syntactic structure consists of relation between lexical items, normally binary asymmetric relations (”arrows”) called dependencies.

An arrow connects a head (governor, superior, regent) with a dependent (modifier, inferior, subordinate).

Usually, dependencies form a tree (a connected acyclic, single-root graph).

We add a fake ROOT so every word is a dependent of precisely one other node.

Treebank

The term treebank was coined by linguist Geoffrey Leech in the 1980s, by analogy to other repositories such as a seedbank or bloodbank. This is because both syntactic and semantic structure are commonly represented compositionally as a tree structure. The term parsed corpus is often used interchangeably with the term treebank, with the emphasis on the primacy of sentences rather than trees. —— From wikipedia

Starting off, building a treebank seems a lot slower and less useful than writing a grammar (by hand).

Advantages:

Reusability of the labor
Broad coverage, not just a few intuitions
Frequencies and distributional information
A way to evaluate NLP system

Dependency parsing

A sentence is parsed by choosing for each word what other word (including ROOT) it is a dependent of
Usually some constraints:
- Only one word is a dependent of ROOT
- Don’t want cycles A to B, B to A
This makes the dependencies a tree
Final issue is whether arrows can cross (be non-projective) or not

Projectivity

Definition of a projective parse: There are no crossing dependency arcs when the words are laid out in their linear order, with all arcs above the words
Dependencies corresponding to a CFG tree must be projective
I.e., by forming dependencies by taking 1 child of each category as head
Most syntactic structure is projective like this, but dependency theory normally does
allow non-projective structures to account for displaced constituents
You can’t easily get the semantics of certain constructions right without these
nonprojective dependencies

Methods of Dependency Parsing

Dynamic programming
Graph algorithms
Constraint satisfaction
“Transition-based parsing” or “deterministic dependency parsing”

The last will be introduced then.

Transition-based parsing

A stack $\sigma$ starts with a ROOT in it
A buffer $\beta$ starts with the input sentence
A set of dependency arcs $A$ starts with nothing in it
A set of actions

The biggest problem: how we choose the next action?

Machine learning! Each action is predicted by a discriminative classifier (e.g., softmax classifier) over each legal move. It provides very fast linear time parsing.

Evaluation of Dependency Parsing

Unlabeled Accuracy Score: only wrong dependency pair is wrong
Labeled Accuracy Score: wrong label is also wrong

CS224N Lecture 4: Syntactic Structure and Dependency Parsing