CS224N Lecture 4: Syntactic Structure and Dependency Parsing

发布于 2023-01-16  97 次阅读


Context-Free Grammars (CFGs)

Also called constituency or phrases structure grammar.

Starting unit: words

Like noun, determiner, preposition and verb.

Words combine into phrases

Phrases can combine into bigger phrases

Phrases structure organizes words into nested constituents.

Some examples of grammars:

  • noun phrases: Det N or Det (Adj) N (PP)
  • preposition phrases: P NP
  • etc

With such grammar above and lexicon, we could easily parse some fragments like “the cuddly cat by the door”.

Dependency structure

Dependency structure shows which words depend on (modify, attach to, or are arguments of) which other words.

An arc from a word to another word which depends on it.

Why are we interested on sentence structure? Human put words together to express complex meaning, listeners need to work out what modifies (attaches to) what. A model needs to understand sentence structure in order to be able to interpret language correctly.

Prepositional phrase attachment ambiguity

Different languages have their own ambiguities.

Like “San Jose cops kill man with knife” and “Scientists count whales from space”.

This problem gets much more complex. A key parsing decision is how we “attach” various constituents.

Catalan number: $\frac{2n!}{(n + 1)! n!}$ means the number of parses one sentence has is determined by the prepositional phrases it has. That’s an exponentially growing series.

Coordination scope ambiguity

Like “Doctor: No heart, cognitive issues”.

Adjectival or Adverbial Modifier Ambiguity

Like “Students get first hand job experience”.

verb Phrase (VP) attachment ambiguity

Skip this example.

Dependency Grammar and Dependency Structure

Dependency syntax postulates that syntactic structure consists of relation between lexical items, normally binary asymmetric relations (”arrows”) called dependencies.

An arrow connects a head (governor, superior, regent) with a dependent (modifier, inferior, subordinate).

Usually, dependencies form a tree (a connected acyclic, single-root graph).

We add a fake ROOT so every word is a dependent of precisely one other node.

Treebank

The term treebank was coined by linguist Geoffrey Leech in the 1980s, by analogy to other repositories such as a seedbank or bloodbank. This is because both syntactic and semantic structure are commonly represented compositionally as a tree structure. The term parsed corpus is often used interchangeably with the term treebank, with the emphasis on the primacy of sentences rather than trees. —— From wikipedia

Starting off, building a treebank seems a lot slower and less useful than writing a grammar (by hand).

Advantages:

  • Reusability of the labor
  • Broad coverage, not just a few intuitions
  • Frequencies and distributional information
  • A way to evaluate NLP system

Dependency parsing

  • A sentence is parsed by choosing for each word what other word (including ROOT) it is a dependent of
  • Usually some constraints:
    • Only one word is a dependent of ROOT
    • Don’t want cycles A to B, B to A
  • This makes the dependencies a tree
  • Final issue is whether arrows can cross (be non-projective) or not

Projectivity

  • Definition of a projective parse: There are no crossing dependency arcs when the words are laid out in their linear order, with all arcs above the words
  • Dependencies corresponding to a CFG tree must be projective
    I.e., by forming dependencies by taking 1 child of each category as head
  • Most syntactic structure is projective like this, but dependency theory normally does
    allow non-projective structures
    to account for displaced constituents
  • You can’t easily get the semantics of certain constructions right without these
    nonprojective dependencies

Methods of Dependency Parsing

  • Dynamic programming
  • Graph algorithms
  • Constraint satisfaction
  • “Transition-based parsing” or “deterministic dependency parsing”

The last will be introduced then.

Transition-based parsing

  • A stack $\sigma$ starts with a ROOT in it
  • A buffer $\beta$ starts with the input sentence
  • A set of dependency arcs $A$ starts with nothing in it
  • A set of actions

The biggest problem: how we choose the next action?

Machine learning! Each action is predicted by a discriminative classifier (e.g., softmax classifier) over each legal move. It provides very fast linear time parsing.

Evaluation of Dependency Parsing

  • Unlabeled Accuracy Score: only wrong dependency pair is wrong
  • Labeled Accuracy Score: wrong label is also wrong