288 lines
10 KiB
Text
288 lines
10 KiB
Text
.. Copyright (C) 2001-2018 NLTK Project
|
|
.. For license information, see LICENSE.TXT
|
|
|
|
========
|
|
FrameNet
|
|
========
|
|
|
|
The FrameNet corpus is a lexical database of English that is both human-
|
|
and machine-readable, based on annotating examples of how words are used
|
|
in actual texts. FrameNet is based on a theory of meaning called Frame
|
|
Semantics, deriving from the work of Charles J. Fillmore and colleagues.
|
|
The basic idea is straightforward: that the meanings of most words can
|
|
best be understood on the basis of a semantic frame: a description of a
|
|
type of event, relation, or entity and the participants in it. For
|
|
example, the concept of cooking typically involves a person doing the
|
|
cooking (Cook), the food that is to be cooked (Food), something to hold
|
|
the food while cooking (Container) and a source of heat
|
|
(Heating_instrument). In the FrameNet project, this is represented as a
|
|
frame called Apply_heat, and the Cook, Food, Heating_instrument and
|
|
Container are called frame elements (FEs). Words that evoke this frame,
|
|
such as fry, bake, boil, and broil, are called lexical units (LUs) of
|
|
the Apply_heat frame. The job of FrameNet is to define the frames
|
|
and to annotate sentences to show how the FEs fit syntactically around
|
|
the word that evokes the frame.
|
|
|
|
------
|
|
Frames
|
|
------
|
|
|
|
A Frame is a script-like conceptual structure that describes a
|
|
particular type of situation, object, or event along with the
|
|
participants and props that are needed for that Frame. For
|
|
example, the "Apply_heat" frame describes a common situation
|
|
involving a Cook, some Food, and a Heating_Instrument, and is
|
|
evoked by words such as bake, blanch, boil, broil, brown,
|
|
simmer, steam, etc.
|
|
|
|
We call the roles of a Frame "frame elements" (FEs) and the
|
|
frame-evoking words are called "lexical units" (LUs).
|
|
|
|
FrameNet includes relations between Frames. Several types of
|
|
relations are defined, of which the most important are:
|
|
|
|
- Inheritance: An IS-A relation. The child frame is a subtype
|
|
of the parent frame, and each FE in the parent is bound to
|
|
a corresponding FE in the child. An example is the
|
|
"Revenge" frame which inherits from the
|
|
"Rewards_and_punishments" frame.
|
|
|
|
- Using: The child frame presupposes the parent frame as
|
|
background, e.g the "Speed" frame "uses" (or presupposes)
|
|
the "Motion" frame; however, not all parent FEs need to be
|
|
bound to child FEs.
|
|
|
|
- Subframe: The child frame is a subevent of a complex event
|
|
represented by the parent, e.g. the "Criminal_process" frame
|
|
has subframes of "Arrest", "Arraignment", "Trial", and
|
|
"Sentencing".
|
|
|
|
- Perspective_on: The child frame provides a particular
|
|
perspective on an un-perspectivized parent frame. A pair of
|
|
examples consists of the "Hiring" and "Get_a_job" frames,
|
|
which perspectivize the "Employment_start" frame from the
|
|
Employer's and the Employee's point of view, respectively.
|
|
|
|
To get a list of all of the Frames in FrameNet, you can use the
|
|
`frames()` function. If you supply a regular expression pattern to the
|
|
`frames()` function, you will get a list of all Frames whose names match
|
|
that pattern:
|
|
|
|
>>> from pprint import pprint
|
|
>>> from operator import itemgetter
|
|
>>> from nltk.corpus import framenet as fn
|
|
>>> from nltk.corpus.reader.framenet import PrettyList
|
|
>>> x = fn.frames(r'(?i)crim')
|
|
>>> x.sort(key=itemgetter('ID'))
|
|
>>> x
|
|
[<frame ID=200 name=Criminal_process>, <frame ID=500 name=Criminal_investigation>, ...]
|
|
>>> PrettyList(sorted(x, key=itemgetter('ID')))
|
|
[<frame ID=200 name=Criminal_process>, <frame ID=500 name=Criminal_investigation>, ...]
|
|
|
|
To get the details of a particular Frame, you can use the `frame()`
|
|
function passing in the frame number:
|
|
|
|
>>> from pprint import pprint
|
|
>>> from nltk.corpus import framenet as fn
|
|
>>> f = fn.frame(202)
|
|
>>> f.ID
|
|
202
|
|
>>> f.name
|
|
'Arrest'
|
|
>>> f.definition # doctest: +ELLIPSIS
|
|
"Authorities charge a Suspect, who is under suspicion of having committed a crime..."
|
|
>>> len(f.lexUnit)
|
|
11
|
|
>>> pprint(sorted([x for x in f.FE]))
|
|
['Authorities',
|
|
'Charges',
|
|
'Co-participant',
|
|
'Manner',
|
|
'Means',
|
|
'Offense',
|
|
'Place',
|
|
'Purpose',
|
|
'Source_of_legal_authority',
|
|
'Suspect',
|
|
'Time',
|
|
'Type']
|
|
>>> pprint(f.frameRelations)
|
|
[<Parent=Intentionally_affect -- Inheritance -> Child=Arrest>, <Complex=Criminal_process -- Subframe -> Component=Arrest>, ...]
|
|
|
|
The `frame()` function shown above returns a dict object containing
|
|
detailed information about the Frame. See the documentation on the
|
|
`frame()` function for the specifics.
|
|
|
|
You can also search for Frames by their Lexical Units (LUs). The
|
|
`frames_by_lemma()` function returns a list of all frames that contain
|
|
LUs in which the 'name' attribute of the LU matchs the given regular
|
|
expression. Note that LU names are composed of "lemma.POS", where the
|
|
"lemma" part can be made up of either a single lexeme (e.g. 'run') or
|
|
multiple lexemes (e.g. 'a little') (see below).
|
|
|
|
>>> PrettyList(sorted(fn.frames_by_lemma(r'(?i)a little'), key=itemgetter('ID'))) # doctest: +ELLIPSIS
|
|
[<frame ID=189 name=Quanti...>, <frame ID=2001 name=Degree>]
|
|
|
|
-------------
|
|
Lexical Units
|
|
-------------
|
|
|
|
A lexical unit (LU) is a pairing of a word with a meaning. For
|
|
example, the "Apply_heat" Frame describes a common situation
|
|
involving a Cook, some Food, and a Heating Instrument, and is
|
|
_evoked_ by words such as bake, blanch, boil, broil, brown,
|
|
simmer, steam, etc. These frame-evoking words are the LUs in the
|
|
Apply_heat frame. Each sense of a polysemous word is a different
|
|
LU.
|
|
|
|
We have used the word "word" in talking about LUs. The reality
|
|
is actually rather complex. When we say that the word "bake" is
|
|
polysemous, we mean that the lemma "bake.v" (which has the
|
|
word-forms "bake", "bakes", "baked", and "baking") is linked to
|
|
three different frames:
|
|
|
|
- Apply_heat: "Michelle baked the potatoes for 45 minutes."
|
|
|
|
- Cooking_creation: "Michelle baked her mother a cake for her birthday."
|
|
|
|
- Absorb_heat: "The potatoes have to bake for more than 30 minutes."
|
|
|
|
These constitute three different LUs, with different
|
|
definitions.
|
|
|
|
Multiword expressions such as "given name" and hyphenated words
|
|
like "shut-eye" can also be LUs. Idiomatic phrases such as
|
|
"middle of nowhere" and "give the slip (to)" are also defined as
|
|
LUs in the appropriate frames ("Isolated_places" and "Evading",
|
|
respectively), and their internal structure is not analyzed.
|
|
|
|
Framenet provides multiple annotated examples of each sense of a
|
|
word (i.e. each LU). Moreover, the set of examples
|
|
(approximately 20 per LU) illustrates all of the combinatorial
|
|
possibilities of the lexical unit.
|
|
|
|
Each LU is linked to a Frame, and hence to the other words which
|
|
evoke that Frame. This makes the FrameNet database similar to a
|
|
thesaurus, grouping together semantically similar words.
|
|
|
|
In the simplest case, frame-evoking words are verbs such as
|
|
"fried" in:
|
|
|
|
"Matilde fried the catfish in a heavy iron skillet."
|
|
|
|
Sometimes event nouns may evoke a Frame. For example,
|
|
"reduction" evokes "Cause_change_of_scalar_position" in:
|
|
|
|
"...the reduction of debt levels to $665 million from $2.6 billion."
|
|
|
|
Adjectives may also evoke a Frame. For example, "asleep" may
|
|
evoke the "Sleep" frame as in:
|
|
|
|
"They were asleep for hours."
|
|
|
|
Many common nouns, such as artifacts like "hat" or "tower",
|
|
typically serve as dependents rather than clearly evoking their
|
|
own frames.
|
|
|
|
Details for a specific lexical unit can be obtained using this class's
|
|
`lus()` function, which takes an optional regular expression
|
|
pattern that will be matched against the name of the lexical unit:
|
|
|
|
>>> from pprint import pprint
|
|
>>> PrettyList(sorted(fn.lus(r'(?i)a little'), key=itemgetter('ID')))
|
|
[<lu ID=14733 name=a little.n>, <lu ID=14743 name=a little.adv>, ...]
|
|
|
|
You can obtain detailed information on a particular LU by calling the
|
|
`lu()` function and passing in an LU's 'ID' number:
|
|
|
|
>>> from pprint import pprint
|
|
>>> from nltk.corpus import framenet as fn
|
|
>>> fn.lu(256).name
|
|
'foresee.v'
|
|
>>> fn.lu(256).definition
|
|
'COD: be aware of beforehand; predict.'
|
|
>>> fn.lu(256).frame.name
|
|
'Expectation'
|
|
>>> fn.lu(256).lexemes[0].name
|
|
'foresee'
|
|
|
|
Note that LU names take the form of a dotted string (e.g. "run.v" or "a
|
|
little.adv") in which a lemma preceeds the "." and a part of speech
|
|
(POS) follows the dot. The lemma may be composed of a single lexeme
|
|
(e.g. "run") or of multiple lexemes (e.g. "a little"). The list of
|
|
POSs used in the LUs is:
|
|
|
|
v - verb
|
|
n - noun
|
|
a - adjective
|
|
adv - adverb
|
|
prep - preposition
|
|
num - numbers
|
|
intj - interjection
|
|
art - article
|
|
c - conjunction
|
|
scon - subordinating conjunction
|
|
|
|
For more detailed information about the info that is contained in the
|
|
dict that is returned by the `lu()` function, see the documentation on
|
|
the `lu()` function.
|
|
|
|
-------------------
|
|
Annotated Documents
|
|
-------------------
|
|
|
|
The FrameNet corpus contains a small set of annotated documents. A list
|
|
of these documents can be obtained by calling the `docs()` function:
|
|
|
|
>>> from pprint import pprint
|
|
>>> from nltk.corpus import framenet as fn
|
|
>>> d = fn.docs('BellRinging')[0]
|
|
>>> d.corpname
|
|
'PropBank'
|
|
>>> d.sentence[49] # doctest: +ELLIPSIS
|
|
full-text sentence (...) in BellRinging:
|
|
<BLANKLINE>
|
|
<BLANKLINE>
|
|
[POS] 17 tags
|
|
<BLANKLINE>
|
|
[POS_tagset] PENN
|
|
<BLANKLINE>
|
|
[text] + [annotationSet]
|
|
<BLANKLINE>
|
|
`` I live in hopes that the ringers themselves will be drawn into
|
|
***** ******* *****
|
|
Desir Cause_t Cause
|
|
[1] [3] [2]
|
|
<BLANKLINE>
|
|
that fuller life .
|
|
******
|
|
Comple
|
|
[4]
|
|
(Desir=Desiring, Cause_t=Cause_to_make_noise, Cause=Cause_motion, Comple=Completeness)
|
|
<BLANKLINE>
|
|
|
|
>>> d.sentence[49].annotationSet[1] # doctest: +ELLIPSIS
|
|
annotation set (...):
|
|
<BLANKLINE>
|
|
[status] MANUAL
|
|
<BLANKLINE>
|
|
[LU] (6605) hope.n in Desiring
|
|
<BLANKLINE>
|
|
[frame] (366) Desiring
|
|
<BLANKLINE>
|
|
[GF] 2 relations
|
|
<BLANKLINE>
|
|
[PT] 2 phrases
|
|
<BLANKLINE>
|
|
[text] + [Target] + [FE] + [Noun]
|
|
<BLANKLINE>
|
|
`` I live in hopes that the ringers themselves will be drawn into
|
|
- ^^^^ ^^ ***** ----------------------------------------------
|
|
E supp su Event
|
|
<BLANKLINE>
|
|
that fuller life .
|
|
-----------------
|
|
<BLANKLINE>
|
|
(E=Experiencer, su=supp)
|
|
<BLANKLINE>
|
|
<BLANKLINE>
|