Annotation of Semantic Roles

Document technical information

Format pdf
Size 128.8 kB
First found May 22, 2018

Document content analysis

Category Also themed
Language
English
Type
not defined
Concepts
no text concepts found

Persons

Organizations

Places

Transcript

Annotation of Semantic Roles
Paola Monachesi
In collaboration with
Gerwert Stevens and Jantine Trapman
Utrecht University
Overview
• Semantic roles in linguistic literature
• Annotation of semantic roles
– Framenet
– PropBank
•
•
•
•
2
Merging approaches
Annotation in the D-coi project
Automatic Semantic Role Labeling
Hands-on: annotation of the 1984 EnglishRomanian corpus
Semantic Roles
A general introduction
• Semantic roles capture the relationship
between a predicate and syntactic constituents
• Semantic roles assign meaning to syntactic
constituents
• Linking theory: interaction between syntax and
semantics
• How can roles be inferred from syntax?
3
Generic semantic roles:
characteristics
• Fixed set
• Roles are atomic
• Each verbal argument is assigned only
one role
• roles are uniquely assigned
• roles are non relational
4
Fillmore 1968
• Nine roles: agent, experiencer,instrument,
object,source, goal, location, time and
path.
• Direct relation between roles and
grammatical functions
• Small set of roles not sufficient
• Frame semantics -> FrameNet
5
Jackendoff 1990
• Four roles: theme, source, goal, agent
• Meaning represented by conceptual
structure based on conceptual
constituents
• Relation between syntactic constituent
and conceptual constituent
6
Dowty 1991
• Thematic roles as prototipical concepts
• Two proto-roles: Proto-Agent and ProtoPatient
• Each proto role characterized by
properties
• Flexible system
7
Levin 1993
• Syntactic frames reflect the semantics of
verbs
• Verb classes based on syntactic frames
which are meaning preserving
• Verb Net (Kipper et al. 2000)
• PropBank (Palmer et al. 2005)
8
Verb specific roles
• Situation Semantics in HPSG (Pollard and
Sag 1987)
• Frame semantics (Fillmore 1968)
• No fixed set of roles
• Role sets specific to:
– Verb
– Concept of a given verb
9
Semantic role
assignment
Some emerging projects as basis:
• Proposition Bank (Kingsbury et al. 2002)
• FrameNet (Johnson et al. 2002)
10
PropBank
• Semantic layer of Penn Treebank
• Goal: consistent argument labeling for
automatic extraction of relational data.
• Set of semantic roles related to the
accompanying syntactic realizations.
11
PropBank
Arg0
extern argument (proto-Agent)
Arg1
intern argument (proto-Patient)
Arg2
indirect object / beneficiary / instrument /
attribute / end state
Arg3
start point / beneficiary / instrument / attribute
Arg4
end point
ArgA
external causer
12
Additional tags (ArgMs)
•
•
•
•
•
13
ArgM-TMP
Temporal marker (when?)
ArgM-LOC
Location (where?)
ArgM-DIR Direction (where to?)
ArgM-MNR
Manner (how?)
Etc.
PropBank
• Framefiles are developed on the basis of
the individual verbs.
• All the possible roles are spelled out.
• The framefile includes all the possible
senses of a word.
14
Framefiles
Mary left the room
Mary left her daughter-in-law her pearls in her will
Frameset leave.01 “move away from”:
Arg0: entity leaving
Arg1: place left
Frameset leave.02 “give”:
Arg0: giver
Arg1: thing given
Arg2: beneficiary
15
PropBank Frame File Example
Roleset give.01 "transfer"
Roles:
Arg0:
Giver
Arg1:
Thing given
Arg2:
entity given to
Example:
[The executives]arg0 gave [the chefs]arg2 [a standing ovation]arg1
16
FrameNet
•
•
•
•
17
http://framenet.icsi.berkeley.edu
Lexicon-building project
Corpus based
Words grouped in semantic classes which
represent prototypical situations (frame)
FrameNet
• 8.900 lexical units,
• 625 semantic frames
• 135.000 annotated sentences
18
FrameNet
• Three components:
• Frame ontology
• Set of annotated sentences
• Set of lexical entries
19
FrameNet
• Lexical units
• Frame ontology
• Frame:
– Definition
– List of frame elements
– Set of lexical units (Frame Evoking Elements)
• Corpus of example sentences
20
FrameNet
Example:
Leave evokes Departing:
Definition:
“An object (the Theme) moves away from a Source.
The Source may be expressed or it may be
understood from context, but its existence is always
implied by the departing word itself.”
21
FrameNet
Frame elements:
Source, Theme, Area, Depictive, Distance,
Manner, Goal etc.
Example sentence:
[Theme We all] left [Source the school] [Time at four
o’clock].
22
FrameNet frame Example
Frame: Giving
Lexical units: give.v, give_out.v, hand in.v, hand.v, hand_out.v,
hand_over.v, pass.v,
Frame elements:
Donor:
The person that begins in possession of the Themeand
causes it to be in the possession of the Recipient.
Recipient:
The entity that ends up in possession of the Theme
Theme:
The object that changes ownership
Example:
[300 euro]theme was given [to John]recipient [by his mother]donor
23
Comparing approaches
Differences in
– Methodology
– Construction
– Structure
24
Comparing approaches
FrameNet
[Buyer Chuck] bought
[Goods a car] [Seller from Jerry] [Payment for $1000].
[Seller Jerry] sold
[Goods a car] [Buyer to Chuck]
[Payment for $1000].
PropBank
[Arg0 Chuck] bought
[Arg1 a car] [Arg2 from Jerry]
[Arg3 for $1000].
[Arg0 Jerry] sold [Arg1 a car] [Arg2 to Chuck]
25
[Arg3 for $1000].
FrameNet: methodology
•
•
•
•
•
•
•
26
Frame by frame basis
Choose semantic frame
Define the frame
Define its participants (frame elements)
List lexical predicates which invoke a frame
Find relevant sentences in a corpus
Annotate each frame element in sentence
PropBank: methodology
• Examine relevant sentences from corpus containing
verb under consideration;
• Group verbs into major senses;
• Semantic roles assigned on a verb by verb basis.
• Fileframes created on the basis of all possible senses
of a predicate.
• Attempt to label semantically related verbs
consistently
• Less emphasis on the definition of the semantics of the
class
• Creates the basis for training statistical systems.
27
PropBank vs. FrameNet
• PB: classification based on word senses
(corpus driven)
• FN: classification based on semantic classes
(concepts driven)
28
Comparing approaches
PropBank
– Word senses
– Shallow layering
– Restricted set of
argument labels
– Reflecting syntactic
relations
29
FrameNet
– Concepts
– Deep hierarchy
– Exhaustive list of
frame elements
– Semantic roles
Semantic roles and NLP
• Semantic roles help to answer questions like
"Who?", "When?", "What?", "Where?", "Why?",
etc. in NLP applications.
• Semantic role labeling (SRL) is in useful in
range of applications such as:
– Question Answering
– Machine translation
– Information extraction
• Projects have emerged in which corpora are
annotated with semantic roles
30
Semantic roles and corpora
• Can the PB and FN methodologies be
adopted for the annotation of corpora in
different languages?
• What changes are necessary?
• FN: Salsa project (Erk and Pado 2004) Spanish
FrameNet (Subirats and Petruck, 2003) and Japanese
FrameNet (Ohara et al., 2004)
• PB: Arabic PB, Spanish PB
31
Dutch Corpus Initiative
(D-coi)
• Pilot of 50 M words, written language
• September 2005 - December 2006
• Blueprint for 500 MW corpus
–
–
–
–
32
Schemes
Protocols
Procedures
Testing adequacy & practicability
STEVIN
•
•
•
•
Dutch-Flemish cooperation
2004 – 2009
8.5 M euro
Goals:
– Realization of an adequate digital language
infrastructure for Dutch
– Research within the area of LST
– Train new experts, exchange knowledge, stimulate
demand
33
STEVIN
Priority list of needed facilities
– In Speech Technology
• Speech and multimodal corpora
• Text corpora
• Tools and data
– In Language Technology
• Dutch corpora
• Electronic lexica
• Aligned parallel corpora
34
D-coi
• Applications:
–
–
–
–
–
35
Information extraction
QA
Document classification
Automatic abstracting
Linguistic research
D-coi
• Various annotation layers:
– PoS
– Lemmatization
– Syntax
– (Semantics)
36
Semantic annotation
• Current projects focus mainly on English
• Need for a Dutch scheme
• Role assignment, temporal and spatial
annotation
• +/- 3000 words
• Utrecht University: role assignment
37
Integration in D-coi
• Separate annotation levels
• One comprehensive scheme for
semantic annotations
• Integration with other annotation layers
38
Several options
Option 1: Dutch FrameNet
+
39
Exploit SALSA results
construction of new frames necessary
not a very transparent annotation
difficult in use for annotators
Several options
Option 2: Dutch PropBank
+ Transparent annotation
+ At least semi-automatic
- No classification within frame ontology
40
Several options
Option 3: Union of FrameNet and PropBank
• FrameNet – conceptual structure
• PropBank – role assignment
41
D-coi: semantic role
assignment
Reconcile:
• PropBank approach which is corpus based and
syntactic driven.
• FrameNet approach which is semantic driven
and based on a network of relations between
frames.
• Necessity to make the annotation process
automatic.
• Necessity to have a transparent annotation
for annotators and users.
42
Questions
• Is it possible to merge FN frames with PB role
labels (manual or semi-automatic)?
• To which extent can we use existing resources?
• Can we extend existing resources? Should we
include language specific features in the
original source?
• Is it possible to extend the merged resources by
exploiting the best features of both?
43
Three pilot studies
• The Communication frame
• The Transitive_action frame
• The adjunct middle in Dutch
44
The Communication
frame
• Aims:
– Convert FN frames to a simpler form
– Make PB argument labels more uniform
• Assume Levin’s classes and diathesis
alternations
• Construct one role set for verbs that
share the same class
45
The Communication
frame
• Test: Communication and daughter frames
• Example from Communication_noise:
PropBank
FrameNet
Arg0: speaker, communicator
Arg1: utterance
Arg2: hearer
Speaker
Message
Addressee
46
The Transitive_action
frame
• Definition: ”This frame characterizes, at a very
abstract level an Agent or Cause affecting a Patient.”
• More abstract, more challenging
• 29 daughter frames
• Five frames investigated
47
The Transitive_action
frame
Example from Cause_harm:
PropBank
FrameNet
Arg0: agent, hitter – animate only!
Arg1: thing hit
Arg2: instrument, thing hit by/with
Arg3: intensifier of action
Agent
Victim, Body_part
Instrument
Degree
48
The Transitive_action
frame
• Classification sometimes not straightforward
• Role sets can be very specific
• Be careful not to create too general role sets
Arg0: V-er
Arg1: thing being V-ed
Arg2: instrument
Arg3: pieces
49
Arg0: entity causing harm
Arg1: thing being harmed
Arg2: instrument
Arg3: pieces
The adjunct middle
Object middle
(1)
De winkel verkocht zijn laatste roman helemaal
niet.
‘The store didn’t sell his last novel at all.’
(2)
Zijn laatste roman verkocht helemaal niet.
‘His last novel didn’t sell at all.’
Adjunct middle
(3)
Men zit lekker op deze stoel.
‘One sits comfortably on this chair’
(4)
Deze stoel zit lekker.
50
‘This chair sits comfortably.’
The adjunct middle
(1)
Deze stoel zit lekker.
‘This chair sits comfortably.’
(2)
Deze zee vaart rustig.
‘This sea sails peacefully.’
(3)
Regenweer wandelt niet gezellig.
‘Rainy weather does not walk pleasantly.’
51
Middles in FrameNet
a.
[Goods Zijn laatste roman] verkocht helemaal niet (CNI:
Seller).
‘His last novel didn’t sell at all.’
b.
[Location Deze stoel] zit lekker (CNI: Agent).
‘This chair sits comfortably.’
c.
[Area De zee] vaart rustig (CNI: Driver).
‘The sea sails peacefully.’
d.
[Depictive? Regenweer] wandelt niet prettig (CNI: Selfmover).
‘Rainy weather does not walk pleasantly.’
52
Middles in PropBank
a. [Arg1 Zijn laatste roman] verkocht [ArgM-MNR
helemaal niet].
‘His last novel didn’t sell at all.’
b. [? Deze stoel] zit [ArgM-MNR lekker].
‘This chair sits comfortably.’
c. [? De zee] vaart [ArgM-MNR rustig].
‘The sea sails peacefully.’
d. [? Regenweer] wandelt [ArgM-MNR niet prettig].
‘Rainy weather does not walk pleasantly.’
53
Observations
• FrameNet: more specific role labels,
semantically driven
• PropBank: less specific, syntactically
driven
• Both approaches have their own problems
• Merging might provide a solution Æ
language specific problems need to be
addressed
54
Omega
http://omega.isi.edu
• 120,000-node terminological ontology
• It includes:
• Wordnet
• Mikrokosmos (conceptual resource)
• FrameNet and PropBank are included to assign frame
information to each word sense of the predicate.
• Link between the frames and the word senses is
created manually as well as the alignment between
FrameNet and PropBank
•55 Omega seems to align while we merge
Concepts vs. word senses
Concepts
MC
Word
Senses
WN
Omega
Semantic
Frames
FN
56
Word
Senses
PB
Alignment
• Linking schemes
• Schemes stay separate modules
• Problem when modified
57
Merging
• Implies alignment
• Integrates one scheme into another
• Integrates two schemes into a third,
new scheme
58
How to proceed
• Omega can be used.
• Possibility to use the link with WN and its
Dutch equivalent to automatically
translate the word senses.
• PB methodology can be employed to
automatically assign roles to various
predicates.
59
Semantic annotation in D-coi
Considerations
• Can existing methologies be adopted?
– PropBank
– FrameNet
• Our choice: a combination of both
(Monachesi and Trapman 2006)
• But for the time being: PropBank
60
Automatic SRL
• Manual annotation of a large corpus such
as D-Coi is too expensive
• Is automatic semantic role labeling
feasible?
61
Automatic SRL
• Classification algorithms
• Mapping between set of features and set
of classes
• Two phases:
– Training phase
– Evaluation phase
62
Classification algorithms
• Probability Estimation (Gildea and
Jurafsky, 2002)
• Assignment of FrameNet roles
• 65% Precision
• 61% Recall
63
Classification algorithms
• Support Vector Machines (SVMs) (Vapnik,
1995)
• Binary classifier, problem for SRL
• Lower classification speed
– Solution: filter out instances with high
probability of being null
64
Classification algorithms
• Memory based learning (MBL) (Daelemans
et al., 2004)
• Learning component: training examples
stored in memory
• Performance component: similarity based
65
MBL
• Instances are loaded in memory
• Instance: vector with feature-value pairs
and class assignment
• Unseen examples compared with training
data
• Distance metric used for comparison
• k-nearest neighbors algorithm
66
Automatic SRL
• Previous research on automatic SRL showed
encouraging results
– Best published results for PropBank labeling of an
English corpus: 84% precision, 75% recall and 79 FScore (Pradhan et al., 2005)
• Generally, machine learning methods are used,
which requires training data
67
Automatic SRL in a Dutch corpus
• Main Problem:
– There is no Dutch annotated corpus available that
can be used as training data
• Solution:
– Create new training data semi-automatically
(bootstrapping) by using a rule-based tagger on
unannotated data (dependency structures)
– Manually correct output of rule-based tagger
68
SRL approach
•
•
•
•
69
Define mapping between dependency
structures and PropBank
Implement mapping in a rule based
automatic argument tagger
Manually correct tagger output
Use manually corrected corpus as input
for a memory based classifier (TiMBL)
Dependency structures
John geeft het boek aan Marie
“John gives the book to Marie”
SU
name
HD
verb
John
geeft
SMAIN
OBJ1
NP
het boek
70
OBJ2
PP
HD
PREP
OBJ1
noun
aan
Marie
Augmenting dependency nodes
with PropBank labels
John geeft het boek aan Marie
“John gives the book to Marie”
71
SU
Name
Arg0
HD
verb
PRED
John
geeft
SMAIN
OBJ1
NP
Arg1
het boek
OBJ2
PP
Arg2
aan Marie
Basic mapping
PropBank
label
Arg0 . . .Argn
72
Dependency
category
Complement
ArgM-xxx
Modifier
Predicate
Head
Mapping numbered arguments
a mapping for subject and object
complements
Dependency label
Thematic role
PropBank Label
SU (Subject)
Agent
Arg0
OBJ1 (Direct object)
Patient
Arg1
OBJ2 (Indirect object)
Instrument /
Attribute
Arg2
“No consistent generalizations can be made across verbs for the higher
numbered arguments” (Palmer et al. 2005)
73
Mapping numbered arguments
Heuristically mapping higher numbered
arguments
• Mapping complements to numbered arguments higher
than Arg2 is difficult
• Complements that are candidate arguments are:
–
–
–
–
PREDs (purpose clauses)
VCs (verbal complements)
MEs (complements indicating a quantity)
PCs (prepositional complements)
• These complements are mapped to the first available
numbered argument
74
Heuristic mapping example
Ik denk aan je
SMAIN
SU
Pron
Arg0
Ik
HD
verb
PRED
First available numbered
Argument: Arg1
PC
pp
Arg1
denk
aan je
75
Mapping modifiers
PropBank label
Description
ArgM-Neg
Negation markers
ArgM-Rec
ArgM-LOC
Reciprocals
Locative
modifiers
Purpose clauses
ArgM-PNC
ArgM-PRD
76
Prediction
markers
Corresponding
dependency nodes
Niet, nooit, geen,
nergens
Mezelf, zichzelf,
etc.
Nodes with
dependency label
“om
LD te” clause (clabel OTI)
Nodes with
dependency label
PREDM
XARA overview
• Mapping is implemented in XARA: XML-based Automatic
Role-labeler for Alpino Trees (Stevens 2006, 2007)
• XARA performs automatic annotation of XML files based
on a set of rules
• Purpose of XARA is to create training data for a learning
system
• XARA is written in Java
• Rule definitions are based on XPath queries
–
–
77
Rules consist of an XPath expression and a target label
XPath expressions are used to select nodes in an XML file
XARA annotation process
(./node[@rel=‘su’], Arg0)
(./node[@rel=‘obj1’], Arg1)
SMAIN
(./node[@rel=‘obj2’], Arg2)
78
SU
Name
Arg0
HD
verb
PRED
John
geeft
OBJ1
NP
Arg1
het boek
OBJ2
PP
Arg2
aan Marie
XARA’s reusability
• Rules are based on XPath expressions, as
a result:
– XARA can be adapted to any XML-based
treebank
– Creating rule definitions does not require
programming skills
• XARA is not restricted to a specific set of
role labels
79
Evaluation of XARA
Precision
Recall
F-Score
65,11%
45,83%
53,80%
• Relatively low recall score due to the fact
that XARA’s rules cover only a subset of
PropBank argument labels
80
Manual correction
• Sentences from a corpus annotated by
XARA were manually corrected
• Correction was done in accordance with
the PropBank guidelines
• The manually corrected corpus can be
used to train a semantic role classifier
81
Consequences
• Adapt PB guidelines to Dutch
• Extend guidelines if needed
• Dutch PB frameindex?
82
Guidelines
• PB guidelines largely applicable to Dutch
without problems (Trapman and Monachesi
2006)
• More linguistic research/background needed
about the interpretation of modifiers
• Differences mainly caused by different tree
structures:
– D-coi:
– Penn Treebank:
dependency structure
constituent structure
• Structural issue: traces
83
Traces
• General rule: traces do not get any label
- Passives:
[Arg1 Degene die sterft], wordt *trace*
[Arg2erflater] [PRED genoemd].
- Conjunctions:
[Arg0 Jaap] [PRED leest] [Arg1 een boek] en [Arg0 Piet]
*trace* [Arg1 een magazine].
84
Traces
- Wh-questions
[Arg1 Wat] kunt [Arg0 u] *trace* [PRED doen] [Arg2 om de
luchtkwaliteit in uw woning te verbeteren]?
- Relative clauses
Daarnaast moet er regionaal extra aandacht komen
voor [Arg0 kinderen] [Arg0 die] *trace* [Arg1 tot een
risicogroep] [PRED behoren].
85
Annotation tools
• CLaRK:
http://www.bultreebank.org/clark/index.html
• Salto:
http://www.coli.uni-saarland.de/projects/salsa/
• TrEd:
http://ufal.mff.cuni.cz/~pajas/tred/
86
Methodology
• Partly automatic annotation: Arg0, Arg1 and some
modifiers
• Manual correction based on “Dutch” PB guidelines
– Check automatic annotation
– Add remaining labels
• Support from PB frame files (English):
– Partial setup Dutch frame index
– check role set when uncertain about argument structure
– check verb sense
87
Result
• Semantic layer with labeled predicates,
arguments and modifiers
• 2.088 sentences:
– 1.773 NL
– 315 VL
• 12.147 labels (NL):
– 3.066 PRED labels (= verbs)
– 5.271 arguments
– 3.810 modifiers
88
Example
89
Annotation problems
• Ellipsis:
Indien u toch mocht besluiten [naar *trace* en in Angola te reizen],
wordt aangeraden ...
90
Annotation problems
• Ellipsis:
De man komt dichterbij en *trace* zegt: ...
91
Annotation problems
• Syntactic errors, e.g. wrong PPattachment
• One annotator
• English frame files
92
Automatic SRL classification
• Automatic SRL in earlier research is based on
classification algorithms, e.g.:
–
–
–
–
Support Vector Machines (SVMs)
Decision Trees
Maximum Entropy Models
Memory Based Learning (MBL) (Daelemans et al. 2004)
• In semantic role classification text chunks are
described by a set of features
– e.g.: phrase type, POS-tag
• Text chunks are assigned a semantic role based on their
feature set
93
Semantic role classification
• Classification is a two step process:
– Training the classifier on training data
– Applying the trained classifier to unseen (test) data
• Previous research focused on English training
data based on constituent structures
• This approach is based on dependency
structures from a Dutch corpus (Stevens (2006),
Stevens, Monachesi and van den Bosch (2007), Monachesi, Stevens
and Trapman (2007))
94
Classification approach
• Approach based on earlier research by van den
Bosch et al:
– Predicates are paired with candidate arguments
– (predicate features,argument features) pairs are
called instances
– Instances are classified into a set of PropBank labels
and “null” labels
95
TiMBL
• TiMBL (Tilburg Memory Based Learner is
used for classification)
– MBL is a descendent of the classical kNearest Neighbor (k-NN) approach
– Adapted to NLP applications by the ILK
research group at Tilburg University
96
Features used
1.Predicate’s root form
2.Predicate’s voice (active/passive)
3.Argument’s Part-of-speech tag
4.Argument’s c-label
5.Argument’s d-label
6.Argument’s position (before/after predicate)
7.Argument’s relation head word
8.Head word POS-tag
9.c-label pattern of argument
10. d-label pattern of argument
11.c-/d-label combined
97
predicate
features
argument
features
An example instance
SMAIN
SU
name
Arg0
HD
verb
PRED
John
geeft
OBJ1
NP
Arg1
het boek
OBJ2
PP
Arg2
aan Marie
geef,active,#,SU,name,before,John,verb,name*verb*NP*P
P,SU*HD*OBJ1*OBJ2,SU*name,Arg0
98
Training procedure
• TiMBL with default parameters, parameter
optimization to prevent overfitting
• Relatively few training data was available:
– 12,113 instances extracted from 2,395 sentences
– 3066 verbs, 5271 arguments, 3810 modifiers
• Leave One Out (LOO) method to overcome data
sparsity problem
– Every data item in turn is selected once as a test
item, classifier is trained on remaining items
99
Evaluation measures
• Measures commonly used in information extraction are
used:
100
–
Precision: proportion of instances labeled with a non-null label
that were labeled correctly.
–
Recall: proportion of instances correctly labeled with a non-null
label out of all non-null instances
–
F-Score: harmonic mean of precision and recall: 2·precision·recall
/ (precision+recall)
Evaluation of the TiMBL classifier
Precision
70,27%
101
Recall
70,59%
F-Score
70,43%
Label
Precision
Recall
F ß=1
Arg0
90.44%
86.82%
88.59
Arg1
87.80%
84.63%
86.18
Arg2
63.34%
59.10%
61.15
Arg3
21.21%
19.18%
20.14
Arg4
54.05%
54.05%
54.05
ArgM-ADV
54.98%
51.85%
53.37
ArgM-CAU
47.24%
43.26%
45.16
ArgM-DIR
36.36%
33.33%
34.78
ArgM-DIS
74.27%
70.71%
72.45
ArgM-EXT
29.89%
28.57%
29.21
ArgM-LOC
57.95%
54.53%
56.19
ArgM-MNR
52.07%
47.57%
49.72
ArgM-NEG
68.00%
65.38%
66.67
ArgM-PNC
68.61%
64.83%
66.67
ArgM-PRD
45.45%
40.63%
42.90
ArgM-REC
102
ArgM-TMP
86.15%
84.85%
85.50
55.95%
53.29%
54.58
Comparison with CoNLL-05
systems
• CoNLL = Conference on Computational Natural
Language
• Shared task: “competition” between automatic
PropBank role labeling systems
• CoNLL shared task 2005
– Best performing system reached an F-Score of 80
– Seven systems reached an F-Score in the 75-78
range, seven more in the 70-75 range
– Five systems reached an F-Score between 65 and 70
• Dependency structure (Hacioglu, 2004): F-Score
103
84.6
Future work
• Further work is needed to improve
performance:
–
–
–
–
104
Larger training corpus
Improvements on the feature set
Optimizations of algorithmic parameters
Experimentation with different learning
algorithms (e.g. SVMs)
Conclusions
• Dependency structures prove to be a quite valuable
resource both for rule based as for learning systems.
• Automatic SRL in a Dutch corpus is feasible given the
currently available resources
• Current system shows encouraging results, still many
improvements are possible
• Adapting PB guidelines to Dutch not problematic.
• Follow-up project: SONAR 500 million word corpus, 1
million semantically annotated
105
×

Report this document