Technical Report List of technical reports October 2014 Computer Laboratory

Document technical information

Format pdf
Size 1.0 MB
First found Jun 9, 2017

Document content analysis

Language
English
Type
not defined
Concepts
no text concepts found

Persons

Robert Adrain
Robert Adrain

wikipedia, lookup

Gong Li
Gong Li

wikipedia, lookup

David Hayter
David Hayter

wikipedia, lookup

Organizations

Places

Transcript

Technical Report
UCAM-CL-TR
ISSN 1476-2986
Computer Laboratory
List of technical reports
October 2014
15 JJ Thomson Avenue
Cambridge CB3 0FD
United Kingdom
phone +44 1223 763500
http://www.cl.cam.ac.uk/
c
Technical reports published by the University of Cambridge
Computer Laboratory are freely available via the Internet:
http://www.cl.cam.ac.uk/techreports/
Series editor:
ISSN 1476-2986
UCAM-CL-TR-1
at a rate exceeding that appropriate to his share of the
machine, he can request, for every job he submits, the
‘deadline’ by which he wants it running, and the sysM.F. Challis:
tem will usually succeed in running his job at about
the requested time – rarely later, and only occasionally
The JACKDAW database package
sooner.
October 1974, 15 pages, PDF
Every job in the machine has its own ‘deadline’, and
the machine is not underloaded. Within limits, each
Abstract: This report describes a general database pack- user can request his jobs back when he wants them,
age which has been implemented in BCPL on an IBM and the system keeps his use to within the share of the
370/165 at the University of Cambridge. One cur- machine he has been given. The approach is believed to
rent application is the provision of an administrative be an original one and to have a number of advantages
database for the Computing Service.
over more conventional scheduling and controlling alEntries within a database may include (in addition gorithms.
to primitive fields such as ‘salary’ and ‘address’) links
to other entries: each link represents a relationship beUCAM-CL-TR-3
tween two entries and is always two-way.
Generality is achieved by including within each
database class definitions which define the structure of A.J.M. Stoneley:
the entries within it; these definitions may be interro- A replacement for the OS/360 disc
gated by program.
The major part of the package presents a procedu- space management routines
ral interface between an application program and an
April 1975, 7 pages, PDF
existing database, enabling entries and their fields to
be created, interrogated, updated and deleted. The creation of a new database (or modification of an existing Abstract: In the interest of efficiency, the IBM disc space
one) by specifying the class definitions is handled by a management routines (Dadsm) have been completely
replaced in the Cambridge 370/165.
separate program.
A large reduction in the disc traffic has been
The first part of the report describes the database
achieved
by keeping the lists of free tracks in a more
structure and this is followed by an illustration of the
compact
form and by keeping lists of free VTOC
procedural interface. Finally, some of the implementablocks.
The
real time taken in a typical transaction has
tion techniques used to insure integrity of the database
been
reduced
by a factor of twenty.
are described.
By writing the code in a more appropriate form than
the original, the size has been decreased by a factor
UCAM-CL-TR-2
of five, thus making it more reasonable to keep it permanently resident. The cpu requirement has decreased
J. Larmouth:
from 5% to 0.5% of the total time during normal service.
Scheduling for a share of the machine
The new system is very much safer than the old in
the
fact of total system crashes. The old system gave
October 1974, 29 pages, PDF
little attention to the consequences of being stopped in
Abstract: This paper describes the mechanism used to mid-flight, and it was common to discover an area of
schedule jobs and control machine use on the IBM disc allocated to two files. This no longer happens.
370/165 at Cambridge University, England. The same
algorithm is currently being used in part at the University of Bradford and implementations are in progress or
under study for a number of other British Universities.
The system provides computer management with a
simple tool for controlling machine use. The managerial decision allocates a share of the total machine resources to each user of the system, either directly, or via
a hierarchial allocation scheme. The system then undertakes to vary the turnaround of user jobs to ensure
that those decisions are effective, no matter what sort
of work the user is doing.
At the user end of the system we have great flexibility in the way in which he uses the resources he has
received, allowing him to get a rapid turnaround for
those (large or small) jobs which require it, and a slower
turnaround for other jobs. Provided he does not work
UCAM-CL-TR-4
A.J.M. Stoneley:
The dynamic creation of I/O paths
under OS/360-MVT
April 1975, 16 pages, PDF
Abstract: In a large computer it is often desirable and
convenient for an ordinary program to be able to establish for itself a logical connection to a peripheral
device. This ability is normally provided through a routine within the operating system which may be called by
any user program at any time. OS/360 lacks such a routine. For the batch job, peripheral connections can only
3
be made through the job control language and this canUCAM-CL-TR-10
not be done dynamically at run-time. In the restricted
context of TSO (IBM’s terminal system) a routine for Mark Theodore Pezarro:
establishing peripheral connections does exist, but it is
extremely inefficient and difficult to use.
Prediction oriented description
This paper describes how a suitable routine was
database systems
written and grafted into the operating system of the
Cambridge 370/165.
190 pages, paper copy
PhD thesis (Darwin College, October 1978)
of
UCAM-CL-TR-5
UCAM-CL-TR-11
P. Hazel, A.J.M. Stoneley:
Parrot – A replacement for TCAM
Branimir Konstatinov Boguraev:
April 1976, 25 pages, PDF
Automatic resolution of linguistic
ambiguities
Abstract: The terminal driving software and hardware
for the Cambridge TSO (Phoenix) system is described.
TCAM and the IBM communications controller were
replaced by a locally written software system and a
PDP-11 complex. This provided greater flexibility, reliability, efficiency and a better “end-user” interface than
was possible under a standard IBM system.
222 pages, PDF
PhD thesis (Trinity College, August 1979)
Abstract: The thesis describes the design, implementation and testing of a natural language analysis system capable of performing the task of generating paraphrases in a highly ambiguous environment. The emphasis is on incorporating strong semantic judgement
in an augmented transition network grammar: the system provides a framework for examining the relationship between syntax and semantics in the process of
text analysis, especially while treating the related phenomena of lexical and structural ambiguity. Word-sense
selection is based on global analysis of context within
a semantically well-formed unit, with primary emphasis on the verb choice. In building structures representing text meaning, the analyser relies not on screening through many alternative structures – intermediate, syntactic or partial semantic – but on dynamically constructing only the valid ones. The two tasks
of sense selection and structure building are procedurally linked by the application of semantic routines derived from Y. Wilks’ preference semantics, which are invoked at certain well chosen points of the syntactic constituent analysis – this delimits the scope of their action
and provides context for a particular disambiguation
technique. The hierarchical process of sentence analysis is reflected in the hierarchical organisation of application of these semantic routines – this allows the
efficient coordination of various disambiguation techniques, and the reduction of syntactic backtracking,
non-determinism in the grammar, and semantic parallelism. The final result of the analysis process is a dependency structure providing a meaning representation
of the input text with labelled components centred on
the main verb element, each characterised in terms of
semantic primitives and expressing both the meaning of
a constituent and its function in the overall textual unit.
The representation serves as an input to the generator,
organised around the same underlying principle as the
analyser – the verb is central to the clause. Currently
UCAM-CL-TR-6
Andrew D. Birrell:
System programming in a high level
language
125 pages, paper copy
PhD thesis (Trinity College, December 1977)
UCAM-CL-TR-7
Andrew Hopper:
Local area computer communications
network
192 pages, paper copy
PhD thesis (Trinity Hall, April 1978)
UCAM-CL-TR-9
Douglas John Cook:
Evaluation of a protection system
181 pages, paper copy
PhD thesis (Gonville & Caius College, April 1978)
4
the generator works in paraphrase mode, but is specifUCAM-CL-TR-15
ically designed so that with minimum effort and virtually no change in the program control structure and I.D. Wilson:
code it could be switched over to perform translation.
The thesis discusses the rationale for the approach The implementation of BCPL
adopted, comparing it with others, describes the sysZ80 based microcomputer
tem and its machine implementation, and presents experimental results.
68 pages, PDF
on a
Abstract: The main aim of this project was to achieve as
full an implementation as possible of BCPL on a floppy
disc based microcomputer, running CP/M or CDOS
(the two being esentially compatible). On the face of
it there seemed so many limiting factors, that, when the
project was started, it was not at all clear which one
(if any) would become a final stumbling block. As it
happened, the major problems that cropped up could
be programmed round, or altered in such a way as to
make them soluble.
The main body of the work splits comfortably into
three sections, and the writer hopes that, in covering
each section separately, to be able to show how the
whole project fits together into the finished implementation.
UCAM-CL-TR-12
M.R.A. Oakley, P. Hazel:
HASP “IBM 1130” multileaving
remote job entry protocol with
extensions as used on the University
of Cambridge IBM 370/165
September 1979, 28 pages, paper copy
UCAM-CL-TR-13
Philip Hazel:
UCAM-CL-TR-16
Resource allocation and job
scheduling
Jeremy Dion:
41 pages, PDF
Reliable storage in a local network
Abstract: The mechanisms for sharing the resources of 129 pages, paper copy
the Cambridge IBM 370/165 computer system among PhD thesis (Darwin College, February 1981)
many individual users are described. File store is treated
UCAM-CL-TR-17
separately from other resources such as central processor and channel time. In both cases, flexible systems
that provide incentives to thrifty behaviour are used. B.K. Boguraev, K. Spärck Jones, J.I. Tait:
The method of allocating resources directly to users
rather than in a hierarchical manner via faculties and Three papers on parsing
departments is described, and its social acceptability is
1982, 22 pages, paper copy
discussed.
UCAM-CL-TR-18
UCAM-CL-TR-14
Burkard Wördenweber:
J.S. Powers:
Automatic mesh generation of 2 & 3
dimensional curvilinear manifolds
Store to store swapping for
TSO under OS/MVT
November 1981, 128 pages, paper copy
June 1980, 28 pages, PDF
PhD thesis (St John’s College, November 1981)
Abstract: A system of store-to-store swapping incorpoUCAM-CL-TR-19
rated into TSO on the Cambridge IBM 370/165 is described. Unoccupied store in the dynamic area is used
as the first stage of a two-stage backing store for swap- Arthur William Sebright Cater:
ping time-sharing sessions; a fixed-head disc provides
Analysis and inference for English
the second stage. The performance and costs of the system are evaluated.
September 1981, 223 pages, paper copy
PhD thesis (Queens’ College, September 1981)
5
UCAM-CL-TR-20
notion of referential constraint are proposed and it is
shown how generalisation hierarchies can be expressed
as sets of referential constraints. It is shown how the
stored data model is used in enforcement of the constraints.
Avra Cohn, Robin Milner:
On using Edinburgh LCF to prove
the correctness of a parsing algorithm
UCAM-CL-TR-23
February 1982, 23 pages, PDF
J.I. Tait:
Abstract: The methodology of Edinburgh LCF, a mechanized interactive proof system is illustrated through a
problem suggested by Gloess – the proof of a simple
parsing algorithm. The paper is self-contained, giving
only the relevant details of the LCF proof system. It is
shown how tactics may be composed in LCF to yield
a strategy which is appropriate for the parser problem
but which is also of a generally useful form. Also illustrated is a general mechanized method of deriving
structural induction rules within the system.
Two papers about the scrabble
summarising system
12 pages, PDF
Abstract: This report contains two papers which describe parts of the Scrabble English summarizing system. The first, “Topic identification techniques for predictive language analyzers” has been accepted as a
short communication for the 9th International COnference on Computational Linguistics, in Prague. The second, “General summaries using a predictive language
analyser” is an extended version of a discussion paper
which will be presented at the European Conference
on Artificial Intelligence in Paris. Both conferences will
take place during July 1982.
The [second] paper describes a computer system
capable of producing coherent summaries of English
texts even when they contain sections which the system
has not understood completely. The system employs an
analysis phase which is not dissimilar to a script applier
together with a rather more sophisticated summariser
than previous systems. Some deficiencies of earlier systems are pointed out, and ways in which the current
implementation overcomes them are discussed.
UCAM-CL-TR-21
A. Cohn:
The correctness of a precedence
parsing algorithm in LCF
April 1982, 38 pages, PDF
Abstract: This paper describes the proof in the LCF system of a correctness property of a precedence parsing
algorithm. The work is an extension of a simpler parser
and proof by Cohn and Milner (Cohn & Milner 1982).
Relevant aspects of the LCF system are presented as
needed. In this paper, we emphasize (i) that although
the current proof is much more complex than the earlier one, mqany of the same metalanguage strategies
and aids developed for the first proof are used in this
proof, and (ii) that (in both cases) a general strategy
for doing some limited forward search is incorporated
neatly into the overall goal-oriented proof framework.
UCAM-CL-TR-24
B.K. Boguraev, K. Spärck Jones:
Steps towards natural language to
data language translation using
general semantic information
UCAM-CL-TR-22
M. Robson:
March 1982, 8 pages, paper copy
Constraints in CODD
UCAM-CL-TR-25
18 pages, PDF
Hiyan Alshawi:
Abstract: The paper describes the implementation of
the data structuring concepts of domains, intra-tuple A clustering technique for
constraints and referential constraints in the relational network processing
DBMS CODD. All of these constraints capture some of
the semantics of the database’s application.
May 1982, 9 pages, paper copy
Each class of constraint is described briefly and it is
shown how each of them is specified. The constraints
are stored in the database giving a centralised data
model, which contains descriptions of procedures as
well as of statistic structures. Some extensions to the
6
semantic
UCAM-CL-TR-26
UCAM-CL-TR-30
Brian James Knight:
John Wilkes:
Portable system software for
personal computers on a network
A portable BCPL library
October 1982, 31 pages, paper copy
204 pages, paper copy
PhD thesis (Churchill College, April 1982)
UCAM-CL-TR-31
UCAM-CL-TR-27
J. Fairbairn:
Martyn Alan Johnson:
Ponder and its type system
Exception handling in domain based
systems
November 1982, 42 pages, PDF
Abstract: This note describes the programming language “Ponder”, which is designed according to the
principles of referencial transparency and “orthogonalPhD thesis (Churchill College, September 1981)
ity” as in [vWijngaarden 75]. Ponder is designed to be
simple, being functional with normal order semantics.
UCAM-CL-TR-28
It is intended for writing large programmes, and to be
easily tailored to a particular application. It has a simD.C.J. Matthews:
ple but powerful polymorphic type system.
The main objective of this note is to describe the
Poly report
type system of Ponder. As with the whole of the lanAugust 1982, 17 pages, PDF
guage design, the smallest possible number of primitives is built in to the type system. Hence for example,
Abstract: Poly was designed to provide a programming
unions and pairs are not built in, but can be constructed
system with the same flexibility as a dynamically typed
from other primitives.
language but without the run-time oveheads. The type
system, based on that of Russel allows polymorpphic
UCAM-CL-TR-32
operations to be used to manipulate abstract objects,
but with all the type checking being done at compiletime. Types may be passed explicitly or by inference as B.K. Boguraev, K. Spärck Jones:
parameters to procedures, and may be returned from
procedures. Overloading of names and generic types How to drive a database front end
can be simulated by using the general procedure mecha- using general semantic information
nism. Despite the generality of the language, or perhaps
because of it, the type system is very simple, consisting November 1982, 20 pages, paper copy
of only three classes of object. There is an exception
mechanism, similar to that of CLU, and the exceptions
UCAM-CL-TR-33
raised in a procedure are considered as part of its ‘type’.
The construction of abstract objects and hiding of interJohn A. Carroll:
nal details of the representation come naturally out of
the type system.
An island parsing interpreter for
129 pages, paper copy
Augmented Transition Networks
UCAM-CL-TR-29
October 1982, 50 pages, PDF
D.C.J. Matthews:
Abstract: This paper describes the implementation of
an ‘island parsing’ interpreter for an Augmented Transition Network (ATN). The interpreter provides more
May 1982, 24 pages, PDF
complete coverage of Woods’ original ATM formalism
Abstract: This report is a tutorial introduction to the than his later island parsing implementation; it is writprogramming language Poly. It describes how to write ten in LISP and has been modestly tested.
and run programs in Poly using the VAX/UNIX implementation. Examples given include polymorphic list
functions, a double precision integer package and a
subrange type constructor.
Introduction to Poly
7
UCAM-CL-TR-34
UCAM-CL-TR-37
Larry Paulson:
Christopher Gray Girling:
Recent developments in LCF:
examples of structural induction
Representation and authentication on
computer networks
January 1983, 15 pages, paper copy
154 pages, paper copy
PhD thesis (Queens’ College, April 1983)
UCAM-CL-TR-35
UCAM-CL-TR-38
Larry Paulson:
Rewriting in Cambridge LCF
Mike Gray:
February 1983, 32 pages, DVI
Views and imprecise information in
databases
Abstract: Many automatic theorem-provers rely on
rewriting. Using theorems as rewrite rules helps to simplify the subgoals that arise during a proof.
LCF is an interactive theorem-prover intended for
reasoning about computation. Its implementation of
rewriting is presented in detail. LCF provides a family of rewriting functions, and operators to combine
them. A succession of functions is described, from pattern matching primitives to the rewriting tool that performs most inferences in LCF proofs.
The design is highly modular. Each function performs a basic, specific task, such as recognizing a certain form of tautology. Each operator implements one
method of building a rewriting function from simpler
ones. These pieces can be put together in numerous
ways, yielding a variety of rewriting strategies.
The approach involves programming with higherorder functions. Rewriting functions are data values,
produced by computation on other rewriting functions.
The code is in daily use at Cambridge, demonstrating
the practical use of functional programming.
119 pages, paper copy
PhD thesis (November 1982)
UCAM-CL-TR-39
Lawrence Paulson:
Tactics and tacticals in Cambridge
LCF
July 1983, 26 pages, paper copy
UCAM-CL-TR-40
W. Stoye:
The SKIM microprogrammer’s guide
October 1983, 33 pages, paper copy
UCAM-CL-TR-36
UCAM-CL-TR-41
Lawrence Paulson:
Mike Gordon:
The revised logic PPLAMBDA
A reference manual
LCF LSM, A system for specifying
and verifying hardware
March 1983, 28 pages, PDF
47 pages, paper copy
Abstract: PPLAMBDA is the logic used in the Cambridge LCF proof assistant. It allows Natural DeducUCAM-CL-TR-42
tion proofs about computation, in Scott’s theory of partial orderings. The logic’s syntax, axioms, primitive inference rules, derived inference rules and standard lem- Mike Gordon:
mas are described as are the LCF functions for building Proving a computer correct with
and taking apart PPLAMBDA formulas.
PPLAMBDA’s rule of fixed-point induction admits LCF LSM hardware verification
a wide class of inductions, particularly where flat or fi- system
nite types are involved. The user can express and prove
these type properties in PPLAMBDA. The induction 49 pages, paper copy
rule accepts a list of theorems, stating type properties
to consider when deciding to admit an induction.
8
the
UCAM-CL-TR-43
need, even in comparatively favourable cases, for inference using pragmatic information. This has consequences for language processor architectures and, even
more, for speech processors.
Ian Malcom Leslie:
Extending the local area network
UCAM-CL-TR-46
71 pages, paper copy
PhD thesis (Darwin College, February 1983)
Nicholas Henry Garnett:
Abstract: This dissertation is concerned with the development of a large computer network which has
many properties associated with local area computer
networks, including high bandwidth and lower error
rates. The network is made up of component local area
networks, specifically Cambridge rings, which are connected either through local ring-ring bridges or through
a high capacity satellite link. In order to take advantage of the characteristics of the resulting network, the
protocols used are the same simple protocols as those
used on a single Cambridge ring. This in turn allows
many applications, which might have been thought of
as local area network applications, to run on the larger
network.
Much of this work is concerned with an interconnection strategy which allows hosts of different component networks to communicate in a flexible manner without building an extra internetwork layer into
protocol hierarchy. The strategy arrived at is neither a
datagram approach nor a system of concatenated error and flow controlled virtual circuits. Rather, it is a
lightweight virtual circuit approach which preserves the
order of blocks sent on a circuit, but which makes no
other guarantees about the delivery of these blocks. An
extra internetwork protocol layer is avoided by modifying the system used on a single Cambridge ring which
binds service names to addresses so that it now binds
service names to routes across the network.
Intelligent network interfaces
140 pages, paper copy
PhD thesis (Trinity College, May 1983)
UCAM-CL-TR-47
John Irving Tait:
Automatic summarising of English
texts
137 pages, PDF
PhD thesis (Wolfson College, December 1982)
Abstract: This thesis describes a computer program
called Scrabble which can summarise short English
texts. It uses large bodies of predictions about the likely
contents of texts about particular topics to identify the
commonplace material in an input text. Pre-specified
summary templates, each associated with a different
topic are used to condense the commonplace material
in the input. Filled-in summary templates are then used
to form a framework into which unexpected material
in the input may be fitted, allowing unexpected material to appear in output summary texts in an essentially unreduced form. The system’s summaries are in
English.
The program is based on technology not dissimilar
to a script applier. However, Scrabble represents a significant advance over previous script-based summarising systems. It is much less likely to produce misleading
summaries of an input text than some previous systems
and can operate with less information about the subject
domain of the input than others.
These improvements are achieved by the use of three
main novel ideas. First, the system incorporates a new
method for identifying the idea or topics of an input
text. Second, it allows a section of text to have more
than one topic at a time, or at least a composite topic
which may be dealt with by the computer program simultaneously applying the text predictions associated
with more than one simple topic. Third, Scrabble incorporates new mechanisms for the incorporation of
unexpected material in the input into its output summary texts. The incorporation of such material in the
output summary is motivated by the view that it is precisely unexpected material which is likely to form the
most salient matter in the input text.
The performance of the system is illustrated by
means of a number of example input texts and their
Scrabble summaries.
UCAM-CL-TR-44
Lawrence Paulson:
Structural induction in LCF
November 1983, 35 pages, paper copy
UCAM-CL-TR-45
Karen Spärck Jones:
Compound noun interpretation
problems
July 1983, 16 pages, PDF
Abstract: This paper discusses the problems of compound noun interpretation in the context of automatic
language processing. Given that compound processing
implies identifying the senses of the words involved, determining their bracketing, and establishing their underlying semantic relations, the paper illustrates the
9
UCAM-CL-TR-48
UCAM-CL-TR-51
Hiyan Alshawi:
Glynn Winskel, Kim Guldstrand Larsen:
A mechanism for the accumulation
and application of context in
text processing
Using information systems to solve
recursive domain equations effectively
41 pages, paper copy
November 1983, 17 pages, PDF
UCAM-CL-TR-52
Abstract: The paper describes a mechanism for the representation and application of context information for
automatic natural language processing systems. Context information is gathered gradually during the reading of the text, and the mechanism gives a way of combining the effect of several different types of context
factors. Context factors can be managed independently,
while still allowing efficient access to entities in focus.
The mechanism is claimed to be more general than the
global focus mechanism used by Grosz for discourse
understanding. Context affects the interpretation process by choosing the results, and restricting the processing, of a number of important language interpretation
operations, including lexical disambiguation and reference resolution. The types of context factors that have
been implemented in an experimental system are described, and examples of the application of context are
given.
UCAM-CL-TR-49
David Charles James Matthews:
Programming language design with
polymorphism
143 pages, paper copy
PhD thesis (Wolfson College, 1983)
UCAM-CL-TR-50
Lawrence Paulson:
Verifying the unification algorithm in
LCF
March 1984, 28 pages, PDF
Steven Temple:
The design of a ring communication
network
132 pages, PDF
PhD thesis (Corpus Christi College, January 1984)
Abstract: This dissertation describes the design of a
high speed local area network. Local networks have
been in use now for over a decade and there is a proliferation of different systems, experimental ones which
are not widely used and commercial ones installed in
hundreds of locations. For a new network design to
be of interest from the research point of view it must
have a feature or features which set it apart from existing networks and make it an improvement over existing systems. In the case of the network described, the
research was started to produce a network which was
considerably faster than current designs, but which retained a high degree of generality.
As the research progressed, other features were considered, such as ways to reduce the cost of the network
and the ability to carry data traffic of many different
types. The emphasis on high speed is still present but
other aspects were considered and are discussed in the
dissertation. The network has been named the Cambridge Fast Ring and and the network hardware is currently being implemented as an integrated circuit at the
University of Cambridge Computer Laboratory.
The aim of the dissertation is to describe the background to the design and the decisions which were
made during the design process, as well as the design
itself. The dissertation starts with a survey of the uses
of local area networks and examines some established
networks in detail. It then proceeds by examining the
characteristics of a current network installation to assess what is required of the network in that and similar applications. The major design considerations for a
high speed network controller are then discussed and
a design is presented. Finally, the design of computer
interfaces and protocols for the network is discussed.
Abstract: Manna and Waldinger’s theory of substitutions and unification has been verified using the Cambridge LCF theorem prover. A proof of the monotonicity of substitution is presented in detail, as an example of interaction with LCF. Translating the theory into
LCF’s domain-theoretic logic is largely straightforward.
Well-founded induction on a complex ordering is transUCAM-CL-TR-53
lated into nested structural inductions. Correctness of
unification is expressed using predicates for such prop- Jon Fairbairn:
erties as idempotence and most-generality. The verification is presented as a series of lemmas. The LCF proofs A new type-checker for a functional
are compared with the original ones, and with other aplanguage
proaches. It appears difficult to find a logic that is both
simple and flexible, especially for proving termination. 16 pages, paper copy
10
UCAM-CL-TR-54
UCAM-CL-TR-59
Lawrence Paulson:
Glynn Winskel:
Lessons learned from LCF
On the composition and
decomposition of assertions
August 1984, 16 pages, paper copy
UCAM-CL-TR-55
35 pages, paper copy
UCAM-CL-TR-60
Ben Moszkowski:
Executing temporal logic programs
Hiyan Alshawi:
August 1984, 27 pages, paper copy
Memory and context mechanisms for
automatic text processing
UCAM-CL-TR-56
William Stoye:
A new scheme for writing functional
operating systems
30 pages, paper copy
UCAM-CL-TR-57
192 pages, paper copy
PhD thesis (Trinity Hall, December 1983)
UCAM-CL-TR-61
Karen Spärck Jones:
User models and expert systems
December 1984, 44 pages, paper copy
Lawrence C. Paulson:
Constructing recursion operators in
intuitionistic type theory
October 1984, 46 pages, PDF
UCAM-CL-TR-62
Michael Robson:
Constraint enforcement in a
relational database management
system
Abstract: Martin-Löf’s Intuitionistic Theory of Types
is becoming popular for formal reasoning about computer programs. To handle recursion schemes other 106 pages, paper copy
than primitive recursion, a theory of well-founded rela- PhD thesis (St John’s College, March 1984)
tions is presented. Using primitive recursion over higher
types, induction and recursion are formally derived for
UCAM-CL-TR-63
a large class of well-founded relations. Included are <
on natural numbers, and relations formed by inverse
David C.J. Matthews:
images, addition, multiplication, and exponentiation of
other relations. The constructions are given in full de- Poly manual
tail to allow their use in theorem provers for Type Theory, such as Nuprl. The theory is compared with work February 1985, 46 pages, paper copy
in the field of ordinal recursion over higher types.
UCAM-CL-TR-64
UCAM-CL-TR-58
Glynn Winskel:
Branimir K. Boguraev, Karen Spärck Jones:
A framework for inference in
Categories of models for concurrency natural language front ends to
35 pages, paper copy
databases
February 1985, 73 pages, paper copy
11
UCAM-CL-TR-65
Mark Tillotson:
Introduction to the programming
language “Ponder”
May 1985, 57 pages, paper copy
presented as the ‘computational meaning’ of those two
proofs.
A related function makes nested recursive calls. The
three termination proofs become more complex: termination and correctness must be proved simultaneously.
The recursion relation approach seems flexible enough
to handle subtle termination proofs where previously
domain theory seemed essential.
UCAM-CL-TR-70
UCAM-CL-TR-66
M.J.C. Gordon, J. Herbert:
Kenneth Graham Hamilton:
A formal hardware verification
methodology and its application to a
network interface chip
A remote procedure call system
109 pages, paper copy
PhD thesis (Wolfson College, December 1984)
UCAM-CL-TR-71
35 pages, paper copy
UCAM-CL-TR-67
Lawrence C. Paulson:
Natural deduction theorem proving
via higher-order resolution
May 1985, 19 pages, paper copy
UCAM-CL-TR-68
Mike Gordon:
HOL
A machine oriented formulation of
higher order logic
July 1985, 52 pages, paper copy
UCAM-CL-TR-69
Lawrence C. Paulson:
Ben Moszkowski:
Executing temporal logic programs
August 1985, 96 pages, paper copy
UCAM-CL-TR-72
W.F. Clocksin:
Logic programming and the
specification of circuits
13 pages, paper copy
UCAM-CL-TR-73
Daniel Hammond Craft:
Resource management in a
distributed computing system
116 pages, paper copy
PhD thesis (St John’s College, March 1985)
Proving termination of normalization
functions for conditional expressions
June 1985, 16 pages, PDF
UCAM-CL-TR-74
Mike Gordon:
Abstract: Boyer and Moore have discussed a recursive Hardware verification
function that puts conditional expressions into normal
proof
form. It is difficult to prove that this function terminates
on all inputs. Three termination proofs are compared: 6 pages, paper copy
(1) using a measure function, (2) in domain theory using LCF, (3) showing that its “recursion relation”, defined by the pattern of recursive calls, is well-founded.
The last two proofs are essentially the same though
conducted in markedly different logical frameworks.
An obviously total variant of the normalize function is
12
by formal
UCAM-CL-TR-75
UCAM-CL-TR-77
Jon Fairbairn:
Mike Gordon:
Design and implementation of a
simple typed language based on the
lambda-calculus
Why higher-order logic is a
good formalisation for specifying and
verifying hardware
May 1985, 107 pages, PDF
27 pages, paper copy
PhD thesis (Gonville & Caius College, December
1984)
UCAM-CL-TR-78
Abstract: Despite the work of Landin and others as
long ago as 1966, almost all recent programming languages are large and difficult to understand. This thesis
is a re-examination of the possibility of designing and
implementing a small but practical language based on
very few primitive constructs.
The text records the syntax and informal semantics
of a new language called Ponder. The most notable features of the work are a powerful type-system and an
efficient implementation of normal order reduction.
In contrast to Landin’s ISWIM, Ponder is statically
typed, an expedient that increases the simplicity of the
language by removing the requirement that operations
must be defined for incorrect arguments. The type system is a powerful extension of Milner’s polymorphic
type system for ML in that it allows local quantification of types. This extension has the advantage that
types that would otherwise need to be primitive may
be defined.
The criteria for the well-typedness of Ponder programmes are presented in the form of a natural deduction system in terms of a relation of generality between types. A new type checking algorithm derived
from these rules is proposed.
Ponder is built on the λ-calculus without the need
for additional computation rules. In spite of this abstract foundation an efficient implementation based on
Hughes’ super-combinator approach is described. Some
evidence of the speed of Ponder programmes is included.
The same strictures have been applied to the design of the syntax of Ponder, which, rather than having
many pre-defined clauses, allows the addition of new
constructs by the use of a simple extension mechanism.
UCAM-CL-TR-76
Glynn Winskel:
A complete proof system for SCCS
with model assertions
23 pages, paper copy
UCAM-CL-TR-79
Glynn Winskel:
Petri nets, algebras and morphisms
38 pages, PDF
Abstract: It is shown how a category of Petri nets can
be viewed as a subcategory of two sorted algebras over
multisets. This casts Petri nets in a familiar framework
and provides a useful idea of morphism on nets different from the conventional definition – the morphisms
here respect the behaviour of nets. The categorical constructions with result provide a useful way to synthesise nets and reason about nets in terms of their components; for example various forms of parallel composition of Petri nets arise naturally from the product in
the category. This abstract setting makes plain a useful functor from the category of Petri nets to a category of spaces of invariants and provides insight into
the generalisations of the basic definition of Petri nets
– for instance the coloured and higher level nets of
Kurt Jensen arise through a simple modificationof the
sorts of the algebras underlying nets. Further it provides
a smooth formal relation with other models of concurrency such as Milner’s Calculus of Communicating
Systems (CCS) and Hoare’s Communicating Sequential
Processes (CSP).
UCAM-CL-TR-80
R.C.B. Cooper, K.G. Hamilton:
Preserving abstraction in concurrent
programming
16 pages, paper copy
Lawrence C. Paulson:
Interactive theorem proving with
Cambridge LCF
A user’s manual
November 1985, 140 pages, paper copy
13
UCAM-CL-TR-81
William Robert Stoye:
The implementation of functional
languages using custom hardware
December 1985, 151 pages, PDF
PhD thesis (Magdalene College, May 1985)
Abstract: In recent years functional programmers have
produced a great many good ideas but few results.
While the use of functional languages has been enthusiastically advocated, few real application areas have
been tackled and so the functional programmer’s views
and ideas are met with suspicion.
The prime cause of this state of affairs is the lack of
widely available, solid implementations of functional
languages. This in turn stems from two major causes:
(1) Our understanding of implementation techniques
was very poor only a few years ago, and so any implementation that is “mature” is also likely to be unuseably slow. (2) While functional languages are excellent for expressing algorithms, there is still considerable
debate in the functional programming community over
the way in which input and output operations should
be represented to the programmer. Without clear guiding principles implementors have tended to produce adhoc, inadequate solutions.
My research is concerned with strengthening the
case for functional programming. To this end I constructed a specialised processor, called SKIM, which
could evaluate functional programs quickly. This allowed experimentation with various implementation
methods, and provided a high performance implementation with which to experiment with writing large
functional programs.
This thesis describes the resulting work and includes
the following new results: (1) Details of a practical
turner-style combinator reduction implementation featuring greatly improved storage use compared with previous methods. (2) An implementation of Kennaway’s
director string idea that further enhances performance
and increases understanding of a variety of reduction
strategies. (3) Comprehensive suggestions concerning
the representation of input, output, and nondeterministic tasks using functional languages, and the writing
of operating systems. Details of the implementation of
these suggestions developed on SKIM. (4) A number of
observations concerning fuctional programming in general based on considerable practical experience.
UCAM-CL-TR-82
Lawrence C. Paulson:
Natural deduction proof as
higher-order resolution
December 1985, 25 pages, PDF
Abstract: An interactive theorem prover, Isabelle, is under development. In LCF, each inference rule is represented by one function for forwards proof and another
(a tactic) for backwards proof. In Isabelle, each inference rule is represented by a Horn clause. Resolution
gives both forwards and backwards proof, supporting
a large class of logics. Isabelle has been used to prove
theorems in Martin-Löf’s Constructive Type Theory.
Quantifiers pose several difficulties: substitution,
bound variables, Skolemization. Isabelle’s representation of logical syntax is the typed lambda-calculus, requiring higher-order unification. It may have potential
for logic programming. Depth-first search using inference rules constitutes a higher-order Prolog.
UCAM-CL-TR-83
Ian David Wilson:
Operation system design for
large personal workstations
203 pages, paper copy
PhD thesis (Darwin College, July 1985)
UCAM-CL-TR-84
Martin Richards:
BSPL:
a language for describing the
behaviour of synchronous hardware
April 1986, 56 pages, paper copy
UCAM-CL-TR-85
Glynn Winskel:
Category theory and models for
parallel computation
April 1986, 16 pages, PDF
Abstract: This report will illustrate two uses of category
theory: Firstly the use of category theory to define semantics in a particular model. How semantic constructions can often be seen as categorical ones, and, in particular, how parallel compositions are derived from a
categorical product and a nun-deterministic sum. These
categorical notions can provide a basis for reasoning about computations and will be illustrated for the
model of Petri nets.
Secondly, the use of category theory to relate different semantics will be examined; specifically, how the relations between various concrete models like Petri nets,
event structures, trees and state machines are expressed
as adjunctions. This will be illustrated by showing the
coreflection between safe Petri nets and trees.
14
UCAM-CL-TR-86
Stephen Christopher Crawley:
The Entity System: an object based
filing system
April 1986, 120 pages, paper copy
PhD thesis (St John’s College, December 1985)
UCAM-CL-TR-87
Kathleen Anne Carter:
Computer-aided type face design
May 1986, 160 pages, paper copy
PhD thesis (King’s College, November 1985)
UCAM-CL-TR-88
UCAM-CL-TR-91
Albert Camilleri, Mike Gordon,
Tom Melham:
Hardware verification using
higher-order logic
September 1986, 25 pages, PDF
Abstract: The Hardware Verification Group at the University of Cambridge is investigating how various kinds
of digital systems can be verified by mechanised formal
proof. This paper explains our approach to representing behaviour and structure using higher order logic.
Several examples are described including a ripple carry
adder and a sequential device for computing the factorial function. The dangers of inaccurate models are
illustrated with a CMOS exclusive-or gate.
David Maclean Carter:
A shallow processing approach to
anaphor resolution
May 1986, 233 pages, paper copy
PhD thesis (King’s College, December 1985)
UCAM-CL-TR-89
UCAM-CL-TR-92
Stuart Charles Wray:
Implementation and programming
techniques for functional languages
June 1986, 117 pages, paper copy
PhD thesis (Christ’s College, January 1986)
Jon Fairbairn:
Making form follow function
An exercise in functional
programming style
June 1986, 9 pages, PDF
UCAM-CL-TR-93
J.P. Bennett:
Automated design of an instruction
set for BCPL
Abstract: The combined use of user-defined infix opera- June 1986, 56 pages, paper copy
tors and higher order functions allows the programmer
to invent new control structures tailored to a particular
UCAM-CL-TR-94
problem area.
This paper is to suggest that such a combination has Avra Cohn, Mike Gordon:
beneficial effects on the ease of both writing and reading programmes, and hence can increase programmer A mechanized proof of correctness
productivity. As an example, a parser for a simple lana simple counter
guage is presented in this style.
It is hoped that the presentation will be palatable to June 1986, 80 pages, paper copy
people unfamiliar with the concepts of functional programming.
UCAM-CL-TR-95
UCAM-CL-TR-90
Glynn Winskel:
Andy Hopper, Roger M. Needham:
Event structures
The Cambridge Fast Ring networking Lecture notes for the Advanced
system (CFR)
Course on Petri Nets
June 1986, 25 pages, paper copy
July 1986, 69 pages, PDF
15
of
Abstract: Event structures are a model of computaUCAM-CL-TR-100
tional processes. They represent a process as a set of
event occurrences with relations to express how events Jeff Joyce, Graham Birtwistle, Mike
causally depend on others. This paper introduces event
structures, shows their relationship to Scott domains Proving a computer correct in
and Petri nets, and surveys their role in denotational
higher order logic
semantics, both for modelling laguages like CCS and
CSP and languages with higher types.
December 1986, 57 pages, paper copy
UCAM-CL-TR-96
Glynn Winskel:
Models and logic of MOS circuits
Lectures for the Marktoberdorf
Summerschool, August 1986
Gordon:
UCAM-CL-TR-101
David Russel Milway:
Binary routing networks
December 1986, 131 pages, paper copy
PhD thesis (Darwin College, December 1986)
October 1986, 47 pages, paper copy
UCAM-CL-TR-102
UCAM-CL-TR-97
Alan Mycroft:
David C.J. Matthews:
A study on abstract interpretation
and “validating microcode
algebraically”
A persistent storage system for Poly
and ML
January 1987, 16 pages, paper copy
October 1986, 22 pages, paper copy
UCAM-CL-TR-98
E. Robinson:
Power-domains, modalities and the
Vietoris monad
October 1986, 16 pages, PDF
UCAM-CL-TR-103
Mike Gordon:
HOL
A proof generating system for
higher-order logic
January 1987, 56 pages, paper copy
Abstract: It is possible to divide the syntax-directed approaches to programming language semantics into two
UCAM-CL-TR-104
classes, “denotational”, and “proof-theoretic”. This
paper argues for a different approach which also has Avra Cohn:
the effect of linking the two methods. Drawing on recent work on locales as formal spaces we show that A proof of correctness of
this provides a way in which we can hope to use a
the Viper microprocessor:
proof-theoretical semantics to give us a denotational
one. This paper reviews aspects of the general the- the first level
ory, before developing a modal construction on locales
and discussing the view of power-domains as free non- January 1987, 46 pages, PDF
deterministic algebras. Finally, the relationship between
Abstract: The Viper microprocessor designed at the
the present work and that of Winskel is examined.
Royal Signals and Radar Establishment (RSRE) is one
of the first commercially produced computers to have
UCAM-CL-TR-99
been developed using modern formal methods. Viper is
specified in a sequence of decreasingly abstract levels.
David C.J. Matthews:
In this paper a mechanical proof of the equivalence of
An overview of the Poly
the first two of these levels is described. The proof was
generated using a version of Robin Milner’s LCF sysprogramming language
tem.
August 1986, 11 pages, paper copy
16
UCAM-CL-TR-105
UCAM-CL-TR-110
Glynn Winskel:
Glynn Winskel:
A compositional model of MOS
circuits
Relating two models of hardware
July 1987, 16 pages, paper copy
April 1987, 25 pages, paper copy
UCAM-CL-TR-111
UCAM-CL-TR-106
Thomas F. Melham:
K. Spärck Jones:
Abstraction mechanisms for
hardware verification
Realism about user modelling
May 1987, 23 pages, paper copy
Abstract: This paper reformulates the framework for
user modelling presented in an earlier technical report,
‘User Models and Expert Systems’, and considers the
implications of the real limitations on the knowledge
likely to be available to a system for the value and application of user models.
June 1987, 32 pages, PDF
UCAM-CL-TR-107
Thierry Coquand, Carl Gunter,
Glynn Winskel:
DI-domains as a model of
polymorphism
UCAM-CL-TR-112
D.A. Wolfram:
May 1987, 19 pages, paper copy
Reducing thrashing by adaptive
backtracking
UCAM-CL-TR-108
Andrew John Wilkes:
August 1987, 15 pages, paper copy
Workstation design for distributed
computing
UCAM-CL-TR-113
June 1987, 179 pages, PDF
Lawrence C. Paulson:
PhD thesis (Wolfson College, June 1984)
The representation of logics in
Abstract: This thesis discusses some aspects of the
design of computer systems for local area networks higher-order logic
(LANs), with particular emphasis on the way such systems present themselves to their users. Too little atten- August 1987, 29 pages, paper copy
tion to this issue frequently results in computing environments that cannot be extended gracefully to accomUCAM-CL-TR-114
modate new hardware or software and do not present
consistent, uniform interfaces to either their human Stephen Ades:
users or their programmatic clients. Before computer
systems can become truly ubiquitous tools, these prob- An architecture for integrated
lems of extensibility and accessibility must be solved.
This dissertation therefore seeks to examine one possi- on the local area network
ble approach, emphasising support for program devel- September 1987, 166 pages, PDF
opment on LAN based systems.
PhD thesis (Trinity College, January 1987)
UCAM-CL-TR-109
Jeffrey Joyce:
Hardware verification of VLSI regular
structures
July 1987, 20 pages, paper copy
17
services
Abstract: This dissertation concerns the provision of integrated services in a local area context, e.g. on business premises. The term integrated services can be understood at several levels. At the lowest, one network
may be used to carry traffic of several media—voice,
data, images etc. Above that, the telephone exchange
may be replaced by a more versatile switching system,
incorporating facilities such as stored voice messages.
Its facilities may be accessible to the user through the
interface of the workstation rather than a telephone.
At a higher level still, new services such as multi-media
document manipulation may be added to the capabilities of a workstation.
Most of the work to date has been at the lowest
of these levels, under the auspices of the Integrated
Services Digital Network (ISDN), which mainly concerns wide area communications systems. The thesis
presented here is that all of the above levels are important in a local area context. In an office environment,
sophisticated data processing facilities in a workstation
can usefully be combined with highly available telecommunications facilities such as the telephone, to offer the
user new services which make the working day more
pleasant and productive. That these facilities should be
provided across one integrated network, rather than by
several parallel single medium networks is an important
organisational convenience to the system builder.
The work described in this dissertation is relevant
principally in a local area context—in the wide area
economics and traffic balance dictate that the emphasis will be on only the network level of integration for
some time now. The work can be split into three parts:
i) the use of a packet network to carry mixed media. This has entailed design of packet voice protocols which produce delays low enough for the network
to interwork with national telephone networks. The
system has also been designed for minimal cost per
telephone—packet-switched telephone systems have
traditionally been more expensive than circuit-switched
types. The network used as a foundation for this work
has been the Cambridge Fast Ring.
ii) use of techniques well established in distributed
computing systems to build an ‘integrated services
PABX (Private Automatic Branch Exchange)’. Current
PABX designs have a very short life expectancy and an
alarmingly high proportion of their costs is due to software. The ideas presented here can help with both of
these problems, produce an extensible system and provide a basis for new multi-media services.
iii) development of new user level Integrated Services. Work has been done in three areas. The first is
multi-media documents. A voice editing interface is described along with the system structure required to support it. Secondly a workstation display has been built to
support a variety of services based upon image manipulation and transmission. Finally techniques have been
demonstrated by which a better interface to telephony
functions can be provided to the user, using methods of
control typical of workstation interfaces.
UCAM-CL-TR-115
I.S. Dhingra:
Formal validation of an integrated
circuit design style
August 1987, 29 pages, paper copy
UCAM-CL-TR-116
Thierry Coquand, Carl Gunter,
Glynn Winskel:
Domain theoretic models of
polymorphism
September 1987, 52 pages, paper copy
UCAM-CL-TR-117
J.M. Bacon, K.G. Hamilton:
Distributed computing with RPC:
the Cambridge approach
October 1987, 15 pages, PDF
Abstract: The Cambridge Distributed Computing System (CDCS) is described and its evolution outlined.
The Mayflower project allowed CDCS infrastructure,
services and applications to be programmed in a high
level, object oriented, language, Concurrent CLU. The
Concurrent CLU RPC facility is described in detail. It
is a non-transparent, type checked, type safe system
which employs dynamic binding and passes objects of
arbitrary graph structure. Recent extensions accomodate a number of languages and transport protocols. A
comparison with other RPC schemes is given.
UCAM-CL-TR-118
B.K. Boguraev, K. Spärck Jones:
Material concerning a study of
cases
May 1987, 31 pages, paper copy
UCAM-CL-TR-119
Robert Cooper:
Pilgrim: a debugger for distributed
systems
July 1987, 19 pages, paper copy
18
UCAM-CL-TR-120
UCAM-CL-TR-125
D. Wheeler:
Juanito Camilleri:
Block encryption
An operational semantics for Occam
November 1987, 4 pages, PDF
February 1988, 24 pages, paper copy
Abstract: A fast and simple way of encrypting computer
UCAM-CL-TR-126
data is needed. The UNIX crypt is a good way of doing this although the method is not cryptographically
M.E. Leeser:
sound for text. The method suggested here is applied
to larger blocks than the DES method which uses 64 Reasoning about the function and
bit blocks, so that the speed of encyphering is reasonable. The algorithm is designed for software rather than timing of integrated circuits with
hardware. This forgoes two advantages of the crypt al- Prolog and temporal logic
gorithm, namely that each character can be encoded
and decoded independently of other characters and that February 1988, 50 pages, paper copy
the identical process is used both for encryption and
decryption. However this method is better for coding
UCAM-CL-TR-127
blocks directly.
UCAM-CL-TR-121
Jonathan Billington:
A high-level petri net specification of
the Cambridge Fast Ring M-access
service
John Carroll, Bran Boguraev, Claire Grover,
Ted Briscoe:
A development environment for
large natural language grammars
February 1988, 44 pages, paper copy
UCAM-CL-TR-128
December 1987, 31 pages, paper copy
Robert Charles Beaumont Cooper:
UCAM-CL-TR-122
John Herbert:
Temporal abstraction of digital
designs
Debugging concurrent and
distributed programs
February 1988, 110 pages, paper copy
PhD thesis (Churchill College, December 1987)
UCAM-CL-TR-129
February 1988, 34 pages, paper copy
UCAM-CL-TR-123
John Herbert:
Case study of the Cambridge Fast
Ring ECL chip using HOL
February 1988, 38 pages, paper copy
UCAM-CL-TR-124
John Herbert:
Formal verification of basic memory
devices
February 1988, 46 pages, paper copy
Jeremy Peter Bennett:
A methodology for automated design
of computer instruction sets
March 1988, 147 pages, PDF
PhD thesis (Emmanuel College, January 1987)
Abstract: With semiconductor technology providing
scope for increasingly complex computer architectures,
there is a need more than ever to rationalise the
methodology behind computer design. In the 1970’s,
byte stream architectures offered a rationalisation of
computer design well suited to microcoded hardware.
In the 1980’s, RISC technology has emerged to simplify computer design and permit full advantage to be
taken of very large scale integration. However, such approaches achieve their aims by simplifying the problem
to a level where it is within the comprehension of a
19
simple human being. Such an effort is not sufficient.
UCAM-CL-TR-132
There is a need to provide a methodology that takes the
burden of design detail away from the human designer, Miriam Ellen Leeser:
leaving him free to cope with the underlying principles
involved.
Reasoning about the function and
In this dissertation I present a methodology for the
timing of integrated circuits with
design of computer instruction sets that is capable of
automation in large part, removing the drudgery of in- Prolog and temporal logic
dividual instruction selection. The methodology does
not remove the need for the designer’s skill, but rather April 1988, 151 pages, paper copy
allows precise refinement of his ideas to obtain an opti- PhD thesis (Queens’ College, December 1987)
mal instruction set.
In developing this methodology a number of pieces
UCAM-CL-TR-133
of software have been designed and implemented.
Compilers have been written to generate trial instruc- Lawrence C. Paulson:
tion sets. An instruction set generator program has been
written and the instruction set it proposes evaluated. Fi- A preliminary users manual for
nally a prototype language for instruction set design has
Isabelle
been devised and implemented.
May 1988, 81 pages, PDF
UCAM-CL-TR-130
Lawrence C Paulson:
The foundation of a generic theorem
prover
Abstract: This is an early report on the theorem prover
Isabelle and several of its object-logics. It describes
Isabelle’s operations, commands, data structures, and
organization. This information is fairly low-level, but
could benefit Isabelle users and implementors of other
systems.
March 1988, 44 pages, PDF
This paper is a revised version of UCAM-CL-TR-113.
UCAM-CL-TR-134
Abstract: Isabelle is an interactive theorem prover that Avra Cohn:
supports a variety of logics. It represents rules as propositions (not as functions) and builds proofs by combin- Correctness properties
ing rules. These operations constitute a meta-logic (or
of the Viper black model:
‘logical framework’) in which the object-logics are formalized. Isabelle is now based on higher-order logic – a the second level
precise and well-understood foundation.
Examples illustrate use of this meta-logic to formal- May 1988, 114 pages, paper copy
ize logics and proofs. Axioms for first-order logic are
shown sound and complete. Backwards proof is forUCAM-CL-TR-135
malized by meta-reasoning about object-level entailment.
Thomas F. Melham:
Higher-order logic has several practical advantages
over other meta-logics. Many proof techniques are Using reclusive types to reason
known, such as Huet’s higher-order unification procehardware in higher order logic
dure.
UCAM-CL-TR-131
May 1988, 30 pages, paper copy
UCAM-CL-TR-136
Karen Spärck Jones:
Architecture problems in the
construction of expert systems for
document retrieval
December 1986, 28 pages, paper copy
about
Jeffrey J. Joyce:
Formal specification and verification
of asynchronous processes in
higher-order logic
June 1988, 45 pages, PDF
20
Abstract: We model the interaction of a synchronous
process with an asynchronous memory process using
a four-phase “handshaking” protocol. This example
demonstrates the use of higher-order logic to reason
about the behaviour of synchronous systems such as
microprocessors which communicate requests to asynchronous devices and then wait for unpredictably long
periods until these requests are answered. We also describe how our model could be revised to include some
of the detailed timing requirements found in real systems such as the M68000 microprocessor. One enhancement uses non-determinism to model minimum
setup times for asynchronous inputs. Experience with
this example suggests that higher-order logic may also
be a suitable formalism for reasoning about more abstract forms of concurrency.
UCAM-CL-TR-137
This report, therefore introduces Petri nets, discussing their behaviour, interpretation and relationship
to other models of concurrency. It defines and discusses several restrictions and extensions of the Petri
net model, showing how they relate to basic Petri nets,
while explaining why they have been of historical importance. Finally it presents a survey of the analysis
methods applied to Petri nets in general and for some
of the net models introduced here.
UCAM-CL-TR-140
Albert John Camilleri:
Executing behavioural definitions in
higher-order logic
July 1988, 183 pages, PDF
PhD thesis (Darwin College, February 1988)
F.V. Hasle:
Mass terms and plurals
From linguistic theory to natural
language processing
June 1988, 171 pages, paper copy
UCAM-CL-TR-138
Michael Burrows, Martı́n Abadi,
Roger Needham:
Authentication: a practical study in
belief and action
June 1988, 19 pages, paper copy
UCAM-CL-TR-139
Paul R. Manson:
Petri net theory: a survey
June 1988, 77 pages, PDF
Abstract: The intense interest in concurrent (or “parallel”) computation over the past decade has given rise to
a large number of languages for concurrent programming, representing many conflicting views of concurrency.
The discovery that concurrent programming is significantly more difficult than sequential programming
has prompted considerable research into determining a
tractable and flexible theory of concurrency, with the
aim of making concurrent processing more accessible,
and indeed the wide variety of concurrent languages
merely reflects the many different models of concurrency which have also been developed.
Abstract: Over the past few years, computer scientists have been using formal verification techniques to
show the correctness of digital systems. The verification
process, however, is complicated and expensive. Even
proofs of simple circuits can involve thousands of logical steps. Often it can be extremely difficult to find correct device specifications and it is desirable that one sets
off to prove a correct specification from the start, rather
than repeatedly backtrack from the verification process
to modify the original definitions after discovering they
were incorrect or inadequate.
The main idea presented in the thesis is to amalgamate the techniques of simulation and verification,
rather than have the latter replace the former. The result is that behavioural definitions can be simulated until one is reasonably sure that the specification is correct. Furthermore, proving the correctness with respect
to these simulated specifications avoids the inadequacies of simulation where it may not be computationally feasible to demonstrate correctness by exhaustive
testing. Simulation here has a different purpose: to get
specifications correct as early as possible in the verification process. Its purpose is not to demonstrate the
correctness of the implementation – this is done in the
verification stage when the very same specifications that
were simulated are proved correct.
The thesis discusses the implementation of an executable subset of the HOL logic, the version of Higher
Order Logic embedded in the HOL theorem prover. It is
shown that hardware can be effectively described using
both relations and functions; relations being suitable
for abstract specification and functions being suitable
for execution. The difference between relational and
functional specifications are discussed and illustrated
by the verification of an n-bit adder. Techniques for executing functional specifications are presented and various optimisation strategies are shown which make the
execution of the logic efficient. It is further shown that
21
the process of generating optimised functional definitions from relational definitions can be automated. Example simulations of three hardware devices (a factorial machine, a small computer and a communications
chip) are presented.
UCAM-CL-TR-141
Roy Want:
Reliable management of voice in a
distributed system
This thesis describes the issues as a distributed computing problem and proposes solutions, many of which
have been demonstrated in a real implementation. Particular attention has been paid to the quality of service provided by the solutions. This amounts to the design of helpful operator interfaces, flexible schemes for
the control of voice from personal workstations and,
in particular, a high reliability factor for the backbone
telephony service. This work demonstrates the advantages and the practicality of integrating voice and data
services within the Local Area Network.
UCAM-CL-TR-142
July 1988, 127 pages, PDF
PhD thesis (Churchill College, December 1987)
Peter Newman:
Abstract: The ubiquitous personal computer has found
its way into most office environments. As a result,
widespread use of the Local Area Network (LAN)
for the purposes of sharing distributed computing resources has become common. Another technology, the
Private Automatic Branch Exchange (PABX), has benefited from large research and development by the telephone companies. As a consequence, it is cost effective
and has widely infiltrated the office world. Its primary
purpose is to switch digitised voice but, with the growing need for communication between computers it is
also being adapted to switch data. However, PABXs
are generally designed around a centralised switch in
which bandwidth is permanently divided between its
subscribers. Computing requirements need much larger
bandwidths and the ability to connect to several services at once, thus making the conventional PABX unsuitable for this application.
Some LAN technologies are suitable for switching
voice and data. The additional requirement for voice
is that point to point delay for network packets should
have a low upper-bound. The 10 Mb/s Cambridge Ring
is an example of this type of network, but is relatively
low bandwidth gives it limited application in this area.
Networks with larger bandwidths (up to 100 Mb/s) are
now becoming available comercially and could support
a realistic population of clients requiring voice and data
communication.
Transporting voice and data in the same network
has two main advantages. Firstly, from a practical point
of view, wiring is minimised. Secondly, applications
which integrate both media are made possible, and
hence digitised voice may be controlled by client programs in new and interesting ways.
In addition to the new applications, the original telephony facilities must also be available. They should, at
least by default, appear to work in an identical way to
our tried and trusted impression of a telephone. However, the control and management of a network telephone is now in the domain of distributed computing.
The voice connections between telephones are virtual
circuits. Control and data information can be freely
mixed with voice at a network interface. The new problems that result are the management issues related to
the distributed control of real-time media.
A fast packet switch for the
integrated services backbone network
July 1988, 24 pages, paper copy
UCAM-CL-TR-143
Lawrence C. Paulson:
Experience with Isabelle
A generic theorem prover
August 1988, 20 pages, PDF
Abstract: The theorem prover Isabelle is described
briefly and informally. Its historical development is
traced from Edinburgh LCF to the present day. The
main issues are unification, quantifiers, and the representation of inference rules. The Edinburgh Logical
Framework is also described, for a comparison with Isabelle. An appendix presents several Isabelle logics, including set theory and Constructive Type Theory, with
examples of theorems.
UCAM-CL-TR-144
Juanito Camilleri:
An operational semantics for occam
August 1988, 27 pages, paper copy
This is an extended version of UCAM-CL-TR-125, in
which we include the operational semantics of priority
alternation.
UCAM-CL-TR-145
Michael J.C. Gordon:
Mechanizing programming logics in
higher order logic
September 1988, 55 pages, paper copy
22
UCAM-CL-TR-146
functions. Vautherin associates equations with transitions rather than the more general Boolean expressions.
P-Graphs are useful for specification at a concrete level.
Classes of the P-Graph, known as Many-sorted Algebraic Nets and Many-sorted Predicate/Transition nets,
are defined and illustrated by a number of examples. An
extended place capacity notation is developed to allow
for the convenient representation of resource bounds in
the graphical form.
Some communications-oriented examples are presented including queues and the Demon Game of international standards fame.
The report concludes with a discussion of future
work. In particular, an abstract P-Graph is defined that
is very similar to Vautherin’s Petri net-like schema, but
including the capacity and inhibitor extensions and
associating boolean expressions with transitions. This
will be useful for more abstract specifications (eg classes
of communications protocols) and for their analysis.
It is believed that this is the first coherent and formal
presentation of these extensions in the literature.
Thomas F. Melham:
Automating recursive type definitions
in higher order logic
September 1988, 64 pages, paper copy
UCAM-CL-TR-147
Jeffrey Joyce:
Formal specification and verification
of microprocessor systems
September 1988, 24 pages, paper copy
UCAM-CL-TR-148
Jonathan Billington:
UCAM-CL-TR-149
Extending coloured petri nets
Paul Ashley Karger:
September 1988, 82 pages, PDF
Abstract: Jensen’s Coloured Petri Nets (CP-nets) are
taken as the starting point for the development of
a specification technique for complex concurrent systems. To increase its expressive power CP-nets are extended by including capacity and inhibitor functions. A
class of extended CP-nets, known as P-nets, is defined
that includes the capacity function and the threshold inhibitor extension. The inhibitor extension is defined in
a totally symmetrical way to that of the usual pre place
map (or incidence function). Thus the inhibitor and pre
place maps may be equated by allowing a marking to be
purged by a single transition occurrence, useful when
specifying the abortion of various procedures. A chapter is devoted to developing the theory and notation for
the purging of a place’s marking or part of its marking.
Two transformations from P-nets to CP-nets are
presented and it is proved that they preserve interleaving behaviour. These are based on the notion of complementary places defined for PT-nets and involve the
definition and proof of a new extended complementary
place invariant for CP-nets
The graphical form of P-nets, known as a P-Graph,
is presented formally and draws upon the theories developed for algebraic specification. Arc inscriptions are
multiples of tuples of terms generated by a many-sorted
signature. Transition conditions are Boolean expressions derived from the same signature. An interpretation of the P-Graph is given in terms of a corresponding
P-net. The work is similar to that of Vautherin but includes the inhibitor and capacity extension and a number of significant differences. in the P-Graph concrete
sets are associated with places, rather than sorts and
likewise there are concrete initial marking and capacity
Improving security and performance
of capability systems
October 1988, 273 pages, PostScript
PhD thesis (Wolfson College, March 1988)
Abstract: This dissertation examines two major limitations of capability systems: an inability to support security policies that enforce confinement and a reputation
for relatively poor performance when compared with
non-capability systems.
The dissertation examines why conventional capability systems cannot enforce confinement and proposes
a new secure capability architecture, called SCAP, in
which confinement can be enforced. SCAP is based on
the earlier Cambridge Capability System, CAP. The dissertation shows how a non-discretionary security policy
can be implemented on the new architecture, and how
the new architecture can also be used to improve traceability of access and revocation of access.
The dissertation also examines how capability systems are vulnerable to discretionary Trojan horse attacks and proposes a defence based on rules built
into the command-language interpreter. System-wide
garbage collection, commonly used in most capability systems, is examined in the light of the nondiscretionary security policies and found to be fundamentally insecure. The dissertation proposes alternative
approaches to storage management to provide at least
some of the benefits of system-wide garbage collection,
but without the accompanying security problems.
Performance of capability systems is improved by
two major techniques. First, the doctrine of programming generality is addressed as one major cause of
23
poor performance. Protection domains should be allocated only for genuine security reasons, rather than
at every subroutine boundary. Compilers can better enforce modularity and good programming style without
adding the expense of security enforcement to every
subroutine call. Second, the ideas of reduced instruction set computers (RISC) can be applied to capability
systems to simplify the operations required. The dissertation identifies a minimum set of hardware functions
needed to obtain good performance for a capability system. This set is much smaller than previous research
had indicated necessary.
A prototype implementation of some of the capability features is described. The prototype was implemented on a re-microprogrammed VAX-11/730 computer. The dissertation examines the performance and
software compatibility implications of the new capability architecture, both in the context of conventional
computers, such as the VAX, and in the context of RISC
processors.
UCAM-CL-TR-150
Albert John Camilleri:
Simulation as an aid to verification
using the HOL theorem prover
October 1988, 23 pages, PDF
Abstract: The HOL theorem proving system, developed
by Mike Gordon at the University of Cambridge, is a
mechanism of higher order logic, primarily intended for
conducting formal proofs of digital system designs. In
this paper we show that hardware specifications written in HOL logic can be executed to enable simulation as a means of supporting formal proof. Specifications of a small microprocessor are described, showing
how HOL logic sentences can be transformed into executable code with minimum risk of introducing inconsistencies. A clean and effective optimisation strategy
is recommended to make the executable specifications
practical.
UCAM-CL-TR-151
Inderpreet-Singh Dhingra:
Formalising an integrated circuit
design style in higher order logic
November 1988, 195 pages, PDF
PhD thesis (King’s College, March 1988)
a formal foundation, in higher order logic, to the design rules of a dynamic CMOS integrated circuit design
style.
Correctness statements for the library of basic elements are fomulated. These statements are based on a
small number of definitions which define the behaviour
of transistors and capacitors and the necessary axiomisation of the four valued algebra for signals. The correctness statements of large and complex circuits are
then derived from the library of previously proved correctness statements, using logical inference rules instead
of rules of thumb. For example, one gate from the library can drive another only if its output constraints
are satisfied by the input constraints of the gate that it
drives. In formalising the design rules, these constraints
are captured as predicates and are part of the correctness statements of these gates. So when two gates are to
be connected, it is only necessary to check that the predicates match. These ideas are fairly general and widely
applicable for formalising the rules of many systems.
A number of worked examples are presented based
on these formal techniques. Proofs are presented at various stages of development to show how the correctness
statement for a device evolves and how the proof is constructed. In particular it is demonstrated how such formal techniques can help improve and sharpen the final
specifications.
As a major case study to test all these techniques, a
new design for a gigital phase-locked loop is presented.
This has been designed down to the gate level using
the above dynamic design style, and has been described
and simulated using ELLA. Some of the subcomponents
have been formally verified down to the detailed circuit
level while others have merely been specified without
formal proofs of correctness. An informal proof of correctness of this device is also presented based on the
formal specifications of the various submodules.
UCAM-CL-TR-152
Andrew Mark Pullen:
Motion development for computer
animation
November 1988, 163 pages, paper copy
PhD thesis (Churchill College, August 1987)
UCAM-CL-TR-153
Michael Burrows:
Efficient data sharing
December 1988, 99 pages, PDF
Abstract: If the activities of an integrated circuit dePhD thesis (Churchill College, September 1988)
signer are examined, we find that rather than keeping
track of all the details, he uses simple rules of thumb
which have been refined from experience. These rules
of thumb are guidelines for deciding which blocks to
use and how they are to be connected. This thesis gives
24
Abstract: As distributed computing systems become
widespread, the sharing of data between people using
a large number of computers becomes more important.
One of the most popular ways to facilitate this sharing is to provide a common file system, accessible by
all the machines on the network. This approach is simple and reasonably effective, but the performance of
the system can degrade significantly if the number of
machines is increased. By using a hierarchical network,
and arranging that machines typically access files stored
in the same section of the network it is possible to build
very large systems. However, there is still a limit on the
number of machines that can share a single file server
and a single network effectively.
A good way to decrease network and server load is
to cache file data on client machines, so that data need
not be fetched from the centralized server each time it is
accessed. This technique can improve the performance
of a distributed file system and is used in a number of
working systems. However, caching brings with it the
overhead of maintaining consistency, or cache coherence. That is, each machine in the network must see the
same data in its cache, even though one machine may be
modifying the data as others are reading it. The problem is to maintain consistency without dramatically increasing the number of messages that must be passed
between machines on the network.
Some existing file systems take a probabilistic approach to consistency, some explicitly prevent the activities that can cause inconsistency, while others provide consistency only at the some cost in functionality
or performance. In this dissertation, I examine how distributed file systems are typically used, and the degree
to which caching might be expected to improve performance. I then describe a new file system that attempts to
cache significantly more data than other systems, provides strong consistency guarantees, yet requires few
additional messages for cache management.
This new file-system provides fine-grain sharing of
a file concurrently open on multiple machines on the
network, at the granularity of a single byte. It uses
a simple system of multiple-reader, single writer locks
held in a centralized server to ensure cache consistency.
The problem of maintaining client state in a centralized
server are solved by using efficient data structures and
crash recovery techniques.
UCAM-CL-TR-155
S.G. Pulman, G.J. Russell, G.D. Ritchie,
A.W. Black:
Computational morphology of
English
January 1989, 15 pages, PDF
Abstract: This paper describes an implemented computer program which uses various kinds of linguistic
knowledge to analyse existing or novel word forms
in terms of their components. Three main types of
knowledge are required (for English): knowledge about
spelling or phonological changes consequent upon affixation (notice we are only dealing with isolated word
forms); knowledge about the syntactic or semantic
properties of affixation (i.e. inflexional and derivational
morphology), and knowledge about the properties of
the stored base forms of words (which in our case are
always themselves words, rather than more abstract entities). These three types of information are stored as
data files, represented in exactly the form a linguist
might employ. These data files are then compiled by
the system to produce a run-time program which will
analyse arbitrary word forms presented to it in a way
consistent with the original linguistic description.
UCAM-CL-TR-156
Steve Pulman:
Events and VP modifiers
January 1989, 10 pages, paper copy
UCAM-CL-TR-157
Juanito Camilleri:
Introducing a priority operator to
CCS
January 1989, 19 pages, paper copy
UCAM-CL-TR-154
I.B. Crabtree, R.S. Crouch, D.C. Moffat,
N.J. Pirie, S.G. Pulman, G.D. Ritchie,
B.A. Tate:
A natural language interface to an
intelligent planning system
January 1989, 14 pages, paper copy
UCAM-CL-TR-158
Karen Spärck Jones:
Tailoring output to the user:
What does user modelling in
generation mean?
August 1988, 21 pages, PDF
25
Abstract: This paper examines the implications for linUCAM-CL-TR-164
guistic output generation tailored to the interactive system user, of earlier analyses of the components of user Li Gong, David J. Wheeler:
modelling and of the constraints realism imposes on
modelling. Using a range of detailed examples it argues A matrix key distribution system
that tailoring based only on the actual dialogue and on
the decision model required for the system task is quite October 1988, 20 pages, PDF
adequate, and that more ambitious modelling is both
Abstract: A new key distribution scheme is presented.
dangerous and unnecessary.
It is based on the distinctive idea that lets each node
have a set of keys of which it shares a distinct subset
UCAM-CL-TR-159
with every other node. This has the advantage that the
numbers of keys that must be distributed and mainAndrew M. Pitts:
tained are reduced by a square root factor; moreover,
two nodes can start conversation with virtually no deNon-trivial power types can’t be
lay. Two versions of the scheme are given. Their persubtypes of polymorphic types
formance and security analysis shows it is a practical
solution to some key distribution problems.
January 1989, 12 pages, PostScript
UCAM-CL-TR-165
UCAM-CL-TR-160
Andrew Gordon:
Peter Newman:
PFL+: A Kernal Scheme for Functions Fast packet switching for integrated
services
I/O
February 1989, 26 pages, paper copy
March 1989, 145 pages, paper copy
PhD thesis (Wolfson College, December 1988)
UCAM-CL-TR-161
UCAM-CL-TR-166
D.C.J. Matthews:
Papers on Poly/ML
February 1989, 150 pages, paper copy
UCAM-CL-TR-162
Jean Bacon:
Evolution of operating system
structures
March 1989, 28 pages, paper copy
Claire Glover, Ted Briscoe, John Carroll,
Bran Boguraev:
The Alvey natural language tools
grammar (2nd Release)
April 1989, 90 pages, paper copy
UCAM-CL-TR-163
UCAM-CL-TR-167
Jeffrey J. Joyce:
A verified compiler for a verified
microprocessor
March 1989, 67 pages, paper copy
UCAM-CL-TR-168
Ann Copestake, Karen Spärck Jones:
Inference in a natural language front
end for databases
February 1989, 87 pages, PDF
J.M. Bacon, I.M. Leslie, R.M. Needham:
Distributed computing with a
processor bank
Abstract: This report describes the implementation and April 1989, 15 pages, paper copy
initial testing of knowledge representation and inference capabilities within a modular database front end
designed for transportability.
26
UCAM-CL-TR-169
UCAM-CL-TR-174
Andrew Franklin Seaborne:
James Thomas Woodchurch Clarke:
Filing in a heterogeneous network
General theory relating to the
implementation of concurrent
symbolic computation
April 1989, 131 pages, paper copy
PhD thesis (Churchill College, July 1987)
August 1989, 113 pages, paper copy
UCAM-CL-TR-170
PhD thesis (Trinity College, January 1989)
Ursula Martin, Tobias Nipkow:
UCAM-CL-TR-175
Ordered rewriting and confluence
Lawrence C. Paulson:
May 1989, 18 pages, paper copy
A formulation of the simple theory of
types (for Isabelle)
UCAM-CL-TR-171
August 1989, 32 pages, PDF
Jon Fairbairn:
Some types with inclusion properties
in ∀, →, µ
June 1989, 10 pages, PDF
Abstract: This paper concerns the ∀, →, µ type system
used in the non-strict functional programming language
Ponder. While the type system is akin to the types of
Second Order Lambda-calculus, the absence of type application makes it possible to construct types with useful inclusion relationships between them.
To illustrate this, the paper contains definitions of
a natural numbers type with many definable subtypes,
and of a record type with inheritance.
UCAM-CL-TR-172
Julia Rose Galliers:
Abstract: Simple type theory is formulated for use with
the generic theorem prover Isabelle. This requires explicit type inference rules. There are function, product, and subset types, which may be empty. Descriptions (the eta-operator) introduce the Axiom of Choice.
Higher-order logic is obtained through reflection between formulae and terms of type bool. Recursive types
and functions can be formally constructed.
Isabelle proof procedures are described. The logic
appears suitable for general mathematics as well as
computational problems.
UCAM-CL-TR-176
T.J.W. Clarke:
Implementing aggregates in parallel
functional languages
A theoretical framework for
computer models of cooperative
dialogue, acknowledging multi-agent
conflict
August 1989, 13 pages, paper copy
July 1989, 226 pages, paper copy
Experimenting with Isabelle in
ZF Set Theory
UCAM-CL-TR-173
UCAM-CL-TR-177
P.A.J. Noel:
September 1989, 40 pages, paper copy
Roger William Stephen Hale:
Programming in temporal logic
July 1989, 182 pages, paper copy
PhD thesis (Trinity College, October 1988)
27
UCAM-CL-TR-178
UCAM-CL-TR-183
Jeffrey J. Joyce:
Rachel Cardell-Oliver:
Totally verified systems:
linking verified software to verified
hardware
The specification and verification
of sliding window protocols in
higher order logic
September 1989, 25 pages, PDF
October 1989, 25 pages, paper copy
Abstract: We describe exploratory efforts to design and
UCAM-CL-TR-184
verify a compiler for a formally verified microprocessor as one aspect of the eventual goal of building totally verified systems. Together with a formal proof of David Lawrence Tennenhouse:
correctness for the microprocessor this yields a precise
and rigorously established link between the semantics Site interconnection and the
of the source language and the execution of compiled
exchange architecture
code by the fabricated microchip. We describe in particular: (1) how the limitations of real hardware influ- October 1989, 225 pages, paper copy
enced this proof; and (2) how the general framework PhD thesis (Darwin College, September 1988)
provided by higher order logic was used to formalize
the compiler correctness problem for a hierarchically
UCAM-CL-TR-185
structured language.
UCAM-CL-TR-179
Ursula Martin, Tobias Nipkow:
Automating Squiggol
Guo Qiang Zhang:
Logics of Domains
December 1989, 250 pages, paper copy
PhD thesis (Trinity College, May 1989)
September 1989, 16 pages, paper copy
UCAM-CL-TR-186
UCAM-CL-TR-180
Tobias Nipkow:
Formal verification of data type
refinement
Theory and practice
Derek Robert McAuley:
Protocol design for high speed
networks
January 1990, 100 pages, PostScript
PhD thesis (Fitzwilliam College, September 1989)
September 1989, 31 pages, paper copy
UCAM-CL-TR-181
Tobias Nipkow:
Proof transformations for equational
theories
September 1989, 17 pages, paper copy
UCAM-CL-TR-182
John M. Levine, Lee Fedder:
The theory and implementation of
a bidirectional question answering
system
Abstract: Improvements in fibre optic communication
and in VLSI for network switching components have
led to the consideration of building digital switched
networks capable of providing point to point communication in the gigabit per second range. Provision of
bandwidths of this magnitude allows the consideration
of a whole new range of telecommunications services,
integrating video, voice, image and text. These multiservice networks have a range of requirements not met
by traditional network architectures designed for digital telephony or computer applications. This dissertation describes the design, and an implementation, of the
Multi-Service Network architecture and protocol family, which is aimed at supporting these services.
Asynchronous transfer mode networks provide the
basic support required for these integrated services, and
the Multi-Service Network architecture is designed primarily for these types of networks. The aim of the
October 1989, 27 pages, paper copy
28
Multi-Service protocol family is to provide a comUCAM-CL-TR-191
plete architecture which allows use of the full facilities of asynchronous transfer mode networks by multi- Cosmos Nicolaou:
media applications. To maintain comparable performance with the underlying media, certain elements of An architecture for real-time
the MSN protocol stack are designed with implementamultimedia communications systems
tion in hardware in mind. The interconnection of heterogeneous networks, and networks belonging to dif- February 1990, 30 pages, PDF
ferent security and administrative domains, is considered vital, so the MSN architecture takes an internet- Abstract: An architecture for real-time multimedia
working approach.
communications systems is presented. A multimedia
communication systems includes both the communicaUCAM-CL-TR-187
tion protocols used to transport the real-time data and
also the Distributed Computing system (DCS) within
which any applications using these protocols must exAnn Copestake, Karen Spärck Jones:
ecute. The architecture presented attempts to integrate
Natural language interfaces to
these protocols with the DCS in a smooth fashion in
order to ease the writing of multimedia applications.
databases
Two issues are identified as being essential to the success of this integration: namely the synchronisation of
September 1989, 36 pages, PostScript
related real-time data streams, and the management of
heterogeneous multimedia hardware. The synchronisaUCAM-CL-TR-188
tion problem is tackled by defining explicit synchronisation properties at the presentation level and by proTimothy E. Leonard:
viding control and synchronisation operations within
the DCS which operate in terms of these properties. The
Specification of computer
heterogeneity problems are addressed by separating the
architectures:
data transport semantics (protocols themselves) from
control semantics (protocol interfaces). The control
a survey and annotated bibliography the
semantics are implemented using a distributed, typed
interface, scheme within the DCS (i.e. above the preJanuary 1990, 42 pages, paper copy
sentation layer), whilst the protocols themselves are implemented within the communication subsystem. The
UCAM-CL-TR-189
interface between the DCS and communications subsystem is referred to as the orchestration interface and
Lawrence C. Paulson, Tobias Nipkow:
can be considered to lie in the presentation and session
layers.
Isabelle tutorial and user’s manual
A conforming prototype implementation is currently
under construction.
January 1990, 142 pages, PDF
UCAM-CL-TR-192
Abstract: This (obsolete!) manual describes how to use
the theorem prover Isabelle. For beginners, it explains
how to perform simple single-step proofs in the built- Lawrence C. Paulson:
in logics. These include first-order logic, a classical sequent calculus, ZF set theory, Constructie Type Theory, Designing a theorem prover
and higher-order logic. Each of these logics is described.
The manual then explains how to develop advanced May 1990, 57 pages, PDF
tactics and tacticals and how to derive rules. Finally,
Abstract: The methods and principles of theorem
it describes how to define new logics within Isabelle.
prover design are presented through an extended example. Starting with a sequent calculus for first-order
UCAM-CL-TR-190
logic, an automatic prover (called Folderol) is developed. Folderol can prove quite a few complicated theoAnn Copestake:
rems, although its search strategy is crude and limited.
Folderol is coded in Standard ML and consists largely
Some notes on mass terms and
of pure functions. Its complete listing is included.
plurals
The report concludes with a survey of other research
in theorem proving: the Boyer/Moore theorem prover,
January 1990, 65 pages, PostScript
Automath, LCF, and Isabelle.
29
UCAM-CL-TR-193
Julia Rose Galliers:
Belief revision and a theory of
communication
May 1990, 30 pages, paper copy
UCAM-CL-TR-194
Julia Rose Galliers:
Proceedings of the First Belief
Representation and Agent
Architectures Workshop
March 1990, 199 pages, paper copy
UCAM-CL-TR-195
Jeffrey J. Joyce:
Multi-level verification of
microprocessor-based systems
May 1990, 163 pages, paper copy
PhD thesis (Pembroke College, December 1989)
UCAM-CL-TR-199
Richard Boulton, Mike Gordon,
John Herbert, John Van Tassel:
The HOL verification of ELLA
designs
August 1990, 22 pages, PostScript
Abstract: HOL is a public domain system for generating proofs in higher order predicate calculus. It has
been in experimental and commercial use in several
countries for a number of years.
ELLA is a hardware design language developed at
the Royal Signals and Radar Establishment (RSRE) and
marketed by Computer General Electronic Design. It
supports simulation models at a variety of different abstraction levels.
A preliminary methodology for reasoning about
ELLA designs using HOL is described. Our approach is
to semantically embed a subset of the ELLA language
in higher order logic, and then to make this embedding convenient to use with parsers and pretty-printers.
There are a number of semantic issues that may affect
the ease of verification. We discuss some of these briefly.
We also give a simple example to illustrate the methodology.
UCAM-CL-TR-196
John Peter Van Tassell:
UCAM-CL-TR-200
Tobias Nipkow, Gregor Snelting:
The semantics of VHDL with Val and Type classes and overloading
Hol:
resolution via order-sorted unification
towards practical verification tools
June 1990, 77 pages, paper copy
August 1990, 16 pages, paper copy
UCAM-CL-TR-201
UCAM-CL-TR-197
Thomas Clarke:
Thomas Frederick Melham:
The semantics and implementation of Formalizing abstraction mechanisms
for hardware verification in higher
aggregates
order logic
or
how to express concurrency without August 1990, 233 pages, PDF
PhD thesis (Gonville & Caius College, August 1989)
destroying determinism
July 1990, 25 pages, paper copy
UCAM-CL-TR-198
Andrew M. Pitts:
Evaluation Logic
August 1990, 31 pages, PostScript
Abstract: Recent advances in microelectronics have
given designers of digital hardware the potential to
build devices of remarkable size and complexity. Along
with this however, it becomes increasingly difficult to
ensure that such systems are free from design errors,
where complete simulation of even moderately sized
circuits is impossible. One solution to these problems
is that of hardware verification, where the functional
behaviour of the hardware is described mathematically
30
and formal proof is used to show that the design meets
rigorous specifications of the intended operation.
This dissertation therefore seeks to develop this,
showing how reasoning about the correctness of hardware using formal proof can be achieved using fundamental abstraction mechanisms to relate specifications of hardware at different levels. Therefore a systematic method is described for defining any instance of
a wide class of concrete data types in higher order logic.
This process has been automated in the HOL theorem
prover, and provides a firm logical basis for representing data in formal specifications.
Further, these abstractions have been developed into
a new technique for modelling the behaviour of entire
classes of hardware designs. This is based on a formal representation in logic for the structure of circuit
designs using the recursive types defined by the above
method. Two detailed examples are presented showing
how this work can be applied in practice.
Finally, some techniques for temporal abstraction
are explained, and the means for asserting the correctness of a model containing time-dependent behaviour
is described. This work is then illustrated using a case
study; the formal verification on HOL of a simple ring
communication network.
[Abstract by Nicholas Cutler (librarian), as none
was submitted with the report.]
UCAM-CL-TR-202
Andrew Charles Harter:
Three-dimensional integrated circuit
layout
August 1990, 179 pages, paper copy
PhD thesis (Corpus Christi College, April 1990)
UCAM-CL-TR-203
Valeria C.V. de Paiva:
Subtyping in Ponder
(preliminary report)
August 1990, 35 pages, PDF
UCAM-CL-TR-204
Roy L. Crole, Andrew M. Pitts:
New foundations for fixpoint
computations: FIX-hyperdoctrines
and the FIX-logic
August 1990, 37 pages, PostScript
UCAM-CL-TR-205
Lawrence C. Paulson, Andrew W. Smith:
Logic programming, functional
programming and inductive
definitions
29 pages, PDF
Abstract: This paper reports an attempt to combine
logic and functional programming. It also questions the
traditional view that logic programming is a form of
first-order logic, arguing instead that the essential nature of a logic program is an inductive definition. This
revised view of logic programming suggests the design
of a combined logic/functional language. A slow but
working prototype is described.
UCAM-CL-TR-206
Richard Cardell-Oliver:
Formal verification of real-time
protocols using higher order logic
August 1990, 36 pages, paper copy
UCAM-CL-TR-207
Stuart Philip Hawkins:
Abstract: This note starts the formal study of the type Video replay in computer animation
system of the functional language Ponder. Some of the October 1990, 161 pages, paper copy
problems of proving soundness and completeness are PhD thesis (Queens’ College, December 1989)
discussed and some preliminary results, about fragments of the type system, shown.
UCAM-CL-TR-208
It consists of 6 sections. In section 1 we review
briefly Ponder’s syntax and describe its typing system.
In section 2 we consider a very restricted fragment of Eike Ritter:
the language for which we can prove soundness of
the type inference mechanism, but not completeness. Categorical combinators for the
Section 3 describes possible models of this fragment calculus of constructions
and some related work. Section 4 describes the typeinference algorithm for a larger fragment of Ponder and October 1990, 43 pages, paper copy
in section 5 we come up against some problematic examples. Section 6 is a summary of further work.
31
UCAM-CL-TR-209
UCAM-CL-TR-212
Andrew William Moore:
K.L. Wrench:
Efficient memory-based learning for
robot control
A distributed and-or parallel Prolog
network
November 1990, 248 pages, PDF
82 pages, paper copy
PhD thesis (Trinity Hall, October 1990)
Abstract: This dissertation is about the application of
machine learning to robot control. A system which has
no initial model of the robot/world dynamics should
be able to construct such a model using data received
through its sensors—an approach which is formalized
here as the SAB (State-Action-Behaviour) control cycle.
A method of learning is presented in which all the experiences in the lifetime of the robot are explicitly remembered. The experiences are stored in a manner which
permits fast recall of the closest previous experience to
any new situation, thus permitting very quick predictions of the effects of proposed actions and, given a goal
behaviour, permitting fast generation of a candidate action. The learning can take place in high-dimensional
non-linear control spaces with real-valued ranges of
variables. Furthermore, the method avoids a number
of shortcomings of earlier learning methods in which
the controller can become trapped in inadequate performance which does not improve. Also considered is how
the system is made resistant to noisy inputs and how
it adapts to environmental changes. A well founded
mechanism for choosing actions is introduced which
solves the experiment/perform dilemma for this domain
with adequate computational efficiency, and with fast
convergence to the goal behaviour. The dissertation explains in detail how the SAB control cycle can be integrated into both low and high complexity tasks. The
methods and algorithms are evaluated with numerous
experiments using both real and simulated robot domains. The final experiment also illustrates how a compound learning task can be structured into a hierarchy
of simple learning tasks.
UCAM-CL-TR-210
Tobias Nipkow:
Higher-order unification,
polymorphism, and subsorts
15 pages, paper copy
UCAM-CL-TR-211
Karen Spärck Jones:
The role of artificial intelligence in
information retrieval
November 1990, 13 pages, paper copy
UCAM-CL-TR-213
Valeria Correa Vaz de Paiva:
The Dialectica categories
January 1991, 82 pages, PDF
PhD thesis (Lucy Cavendish College, November 1988)
Abstract: This work consists of two main parts. The
first one, which gives it its name, presents an internal
categorical version of Gödel’s “Dialectica interpretation” of higher-order arithmetic. The idea is to analyse the Dialectica interpretation using a cetegory DC
where objects are relations on objects of a basic category C and maps are pairs of maps of C satisfying a
pullback condition. If C is finitely complete, DC exists
and has a very natural symmetric monoidal structure.
If C is locally cartesian closed then DC is symmetric
monoidal closed. If we assume C with stable and disjoint coproducts, DC has cartesian products and weakcoproducts and satisfies a weak form of distributivity.
Using the structure above, DC is a categorical model
for intuitionistic linear logic.
Moreover if C has free monoids then DC has cofree
comonoids and the corresponding comonad “!” on
DC, which has some special properties, can be used
to model the exponential “of course!” in Intuitionistic Linear Logic. The category of “!”-coalgebras is isomorphic to the category of comonoids in DC and, if
we assume commutative monoids in C, the “!”-Kleisli
category, which is cartesian closed, corresponds to the
Diller-Nahm variant of the Dialectica interpretation.
The second part introduces the categories GC. The
objects of GC are the same objects of DC, but morphisms are easier to handle, since they are maps in C
in opposite directions. If C is finitely complete, the category GC exists. If C is cartesian closed, we can define a symmetric monoidal structure and if C is locally
cartesian closed as well, we can define inernal homs
in GC that make it a symmetric monoidal closed category. Supposing C with stable and disjoint coproducts,
we can define cartesian products and coproducts in GC
and, more interesting, we can define a dual operation
to the tensor product bifunctor, called “par”. The operation “par” is a bifunctor and has a unit “⊥”, which is
a dualising object. Using the internal hom and ⊥ we define a contravariant functor “(−)⊥” which behaves like
negation and thus it is used to model linear negation.
We show that the category GC, with all the structure
above, is a categorical model for Linear Logic, but not
exactly the classical one.
32
In the last chapter a comonad and a monad are defined to model the exponentials “!” and “?”. To define these endofunctors, we use Beck’s distributive laws
in an interesting way. Finally, we show that the Kleisli
category GC! is cartesian closed and that the categories
DC and GC are related by a Kleisli construction.
UCAM-CL-TR-214
J.A. Bradshaw, R.M. Young:
Integrating knowledge of purpose
and knowledge of structure for
design evaluation
early stage in the validation process. I suggest that the
system described would combine well with other validation tools and provide help throughout the firmware
development cycle. Two case studies are given. The first
describes the verification of Gordon’s computer. This
example being fairly simple, provides a good illustration of the techniques used by the system. The second
case study is concerned with the High Level Hardware
Orion computer which is a commercially produced machine with a fairly complex microarchitecture. This example shows that the techniques scale well to production microarchitectures.
UCAM-CL-TR-216
Carole Susan Klein:
February 1991, 20 pages, paper copy
Exploiting OR-parallelism in Prolog
using multiple sequential machines
UCAM-CL-TR-215
250 pages, paper copy
Paul Curzon:
PhD thesis (Wolfson College, October 1989)
A structured approach to the
verification of low level microcode
UCAM-CL-TR-217
265 pages, PDF
Bhaskar Ramanathan Harita:
PhD thesis (Christ’s College, May 1990)
Dynamic bandwidth management
Abstract: Errors in microprograms are especially serious since all higher level programs on the machine
depend on the microcode. Formal verification presents
one avenue which may be used to discover such errors.
Previous systems which have been used for formally
verifying microcode may be categorised by the form in
which the microcode is supplied. Some demand that it
be written in a high level microprogramming language.
Conventional software verification techniques are then
applied. Other methods allow the microcode to be supplied in the form of a memory image. It is treated as
data to an interpreter modelling the behaviour of the
microarchitecture. The proof is then performed by symbolic execution. A third solution is for the code to be
supplied in an assembly language and modelled at that
level. The assembler instructions are converted to commands in a modelling language. The resulting program
is verified using traditional software verification techniques.
In this dissertation I present a new universal microprogram verification system. It achieves many of the advantages of the other kinds of systems by adopting a hybrid approach. The microcode is supplied as a memory
image, but it is transformed by the system to a high level
program which may be verified using standard software verification techniques. The structure of the high
level program is obtained from user supplied documentation. I show that this allows microcode to be split into
small, independently validatable portions even when it
was not written in that way. I also demonstrate that
the techniques allow the complexity of detail due to
the underlying microarchitecture to be controlled at an
160 pages, paper copy
PhD thesis (Wolfson College, October 1990)
UCAM-CL-TR-218
Tobias Nipkow:
Higher-order critical pairs
15 pages, paper copy
UCAM-CL-TR-219
Ian M. Leslie, Derek M. McAuley,
Mark Hayter, Richard Black, Reto Beller,
Peter Newman, Matthew Doar:
Fairisle project working documents
Snapshot 1
March 1991, 15 pages, paper copy
UCAM-CL-TR-220
Cosmos Andrea Nicolaou:
A distributed architecture for
multimedia communication systems
192 pages, paper copy
PhD thesis (Christ’s College, December 1990)
33
UCAM-CL-TR-221
UCAM-CL-TR-225
Robert Milne:
Valeria de Paiva:
Transforming axioms for data types
into sequential programs
Categorical multirelations,
linear logic and petri nets
(draft)
44 pages, PDF
Abstract: A process is proposed for refining specifications of abstract data types into efficient sequential implementations. The process needs little manual intervention. It is split into three stages, not all of which
need always be carried out. The three stages entail interpreting equalities as behavioural equivalences, converting functions into procedures and replacing axioms
by programs. The stages can be performed as automatic
transformations which are certain to produce results
that meet the specifications, provided that simple conditions hold. These conditions describe the adequacy
of the specifications, the freedom from interference between the procedures, and the mode of construction of
the procedures. Sufficient versions of these conditions
can be checked automatically. Varying the conditions
could produce implementations for different classes of
specification. Though the transformations could be automated, the intermediate results, in styles of specification which cover both functions and procedures, have
interest in their own right and may be particularly appropriate to object-oriented design.
May 1991, 29 pages, PDF
Abstract: This note presents a categorical treatment of
multirelations, which is, in a loose sense a generalisation of both our previous work on the categories GC,
and of Chu’s construction A NC [Barr’79]. The main
motivation for writing this note was the utilisation of
the category GC by Brown and Gurr [BG90] to model
Petri nets. We wanted to extend their work to deal
with multirelations, as Petri nets are usually modelled
using multirelations pre and post. That proved easy
enough and people interested mainly in concurrency
theory should refer to our joint work [BGdP’91], this
note deals with the mathematics underlying [BGdP’91].
The upshot of this work is that we build a model of Intuitionistic Linear Logic (without modalities) over any
symmetric monoidal category C with a distinguished
object (N, ≤, ◦, e −◦) – a closed poset. Moreover, if
the category C is cartesian closed with free monoids,
we build a model of Intuitionistic Linear Logic with a
non-trivial modality ‘!’ over it.
UCAM-CL-TR-226
UCAM-CL-TR-222
Jonathan Billington:
Extensions to coloured petri nets and
their application to protocols
190 pages, paper copy
PhD thesis (Clare Hall, May 1990)
Kwok-yan Lam:
A new approach for improving
system availability
June 1991, 108 pages, paper copy
PhD thesis (Churchill College, January 1991)
UCAM-CL-TR-227
UCAM-CL-TR-223
Philip Gladwin, Stephen Pulman,
Karen Spärck Jones:
Juanito Albert Camilleri:
Priority in process calculi
Shallow processing and
automatic summarising:
a first study
June 1991, 203 pages, paper copy
May 1991, 65 pages, paper copy
Mark Hayter, Derek McAuley:
UCAM-CL-TR-224
PhD thesis (Trinity College, October 1990)
UCAM-CL-TR-228
The desk area network
Ted Briscoe, John Carroll:
May 1991, 11 pages, PostScript
Generalised probabilistic LR parsing
of natural language (corpora)
with unification-based grammars
Abstract: A novel architecture for use within an end
computing system is described. This attempts to extend
the concepts used in modern high speed networks into
computer system design. A multimedia workstation is
being built based on this concept to evaluate the approach.
45 pages, paper copy
34
UCAM-CL-TR-229
ordinary syntax of the notation and its representation
in the formal language is specified by a rewrite rule. The
collection of rewrite rules comprises a rewriting system
of a kind which is computationally well behaved.
The formal system is justified by the fact than set
theory within H.O.L. is a conservative extension of set
theory within F.O.L. Besides facilitating the representation of notations, the formal system is of interestbecause it permits the use of mathematical methods which
do not seem to be available in set theory within F.O.L.
A PDS, called Watson, has been built to demonstrate this approach to the mechanization of mathematics. Watson embodies a methodology for interactive
proof which provides both flexibility of use and a relative guarantee of correctness. Results and proofs can be
saved, and can be perused and modified with an ordinary text editor. The user can specify his own notations
as rewrite rules and adapt the mix of notations to suit
the problem at hand; it is easy to switch from one set of
notations to another. As a case study, Watson has been
used to prove the correctness of a latch implemented as
two cross-coupled nor-gates, with an approximation of
time as a continuum.
David J. Brown:
Abstraction of image and pixel
The thistle display system
August 1991, 197 pages, paper copy
PhD thesis (St John’s College, February 1991)
UCAM-CL-TR-230
J. Galliers:
Proceedings of the second
belief representation and
agent architectures workshop
(BRAA ’91)
August 1991, 255 pages, paper copy
UCAM-CL-TR-231
UCAM-CL-TR-233
Raphael Yahalom:
Managing the order of transactions in John Carroll, Ted Briscoe, Claire Grover:
A development environment for
widely-distributed data systems
large natural language grammars
August 1991, 133 pages, paper copy
PhD thesis (Jesus College, October 1990)
July 1991, 65 pages, paper copy
UCAM-CL-TR-232
UCAM-CL-TR-234
Francisco Corella:
Karen Spärck Jones:
Mechanising set theory
July 1991, 217 pages, PDF
Two tutorial papers:
Information retrieval & Thesaurus
PhD thesis (Corpus Christi College, June 1989)
August 1991, 31 pages, PDF
Abstract: Set theory is today the standard foundation
of mathematics, but most proof development sysems
(PDS) are based on type theory rather than set theory.
This is due in part to the difficulty of reducing the rich
mathematical vocabulary to the economical vocabulary
of the set theory. It is known how to do this in principle,
but traditional explanations of mathematical notations
in set theoretic terms do not lead themselves easily to
mechanical treatment.
We advocate the representation of mathematical
notations in a formal system consisting of the axioms of any version of ordinary set theory, such as
ZF, but within the framework of higher-order logic
with λ-conversion (H.O.L.) rather than first-order logic
(F.O.L.). In this system each notation can be represented by a constant, which has a higher-order type
when the notation binds variables. The meaning of the
notation is given by an axiom which defines the representing constant, and the correspondence between the
Abstract: The first paper describes the characteristics
of information retrieval from documents or texts, the
development and status of automatic indexing and retrieval, and the actual and potential relations between
information retrieval and artificial intelligence. The second paper discusses the properties, construction and actual and potential uses of thesauri, as semantic classifications or terminological knowledge bases, in information retrieval and natural language processing.
UCAM-CL-TR-235
Heng Wang:
Modelling and image generation
145 pages, paper copy
PhD thesis (St John’s College, July 1991)
35
UCAM-CL-TR-236
John Anthony Bradshaw:
Using knowledge of purpose and
knowledge of structure as a basic
for evaluating the behaviour of
mechanical systems
153 pages, paper copy
PhD thesis (Gonville & Caius College, June 1991)
UCAM-CL-TR-237
Derek G. Bridge:
Computing presuppositions in an
incremantal language processing
system
212 pages, paper copy
PhD thesis (Wolfson College, April 1991)
UCAM-CL-TR-238
Ted Briscoe, Ann Copestake,
Valeria de Paiva:
Proceedings of the ACQUILEX
workshop on default inheritance in
the lexicon
October 1991, 180 pages, paper copy
UCAM-CL-TR-239
Mark Thomas Maybury:
Planning multisentential English text
using communicative acts
December 1991, 329 pages, PDF
the reader. This motivated an integrated theory of communicative acts which characterizes text at the level
of rhetorical acts (e.g. describe, define, narrate), illocutionary acts (e.g. inform, request), and locutionary
acts (ask, command). Taken as a whole, the identified
communicative acts characterize the structure, content
and intended effects of four types of text: description,
narration, exposition, argument. These text types have
distinct effects such as getting the reader to know about
entities, to know about events, to understand plans,
processes, or propositions, or to believe propositions or
want to perform actions. In addition to identifying the
communicative function and effect of text at multiple
levels of abstraction, this dissertation details a tripartite
theory of focus of attention (discourse focus, temporal
focus and spatial focus) which constrains the planning
and linguistic realization of text.
To test the integrated theory of communicative acts
and tripartite theory of focus of attention, a text generation system TEXPLAN (Textual EXplanation PLANner) was implemented that plans and linguistically realizes multisentential and multiparagraph explanations
from knowledge based systems. The communicative
acts identified during text analysis were formalized over
sixty compositional and (in some cases) recursive plan
operators in the library of a hierarchical planner. Discourse, temporal and spatial models were implemented
to track and use attentional information to guide the
organization and realization of text. Because the plan
operators distinguish between the communicative function (e.g. argue for a proposition) and the expected effect (e.g. the reader believes the proposition) of communicative acts, the system is able to construct a discourse
model of the structure and function of its textual responses as well as a user model of the expected effects
of its responses on the reader’s knowledge, beliefs, and
desires. The system uses both the discourse model and
user model to guide subsequent utterances. To test its
generality, the system was interfaced to a variety of domain applications including a neuropsychological diagnosis system, a mission planning system, and a knowledge based mission simulator. The system produces descriptions, narratives, expositions and arguments from
these applications, thus exhibiting a broader ranger of
rhetorical coverage then previous text generation systems.
PhD thesis (Wolfson College, July 1991)
UCAM-CL-TR-240
Abstract: The goal of this research is to develop explanation presentation mechanisms for knowledge based
Juanito Camilleri:
systems which enable them to define domain terminology and concepts, narrate events, elucidate plans, pro- Symbolic compilation and
cesses, or propositions and argue to support a claim or
advocate action. This requires the development of de- execution of programs by proof:
vices which select, structure, order and then linguisti- a case study in HOL
cally realize explanation content as coherent and cohesive English text.
31 pages, paper copy
With the goal of identifying generic explanation presentation strategies, a wide range of naturally occurring
texts were analyzed with respect to their communicative structure, function, content and intended effects on
36
UCAM-CL-TR-241
framework to compare different algorithms was also
developed and an experimental testbed was designed to
gather and analyse data on the paging activity of various programs. Using the testbed, conventional paging
algorithms were applied to different types of objects
and the results were compared. New paging algorithms
were designed and implemented for objects that are accessed in a highly sequential manner.
Thomas Ulrich Vogel:
Learning in large state spaces with an
application to biped robot walking
December 1991, 204 pages, paper copy
PhD thesis (Wolfson College, November 1991)
UCAM-CL-TR-243
UCAM-CL-TR-242
Alison Cawsey, Julia Galliers, Stenev Reece,
Karen Spärck Jones:
Glenford Ezra Mapp:
January 1992, 150 pages, PDF
Automating the librarian:
a fundamental approach using belief
revision
PhD thesis (Clare Hall, September 1991)
January 1992, 39 pages, paper copy
An object oriented approach to
virtual memory management
Abstract: Advances in computer technology are being
pooled together to form a new computing environment
which is characterised by powerful workstations with
vast amounts of memory connected to high speed networks. This environment will provide a large number of diverse services such as multimedia communications, expert systems and object-oriented databases.
In order to develop these complex applications in an
efficient manner, new interfaces are required which are
simple, fast and flexible and allow the programmer to
use an object-oriented approach throughout the design
and implementation of an application. Virtual memory
techniques are increasingly being used to build these
new facilities.
In addition since CPU speeds continue to increase
faster than disk speeds, an I/O bottleneck may develop
in which the CPU may be idle for long periods waiting for paging requests to be satisfied. To overcome
this problem it is necessary to develop new paging algorithms that better reflect how different objects are used.
Thus a facility to page objects on a per-object basis is
required and a testbed is also needed to obtain experimental data on the paging activity of different objects.
Virtual memory techniques, previously only used in
mainframe and minicomputer architectures, are being
employed in the memory management units of modern microprocessors. With very large address spaces becoming a standard feature of most systems, the use of
memory mapping is seen as an effective way of providing greater flexibility as well as improved system efficiency.
This thesis presents an object-oriented interface for
memory mapped objects. Each object has a designated
object type. Handles are associated with different object types and the interface allows users to define and
manage new object types. Moving data between the
object and its backing store is done by user-level processes called object managers. Object managers interact with the kernel via a specified interface thus allowing users to build their own object managers. A
UCAM-CL-TR-244
T.F. Melham:
A mechanized theory of the
π-calculus in HOL
31 pages, paper copy
UCAM-CL-TR-245
Michael J. Dixon:
System support for multi-service
traffic
January 1992, 108 pages, PDF
PhD thesis (Fitzwilliam College, September 1991)
Abstract: Digital network technology is now capable of
supporting the bandwidth requirements of diverse applications such as voice, video and data (so called multiservice traffic). Some media, for example voice, have
specific transmission requirements regarding the maximum packet delay and loss which they can tolerate.
Problems arise when attempting to multiplex such traffic over a single channel. Traditional digital networks
based on the Packet- (PTM) and Synchronous- (STM)
Transfer Modes prove unsuitable due to their media access contention and inflexible bandwidth allocation properties respectively. The Asynchronous Transfer Mode (STM) has been proposed as a compromise
between the PTM and STM techniques. The current
state of multimedia research suggests that a significant
amount of multi-service traffic will be handled by computer operating systems. Unfortunately conventional
operating systems are largely unsuited to such a task.
This dissertation is concerned with the system organisation necessary in order to extend the benefits of ATM
37
networking through the endpoint operating system and
up to the application level. A locally developed microkernel, with ATM network protocol support, has been
used as a testbed for the ideas presented. Practical results over prototype ATM networks, including the 512
MHz Cambridge Backbone Network, are presented.
UCAM-CL-TR-246
Victor Poznański:
A relevance-based utterance
processing system
February 1992, 295 pages, PDF
PhD thesis (Girton College, December 1990)
Abstract: This thesis presents a computational interpretation of Sperber and Wilson’s relevance theory, based
on the use of non-monotonic logic supported by a reason maintenance system, and shows how the theory,
when given a specific form in this way, can provide a
unique and interesting account of discourse processing.
Relevance theory is a radical theory of natural language pragmatics which attempts to explain the whole
of human cognition using a single maxim: the Principle
of Optimal Relevance. The theory is seen by its originators as a computationally more adequate alternative
to Gricean pragmatics. Much as it claims to offer the
advantage of a unified approach to utterance comprehension, Relevance Theory is hard to evaluate because
Sperber and Wilson only provide vague, high-level descriptions of vital aspects of their theory. For example, the fundamental idea behind the whole theory is
that, in trying to understand an utterance, we attempt
to maximise significant new information obtained from
the utterance whilst consuming as little cognitive effort
as possible. However, Sperber and Wilson do not make
the nature of information and effort sufficiently clear.
Relevance theory is attractive as a general theory
of human language communication and as a potential
framework for computational language processing systems. The thesis seeks to clarify and flesh out the problem areas in order to develop a computational implementation which is used to evaluate the theory.
The early chapters examine and criticise the important aspects of the theory, emerging with a schema
for an ideal relevance-based system. Crystal, a computational implementation of an utterance processing
system based on this schema is then described. Crystal performs certain types of utterance disambiguation
and reference resolution, and computes implicatures according to relevance theory.
An adequate reasoning apparatus is a key component of a relevance based discourse processor, so a suitable knowledge representation and inference engine are
required. Various candidate formalisms are considered,
and a knowledge representation and inference engine
based on autoepistemic logic is found to be the most
suitable. It is then shown how this representation can
be used to meet particular discourse processing requirements, and how it provides a convenient interface to
a separate abduction system that supplies not demonstrative inferences according to relevence theory. Crystal’s powers are illustrated with examples, and the thesis shows how the design not only implements the less
precise areas of Sperber and Wilson’s theory, but overcomes problems with the theory itself.
Crystal uses rather crude heuristics to model notions
such as salience and degrees of belief. The thesis thefore
presents a proposal and outline for a new kind of reason maintenance system that supports non-monotonic
logic whose formulae re labelled with upper/lower
probability ranges intended to represent strength of
belief. This system should facilitate measurements of
change in semantic information and shed some light on
notions such as expected utility and salience.
The thesis concludes that the design and implementation of crystal provide evidence that relevance theory,
as a generic theory of language processing, is a viable
alternative theory of pragmatics. It therefore merits a
greater level of investigation than has been applied to it
to date.
UCAM-CL-TR-247
Roy Luis Crole:
Programming metalogics with a
fixpoint type
February 1992, 164 pages, paper copy
PhD thesis (Churchill College, January 1992)
UCAM-CL-TR-248
Richard J. Boulton:
On efficiency in theorem provers
which fully expand proofs into
primitive inferences
February 1992, 23 pages, DVI
Abstract: Theorem Provers which fully expand proofs
into applications of primitive inference rules can be
made highly secure, but have been criticized for being
orders of magnitude slower than many other theorem
provers. We argue that much of this relative inefficiency
is due to the way proof procedures are typically written
and not all is inherent in the way the systems work. We
support this claim by considering a proof procedure for
linear arithmetic. We show that straightforward techniques can be used to significantly cut down the computation required. An order of magnitude improvement
in the performance is shown by an implementation of
these techniques.
38
UCAM-CL-TR-249
UCAM-CL-TR-254
John P. Van Tassel:
Richard J. Boulton:
A formalisation of the VHDL
simulation cycle
A HOL semantics for a subset of
ELLA
March 1992, 24 pages, PDF
April 1992, 104 pages, DVI
Abstract: The VHSIC Hardware Description Language
(VHDL) has been gaining wide acceptance as a unifying HDL. It is, however, still a language in which the
only way of validating a design is by careful simulation.
With the aim of better understanding VHDL’s particular simulation process and eventually reasoning about
it, we have developed a formalisation of VHDL’s simulation cycle for a subset of the language. It has also
been possible to embed our semantics in the Cambridge
Higher-Order Logic (HOL) system and derive interesting properties about specific VHDL programs.
Abstract: Formal verification is an important tool in the
design of computer systems, especially when the systems are safety or security critical. However, the formal
techniques currently available are not well integrated
into the set of tools more traditionally used by designers. This work is aimed at improving the integration
by providing a formal semantics for a subset of the
hardware description language ELLA, and by supporting this semantics in the HOL theorem proving system,
which has been used extensively for hardware verification.
A semantics for a subset of ELLA is described, and
an outline of a proof of the equivalence of parallel and
recursive implementations of an n-bit adder is given as
an illustration of the semantics. The proof has been
performed in an extension of the HOL system. Some
proof tools written to support the verification are also
described.
UCAM-CL-TR-250
Innes A. Ferguson:
TouringMachines: autonomous
agents with attitudes
UCAM-CL-TR-255
April 1992, 19 pages, PostScript
UCAM-CL-TR-251
Xiaofeng Jiang:
Multipoint digital video
communication
Rachel Mary Cardell-Oliver:
The formal verification of hard
real-time systems
1992, 151 pages, paper copy
PhD thesis (Queens’ College, January 1992)
April 1992, 124 pages, paper copy
UCAM-CL-TR-256
PhD thesis (Wolfson College, December 1991)
UCAM-CL-TR-252
Andrew M. Pitts:
A co-induction principle for
recursively defined domains
25 pages, PostScript
UCAM-CL-TR-253
Antonio Sanfilippo:
The (other) Cambridge ACQUILEX
papers
Martin Richards:
MCPL programming manual
May 1992, 32 pages, paper copy
UCAM-CL-TR-257
Rajeev Prakhakar Goré:
Cut-free sequent and tableau systems
for propositional normal modal
logics
May 1992, 160 pages, PDF
141 pages, paper copy
39
Abstract: We present a unified treatment of tableau, sequent and axiomatic formulations for many propositional normal modal logics, thus unifying and extending the work of Hanson, Segerberg, Zeman, Mints, Fitting, Rautenberg and Shvarts. The primary emphasis
is on tableau systems as the completeness proofs are
easier in this setting. Each tableau system has a natural sequent analogue defining a finitary provability relation for each axiomatically formulated logic L. Consequently, any tableau proof can be converted into a
sequent proof which can be read downwards to obtain
an axiomatic proof. In particular, we present cut-free
sequent systems for the logics S4.3, S4.3.1 and S4.14.
These three logics have important temporal interpretations and the sequent systems appear to be new.
All systems are sound and (weakly) complete with
respect to their known finite frame Kripke semantics. By concentrating almost exclusively on finite tree
frames we obtain finer characterisation results, particularly for the logics with natural temporal interpretations. In particular, all proofs of tableau completeness
are constructive and yield the finite model property and
decidability for each logic.
Most of these systems are cut-free giving a Gentzen
cut-elimination theorem for the logic in question. But
even when the cut rule is required, all uses of it remain analytic. Some systems do not possess the subformula property. But in all such cases the class of “superformulae” remains bounded, giving an analytic superformula property. Thus all systems remain totally
amenable to computer implementation and immediately serve as nondeterministic decision procedures for
the logics they formulate. Furthermore, the constructive
completeness proofs yield deterministic decision procedures for all the logics concerned.
In obtaining these systems we domonstrate that the
subformula property can be broken in a systematic
and analytic way while still retaining decidability. This
should not be surprising since it is known that modal
logic is a form of second order logic and that the subformula property does not hold for higher order logics.
UCAM-CL-TR-258
David J. Greaves, Derek McAuley:
Private ATM networks
May 1992, 12 pages, paper copy
UCAM-CL-TR-259
Samson Abramsky, C.-H. Luke Ong:
Full abstraction in the Lazy Lambda
Calculus
104 pages, paper copy
UCAM-CL-TR-260
Henrik Reif Anderson:
Local computation of alternating
fixed-points
21 pages, paper copy
UCAM-CL-TR-261
Neil Anthony Dodgson:
Image resampling
August 1992, 264 pages, PDF
PhD thesis (Wolfson College)
Abstract: Image resampling is the process of geometrically transforming digital images. This report considers
several aspects of the process.
We begin by decomposing the resampling process
into three simpler sub-processes: reconstruction of a
continuous intensity surface from a discrete image,
transformation of that continuous surface, and sampling of the transformed surface to produce a new discrete image. We then consider the sampling process,
and the subsidiary problem of intensity quantisation.
Both these are well understood, and we present a summary of existing work, laying a foundation for the central body of the report where the sub-process of reconstruction is studied.
The work on reconstruction divides into four parts,
two general and two specific:
1. Piecewise local polynomials: the most studied
group of reconstructors. We examine these, and the criteria used in their design. One new derivation is of two
piecewise local quadratic reconstructors.
2. Infinite extent reconstructors: we consider these
and their local approximations, the problem of finite
image size, the resulting edge effects, and the solutions to these problems. Amongst the reconstructors
discussed are the interpolating cubic B-spline and the
interpolating Bezier cubic. We derive the filter kernels
for both of these, and prove that they are the same.
Given this kernel we demonstrate how the interpolating
cubic B-spline can be extended from a one-dimensional
to a two-dimensional reconstructor, providing a considerable speed improvement over the existing method of
extension.
3. Fast Fourier transform reconstruction: it has long
been known that the fast Fourier transform (FFT) can
be used to generate an approximation to perfect scaling
of a sample set. Donald Fraser (in 1987) took this result
and generated a hybrid FFT reconstructor which can be
used for general transformations, not just scaling. We
modify Fraser’s method to tackle two major problems:
its large time and storage requirements, and the edge
effects it causes in the reconstructed intensity surface.
4. A priori knowledge reconstruction: first considering what can be done if we know how the original
40
image was sampled, and then considering what can be
UCAM-CL-TR-267
done with one particular class of image coupled with
one particular type of sampling. In this latter case we Christine Ernoult, Alan Mycroft:
find that exact reconstruction of the image is possible.
This is a surprising result as this class of images cannot Untyped strictness analysis
be exactly reconstructed using classical sampling the13 pages, paper copy
ory.
The final section of the report draws all of the
UCAM-CL-TR-268
strands together to discuss transformations and the resampling process as a whole. Of particular note here
is work on how the quality of different reconstruction Paul W. Jardetzky:
and resampling methods can be assessed.
UCAM-CL-TR-262
Nick Benton, Gavin Bierman,
Valeria de Paiva:
Term assignment for
intuitionistic linear logic
(preliminary report)
August 1992, 57 pages, paper copy
UCAM-CL-TR-263
Network file server design for
continuous media
October 1992, 101 pages, PostScript
PhD thesis (Darwin College, August 1992)
Abstract: This dissertation concentrates on issues related to the provision of a network based storage facility for digital audio and video data. The goal is to
demonstrate that a distributed file service in support of
these media may be built without special purpose hardware. The main objective is to identify those parameters that affect file system performance and provide the
criteria for making desirable design decisions.
C.-H. Luke Ong:
UCAM-CL-TR-269
The Lazy Lambda Calculus:
an investigation into the foundations
of functional programming
Alan Mycroft, Arthur Norman:
August 1992, 256 pages, paper copy
23 pages, paper copy
Optimising compilation
PhD thesis (Imperial College London, May 1998)
UCAM-CL-TR-270
UCAM-CL-TR-264
Juanito Camilleri:
Chaoying Ma:
CCS with environmental guards
Designing a universal name service
August 1992, 19 pages, paper copy
133 pages, PDF
UCAM-CL-TR-265
Juanito Camilleri, Tom Melham:
Reasoning with inductively defined
relations in the HOL theorem prover
August 1992, 49 pages, paper copy
UCAM-CL-TR-266
Carole Klein:
Automatic exploitation of
OR-parallelism in Prolog
18 pages, paper copy
PhD thesis (Newnham College, October 1992)
Abstract: Generally speaking, naming in computing
systems deals with the creation of object identifiers at
all levels of system architecture and the mapping among
them. Two of the main purposes of having names in
computer systems are (a) to identify objects; (b) to accomplish sharing. Without naming no computer system
design can be done.
The rapid development in the technology of personal workstations and computer communication networks has placed a great number of demands on designing large computer naming systems. In this dissertation, issues of naming in large distributed computing
systems are addressed. Technical aspects as well as system architecture are examined. A design of a Universal Name Service (UNS) is proposed and its prototype
implementation is described. Three major issues on designing a global naming system are studied. Firstly, it is
41
observed that none of the existing name services provides enough flexibility in restructuring name spaces,
more research has to be done. Secondly it is observed
that although using stale naming data (hints) at the application level is acceptable in most cases as long as it
is detectable and recoverable, stronger naming data integrity should be maintained to provide a better guarantee of finding objects, especially when a high degree
of availability is required. Finally, configuring the name
service is usually done in an ad hoc manner, leading
to unexpected interruptions or a great deal of human
intervention when the system is reconfigured. It is necessary to make a systematic study of automatic configuration and reconfiguration of name services.
This research is based on a distributed computing
model, in which a number of computers work cooperatively to provide the service. The contributions include: (a) the construction of a Globally Unique Directory Identifier (GUDI) name space. Flexible name space
restructuring is supported by allowing directories to be
added to or removed from the GUDI name space. (b)
The definition of a two class name service infrastructure which exploits the semantics of naming. It makes
the UNS replication control more robust, reliable as
well as highly available. (c) The identification of two
aspects in the name service configuration: one is concerned with the replication configuration, and the other
is concerned with the server configuration. It is notable
that previous work only studied these two aspects individually but not in combination. A distinguishing feature of the UNS is that both issues are considered at the
design stage and novel methods are used to allow dynamic service configuration to be done automatically
and safely.
UCAM-CL-TR-271
Lawrence C. Paulson:
Set theory as a computational logic: I.
from foundations to functions
November 1992, 28 pages, PDF
Abstract: A logic for specification and verification is derived from the axioms of Zermelo-Fraenkel set theory.
The proofs are performed using the proof assistant Isabelle. Isabelle is generic, supporting several different
logics. Isabelle has the flexibility to adapt to variants of
set theory. Its higher-order syntax supports the definition of new binding operators. Unknowns in subgoals
can be instantiated incrementally. The paper describes
the derivation of rules for descriptions, relations and
functions, and discusses interactive proofs of Cantor’s
Theorem, the Composition of Homomorphisms challenge, and Ramsey’s Theorem. A generic proof assistant can stand up against provers dedicated to particular logics.
UCAM-CL-TR-272
Martin David Coen:
Interactive program derivation
November 1992, 100 pages, PDF
PhD thesis (St John’s College, March 1992)
Abstract: As computer programs are increasingly used
in safety critical applications, program correctness is
becoming more important; as the size and complexity of programs increases, the traditional approach of
testing is becoming inadequate. Proving the correctness
of programs written in imperative languages is awkward; functional programming languages, however, offer more hope. Their logical structure is cleaner, and it
is practical to reason about terminating functional programs in an internal logic.
This dissertation describes the development of a logical theory called TPT for reasoning about the correctness of terminating functional programs, its implementation using the theorem prover Isabelle, and its use
in proving formal correctness. The theory draws both
from Martin-Löf’s work in type theory and Manna and
Waldinger’s work in program synthesis. It is based on
classical first-order logic, and it contains terms that
represent classes of behaviourally equivalent programs,
types that denote sets of terminating programs and
well-founded orderings. Well-founded induction is used
to reason about general recursion in a natural way and
to separate conditions for termination from those for
correctness.
The theory is implemented using the generic theorem prover Isabelle, which allows correctness proofs
to be checked by machine and partially automated using tactics. In particular, tactics for type checking use
the structure of programs to direct proofs. Type checking allows both the verification and derivation of programs, reducing specifications of correctness to sets of
correctness conditions. These conditions can be proved
in typed first-order logic, using well-known techniques
of reasoning by induction and rewriting, and then lifted
up to TPT. Examples of program termination are asserted and proved, using simple types. Behavioural
specifications are expressed using dependent types, and
the correctness of programs asserted and then proved.
As a non-trivial example, a unification algorithm is
specified and proved correct by machine.
The work in this dissertation clearly shows how
a classical theory can be used to reason about program correctness, how general recursion can be reasoned about, and how programs can direct proofs of
correctness.
UCAM-CL-TR-273
Innes A. Ferguson:
TouringMachines:
42
an architecture for dynamic, rational,
mobile agents
providing useful information for future games. An approach to addressing this question is developed using
probability theory, and then implemented in two different learning methods. Initial experiments in the game
November 1992, 206 pages, PDF
of Go suggest that a program which takes exploration
PhD thesis (Clare Hall, October 1992)
into account can learn better against a knowledgeable
Abstract: It is becoming widely accepted that nei- opponent than a program which does not.
ther purely reactive nor purely deliberative control
UCAM-CL-TR-276
techniques are capable of producing the range of
behaviours required of intelligent computational or
robotic agents in dynamic, unpredictable, multi-agent Barney Pell:
worlds. We present a new architecture for controlling autonomous, mobile agents – building on previous METAGAME: a new challenge for
work addressing reactive and deliberative control methgames and learning
ods. The proposed multi-layered control architecture
allows a resource-bounded, goal-directed agent to react 15 pages, PostScript
promptly to unexpected changes in its environment; at
the same time it enables the agent to reason predictively Abstract: In most current approaches to Computer
about potential conflicts by constructing and project- Game-Playing, including those employing some form
ing causal models or theories which hypothesise other of machine learning, the game analysis mainly is peragents’ goals and intentions.
formed by humans. Thus, we are sidestepping largely
The line of research adopted is very much a prag- the interesting (and difficult) questions. Human analymatic one. A single, common architecture has been sis also makes it difficult to evaluate the generality and
implemented which, being extensively parametrized, applicability of different approaches.
allows an experimenter to study functionally- and
To address these problems, we introduce a new chalbehaviourally-diverse agent configurations. A principal lenge: Metagame. The idea is to write programs which
aim of this research is to understand the role different take as input the rules of a set of new games within
functional capabilities play in constraining an agent’s a pre-specified class, generated by a program which is
behaviour under varying environmental conditions. To publicly available. The programs compete against each
this end, we have constructed an experimental testbed other in many matches on each new game, and they can
comprising a simulated multi-agent world in which a then be evaluated based on their overall performance
variety of agent configurations and behaviours have and improvement through experience.
been investigated. Experience with the new control arThis paper discusses the goals, research areas, and
chitecture is described.
general concerns for the idea of Metagame.
UCAM-CL-TR-274
UCAM-CL-TR-277
Paul Curzon:
Barney Pell:
Of what use is a verified compiler
specification?
METAGAME in symmetric chess-like
games
23 pages, paper copy
30 pages, PostScript
UCAM-CL-TR-275
Barney Pell:
Exploratory learning in the game of
GO
18 pages, PostScript
Abstract: This paper considers the importance of exploration to game-playing programs which learn by playing against opponents. The central question is whether
a learning program should play the move which offers the best chance of winning the present game, or
if it should play the move which has the best chance of
Abstract: I have implemented a game generator that
generates games from a wide but still restricted class.
This class is general enough to include most aspects
of many standard games, including Chess, Shogi, Chinese Chess, Checkers, Draughts, and many variants of
Fairy Chess. The generator, implemented in Prolog is
transparent and publicly available, and generates games
using probability distributions for parameters such as
piece complexity, types of movement, board size, and
locality.
The generator is illustrated by means of a new game
it produced, which is then subjected to a simple strategic analysis. This form of analysis suggests that programs to play Metagame well will either learn or apply very general game-playing principles. But because
43
the class is still restricted, it may be possible to develop a naive but fast program which can outplay more
sophisticated opponents. Performance in a tournament
between programs is the deciding criterion.
UCAM-CL-TR-282
Ian M. Leslie, Derek McAuley,
Sape J. Mullender:
Pegasus – Operating system support
for distributed multimedia systems
UCAM-CL-TR-278
Monica Nesi:
December 1992, 14 pages, paper copy
A formalization of the process
algebra CCS in high order logic
Lawrence C. Paulson:
42 pages, PDF
The Isabelle reference manual
UCAM-CL-TR-283
Abstract: This paper describes a mechanization in
higher order logic of the theory for a subset of Milner’s CCS. The aim is to build a sound and effective
tool to support verification and reasoning about process algebra specifications. To achieve this goal, the formal theory for pure CCS (no value passing) is defined in
the interactive theorem prover HOL, and a set of proof
tools, based on the algebraic presentation of CCS, is
provided.
UCAM-CL-TR-279
February 1993, 78 pages, DVI
Abstract: This manual is a comprehensive description
of Isabelle, including all commands, functions and
packages. It is intended for reference rather than for
reading through, and is certainly not a tutorial. The
manual assumes familiarity with the basic concepts explained in Introduction to Isabelle. Functions are organized by their purpose, by their operands (subgoals,
tactics, theorems), and by their usefulness. In each section, basic functions appear first, then advanced functions, and finally esoteric functions.
UCAM-CL-TR-284
Victor A. Carreño:
The transition assertions specification Claire Grover, John Carroll, Ted Briscoe:
The Alvey Natural Language Tools
method
grammar (4th Release)
18 pages, paper copy
January 1993, 260 pages, paper copy
UCAM-CL-TR-280
UCAM-CL-TR-285
Lawrence C. Paulson:
Andrew Donald Gordon:
Introduction to Isabelle
Functional programming and
input/output
January 1993, 61 pages, DVI
February 1993, 163 pages, paper copy
Abstract: Isabelle is a generic theorem prover, support- PhD thesis (King’s College, August 1992)
ing formal proof in a variety of logics. Through a vaUCAM-CL-TR-286
riety of examples, this paper explains the basic theory
demonstrates the most important commands. It serves Lawrence C. Paulson:
as the introduction to other Isabelle documentation.
Isabelle’s object-logics
UCAM-CL-TR-281
Sape J. Mullender, Ian M. Leslie,
Derek McAuley:
Pegasus project description
September 1992, 23 pages, paper copy
February 1993, 161 pages, DVI
Abstract: Several logics come with Isabelle. Many of
them are sufficiently developed to serve as comfortable
reasoning environments. They are also good starting
points for defining new logics. Each logic is distributed
with sample proofs, some of which are presented in
the paper. The logics described include first-order logic,
Zermelo-Fraenkel set theory, higher-order logic, constructive type theory, and the classical sequent calculus
LK. A final chapter explains the fine points of defining
logics in Isabelle.
44
UCAM-CL-TR-287
UCAM-CL-TR-291
Andrew D. Gordon:
J.R. Galliers, K. Spärck Jones:
A mechanised definition of Silage in
HOL
Evaluating natural language
processing systems
February 1993, 28 pages, DVI
February 1993, 187 pages, PostScript
Abstract: If formal methods of hardware verification
are to have any impact on the practices of working engineers, connections must be made between the languages
used in practice to design circuits, and those used for
research into hardware verification. Silage is a simple
dataflow language marketed for specifying digital signal processing circuits. Higher Order Logic (HOL) is
extensively used for research into hardware verification. This paper presents a formal definition of a substantial subset of Silage, by mapping Silage declarations
into HOL predicates. The definition has been mechanised in the HOL theorem prover to support the transformational design of Silage circuits as theorem proving
in HOL.
Abstract: This report presents a detailed analysis and
review of NLP evaluation, in principle and in practice.
Part 1 examines evaluation concepts and establishes
a framework for NLP system evaluation. This makes
use of experience in the related area of information
retrieval and the analysis also refers to evaluation in
speech processing. Part 2 surveys significant evaluation
work done so far, for instance in machine translation,
and discusses the particular problems of generic system
evaluation. The conclusion is that evaluation strategies
and techniques for NLP need much more development,
in particular to take proper account of the influence of
system tasks and settings. Part 3 develops a general approach to NLP evaluation, aimed at methodologicallysound strategies for test and evaluation motivated by
comprehensive performance factor identification. The
analysis throughout the report is supported by extensive illustrative examples.
UCAM-CL-TR-288
Rajeev Gore:
Cut-free sequent and tableau systems
for propositional Diodorean modal
logics
February 1993, 19 pages, paper copy
UCAM-CL-TR-289
UCAM-CL-TR-292
Cormac John Sreenan:
Synchronisation services for
digital continuous media
March 1993, 123 pages, PostScript
PhD thesis (Christ’s College, October 1992)
David Alan Howard Elworthy:
The semantics of noun phrase
anaphora
February 1993, 191 pages, paper copy
PhD thesis (Darwin College, February 1993)
UCAM-CL-TR-290
Karen Spärck Jones:
Discourse modelling for automatic
summarising
February 1993, 30 pages, paper copy
Abstract: The development of broadband ATM networking makes it attractive to use computer communication networks for the transport of digital audio
and motion video. Coupled with advances in workstation technology, this creates the opportunity to integrate these continuous information media within a distributed computing system. Continuous media have an
inherent temporal dimension, resulting in a set of synchronisation requirements which have real-time constraints. This dissertation identifies the role and position of synchronisation, in terms of the support which
is necessary in an integrated distributed system. This
work is supported by a set of experiments which were
performed in an ATM inter-network using multi-media
workstations, each equipped with an Olivetti Pandora
Box.
45
UCAM-CL-TR-293
Jean Bacon, Ken Moody:
UCAM-CL-TR-298
John Matthew Simon Doar:
Objects and transactions for
Multicast in the asynchronous
modelling distributed applications:
transfer mode environment
concurrency control and commitment April 1993, 168 pages, PostScript
April 1993, 39 pages, paper copy
UCAM-CL-TR-294
Ken Moody, Jean Bacon, Noha Adly,
Mohamad Afshar, John Bates,
Huang Feng, Richard Hayton, Sai Lai Lo,
Scarlet Schwiderski, Robert Sultana,
Zhixue Wu:
OPERA
Storage, programming and display of
multimedia objects
April 1993, 9 pages, paper copy
PhD thesis (St John’s College, January 1993)
Abstract: In future multimedia communication networks, the ability to multicast information will be useful for many new and existing services. This dissertation considers the design of multicast switches for
Asynchronous Transfer Mode (ATM) networks and
proposes one design based upon a slotted ring. Analysis and simulation studies of this design are presented
and details of its implementation for an experimental
ATM network (Project Fairisle) are described, together
with the modifications to the existing multi-service protocol architecture necessary to provide multicast connections. Finally, a short study of the problem of multicast routing is presented, together with some simulations of the long-term effect upon the routing efficiency
of modifying the number of destinations within a multicast group.
UCAM-CL-TR-295
Jean Bacon, John Bates, Sai Lai Lo,
Ken Moody:
OPERA
Storage and presentation support
for multimedia applications
in a distributed, ATM network
environment
April 1993, 12 pages, paper copy
UCAM-CL-TR-296
Z. Wu, K. Moody, J. Bacon:
A persistent programming language
for multimedia databases in the
OPERA project
April 1993, 9 pages, paper copy
UCAM-CL-TR-297
UCAM-CL-TR-299
Bjorn Gamback, Manny Rayner, Barney Pell:
Pragmatic reasoning in bridge
April 1993, 23 pages, PostScript
Abstract: In this paper we argue that bidding in the
game of Contract Bridge can profitably be regarded as a
micro-world suitable for experimenting with pragmatics. We sketch an analysis in which a “bidding system”
is treated as the semantics of an artificial language, and
show how this “language”, despite its apparent simplicity, is capable of supporting a wide variety of common
speech acts parallel to those in natural languages; we
also argue that the reason for the relatively unsuccessful nature of previous attempts to write strong Bridge
playing programs has been their failure to address the
need to reason explicitly about knowledge, pragmatics, probabilities and plans. We give an overview of
Pragma, a system currently under development, which
embodies these ideas in concrete form, using a combination of rule-based inference, stochastic simulation,
and “neural-net” learning. Examples are given illustrating the functionality of the system in its current form.
Eike Ritter:
Categorical abstract machines for
higher-order lambda calculi
April 1993, 149 pages, paper copy
PhD thesis (Trinity College)
UCAM-CL-TR-300
Wai Wong:
Formal verification of VIPER’s ALU
April 1993, 78 pages, paper copy
46
UCAM-CL-TR-301
Zhixue Wu, Ken Moody, Jean Bacon:
UCAM-CL-TR-304
Lawrence C. Paulson:
Co-induction and co-recursion in
The dual-level validation concurrency
higher-order logic
control method
June 1993, 24 pages, paper copy
UCAM-CL-TR-302
Barney Pell:
Logic programming for general
game-playing
June 1993, 15 pages, PostScript
July 1993, 35 pages, PDF
Abstract: A theory of recursive and corecursive definitions has been developed in higher-order logic (HOL)
and mechanised using Isabelle. Least fixedpoints express inductive data types such as strict lists; greatest
fixedpoints express co-inductive data types, such as lazy
lists. Well-founded recursion expresses recursive functions over inductive data types; co-recursion expresses
functions that yield elements of co-inductive data types.
The theory rests on a traditional formalization of infinite trees. The theory is intended for use in specification and verification. It supports reasoning about a
wide range of computable functions, but it does not formalize their operational semantics and can express noncomputable functions also. The theory is demonstrated
using lists and lazy lists as examples. The emphasis is
on using co-recursion to define lazy list functions, and
on using co-induction to reason about them.
Abstract: Meta-Game Playing is a new approach to
games in Artificial Intelligence, where we construct programs to play new games in a well-defined class, which
are output by an automatic game generator. As the specific games to be played are not known in advance, a
degree of human bias is eliminated, and playing programs are required to perform any game-specific optiUCAM-CL-TR-305
misations without human assistance.
The attempt to construct a general game-playing
P.N. Benton:
program is made difficult by the opposing goals of generality and efficiency. This paper shows how applica- Strong normalisation for the
tion of standard techniques in logic-programming (abstract interpretation and partial evaluation) makes it linear term calculus
possible to achieve both of these goals. Using these July 1993, 13 pages, paper copy
techniques, we can represent the semantics of a large
UCAM-CL-TR-306
class of games in a general and declarative way, but
then have the program transform this representation
Wai Wong:
into a more efficient version once it is presented with
the rules of a new game. This process can be viewed Recording HOL proofs
as moving some of the responsibility for game analysis
(that concerned with efficiency) from the researcher to July 1993, 57 pages, paper copy
the program itself.
UCAM-CL-TR-307
UCAM-CL-TR-303
Andrew Kennedy:
Drawing trees —
a case study in functional
programming
June 1993, 9 pages, paper copy
David D. Lewis, Karen Spärck Jones:
Natural language processing for
information retrieval
July 1993, 22 pages, PostScript
Abstract: The paper summarizes the essential properties
of document retrieval and reviews both conventional
practice and research findings, the latter suggesting that
simple statistical techniques can be effective. It then
considers the new opportunities and challenges presented by the ability to search full text directly (rather
than e.g. titles and abstracts), and suggests appropriate
approaches to doing this, with a focus on the role of
natural language processing. The paper also comments
on possible connections with data and knowledge retrieval, and concludes by emphasizing the importance
of rigorous performance testing.
47
UCAM-CL-TR-308
Jacob Frost:
A case study of co-induction in
Isabelle HOL
August 1993, 27 pages, PDF
Abstract: The consistency of the dynamic and static semantics for a small functional programming language
was informally proved by R. Milner and M. Tofte. The
notions of co-inductive definitions and the associated
principle of co-induction played a pivotal role in the
proof. With emphasis on co-induction, the work presented here deals with the formalisation of this result
in the higher-order logic of the generic theorem prover
Isabelle.
UCAM-CL-TR-309
Abstract: A theory of recursive definitions has been
mechanized in Isabelle’s Zermelo-Fraenkel (ZF) set theory. The objective is to support the formalization of
particular recursive definitions for use in verification,
semantics proofs and other computational reasoning.
Inductively defined sets are expressed as least fixedpoints, applying the Knaster-Tarski Theorem over a
suitable set. Recursive functions are defined by wellfounded recursion and its derivatives, such as transfinite recursion. Recursive data structures are expressed
by applying the Knaster-Tarski Theorem to a set that is
closed under Cartesian product and disjoint sum.
Worked examples include the transitive closure of
a relation, lists, variable-branching trees and mutually recursive trees and forests. The Schröder-Bernstein
Theorem and the soundness of propositional logic are
proved in Isabelle sessions.
UCAM-CL-TR-313
Yves Bertot, Gilles Kahn, Laurent Théry:
Peter Nicholas Benton:
Proof by pointing
Strictness analysis of lazy functional
programs
October 1993, 27 pages, paper copy
UCAM-CL-TR-314
August 1993, 154 pages, paper copy
PhD thesis (Pembroke College, December 1992)
UCAM-CL-TR-310
Noha Adly:
John Andrew Carroll:
Practical unification-based parsing of
natural language
173 pages, PostScript
HARP: a hierarchical asynchronous
replication protocol for massively
replicated systems
August 1993, 34 pages, PostScript
UCAM-CL-TR-311
Paul Curzon:
A verified Vista implementation
September 1993, 56 pages, paper copy
UCAM-CL-TR-312
Lawrence C. Paulson:
Set theory for verification:
II
Induction and recursion
September 1993, 46 pages, PDF
PhD thesis (September 1993)
Abstract: The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural
Language (NL) texts with a wide-coverage unificationbased grammar of English. The thesis tackles two of the
major problems in this area: firstly, the fact that parsing
realistic inputs with such grammars can be computationally very expensive, and secondly, the observation
that many analyses are often assigned to an input, only
one of which usually forms the basis of the correct interpretation.
The thesis starts by presenting a new unification algorithm, justifies why it is well-suited to practical NL
parsing, and describes a bottom-up active chart parser
which employs this unification algorithm together with
several other novel processing and optimisation techniques. Empirical results demonstrate that an implementation of this parser has significantly better practical performance than a comparable, state-of-the-art
unification-based parser. Next, techniques for computing an LR table for a large unification grammar are described, a context free non-deterministic LR parsing algorithm is presented which has better time complexity
than any previously reported using the same approach,
and a unification-based version is derived. In experiments, the performance of an implementation of the
48
latter is shown to exceed both the chart parser and also
that of another efficient LR-like algorithm recently proposed.
Building on these methods, a system for parsing
text taken from a given corpus is described which uses
probabilistic techniques to identify the most plausible
syntactic analyses for an input from the often large
number licensed by the grammar. New techniques implemented include an incremental approach to semisupervised training, a context-sensitive method of scoring sub-analyses, the accurate manipulation of probabilities during parsing, and the identification of the
highest ranked analyses without exhaustive search. The
system attains a similar success rate to approaches
based on context-free grammar, but produces analyses
which are more suitable for semantic processing.
The thesis includes detailed analyses of the worstcase space and time complexities of all the main algorithms described, and discusses the practical impact of
the theoretical complexity results.
each game it is given. This appears to be the first program to have derived useful piece values directly from
analysis of the rules of different games.
Experiments show that the knowledge implemented
in METAGAMER is useful on games unknown to its
programmer in advance of the competition and make
it seem likely that future programs which incorporate
learning and more sophisticated active-analysis techniques will have a demonstrable competitive advantage
on this new problem. When playing the known games
of chess and checkers against humans and specialised
programs, METAGAMER has derived from more general principles some strategies which are familiar to
players of those games and which are hard-wired in
many game-specific programs.
UCAM-CL-TR-316
Ann Copestake:
The Compleat LKB
UCAM-CL-TR-315
August 1993, 126 pages, PostScript
Barney Darryl Pell:
UCAM-CL-TR-317
Strategy generation and evaluation
for meta-game playing
John Peter Van Tassel:
November 1993, 289 pages, PostScript
PhD thesis (Trinity College, August 1993)
Abstract: Meta-Game Playing (METAGAME) is a new
paradigm for research in game-playing in which we design programs to take in the rules of unknown games
and play those games without human assistance. Strong
performance in this new paradigm is evidence that the
program, instead of its human designer, has performed
the analysis of each specific game.
SCL-METAGAME is a concrete METAGAME research problem based around the class of symmetric chess-like games. The class includes the games of
chess, checkers, noughts and crosses, Chinese-chess,
and Shogi. An implemented game generator produces
new games in this class, some of which are objects of
interest in their own right.
METAGAMER is a program that plays SCLMETAGAME. The program takes as input the rules of
a specific game and analyses those rules to construct
for that game an efficient representation and an evaluation function, both for use with a generic search engine.
The strategic analysis performed by the program relates
a set of general knowledge sources to the details of the
particular game. Among other properties, this analysis
determines the relative value of the different pieces in
a given game. Although METAGAMER does not learn
from experience, the values resulting from its analysis
are qualitatively similar to values used by experts on
known games, and are sufficient to produce competitive
performance the first time the program actually plays
Femto-VHDL:
the semantics of a subset of VHDL
and its embedding in the HOL proof
assistant
November 1993, 122 pages, paper copy
PhD thesis (Gonville & Caius College, July 1993)
UCAM-CL-TR-318
Jim Grundy:
A method of program refinement
November 1993, 207 pages, PostScript
PhD thesis (Fitzwilliam College, November 1993)
Abstract: A method of specifying the desired behaviour
of a computer program, and of refining such specifications into imperative programs is proposed. The refinement method has been designed with the intention of
being amenable to tool support, and of being applicable to real-world refinement problems.
Part of the refinement method proposed involves
the use of a style of transformational reasoning called
‘window inference’. Window inference is particularly
powerful because it allows the information inherent in
the context of a subexpression to be used in its transformation. If the notion of transformational reasoning
is generalised to include transformations that preserve
49
relationships weaker than equality, then program refinement can be regarded as a special case of transformational reasoning. A generalisation of window inference is described that allows non-equivalence preserving transformations. Window inference was originally proposed independently from, and as an alternative to, traditional styles of reasoning. A correspondence between the generalised version of window inference and natural deduction is described. This correspondence forms the basis of a window inference tool
that has been built on top of the HOL theorem proving
system.
This dissertation adopts a uniform treatment of
specifications and programs as predicates. A survey of
the existing approaches to the treatment of programs
as predicates is presented. A new approach is then developed based on using predicates of a three-valued
logic. This new approach can distinguish more easily
between specifications of terminating and nonterminating behaviour than can the existing approaches.
A method of program refinement is then described
by combining the unified treatment of specifications
and programs as three-valued predicates with the window inference style of transformational reasoning. The
result is a simple method of refinement that is well
suited to the provision of tool support.
The method of refinement includes a technique for
developing recursive programs. The proof of such developments is usually complicated because little can be
assumed about the form and termination properties of
a partially developed program. These difficulties are
side-stepped by using a simplified meaning for recursion that compels the development of terminating programs. Once the development of a program is complete,
the simplified meaning for recursion is refined into the
true meaning.
The dissertation concludes with a case study which
presents the specification and development of a simple
line-editor. The case study demonstrates the applicability of the refinement method to real-world problems.
The line editor is a nontrivial example that contains
features characteristic of large developments, including
complex data structures and the use of data abstraction. Examination of the case study shows that window
inference offers a convenient way of structuring large
developments.
dissertation considers an architecture for a workstation
to support such traffic effectively. In addition to presenting the information to a human user the architecture allows processing to be done on continuous media
streams.
The proposed workstation architecture, known as
the Desk Area Network (DAN), extends ideas from
Asynchronous Transfer Mode (ATM) networks into the
end-system. All processors and devices are connected
to an ATM interconnect. The architecture is shown to
be capable of supporting both multimedia data streams
and more traditional CPU cache line traffic. The advocated extension of the CPU cache which allows caching
of multimedia data streams is shown to provide a natural programming abstraction and a mechanism for synchronising the processor with the stream.
A prototype DAN workstation has been built. Experiments have been done to demonstrate the features
of the architecture. In particular the use of the DAN as
a processor-to-memory interconnect is closely studied
to show the practicality of using ATM for cache line
traffic in a real machine. Simple demonstrations of the
stream cache ideas are used to show its utility in future
applications.
UCAM-CL-TR-320
Lawrence C. Paulson:
A fixedpoint approach to
implementing (co)inductive
definitions (updated version)
July 1995, 29 pages, PDF
Abstract: Several theorem provers provide commands
for formalizing recursive datatypes or inductively defined sets. This paper presents a new approach, based
on fixedpoint definitions. It is unusually general: it admits all monotone inductive definitions. It is conceptually simple, which has allowed the easy implementation of mutual recursion and other conveniences. It also
handles coinductive definitions: simply replace the least
fixedpoint by a greatest fixedpoint. This represents the
first automated support for coinductive definitions.
The method has been implemented in Isabelle’s formalization
of ZF set theory. It should be applicable to
UCAM-CL-TR-319
any logic in which the Knaster-Tarski Theorem can be
proved. The paper briefly describes a method of formalMark David Hayter:
izing non-well-founded data structures in standard ZF
set theory.
A workstation architecture to
Examples include lists of n elements, the accessible
support multimedia
part of a relation and the set of primitive recursive functions. One example of a coinductive definition is bisimNovember 1993, 99 pages, PostScript
ulations for lazy lists. Recursive datatypes are examined
PhD thesis (St John’s College, September 1993)
in detail, as well as one example of a “codatatype”: lazy
lists. The appendices are simple user’s manuals for this
Abstract: The advent of high speed networks in the
Isabelle/ZF package.
wide and local area enables multimedia traffic to be
easily carried between workstation class machines. The
50
UCAM-CL-TR-321
UCAM-CL-TR-325
Andrew M. Pitts:
Richard Crouch:
Relational properties of domains
The temporal properties of English
conditionals and modals
December 1993, 38 pages, PostScript
Abstract: New tools are presented for reasoning about
properties of recursively defined domains. We work
within a general, category-theoretic framework for various notions of ‘relation’ on domains and for actions
of domain constructors on relations. Freyd’s analysis of
recursive types in terms of a property of mixed initiality/finality is transferred to a corresponding property
of invariant relations. The existence of invariant relations is proved under completeness assumptions about
the notion of relation. We show how this leads to simpler proofs of the computational adequacy of denotational semantics for functional programming languages
with user-declared datatypes. We show how the initiality/finality property of invariant relations can be specialized to yield an induction principle for admissible
subsets of recursively defined domains, generalizing the
principle of structural induction for inductively defined
sets. We also show how the initiality/finality property
gives rise to the co-induction principle studied by the
author (in UCAM-CL-TR-252), by which equalities between elements of recursively defined domains may be
proved via an appropriate notion of ‘bisimulation’.
UCAM-CL-TR-322
Guangxing Li:
Supporting distributed realtime
computing
December 1993, 113 pages, paper copy
PhD thesis (King’s College, August 1993)
UCAM-CL-TR-323
J. von Wright:
Representing higher-order logic
proofs in HOL
January 1994, 248 pages, PDF
PhD thesis (April 1993)
Abstract: This thesis deals with the patterns of temporal
reference exhibited by conditional and modal sentences
in English, and specifically with the way that past and
present tenses can undergo deictic shift in these contexts. This shifting behaviour has consequences both
for the semantics of tense and for the semantics of conditionals and modality.
Asymmetries in the behaviour of the past and
present tenses under deictic shift are explained by
positing a primary and secondary deictic centre for
tenses. The two deictic centres, the assertion time and
the verification time, are given independent motivation
through an information based view of tense. This holds
that the tense system not only serves to describe the
way that the world changes over time, but also the way
that information about the world changes. Information
change takes place in two stages. First, it is asserted
that some fact holds. And then, either at the same time
or later, it is verified that is assertion is correct.
Typically, assertion and verification occur simultaneously, and most sentences convey verified information. Modals and conditionals allow delayed assertion
and verification. “If A, then B” means roughly: suppose
you were now to assert A; if and when A is verified, you
will be in a position to assert B, and in due course this
assertion will also be verified. Since A and B will both
be tensed clauses, the shifting of the primary and secondary deictic centres leads to shifted interpretations of
the two clauses.
The thesis presents a range of temporal properties
of indicative and subjunctive conditionals that have not
previously been discussed, and shows how they can be
explained. A logic is presented for indicative conditionals, based around an extension of intuitionistic logic to
allow for both verified and unverified assertions. This
logic naturally gives rise to three forms of epistemic
modality, corresponding to “must”, “may” and “will”.
UCAM-CL-TR-326
January 1994, 28 pages, paper copy
UCAM-CL-TR-324
J. von Wright:
Verifying modular programs in
HOL
Sai-Lai Lo:
A modular and extensible network
storage architecture
January 1994, 147 pages, PostScript
PhD thesis (Darwin College, November 1993)
January 1994, 25 pages, paper copy
51
Abstract: Most contemporary distributed file systems
are not designed to be extensible. This work asserts that
the lack of extensibility is a problem because:
– New data types, such as continuous-medium data
and structured data, are significantly different from
conventional unstructured data, such as text and binary, that contemporary distributed file systems are
built to support.
– Value-adding clients can provide functional enhancements, such as convenient and reliable persistent
programming and automatic and transparent file indexing, but cannot be integrated smoothly with contemporary distributed file systems.
– New media technologies, such as the optical jukebox and RAID disk, can extend the scale and performance of a storage service but contemporary distributed file systems do not have a clear framework to
incorporate these new technologies and to provide the
necessary user level transparency.
Motivated by these observations, the new network
storage architecture (MSSA) presented in this dissertation, is designed to be extensible. Design modularity is
taken as the key to achieve service extensibility. This
dissertation examines a number of issues related to the
design of the architecture. New ideas, such as a flexible access control mechanism based on temporary capabilities, a low level storage substrate that uses nonvolatile memory to provide atomic update semantics at
high performance, a concept of sessions to differentiate performance requirements of different data types,
are introduced. Prototype implementations of the key
components are evaluated.
UCAM-CL-TR-327
Siani L. Baker:
A new application for
explanation-based generalisation
within automated deduction
February 1994, 18 pages, paper copy
UCAM-CL-TR-328
Paul Curzon:
The formal verification of the
Fairisle ATM switching element:
an overview
March 1994, 46 pages, paper copy
UCAM-CL-TR-329
Paul Curzon:
The formal verification of the
Fairisle ATM switching element
March 1994, 105 pages, paper copy
UCAM-CL-TR-330
Pierre David Wellner:
Interacting with paper on the
DigitalDesk
March 1994, 96 pages, PDF
PhD thesis (Clare Hall, October 1993)
Abstract: In the 1970’s Xerox PARC developed the
“desktop metaphor,” which made computers easy to
use by making them look and act like ordinary desks
and paper. This led visionaries to predict the “paperless office” would dominate within a few years, but the
trouble with this prediction is that people like paper
too much. It is portable, tactile, universally accepted,
and easier to read than a screen. Today, we continue to
use paper, and computers produce more of it than they
replace.
Instead of trying to use computers to replace paper,
the DigitalDesk takes the opposite approach. It keeps
the paper, but uses computers to make it more powerful. It provides a Computer Augmented Environment
for paper.
The DigitalDesk is built around an ordinary physical desk and can be used as such, but it has extra capabilities. A video camera is mounted above the desk,
pointing down at the work surface. This camera’s output is fed through a system that can detect where the
user is pointing, and it can read documents that are
placed on the desk. A computer-driven electronic projector is also mounted above the desk, allowing the system to project electronic objects onto the work surface
and onto real paper documents — something that can’t
be done with flat display panels or rear-projection. The
system is called DigitalDesk because it allows pointing
with the fingers.
Several applications have been prototyped on the
DigitalDesk. The first was a calculator where a sheet
of paper such as an annual report can be placed on the
desk allowing the user to point at numbers with a finger or pen. The camera reads the numbers off the paper, recognizes them, and enters them into the display
for further calculations. Another is a translation system which allows users to point at unfamiliar French
words to get their English definitions projected down
next to the paper. A third is a paper-based paint program (PaperPaint) that allows users to sketch on paper using traditional tools, but also be able to select
and paste these sketches with the camera and projector to create merged paper and electronic documents.
A fourth application is the DoubleDigitalDesk, which
allows remote colleagues to “share” their desks, look
at each other’s paper documents and sketch on them
remotely.
This dissertation introduces the concept of Computer Augmented Environments, describes the DigitalDesk and applications for it, and discusses some
of the key implementation issues that need to be addressed to make this system work. It describes a toolkit
52
UCAM-CL-TR-335
for building DigitalDesk applications, and it concludes
with some more ideas for future work.
G.J.F. Jones, J.T. Foote, K. Spärck Jones,
S.J. Young:
UCAM-CL-TR-331
Video mail retrieval using voice:
report on keyword definition and
data collection (deliverable report on
VMR task No. 1)
Noha Adly, Akhil Kumar:
HPP: a hierarchical propagation
protocol for large scale replication in
wide area networks
April 1994, 38 pages, PDF
March 1994, 24 pages, paper copy
Abstract: This report describes the rationale, design,
collection and basic statistics of the initial training and
test database for the Cambridge Video Mail Retrieval
(VMR) project. This database is intended to support
both training for the wordspotting processes and testing for the document searching methods using these
that are being developed for the project’s message retrieval task.
UCAM-CL-TR-332
David Martin Evers:
Distributed computing with objects
March 1994, 154 pages, paper copy
PhD thesis (Queens’ College, September 1993)
UCAM-CL-TR-336
UCAM-CL-TR-333
Barnaby P. Hilken:
G.M. Bierman:
Towards a proof theory of rewriting:
the simply-typed 2-λ calculus
What is a categorical model of
intuitionistic linear logic?
May 1994, 28 pages, paper copy
April 1994, 15 pages, paper copy
UCAM-CL-TR-337
UCAM-CL-TR-334
Richard John Boulton:
Lawrence C. Paulson:
Efficiency in a fully-expansive
theorem prover
A concrete final coalgebra theorem
for ZF set theory
May 1994, 126 pages, DVI
May 1994, 21 pages, PDF
PhD thesis (Churchill College, December 1993)
Abstract: A special final coalgebra theorem, in the style
of Aczel (1988), is proved within standard ZermeloFraenkel set theory. Aczel’s Anti-Foundation Axiom is
replaced by a variant definition of function that admits
non-well-founded constructions. Variant ordered pairs
and tuples, of possibly infinite length, are special cases
of variant functions. Analogues of Aczel’s Solution and
Substitution Lemmas are proved in the style of Rutten
and Turi (1993).
The approach is less general than Aczel’s; non-wellfounded objects can be modelled only using the variant tuples and functions. But the treatment of non-wellfounded objects is simple and concrete. The final coalgebra of a functor is its greatest fixedpoint. The theory is intended for machine implementation and a simple case of it is already implemented using the theorem
prover Isabelle.
Abstract: The HOL system is a fully-expansive theorem
prover: Proofs generated in the system are composed
of applications of the primitive inference rules of the
underlying logic. This has two main advantages. First,
the soundness of the system depends only on the implementations of the primitive rules. Second, users can be
given the freedom to write their own proof procedures
without the risk of making the system unsound. A full
functional programming language is provided for this
purpose. The disadvantage with the approach is that
performance is compromised. This is partly due to the
inherent cost of fully expanding a proof but, as demonstrated in this thesis, much of the observed inefficiency
is due to the way the derived proof procedures are written.
This thesis seeks to identify sources of non-inherent
inefficiency in the HOL system and proposes some
general-purpose and some specialised techniques for
eliminating it. One area that seems to be particularly
53
amenable to optimisation is equational reasoning. This
is significant because equational reasoning constitutes
large portions of many proofs. A number of techniques
are proposed that transparently optimise equational
reasoning. Existing programs in the HOL system require little or no modification to work faster.
The other major contribution of this thesis is a
framework in which part of the computation involved
in HOL proofs can be postponed. This enables users to
make better use of their time. The technique exploits a
form of lazy evaluation. The critical feature is the separation of the code that generates the structure of a theorem from the code that justifies it logically. Delaying the
justification allows some non-local optimisations to be
performed in equational reasoning. None of the techniques sacrifice the security of the fully-expansive approach.
A decision procedure for a subset of the theory of
linear arithmetic is used to illustrate many of the techniques. Decision procedures for this theory are commonplace in theorem provers due to the importance of
arithmetic reasoning. The techniques described in the
thesis have been implemented and execution times are
given. The implementation of the arithmetic procedure
is a major contribution in itself. For the first time, users
of the HOL system are able to prove many arithmetic
lemmas automatically in a practical amount of time
(typically a second or two).
The applicability of the techniques to other fullyexpansive theorem provers and possible extensions of
the ideas are considered.
UCAM-CL-TR-338
Zhixue Wu:
A new approach to implementing
atomic data types
May 1994, 170 pages, paper copy
PhD thesis (Trinity College, October 1993)
UCAM-CL-TR-339
Brian Logan, Steven Reece, Alison Cawsey,
Julia Galliers, Karen Spärck Jones:
Belief revision and dialogue
management in information retrieval
May 1994, 227 pages, PDF
the belief theory presented problems, and the original
‘multiple expert’ retrieval model had to be drastically
modified to support rational dialogue management. But
the experimental results showed that the characteristics
of literature seeking interaction could be successfully
captured by the belief theory, exploiting important elements of the retrieval model. Thus, though the system’s knowledge and dialogue performance were very
limited, it provides a useful base for further research.
The report presents all aspects of the research in detail, with particular emphasis on the implementation of
belief and intention revision, and the integration of revision with domain reasoning and dialogue interaction.
UCAM-CL-TR-340
Eoin Andrew Hyden:
Operating system support for quality
of service
June 1994, 102 pages, PDF
PhD thesis (Wolfson College, February 1994)
Abstract: The deployment of high speed, multiservice
networks within the local area has meant that it has
become possible to deliver continuous media data to a
general purpose workstation. This, in conjunction with
the increasing speed of modern microprocessors, means
that it is now possible to write application programs
which manipulate continuous media in real-time. Unfortunately, current operating systems do not provide
the resource management facilities which are required
to ensure the timely execution of such applications.
This dissertation presents a flexible resource management paradigm, based on the notion of Quality of
Service, with which it is possible to provide the scheduling support required by continuous media applications.
The mechanisms which are required within an operating system to support this paradigm are described, and
the design and implementation of a prototypical kernel
which implements them is presented.
It is shown that, by augmenting the interface between an application and the operating system, the application can be informed of varying resource availabilities, and can make use of this information to vary the
quality of its results. In particular an example decoder
application is presented, which makes use of such information and exploits some of the fundamental properties of continuous media data to trade video image
quality for the amount of processor time which it receives.
UCAM-CL-TR-341
Abstract: This report describes research to evaluate a
theory of belief revision proposed by Galliers in the John Bates:
context of information-seeking interaction as modelled
by Belkin, Brooks and Daniels and illustrated by user- Presentation support for distributed
librarian dialogues. The work covered the detailed asmultimedia applications
sessment and development, and computational implementation and testing, of both the belief revision theory June 1994, 140 pages, PostScript
and the information retrieval model. Some features of
54
UCAM-CL-TR-342
category. We consider two alternative models: firstly,
one due to Seely and then one due to Lafont. Surprisingly, we find that Seely’s model is not sound, in that
equal terms are not modelled with equal morphisms.
We show how after adapting Seely’s model (by viewing it in a more abstract setting) it becomes a particular
instance of a linear category. We show how Lafont’s
model can also be seen as another particular instance
of a linear category. Finally we consider various categories of coalgebras, whose construction can be seen
as a categorical equivalent of the translation of IL into
ILL.
Stephen Martin Guy Freeman:
An architecture for distributed user
interfaces
July 1994, 127 pages, paper copy
PhD thesis (Darwin College, 1994)
UCAM-CL-TR-344
Martin John Turner:
UCAM-CL-TR-347
The contour tree image encoding
technique and file format
Karen Spärck Jones:
July 1994, 154 pages, paper copy
Reflections on TREC
PhD thesis (St John’s College, April 1994)
July 1994, 35 pages, PostScript
UCAM-CL-TR-345
Abstract: This paper discusses the Text REtrieval Conferences (TREC) programme as a major enterprise in
information retrieval research. It reviews its structure
as an evaluation exercise, characterises the methods of
indexing and retrieval being tested within it in terms
of the approaches to system performance factors these
represent; analyses the test results for solid, overall conclusions that can be drawn from them; and, in the light
of the particular features of the test data, assesses TREC
both for generally-applicable findings that emerge from
it and for directions it offers for future research.
Siani L. Baker:
A proof environment for arithmetic
with the Omega rule
August 1994, 17 pages, paper copy
UCAM-CL-TR-346
G.M. Bierman:
UCAM-CL-TR-348
On intuitionistic linear logic
Jane Louise Hunter:
August 1994, 191 pages, paper copy
PhD thesis (Wolfson College, December 1993)
Abstract: In this thesis we carry out a detailed study of
the (propositional) intuitionistic fragment of Girard’s
linear logic (ILL). Firstly we give sequent calculus, natural deduction and axiomatic formulations of ILL. In
particular our natural deduction is different from others and has important properties, such as closure under substitution, which others lack. We also study the
process of reduction in all three local formulations, including a detailed proof of cut elimination. Finally, we
consider translations between Instuitionistic Logic (IL)
and ILL.
We then consider the linear term calculus, which
arises from applying the Curry-Howard correspondence to the natural deduction formulation. We show
how the various proof theoretic formulations suggest
reductions at the level of terms. The properties of strong
normalization and confluence are proved for these reduction rules. We also consider mappings between the
extended λ-calculus and the linear term calculus.
Next we consider a categorical model for ILL. We
show how by considering the linear term calculus as
an equational logic, we can derive a model: a linear
Integrated sound synchronisation for
computer animation
August 1994, 248 pages, paper copy
PhD thesis (Newnham College, August 1994)
UCAM-CL-TR-349
Brian Graham:
A HOL interpretation of Noden
September 1994, 78 pages, paper copy
UCAM-CL-TR-350
Jonathan P. Bowen, Michael G. Hinchey:
Ten commandments of formal
methods
September 1994, 18 pages, paper copy
55
UCAM-CL-TR-351
representatives and a novel connection caching technique for providing the necessary realtime traffic support functionalities.
A prototype system, comprising of the proposed location and the connection managers, has been built for
demonstrating the feasibility of the presented architecture for transporting continuous media traffic. A set of
experiments have been carried out in order to investigate the impacts of various design decisions and to
identify the performance-critical parts of the design.
Subir Kumar Biswas:
Handling realtime traffic in mobile
networks
September 1994, 198 pages, PostScript
PhD thesis (Darwin College, August 1994)
Abstract: The rapidly advancing technology of cellular communication and wireless LAN makes ubiquitous
computing feasible where the mobile users can have access to the location independent information and the
computing resources. Multimedia networking is another emerging technological trend of the 1990s and
there is an increasing demand for supporting continuous media traffic in wireless personal communication
environment. In order to guarantee the strict performance requirements of realtime traffic, the connectionoriented approaches are proving to be more efficient
compared to the conventional datagram based networking. This dissertation deals with a network architecture and its design issues for implementing the
connection-oriented services in a mobile radio environment.
The wired backbone of the proposed wireless LAN
comprises of high speed ATM switching elements, connected in a modular fashion, where the new switches
and the user devices can be dynamically added and reconnected for maintaining a desired topology. A dynamic reconfiguration protocol, which can cope with
these changing network topologies, is proposed for the
present network architecture. The details about a prototype implementation of the protocol and a simulation
model for its performance evaluation are presented.
CSMA/AED, a single frequency and carrier sensing
based protocol is proposed for the radio medium access
operations. A simulation model is developed in order to
investigate the feasibility of this statistical and reliable
access scheme for the proposed radio network architecture. The effectiveness of a per-connection window
based flow control mechanism, for the proposed radio
LAN, is also investigated. A hybrid technique is used,
where the medium access and the radio data-link layers
are modelled using the mentioned simulator; an upper
layer end-to-end queueing model, involving flow dependent servers, is solved using an approximate Mean
Value Analysis technique which is augmented for faster
iterative convergence.
A distributed location server, for managing mobile
users’ location information and for aiding the mobile
connection management tasks, is proposed. In order to
hide the effects of mobility from the non-mobile network entities, the concept of a per-mobile software entity, known as a “representative”, is introduced. A mobile connection management scheme is also proposed
for handling the end-to-end network layer connections
in the present mobile environment. The scheme uses the
UCAM-CL-TR-352
P.N. Benton:
A mixed linear and non-linear logic:
proofs, terms and models
October 1994, 65 pages, paper copy
UCAM-CL-TR-353
Mike Gordon:
Merging HOL with set theory
November 1994, 40 pages, PDF
Abstract: Set theory is the standard foundation for
mathematics, but the majority of general purpose
mechanized proof assistants support versions of type
theory (higher order logic). Examples include Alf, Automath, Coq, Ehdm, HOL, IMPS, Lambda, LEGO,
Nuprl, PVS and Veritas. For many applications type
theory works well and provides for specification the
benefits of type-checking that are well known in programming. However, there are areas where types get in
the way or seem unmotivated. Furthermore, most people with a scientific or engineering background already
know set theory, whereas type theory may appear inaccessible and so be an obstacle to the uptake of proof
assistants based on it. This paper describes some experiments (using HOL) in combining set theory and
type theory; the aim is to get the best of both worlds
in a single system. Three approaches have been tried,
all based on an axiomatically specified type V of ZFlike sets: (i) HOL is used without any additions besides
V; (ii) an embedding of the HOL logic into V is provided; (iii) HOL axiomatic theories are automatically
translated into set-theoretic definitional theories. These
approaches are illustrated with two examples: the construction of lists and a simple lemma in group theory.
UCAM-CL-TR-354
Sten Agerholm:
Formalising a model of the λ-calculus
in HOL-ST
October 1994, 31 pages, paper copy
56
UCAM-CL-TR-355
UCAM-CL-TR-358
David Wheeler, Roger Needham:
Simon William Moore:
Two cryptographic notes
Multithreaded processor design
November 1994, 6 pages, PDF
February 1995, 125 pages, paper copy
Abstract: A large block DES-like algorithm
DES was designed to be slow in software. We give
here a DES type of code which applies directly to single blocks comprising two or more words of 32 bits.
It is thought to be at least as secure as performing
DES separately on two word blocks, and has the added
advantage of not requiring chaining etc. It is about
8m/(12+2m) times as fast as DES for an m word block
and has a greater gain for Feistel codes where the number of rounds is greater. We use the name GDES for
the codes we discuss. The principle can be used on any
Feistel code.
TEA, a Tiny Encryption Algorithm
We design a short program which will run on most
machines and encypher safely. It uses a large number
of iterations rather than a complicated program. It is
hoped that it can easily be translated into most languages in a compatible way. The first program is given
below. It uses little set up time and does a weak non
linear iteration enough rounds to make it secure. There
are no preset tables or long set up times. It assumes 32
bit words.
PhD thesis (Trinity Hall, October 1994)
This report was also published as a book of the
same title (Kluwer/Springer-Verlag, 1996, ISBN
0-7923-9718-5).
UCAM-CL-TR-359
Jacob Frost:
A case study of co-induction in
Isabelle
February 1995, 48 pages, PDF
Abstract: The consistency of the dynamic and static semantics for a small functional programming language
was informally proved by R. Milner and M. Tofte. The
notions of co-inductive definitions and the associated
principle of co-induction played a pivotal role in the
proof. With emphasis on co-induction, the work presented here deals with the formalisation of this result in
the generic theorem prover Isabelle.
UCAM-CL-TR-360
UCAM-CL-TR-356
S.E. Robertson, K. Spärck Jones:
W.F. Clocksin:
Simple, proven approaches to
text retrieval
On the calculation of explicit
polymetres
March 1995, 12 pages, PDF
December 1994, 8 pages, PDF
Abstract: This technical note describes straightforward
techniques for document indexing and retrieval that
have been solidly established through extensive testing
and are easy to apply. They are useful for many different types of text material, are viable for very large files,
and have the advantage that they do not require special skills or training for searching, but are easy for end
users.
UCAM-CL-TR-357
Jonathan P. Bowen, Michael G. Hinchey:
Seven more myths of formal methods
December 1994, 12 pages, paper copy
Abstract: Computer scientists take an interest in objects
or events which can be counted, grouped, timed and
synchronised. The computational problems involved
with the interpretation and notation of musical rhythm
are therefore of particular interest, as the most complex time-stamped structures yet devised by humankind
are to be found in music notation. These problems are
brought into focus when considering explicit polymetric notation, which is the concurrent use of different
time signatures in music notation. While not in common use the notation can be used to specify complicated cross-rhythms, simple versus compound metres,
and unequal note values without the need for tuplet
notation. From a computational point of view, explicit
polymetric notation is a means of specifying synchronisation relationships amongst multiple time-stamped
streams. Human readers of explicit polymetic notation
use the time signatures together with the layout of barlines and musical events as clues to determine the performance. However, if the aim is to lay out the notation
57
(such as might be required by an automatic music notation processor), the location of barlines and musical
events will be unknown, and it is necessary to calculate them given only the information conveyed by the
time signatures. Similar problems arise when trying to
perform the notation (i.e. animate the specification) in
real-time. Some problems in the interpretation of explicit polymetric notation are identified and a solution
is proposed. Two different interpretations are distinguished, and methods for their automatic calculation
are given. The solution given may be applied to problems which involve the synchronisation or phase adjustment of multiple independent threads of time-stamped
objects.
UCAM-CL-TR-361
Richard John Black:
Explicit network scheduling
April 1995, 121 pages, PostScript
PhD thesis (Churchill College, December 1994)
Abstract: This dissertation considers various problems
associated with the scheduling and network I/O organisation found in conventional operating systems for effective support for multimedia applications which require Quality of Service.
A solution for these problems is proposed in a
micro-kernel structure. The pivotal features of the proposed design are that the processing of device interrupts
is performed by user-space processes which are scheduled by the system like any other, that events are used
for both inter- and intra-process synchronisation, and
the use of a specially developed high performance I/O
buffer management system.
An evaluation of an experimental implementation is
included. In addition to solving the scheduling and networking problems addressed, the prototype is shown
to out-perform the Wanda system (a locally developed
micro-kernel) on the same platform.
This dissertation concludes that it is possible to construct an operating system where the kernel provides
only the fundamental job of fine grain sharing of the
CPU between processes, and hence synchronisation between those processes. This enables processes to perform task specific optimisations; as a result system performance is enhanced, both with respect to throughput
and the meeting of soft real-time guarantees.
UCAM-CL-TR-362
Mark Humphrys:
W-learning:
competition among selfish Q-learners
April 1995, 30 pages, PostScript
Abstract: W-learning is a self-organising actionselection scheme for systems with multiple parallel goals, such as autonomous mobile robots. It
uses ideas drawn from the subsumption architecture
for mobile robots (Brooks), implementing them with
the Q-learning algorithm from reinforcement learning
(Watkins). Brooks explores the idea of multiple sensingand-acting agents within a single robot, more than one
of which is capable of controlling the robot on its own
if allowed. I introduce a model where the agents are
not only autonomous, but are in fact engaged in direct
competition with each other for control of the robot.
Interesting robots are ones where no agent achieves
total victory, but rather the state-space is fragmented
among different agents. Having the agents operate by
Q-learning proves to be a way to implement this, leading to a local, incremental algorithm (W-learning) to
resolve competition. I present a sketch proof that this
algorithm converges when the world is a discrete, finite
Markov decision process. For each state, competition is
resolved with the most likely winner of the state being
the agent that is most likely to suffer the most if it does
not win. In this way, W-learning can be viewed as ‘fair’
resolution of competition. In the empirical section, I
show how W-learning may be used to define spaces of
agent-collections whose action selection is learnt rather
than hand-designed. This is the kind of solution-space
that may be searched with a genetic algorithm.
UCAM-CL-TR-363
Ian Stark:
Names and higher-order functions
April 1995, 140 pages, PostScript
PhD thesis (Queens’ College, December 1994)
Abstract: Many functional programming languages rely
on the elimination of ‘impure’ features: assignment to
variables, exceptions and even input/output. But some
of these are genuinely useful, and it is of real interest to
establish how they can be reintroducted in a controlled
way. This dissertation looks in detail at one example of
this: the addition to a functional language of dynamically generated “names”. Names are created fresh, they
can be compared with each other and passed around,
but that is all. As a very basic example of “state”, they
capture the graduation between private and public, local and global, by their interaction with higher-order
functions.
The vehicle for this study is the “nu-calculus”, an
extension of the simply-typed lambda-calculus. The nucalculus is equivalent to a certain fragment of Standard
ML, omitting side-effects, exceptions, datatypes and recursion. Even without all these features, the interaction
of name creation with higher-order functions can be
complex and subtle.
Various operational and denotational methods for
reasoning about the nu-calculus are developed. These
include a computational metalanguage in the style of
58
Moggi, which distinguishes in the type system between
values and computations. This leads to categorical
models that use a strong monad, and examples are devised based on functor categories.
The idea of “logical relations” is used to derive
powerful reasoning methods that capture some of the
distinction between private and public names. These
techniques are shown to be complete for establishing
contextual equivalence between first-order expressions;
they are also used to construct a correspondingly abstract categorical model.
All the work with the nu-calculus extends cleanly to
Reduced ML, a larger language that introduces integer
references: mutable storage cells that are dynamically
allocated. It turns out that the step up is quite simple,
and both the computational metalanguage and the sample categorical models can be reused.
UCAM-CL-TR-364
Abstract: This paper describes a proof of the ChurchRosser theorem for the pure lambda-calculus formalised in the Isabelle theorem prover. The initial version of the proof is ported from a similar proof done
in the Coq proof assistant by Girard Huet, but a number of optimisations have been performed. The development involves the introduction of several inductive
and recursive definitions and thus gives a good presentation of the inductive package of Isabelle.
May 1995, 19 pages, paper copy
UCAM-CL-TR-366
K. Spärck Jones, G.J.F. Jones, J.T. Foote,
S.J. Young:
Retrieving spoken documents:
VMR Project experiments
May 1995, 94 pages, PostScript
Abstract: This document provides an introduction to
the interaction between category theory and mathematical logic which is slanted towards computer scientists.
It will be a chapter in the forthcoming Volume VI of:
S. Abramsky, D. M. Gabbay, and T. S. E. Maibaum
(eds), “Handbook of Logic in Computer Science”, Oxford University Press.
UCAM-CL-TR-368
June 1995, 73 pages, PostScript
April 1995, 27 pages, PostScript
Computational types from a
logical perspective I
Categorical logic
CogPiT – configuration of protocols
in TIP
The Church-Rosser theorem in
Isabelle:
a proof porting experiment
P.N. Benton, G.M. Bierman, V.C.V. de Paiva:
Andrew M. Pitts:
Burkhard Stiller:
Ole Rasmussen:
UCAM-CL-TR-365
UCAM-CL-TR-367
Abstract: The variety of upcoming applications in terms
of their performance and Quality-of-Service (QoS) requirements is increasing. Besides almost well-known
applications, such as teleconferencing, audio- and
video-transmissions, even more contemporary ones,
such as medical imaging, Video-on-Demand, and interactive tutoring systems, are introduced and applied
to existing networks. On the contrary, traditionally
data-oriented applications, such as file transfer and remote login, are considerably different in terms of their
QoS requirements. Therefore, the consequences of this
evolution effect the architectures of end-systems, e.g.,
workstations that have to be capable of maintaining all
different kinds of multi-media data, and intermediatesystems as well.
Therefore, a configuration approach of communication protocols has been developed to support the
variety of applications. This approach offers the possibility to configure communication protocols automatically depending on the application requirements
expressed in various QoS parameters. The result, an
application-tailored communication protocol, matches
the requested application requirements as far as possible. Additionally, network and system resources (NSR)
are taken into account for a well-suited configuration.
The Configuration of Protocols in TIP is called CogPiT and is part of the Transport and Internetworking
Package (TIP). As an example, in the TIP environment
the transport protocol TEMPO is used for configuration purposes.
May 1995, 28 pages, paper copy
59
UCAM-CL-TR-369
technology (section 1), followd by a more detailed introduction into the modern pan-European GSM standard (section 2). Section 3 is devoted to the data communication services, covering two packet-oriented data
only networks as well as data services planned for the
GSM system. Section 4 covers some security issues and
section 5 gives an insight into the realities today with
details of some networks available in the UK. Finally,
section 6 concludes this overview with a brief look into
the future.
Sten Agerholm:
A comparison of HOL-ST and
Isabelle/ZF
July 1995, 23 pages, PDF
Abstract: The use of higher order logic (simple type
theory) is often limited by its restrictive type system.
UCAM-CL-TR-373
Set theory allows many constructions on sets that are
not possible on types in higher order logic. This paper
presents a comparison of two theorem provers support- Benjamı́n Macı́as, Stephen G. Pulman:
ing set theory, namely HOL-ST and Isabelle/ZF, based
Natural-language processing and
on a formalization of the inverse limit construction of
domain theory; this construction cannot be formalized requirements specifications
in higher order logic directly. We argue that whilst the
combination of higher order logic and set theory in July 1995, 73 pages, paper copy
HOL-ST has advantages over the first order set theory
UCAM-CL-TR-374
in Isabelle/ZF, the proof infrastructure of Isabelle/ZF
has better support for set theory proofs than HOL-ST.
Proofs in Isabelle/ZF are both considerably shorter and Burkhard Stiller:
easier to write.
A framework for QoS updates in a
networking environment
UCAM-CL-TR-370
July 1995, PostScript
Sten Agerholm:
A package for non-primitive recursive
function definitions in HOL
July 1995, 36 pages, paper copy
UCAM-CL-TR-371
Kim Ritter Wagner:
LIMINF convergence in Ω-categories
June 1995, 28 pages, paper copy
UCAM-CL-TR-372
Stefan G. Hild:
A brief history of mobile telephony
January 1995, 19 pages, PDF
Abstract: Mobile telephony has gone through a decade
of tremendous change and progress. Today, mobile
phones are an indispensable tool to many professionals,
and have great potential to become vital components in
mobile data communication applications. In this survey we will attempt to present some of the milestones
from the route which mobile telephony has taken over
the past decades while developing from an experimental system with limited capabilities with to a mature
Abstract: The support of sufficient Quality-of-Service
(QoS) for applications residing in a distributed environment and running on top of high performance networks
is a demanding issue. Currently, the areas to provide
this support adequately include communication protocols, operating systems support, and offered network
services. A configurable approach of communication
protocols offers the needed protocol flexibility to react
accordingly on various different requirements.
Communication protocols and operating systems
have to be parametrized using internal configuration
parameters, such as window sizes, retry counters, or
scheduling mechanisms, that rely closely on requested
application-oriented or network-dependent QoS, such
as bandwidth or delay. Moreover, these internal parameters have to be recalculated from time to time due
to network changes (such as congestion or line breakdown) or due to application-specific alterations (such
as enhanced bandwidth requirements or increased reliability) to adjust a temporary or semi-permanent “outof-tune” service behavior.
Therefore, a rule-based evaluation and QoS updating framework for configuration parameters in a networking environment has been developed. The resulting “rulework” can be used within highly dynamic environments in a communication subsystem that offers
the possibility to specify for every QoS parameter both
a bounding interval of values and an average value.
As an example, the framework has been integrated
in the Function-based Communication Subsystem (FCSS). Especially, an enhanced application service interface is offered, allowing for the specification of various
60
QoS-parameters that are used to configure a sufficient highly efficient manner. Integrated with the scheduler is
application-tailored communication protocol.
an inter-domain communication system which has minimal impact on resource guarantees, and a method of
decoupling hardware interrupts from the execution of
UCAM-CL-TR-375
device drivers.
Finally, a framework for high-level inter-domain
Feng Huang:
and inter-machine communication is described, which
goes beyond object-based RPC systems to permit both
Restructuring virtual memory to
Quality of Service negotiation when a communication
support distributed computing
binding is established, and services to be implemented
straddling protection domain boundaries as well as loenvironments
cally and in remote processes.
July 1995, 135 pages, paper copy
PhD thesis (Clare Hall, July 1995)
UCAM-CL-TR-377
UCAM-CL-TR-376
Larry Paulson, Krzysztof Grabczewski:
The structure of a multi-service
operating system
Mechanising set theory:
cardinal arithmetic and the axiom of
choice
August 1995, 113 pages, PostScript
July 1995, 33 pages, PDF
Timothy Roscoe:
PhD thesis (Queens’ College, April 1995)
Abstract: Increases in processor speed and network
bandwidth have led to workstations being used to process multimedia data in real time. These applications
have requirements not met by existing operating systems, primarily in the area of resource control: there is
a need to reserve resources, in particular the processor,
at a fine granularity. Furthermore, guarantees need to
be dynamically renegotiated to allow users to reassign
resources when the machine is heavily loaded. There
have been few attempts to provide the necessary facilities in traditional operating systems, and the internal
structure of such systems makes the implementation of
useful resource control difficult.
This dissertation presents a way of structuring an
operating system to reduce crosstalk between applications sharing the machine, and enable useful resource
guarantees to be made: instead of system services being located in the kernel or server processes, they are
placed as much as possible in client protection domains
and scheduled as part of the client, with communication between domains only occurring when necessary
to enforce protection and concurrency control. This
amounts to multiplexing the service at as low a level
of abstraction as possible. A mechanism for sharing
processor time between resources is also described. The
prototype Nemesis operating system is used to demonstrate the ideas in use in a practical system, and to illustrate solutions to several implementation problems that
arise.
Firstly, structuring tools in the form of typed interfaces within a single address space are used to reduce
the complexity of the system from the programmer’s
viewpoint and enable rich sharing of text and data between applications.
Secondly, a scheduler is presented which delivers
useful Quality of Service guarantees to applications in a
Abstract: Fairly deep results of Zermelo-Fraenkel (ZF)
set theory have been mechanised using the proof assistant Isabelle. The results concern cardinal arithmetic
and the Axiom of Choice (AC). A key result about cardinal multiplication is K*K=K, where K is any infinite
cardinal. Proving this result required developing theories of orders, order-isomorphisms, order types, ordinal
arithmetic, cardinals, etc.; this covers most of Kunen,
Set Theory, Chapter I. Furthermore, we have proved
the equivalence of 7 formulations of the Well-ordering
Theorem and 20 formulations of AC; this covers the
first two chapters of Rubin and Rubin, Equivalents of
the Axiom of Choice. The definitions used in the proofs
are largely faithful in style to the original mathematics.
UCAM-CL-TR-378
Noha Adly:
Performance evaluation of HARP:
a hierarchical asynchronous
replication protocol for large scale
system
August 1995, 94 pages, PostScript
UCAM-CL-TR-379
Lawrence Paulson:
Proceedings of the First Isabelle Users
Workshop
September 1995, 265 pages, paper copy
61
UCAM-CL-TR-380
UCAM-CL-TR-383
Burkhard Stiller:
Noha Adly:
Quality-of-Service issues in
networking environments
Management of replicated data in
large scale systems
September 1995, 68 pages, PostScript
November 1995, 182 pages, paper copy
Abstract: Quality-of-Service (QoS) issues in networking environments cover various separate areas and topics. They include at least the specification of applications requirements, the definition of network services,
QoS models, resource reservation methods, negotiation
and transformation methods for QoS, and operating
system support for guaranteed services. An embracing
approach for handling, dealing with, and supporting
QoS in different scenarios and technical set-ups is required to manage sufficiently forthcoming communication and networking tasks. Modern telecommunication
systems require an integrated architecture for applications, communication subsystems, and network perspectives to overcome drawbacks of traditional communication architectures, such as redundant protocol
functionality, weakly designed interfaces between the
end-system and a network adapter, or impossibility of
specifying and guaranteeing QoS parameter.
This work contains the discussion of a number of
interconnected QoS issues, e.g., QoS mapping, QoS
negotiation, QoS-based configuration of communication protocols, or QoS aspects in Asynchronous Transfer Mode (ATM) signaling protocols, which have been
dealt with during a one-year research fellowship. This
report is not intended to be a complete description of
every technical detail, but tries to provide a brief overall picture of the emerging and explosively developing
QoS issues in telecommunication systems. Additionally,
investigations of some of these issues are undertaken in
a more closer detail. It is mainly focussed on QoS mapping, negotiation, and updating in the communication
protocol area.
UCAM-CL-TR-381
Uwe Michael Nimscheck:
Rendering for free form deformations
October 1995, 151 pages, paper copy
PhD thesis (Wolfson College)
PhD thesis (Corpus Christi College, August 1995)
UCAM-CL-TR-384
Shaw-Cheng Chuang:
Securing ATM networks
January 1995, 30 pages, PostScript
Abstract: This is an interim report on the investigations
into securing Asynchronous Transfer Mode (ATM) networks. We look at the challenge in providing such a secure ATM network and identify the important issues
in achieving such goal. In this paper, we discuss the
issues and problems involved and outline some techniques to solving these problems. The network environment is first examined and we also consider the correct
placement of security mechanism in such an environment. Following the analysis of the security requirement, we introduce and describe a key agile cryptographic device for ATM. The protection of the ATM
data plane is extremely important to provide data confidentiality and data integrity. Techniques in providing
synchronisation, dynamic key change, dynamic initialisation vector change and Message Authentication Code
on ATM data, are also being considered. Next, we discuss the corresponding control functions. A few key exchange protocols are given as possible candidates for
the establishment of the session key. The impact of such
key exchange protocols on the design of an ATM signalling protocol has also been examined and security
extension to an existing signalling protocol being discussed. We also talk about securing other control plane
functions such as NNI routing, Inter-Domain Policy
Routing, authorisation and auditing, firewall and intrusion detection, Byzantine robustness. Management
plane functions are also being looked at, with discussions on bootstrapping, authenticated neighbour discovery, ILMI Security, PVC security, VPI security and
ATM Forum management model.
UCAM-CL-TR-385
UCAM-CL-TR-382
Oliver M. Castle:
Sanjay Saraswat:
Synthetic image generation for a
multiple-view autostereo display
Performance evaluation of the
Delphi machine
October 1995, 184 pages, paper copy
December 1995, 187 pages, paper copy
PhD thesis (Wolfson College, April 1995)
PhD thesis (St Edmund’s College, October 1995)
62
UCAM-CL-TR-386
UCAM-CL-TR-391
Andrew D. Gordon, Gareth D. Rees:
Andrew John Kennedy:
Bisimilarity for a first-order calculus
of objects with subtyping
Programming languages and
dimensions
January 1996, 78 pages, paper copy
April 1996, 149 pages, paper copy
PhD thesis (St Catherine’s College, November 1995)
UCAM-CL-TR-387
UCAM-CL-TR-392
Scarlet Schwiderski, Andrew Herbert,
Ken Moody:
Uwe Nestmann, Benjamin C. Pierce:
Monitoring composite events in
distributed systems
April 1996, 54 pages, paper copy
Decoding choice encodings
UCAM-CL-TR-393
February 1996, 20 pages, paper copy
Simon Andrew Crosby:
UCAM-CL-TR-388
Performance management in
ATM networks
P.N. Benton:
A unified approach to strictness
analysis and optimising
transformations
April 1996, 215 pages, PostScript
PhD thesis (St John’s College, May 1995)
February 1996, 21 pages, paper copy
UCAM-CL-TR-389
Wai Wong:
A proof checked for HOL
March 1996, 165 pages, paper copy
UCAM-CL-TR-390
Richard J. Boulton:
Syn: a single language for specifiying
abstract syntax tress, lexical analysis,
parsing and pretty-printing
March 1996, 25 pages, PostScript
Abstract: A language called Syn is described in which
all aspects of context-free syntax can be specified without redundancy. The language is essentially an extended
BNF grammar. Unusual features include high-level constructs for specifying lexical aspects of a language and
specification of precedence by textual order. A system
has been implemented for generating lexers, parsers,
pretty-printers and abstract syntax tree representations
from a Syn specification.
Abstract: The Asynchronous Transfer Mode (ATM) has
been identified as the technology of choice amongst
high speed communication networks for its potential
to integrate services with disparate resource needs and
timing constraints. Before it can successfully deliver integrated services, however, significant problems remain
to be solved. They centre around two major issues.
First, there is a need for a simple, powerful network service interface capable of meeting the communications
needs of new applications. Second, within the network
there is a need to dynamically control a mix of diverse
traffic types to ensure that they meet their performance
criteria.
Addressing the first concern, this dissertation argues
that a simple network control interface offers significant advantages over the traditional, heavyweight approach of the telecommunications industry. A network
control architecture based on a distributed systems approach is presented which locates both the network
control functions and its services outside the network.
The network service interface uses the Remote Procedure Call (RPC) paradigm and enables more complicated service offerings to be built from the basic primitives. A formal specification and verification of the usernetwork signalling protocol is presented. Implementations of the architecture, both on Unix and the Wanda
micro-kernel, used on the Fairisle ATM switch, are described. The implementations demonstrate the feasibility of the architecture, and feature a high degree of experimental flexibility. This is exploited in the balance
of the dissertation, which presents the results of a practical study of network performance under a range of
dynamic control mechanisms.
63
Addressing the second concern, results are presented
UCAM-CL-TR-397
from a study of the cell delay variation suffered by
ATM connections when multiplexed with real ATM Borut Robič:
traffic in an uncontrolled network, and from an investigation of the expansion of bursts of ATM traffic as a Optimal routing in 2-jump circulant
result of multiplexing. The results are compared with
networks
those of analytical models. Finally, results from a study
of the performance delivered to delay sensitive traffic by June 1996, 7 pages, PostScript
priority and rate based cell scheduling algorithms, and
the loss experienced by different types of traffic under Abstract: An algorithm for routing a message along
several buffer allocation strategies are presented.
the shortest path between a pair of processors in 2jump circulant (undirected double fixed step) network
UCAM-CL-TR-394
is given. The algorithm requires O(d) time for preprocessing, and l = O(d) routing steps, where l is the distance between the processors and d is the diameter of
Lawrence C. Paulson:
the network.
A simple formalization and proof for
the mutilated chess board
UCAM-CL-TR-398
April 1996, 11 pages, PDF
N.A. Dodgson, J.R. Moore:
Abstract: The impossibility of tiling the mutilated chess
board has been formalized and verified using Isabelle.
The formalization is concise because it is expressed using inductive definitions. The proofs are straightforward except for some lemmas concerning finite cardinalities. This exercise is an object lesson in choosing
a good formalization. is applicable in a variety of domains.
Design and implementation of an
autostereoscopic camera system
UCAM-CL-TR-395
Torben Bräuner, Valeria de Paiva:
Cut-elimination for full intuitionistic
linear logic
May 1996, 27 pages, paper copy
UCAM-CL-TR-396
Lawrence C. Paulson:
Generic automatic proof tools
May 1996, 28 pages, PDF
June 1996, 20 pages, PDF
Abstract: An autostereoscopic display provides the
viewer with a three-dimensional image without the
need for special glasses, and allows the user to look
around objects in the image by moving the head leftright. The time-multiplexed autostereo display developed at the University of Cambridge has been in operation since late 1991.
An autostereoscopic camera system has been designed and implemented. It is capable of taking video
input from up to sixteen cameras, and multiplexing
these into a video output stream with a pixel rate an
order of magnitude faster than the individual input
streams. Testing of the system with eight cameras and a
Cambridge Autostereo Display has produced excellent
live autostereoscopic video.
This report describes the design of this camera
system which has been successfully implemented and
demonstrated. Problems which arose during this process are discussed, and a comparison with similar systems made.
UCAM-CL-TR-399
Abstract: This paper explores a synthesis between two
distinct traditions in automated reasoning: resolution
and interaction. In particular it discusses Isabelle, an Richard Hayton:
interactive theorem prover based upon a form of resolution. It aims to demonstrate the value of proof OASIS
tools that, compared with traditional resolution sysAn open architecture for secure
tems, seem absurdly limited. Isabelle’s classical reasoner
searches for proofs using a tableau approach. The rea- interworking services
soner is generic: it accepts rules proved in applied theories, involving defined connectives. New constants are June 1996, 102 pages, PDF
PhD thesis (Fitzwilliam College, March 1996)
not reduced to first-order logic; the reasoner
64
Abstract: An emerging requirement is for applications
and distributed services to cooperate or inter-operate.
Mechanisms have been devised to hide the heterogeneity of the host operating systems and abstract the issues
of distribution and object location. However, in order
for systems to inter-operate securely there must also be
mechanisms to hide differences in security policy, or at
least negotiate between them.
This would suggest that a uniform model of access
control is required. Such a model must be extremely
flexible with respect to the specification of policy, as different applications have radically different needs. In a
widely distributed environment this situation is exacerbated by the differing requirements of different organisations, and in an open environment there is a need to
interwork with organisations using alternative security
mechanisms.
Other proposals for the interworking of security
mechanisms have concentrated on the enforcement of
access policy, and neglected the concerns of freedom
of expression of this policy. For example it is common
to associate each request with a user identity, and to
use this as the only parameter when performing access
control. This work describes an architectural approach
to security. By reconsidering the role of the client and
the server, we may reformulate access control issues in
terms of client naming.
We think of a client as obtaining a name issued by a
service; either based on credentials already held by the
client, or by delegation from another client. A grammar has been devised that allows the conditions under
which a client may assume a name to be specified, and
the conditions under which use of the name will be revoked. This allows complex security policies to be specified that define how clients of a service may interact
with each other (through election, delegation and revocation), how clients interact with a service (by invoking operations or receiving events) and how clients and
services may inter-operate. (For example, a client of a
Login service may become a client of a file service.)
This approach allows great flexibility when integrating a number of services, and reduces the mismatch
of policies common in heterogeneous systems. A flexible security definition is meaningless if not backed by
a robust and efficient implementation. In this thesis
we present a systems architecture that can be implemented efficiently, but that allows individual services
to ‘fine tune’ the trade-offs between security, efficiency
and freedom of policy expression. The architecture is
inherently distributed and scalable, and includes mechanisms for rapid and selective revocation of privileges
which may cascade between services and organisations.
UCAM-CL-TR-400
Scarlet Schwiderski:
Monitoring the behaviour of
distributed systems
PhD thesis (Selwyn College, April 1996)
Abstract: Monitoring the behaviour of computing systems is an important task. In active database systems,
a detected system behaviour leads to the triggering of
an ECA (event-condition-action) rule. ECA rules are
employed for supporting database management system
functions as well as external applications. Although distributed database systems are becoming more commonplace, active database research has to date focussed
on centralised systems. In distributed debugging systems, a detected system behaviour is compared with
the expected system behaviour. Differences illustrate
erroneous behaviour. In both application areas, system behaviours are specified in terms of events: primitive events represent elementary occurrences and composite events represent complex occurrence patterns.
At system runtime, specified primitive and composite
events are monitored and event occurrences are detected. However, in active database systems events are
monitored in terms of physical time and in distributed
debugging systems events are monitored in terms of
logical time. The notion of physical time is difficult in
distributed systems because of their special characteristics: no global time, network delays, etc.
This dissertation is concerned with monitoring the
behaviour of distributed systems in terms of physical
time, i.e. the syntax, the semantics, the detection, and
the implementation of events are considered.
The syntax of primitive and composite events is derived from the work of both active database systems
and distributed debugging systems; differences and necessities are highlighted.
The semantics of primitive and composite events establishes when and where an event occurs; the semantics depends largely on the notion of physical time in
distributed systems. Based on the model for an approximated global time base, the ordering of events in distributed systems is considered, and the structure and
handling of timestamps are illustrated. In specific applications, a simplified version of the semantics can be
applied which is easier and therefore more efficient to
implement.
Algorithms for the detection of composite events
at system runtime are developed; event detectors are
distributed to arbitrary sites and composite events are
evaluated concurrently. Two different evaluation policies are examined: asynchronous evaluation and synchronous evaluation. Asynchronous evaluation is characterised by the ad hoc consumption of signalled event
occurrences. However, since the signalling of events involves variable delays, the events may not be evaluated
in the system-wide order of their occurrence. On the
other hand, synchronous evaluation enforces events to
be evaluated in the system-wide order of their occurrence. But, due to site failures and network congestion,
the evaluation may block on a fairly long-term basis.
The prototype implementation realises the algorithms for the detection of composite events with both
July 1996, 161 pages, PDF
65
asynchronous and synchronous evaluation. For the purpose of testing, primitive event occurrences are simulated by distributed event simulators. Several tests are
performed illustrating the differences between asynchronous and synchronous evaluation: the first is ‘fast
and unreliable’ whereas the latter is ‘slow and reliable’.
UCAM-CL-TR-401
Gavin Bierman:
A classical linear λ-calculus
July 1996, 41 pages, paper copy
UCAM-CL-TR-402
G.J.F. Jones, J.T. Foote, K. Spärck Jones,
S.J. Young:
Video mail retrieval using voice:
report on collection of naturalistic
requests and relevance assessments
September 1996, 21 pages, paper copy
UCAM-CL-TR-403
Paul Ronald Barham:
by clearly separating the data-path operations, which
require careful accounting and scheduling, and the infrequent control-path operations, which require protection and concurrency control. The approach taken is to
abstract and multiplex the I/O data-path at the lowest level possible so as to simplify accounting, policing
and scheduling of I/O resources and enable applicationspecific use of I/O devices.
The architecture is applied to several representative
classes of device including network interfaces, network
connected peripherals, disk drives and framestores. Of
these, disks and framestores are of particular interest
since they must be shared at a very fine granularity but
have traditionally been presented to the application via
a window system or file-system with a high-level and
coarse-grained interface.
A device driver for the framestore is presented which
abstracts the device at a low level and is therefore able
to provide each client with guaranteed bandwidth to
the framebuffer. The design and implementation of a
novel client-rendering window system is then presented
which uses this driver to enable rendering code to be
safely migrated into a shared library within the client.
A low-level abstraction of a standard disk drive is
also described which efficiently supports a wide variety
of file systems and other applications requiring persistent storage, whilst providing guaranteed rates of I/O
to individual clients. An extent-based file system is presented which can provide guaranteed rate file access
and enables clients to optimise for application-specific
access patterns.
Devices in a multi-service operating
system
UCAM-CL-TR-404
October 1996, 131 pages, PostScript
Kam Hong Shum:
PhD thesis (Churchill College, June 1996)
Adaptive parallelism for computing
on heterogeneous clusters
Abstract: Increases in processor speed and network and
device bandwidth have led to general purpose workstations being called upon to process continuous media data in real time. Conventional operating systems
are unable to cope with the high loads and strict timing constraints introduced when such applications form
part of a multi-tasking workload. There is a need for
the operating system to provide fine-grained reservation
of processor, memory and I/O resources and the ability to redistribute these resources dynamically. A small
group of operating systems researchers have recently
proposed a “vertically-structured” architecture where
the operating system kernel provides minimal functionality and the majority of operating system code executes within the application itself. This structure greatly
simplifies the task of accounting for processor usage by
applications. The prototype Nemesis operating system
embodies these principles and is used as the platform
for this work.
This dissertation extends the provision of Quality of Service guarantees to the I/O system by presenting an architecture for device drivers which minimises crosstalk between applications. This is achieved
November 1996, 147 pages, paper copy
PhD thesis (Darwin College, August 1996)
UCAM-CL-TR-405
Richard J. Boulton:
A tool to support formal reasoning
about computer languages
November 1996, 21 pages, PostScript
Abstract: A tool to support formal reasoning about
computer languages and specific language texts is described. The intention is to provide a tool that can
build a formal reasoning system in a mechanical theorem prover from two specifications, one for the syntax
of the language and one for the semantics. A parser,
pretty-printer and internal representations are generated from the former. Logical representations of syntax
and semantics, and associated theorem proving tools,
66
are generated from the combination of the two specifications. The main aim is to eliminate tedious work from
the task of prototyping a reasoning tool for a computer
language, but the abstract specifications of the language
also assist the automation of proof.
UCAM-CL-TR-406
UCAM-CL-TR-408
John Robert Harrison:
Theorem proving with the real
numbers
November 1996, 147 pages, PostScript
Lawrence C. Paulson:
PhD thesis (Churchill College, June 1996)
Tool support for logics of programs
Abstract: This thesis discusses the use of the real numbers in theorem proving. Typically, theorem provers
only support a few ‘discrete’ datatypes such as the natural numbers. However the availability of the real numbers opens up many interesting and important application areas, such as the verification of floating point
hardware and hybrid systems. It also allows the formalization of many more branches of classical mathematics, which is particularly relevant for attempts to inject
more rigour into computer algebra systems.
Our work is conducted in a version of the HOL theorem prover. We describe the rigorous definitional construction of the real numbers, using a new version of
Cantor’s method, and the formalization of a significant
portion of real analysis. We also describe an advanced
derived decision procedure for the ‘Tarski subset’ of
real algebra as well as some more modest but practically useful tools for automating explicit calculations
and routine linear arithmetic reasoning.
Finally, we consider in more detail two interesting
application areas. We discuss the desirability of combining the rigour of theorem provers with the power
and convenience of computer algebra systems, and explain a method we have used in practice to achieve this.
We then move on to the verification of floating point
hardware. After a careful discussion of possible correctness specifications, we report on two case studies, one
involving a transcendental function.
We aim to show that a theory of real numbers is
useful in practice and interesting in theory, and that
the ‘LCF style’ of theorem proving is well suited to the
kind of work we describe. We hope also to convince the
reader that the kind of mathematics needed for applications is well within the abilities of current theorem
proving technology.
November 1996, 31 pages, PDF
Abstract: Proof tools must be well designed if they are
to be more effective than pen and paper. Isabelle supports a range of formalisms, two of which are described
(higher-order logic and set theory). Isabelle’s representation of logic is influenced by logic programming: its
“logical variables” can be used to implement step-wise
refinement. Its automatic proof procedures are based
on search primitives that are directly available to users.
While emphasizing basic concepts, the article also discusses applications such as an approach to the analysis
of security protocols.
UCAM-CL-TR-407
Sebastian Schoenberg:
The L4 microkernel on Alpha
Design and implementation
September 1996, 51 pages, PostScript
Abstract: The purpose of a microkernel is to cover the
lowest level of the hardware and to provide a more
general platform to operating systems and applications
than the hardware itself. This has made microkernel
development increasingly interesting. Different types of
microkernels have been developed, ranging from kernels which merely deal with the hardware infterface
(Windows NT HAL), kernels especially for embedded
systems (RTEMS), to kernels for multimedia streams
and real time support (Nemesis) and general purpose
kernels (L4, Mach).
The common opinion that microkernels lead to deterioration in system performance has been disproved
by recent research. L4 is an example of a fast and small,
multi address space, message-based microkernel, developed originally for Intel systems only. Based on the L4
interface, which should be as similar as possible on different platforms, the L4 Alpha version has been developed.
This work describes design decisions, implementation and interfaces of the L4 version for 64-bit Alpha
processors.
UCAM-CL-TR-409
Lawrence C. Paulson:
Proving properties of security
protocols by induction
December 1996, 24 pages, PDF
Abstract: Security protocols are formally specified in
terms of traces, which may involve many interleaved
protocol runs. Traces are defined inductively. Protocol
descriptions model accidental key losses as well as attacks. The model spy can send spoof messages made up
of components decrypted from previous traffic.
67
Correctness properties are verified using the proof
tool Isabelle/HOL. Several symmetric-key protocols
have been studied, including Needham-Schroeder, Yahalom and Otway-Rees. A new attack has been discovered in a variant of Otway-Rees (already broken by
Mao and Boyd). Assertions concerning secrecy and authenticity have been proved.
The approach rests on a common theory of messages, with three operators. The operator “parts” denotes the components of a set of messages. The operator “analz” denotes those parts that can be decrypted
with known keys. The operator “synth” denotes those
messages that can be expressed in terms of given components. The three operators enjoy many algebraic laws
that are invaluable in proofs.
UCAM-CL-TR-410
John Harrison:
Proof style
January 1997, 22 pages, PostScript
Abstract: We are concerned with how to communicate
a mathematical proof to a computer theorem prover.
This can be done in many ways, while allowing the machine to generate a completely formal proof object. The
most obvious choice is the amount of guidance required
from the user, or from the machine perspective, the degree of automation provided. But another important
consideration, which we consider particularly significant, is the bias towards a ‘procedural’ or ‘declarative’
proof style. We will explore this choice in depth, and
discuss the strengths and weaknesses of declarative and
procedural styles for proofs in pure mathematics and
for verification applications. We conclude with a brief
summary of our own experiments in trying to combine
both approaches.
UCAM-CL-TR-411
Monica Nesi:
Formalising process calculi in
Higher Order Logic
January 1997, 182 pages, paper copy
PhD thesis (Girton College, April 1996)
UCAM-CL-TR-412
UCAM-CL-TR-413
Lawrence C. Paulson:
Mechanized proofs of security
protocols:
Needham-Schroeder with public keys
January 1997, 20 pages, PDF
Abstract: The inductive approach to verifying security protocols, previously applied to shared-key encryption, is here applied to the public key version of the
Needham-Schroeder protocol. As before, mechanized
proofs are performed using Isabelle/HOL. Both the
original, flawed version and Lowe’s improved version
are studied; the properties proved highlight the distinctions between the two versions. The results are compared with previous analyses of the same protocol. The
analysis reported below required only 30 hours of the
author’s time. The proof scripts execute in under three
minutes.
UCAM-CL-TR-414
Martı́n Abadi, Andrew D. Gordon:
A calculus for cryptographic
protocols
The SPI calculus
January 1997, 105 pages, PostScript
Abstract: We introduce the spi calculus, an extension of
the pi calculus designed for the description and analysis
of cryptographic protocols. We show how to use the spi
calculus, particularly for studying authentication protocols. The pi calculus (without extension) suffices for
some abstract protocols; the spi calculus enables us to
consider cryptographic issues in more detail. We represent protocols as processes in the spi calculus and state
their security properties in terms of coarse-grained notions of protocol equivalence.
UCAM-CL-TR-415
Steven Leslie Pope:
Application support for mobile
computing
G.M. Bierman:
February 1997, 145 pages, paper copy
Observations on a linear PCF
(preliminary report)
PhD thesis (Jesus College, October 1996)
January 1997, 30 pages, paper copy
68
UCAM-CL-TR-416
UCAM-CL-TR-419
Donald Syme:
James Quentin Stafford-Fraser:
DECLARE: a prototype declarative
proof system for higher order logic
Video-augmented environments
February 1997, 25 pages, paper copy
PhD thesis (Gonville & Caius College, February 1996)
April 1997, 91 pages, PDF
UCAM-CL-TR-417
Abstract: In the future, the computer will be thought of
more as an assistant than as a tool, and users will inPeter J.C. Brown:
creasingly expect machines to make decisions on their
behalf. As with a human assistant, a machine’s ability to
Selective mesh refinement for
make informed choices will often depend on the extent
interactive terrain rendering
of its knowledge of activities in the world around it.
Equipping personal computers with a large number of
February 1997, 18 pages, paper copy
sensors for monitoring their environment is, however,
Abstract: Terrain surfaces are often approximated by expensive and inconvenient, and a preferable solution
geometric meshes to permit efficient rendering. This pa- would involve a small number of input devices with a
per describes how the complexity of an approximating broad scope of application. Video cameras are ideally
irregular mesh can be varied across its domain in order suited to many realworld monitoring applications for
to minimise the number of displayed facets while en- this reason. In addition, recent reductions in the manusuring that the rendered surface meets pre-determined facturing costs of simple cameras will soon make their
resolution requirements. We first present a generalised widespread deployment in the home and office economscheme to represent a mesh over a continuous range of ically viable. The use of video as an input device also
resolutions using the output from conventional single- allows the creation of new types of user-interface, more
resolution approximation methods. We then describe suitable in some circumstances than those afforded by
an algorithm which extracts a surface from this repre- the conventional keyboard and mouse.
This thesis examines some examples of these ‘Videosentation such that the resolution of the surface is enAugmented
Environments’ and related work, and then
hanced only in specific areas of interest. We prove that
describes
two
applications in detail. The first, a ‘softthe extracted surface is complete, minimal, satisfies the
ware
cameraman’,
uses the analysis of one video stream
given resolution constraints and meets the Delaunay triangulation criterion if possible. In addition, we present to control the display of another. The second, ‘Brighta method of performing smooth visual transitions be- Board’, allows a user to control a computer by making
tween selectively-refined meshes to permit efficient ani- marks on a conventional whiteboard, thus ‘augmenting’ the board with many of the facilities common to
mation of a terrain scene.
A HTML version of that report is at electronic documents, including the ability to fax, save,
print and email the image of the board. The techniques
http://www.cl.cam.ac.uk/research/rainbow/publications/pjcb/tr417/
which were found to be useful in the construction of
these applications are common to many systems which
UCAM-CL-TR-418
monitor real-world video, and so they were combined
Lawrence C. Paulson:
in a toolkit called ‘Vicar’. This provides an architecture
for ‘video plumbing’, which allows standard videoproMechanized proofs for a recursive
cessing components to be connected together under the
control of a scripting language. It is a single application
authentication protocol
which can be programmed to create a variety of simMarch 1997, 30 pages, PDF
ple Video-Augmented Environments, such as those deAbstract: A novel protocol has been formally analyzed scribed above, without the need for any recompilation,
using the prover Isabelle/HOL, following the inductive and so should simplify the construction of such appliapproach described in earlier work. There is no limit cations in the future. Finally, opportunities for further
on the length of a run, the nesting of messages or the exploration on this theme are discussed.
number of agents involved. A single run of the protocol
UCAM-CL-TR-420
delivers session keys for all the agents, allowing neighbours to perform mutual authentication. The basic security theorem states that session keys are correctly de- Jonathan Mark Sewell:
livered to adjacent pairs of honest agents, regardless of
whether other agents in the chain are compromised. Managing complex models for
The protocol’s complexity caused some difficulties in computer graphics
the specification and proofs, but its symmetry reduced
April 1997, 206 pages, PDF
the number of theorems to prove.
PhD thesis (Queens’ College, March 1996)
69
Abstract: Three-dimensional computer graphics is becoming more common as increasing computational
power becomes more readily available. Although the
images that can be produced are becoming more complex, users’ expectations continue to grow. This dissertation examines the changes in computer graphics software that will be needed to support continuing growth
in complexity, and proposes techniques for tackling the
problems that emerge.
Increasingly complex models will involve longer
rendering times, higher memory requirements, longer
data transfer periods and larger storage capacities. Furthermore, even greater demands will be placed on the
constructors of such models. This dissertation aims to
describe how to construct scalable systems which can
be used to visualise models of any size without requiring dedicated hardware. This is achieved by controlling
the quality of the results, and hence the costs incurred.
In addition, the use of quality controls can become a
tool to help users handle the large volume of information arising from complex models.
The underlying approach is to separate the model
from the graphics application which uses it, so that the
model exists independently. By doing this, an application is free to access only the data which is required at
any given time. For the application to function in this
manner, the data must be in an appropriate form. To
achieve this, approximation hierarchies are defined as
a suitable new model structure. These utilise multiple
representations of both objects and groups of objects at
all levels in the model.
In order to support such a structure, a novel method
is proposed for rapidly constructing simplified representations of groups of complex objects. By calculating
a few geometrical attributes, it is possible to generate
replacement objects that preserve important aspects of
the originals. Such objects, once placed into an approximation hierarchy, allow rapid loading and rendering of
large portions of a model. Extensions to rendering algorithms are described that take advantage of this structure.
The use of multiple representations encompasses
not only different quality levels, but also different storage formats and types of objects. It provides a framework within which such aspects are hidden from the
user, facilitating the sharing and re-use of objects. A
model manager is proposed as a means of encapsulating
these mechanisms. This software gives, as far as possible, the illusion of direct access to the whole complex
model, while at the same time making the best use of
the limited resources available.
UCAM-CL-TR-421
Michael Norrish:
An abstract dynamic semantics for
C
May 1997, 31 pages, PDF
Abstract: This report is a presentation of a formal semantics for the C programming language. The semantics has been defined operationally in a structured semantics style and covers the bulk of the core of the language. The semantics has been developed in a theorem
prover (HOL), where some expected consequences of
the language definition
UCAM-CL-TR-422
Antony Rowstron:
Using the BONITA primitives:
a case study
May 1997, 19 pages, paper copy
UCAM-CL-TR-423
Karl F. MacDorman:
Symbol grounding
Learning categorical and
sensorimotor predictions for
coordination in autonomous robots
May 1997, 170 pages, paper copy
PhD thesis (Wolfson College, March 1997)
UCAM-CL-TR-424
Fabio Massacci:
Simplification with renaming:
a general proof technique for tableau
and sequent-based provers
May 1997, 26 pages, DVI
UCAM-CL-TR-425
Leslie Lamport, Lawrence C. Paulson:
Should your specification language be
typed?
May 1997, 30 pages, PDF
Abstract: Most specification languages have a type system. Type systems are hard to get right, and getting
them wrong can lead to inconsistencies. Set theory can
serve as the basis for a specification language without types. This possibility, which has been widely overlooked, offers many advantages. Untyped set theory
is simple and is more flexible than any simple typed
formalism. Polymorphism, overloading, and subtyping
can make a type system more powerful, but at the cost
of increased complexity, and such refinements can never
70
attain the flexibility of having no types at all. Typed
formalisms have advantages too, stemming from the
power of mechanical type checking. While types serve
little purpose in hand proofs, they do help with mechanized proofs. In the absence of verification, type checking can catch errors in specifications. It may be possible
to have the best of both worlds by adding typing annotations to an untyped specification language.
We consider only specification languages, not programming languages.
UCAM-CL-TR-426
Mark Humphrys:
Action selection methods using
reinforcement learning
June 1997, 195 pages, PostScript
PhD thesis (Trinity Hall)
Abstract: The Action Selection problem is the problem of run-time choice between conflicting and heterogenous goals, a central problem in the simulation
of whole creatures (as opposed to the solution of isolated uninterrupted tasks). This thesis argues that Reinforcement Learning has been overlooked in the solution of the Action Selection problem. Considering a
decentralised model of mind, with internal tension and
competition between selfish behaviors, this thesis introduces an algorithm called “W-learning”, whereby different parts of the mind modify their behavior based on
whether or not they are succeeding in getting the body
to execute their actions. This thesis sets W-learning in
context among the different ways of exploiting Reinforcement Learning numbers for the purposes of Action Selection. It is a ‘Minimize the Worst Unhappiness’ strategy. The different methods are tested and
their strengths and weaknesses analysed in an artificial
world.
UCAM-CL-TR-427
Don Syme:
Proving Java type soundness
June 1997, 35 pages, paper copy
Abstract: In that they often embody compact but
mathematically sophisticated algorithms, operations
for computing the common transcendental functions in
floating point arithmetic seem good targets for formal
verification using a mechanical theorem prover. We discuss some of the general issues that arise in verifications
of this class, and then present a machine-checked verification of an algorithm for computing the exponential function in IEEE-754 standard binary floating point
arithmetic. We confirm (indeed strengthen) the main result of a previously published error analysis, though we
uncover a minor error in the hand proof and are forced
to confront several subtle issues that might easily be
overlooked informally.
Our main theorem connects the floating point exponential to its abstract mathematical counterpart. The
specification we prove is that the function has the correct overflow behaviour and, in the absence of overflow,
the error in the result is less than 0.54 units in the last
place (0.77 if the answer is denormalized) compared
against the exact mathematical exponential function.
The algorithm is expressed in a simple formalized programming language, intended to be a subset of real programming and hardware description languages. It uses
underlying floating point operations (addition, multiplication etc.) that are assumed to conform to the IEEE754 standard for binary floating point arithmetic.
The development described here includes, apart
from the proof itself, a formalization of IEEE arithmetic, a mathematical semantics for the programming
language in which the algorithm is expressed, and the
body of pure mathematics needed. All this is developed logically from first principles using the HOL Light
prover, which guarantees strict adherence to simple
rules of inference while allowing the user to perform
proofs using higher-level derived rules. We first present
the main ideas and conclusions, and then collect some
technical details about the prover and the underlying
mathematical theories in appendices.
UCAM-CL-TR-429
Andrew D. Gordon, Paul D. Hankin,
Søren B. Lassen:
Compilation and equivalence of
imperative objects
June 1997, 64 pages, PostScript
UCAM-CL-TR-428
John Harrison:
Floating point verification in
HOL Light: the exponential function
June 1997, 112 pages, PostScript
Abstract: We adopt the untyped imperative object calculus of Abadi and Cardelli as a minimal setting in
which to study problems of compilation and program
equivalence that arise when compiling object-oriented
languages. We present both a big-step and a small-step
substitution-based operational semantics for the calculus. Our first two results are theorems asserting the
equivalence of our substitution-based semantics with a
closure-based semantics like that given by Abadi and
71
Cardelli. Our third result is a direct proof of the correctUCAM-CL-TR-433
ness of compilation to a stack-based abstract machine
via a small-step decompilation algorithm. Our fourth Martin Richards:
result is that contextual equivalence of objects coincides
with a form of Mason and Talcott’s CIU equivalence; Backtracking algorithms in MCPL
the latter provides a tractable means of establishing opusing bit patterns and recursion
erational equivalences. Finally, we prove correct an algorithm, used in our prototype compiler, for statically July 1997, 80 pages, paper copy
resolving method offsets. This is the first study of correctness of an object-oriented abstract machine, and of
UCAM-CL-TR-434
operational equivalence for the imperative object calculus.
Martin Richards:
Demonstration programs for CTL
and µ-calculus symbolic model
checking
UCAM-CL-TR-430
G.J.F. Jones, et al.:
Video mail retrieval using voice
Report on topic spotting
August 1997, 41 pages, paper copy
UCAM-CL-TR-435
July 1997, 73 pages, paper copy
Peter Sewell:
UCAM-CL-TR-431
Martin Richards:
The MCPL programming manual and
user guide
Global/local subtyping for a
distributed π-calculus
August 1997, 57 pages, PostScript
UCAM-CL-TR-436
July 1997, 70 pages, paper copy
W.F. Clocksin:
UCAM-CL-TR-432
Lawrence C. Paulson:
A new method for estimating optical
flow
On two formal analyses of the
Yahalom protocol
November 1997, 20 pages, PDF
July 1997, 16 pages, PDF
Abstract: The Yahalom protocol is one of those analyzed by Burrows et al. in the BAN paper. Based
upon their analysis, they have proposed modifications
to make the protocol easier to understand and analyze.
Both versions of Yahalom have now been proved, using Isabelle/HOL, to satisfy strong security goals. The
mathematical reasoning behind these machine proofs is
presented informally.
The new proofs do not rely on a belief logic; they
use an entirely different formal model, the inductive
method. They confirm the BAN analysis and the advantages of the proposed modifications. The new proof
methods detect more flaws than BAN and analyze protocols in finer detail, while remaining broadly consistent with the BAN principles. In particular, the proofs
confirm the explicitness principle of Abadi and Needham.
Abstract: Accurate and high density estimation of optical flow vectors in an image sequence is accomplished
by a method that estimates the velocity distribution
function for small overlapping regions of the image. Because the distribution is multimodal, the method can
accurately estimate the change in velocity near motion
contrast borders. Large spatiotemporal support without sacrificing spatial resolution is a feature of the
method, so it is not necessary to smooth the resulting
flow vectors in a subsequent operation, and there is a
certain degree of resistance to aperture and aliasing effects. Spatial support also provides for the accurate estimation of long-range displacements, and subpixel accuracy is achieved by a simple weighted mean near the
mode of the velocity distribution function.
The method is demonstrated using image sequences
obtained from the analysis of ceramic and metal materials under stress. The performance of the system under degenerate conditions is also analysed to provide
insight into the behaviour of optical flow methods in
general.
72
UCAM-CL-TR-437
William S. Harbison:
Trusting in computer systems
December 1997, 95 pages, PDF
PhD thesis (Wolfson College, May 1997)
Abstract: We need to be able to reason about large systems, and not just about their components. For this we
need new conceptual tools, and this dissertation therefore indicates the need for a new methodology which
will allow us to better identify areas of possible conflict
or lack of knowledge in a system.
In particular, it examines at the concept of trust,
and how this can help us to understand the basic security aspects of a system. The main proposal of this
present work is that systems are viewed in a manner
which analyses the conditions under which they have
been designed to perform, and the circumstances under
which they have been implemented, and then compares
the two. This problem is then examined from the point
of what is being trusted in a system, or what it is being
trusted for.
Starting from an approach developed in a military
context, we demonstrate how this can lead to unanticipated risks when applied inappropriately. We further
suggest that ‘trust’ be considered a relative concept, in
contast to the more usual usage, and that it is not the
result of knowledge but a substitute for it. The utility
of these concepts is in their ability to quantify the risks
associated with a specific participant, whether these are
explicitly accepted by them, or not.
We finally propose a distinction between ‘trust’ and
‘trustworthy’ and demonstrate that most current uses
of the term ‘trust’ are more appropriately viewed as
statements of ‘trustworthiness’. Ultimately, therefore,
we suggest that the traditional “Orange Book” concept
of trust resulting from knowledge can violate the security policy of a system.
UCAM-CL-TR-438
variable bit rate (VBR) video can be analysed in advance and used in run-time admission control (AC) and
data retrieval.
Recent research has made gigabit switches a reality,
and the cost/performance ratio of microprocessors and
standard PCs is dropping steadily. It would be more
cost effective and flexible to use off-the-shelf components inside a video server with a scalable switched network as the primary interconnect than to make a special purpose or massively parallel multiprocessor based
video server. This work advocates and assumes such a
scalable video server structure in which data is striped
to multiple peripherals attached directly to a switched
network.
However, most contemporary distributed file systems do not support data distribution across multiple
networked nodes, let alone providing quality of service
(QoS) to CM applications at the same time. It is the
observation of this dissertation that the software system framework for network striped video servers is as
important as the scalable hardware architecture itself.
This leads to the development of a new system architecture, which is scalable, flexible and QoS aware, for
scalable and deterministic video servers. The resulting
srchitecture is called Cadmus from sCAlable and Deterministic MUlitmedia Servers.
Cadmus also provides integrated solutions to AC
and actual QoS enforcement in storage nodes. This is
achieved by considering resources such as CPU buffer,
disk, and network, simultaneously but not independently and by including both real-time (RT) and nonreal-time (NRT) activities, In addition, the potential
to smooth the variability of VBR videos using readahead under client buffer constraints is identified. A
new smoothing algorithm is presented, analysed, and
incorporated into the Cadmus architecture.
A prototype implementation of Cadmus has been
constructed based on distributed object computing
and hardware modules directly connected to an Asynchronous Transfer Mode (ATM) network. Experiments
were performed to evaluate the implementation and
demonstrate the utility and feasibility of the architecture and its AC criteria.
UCAM-CL-TR-439
Feng Shi:
David A. Halls:
An architecture for scalable and
deterministic video servers
November 1997, 148 pages, PDF
Applying mobile code to distributed
systems
PhD thesis (Wolfson College, June 1997)
December 1997, 158 pages, paper copy
Abstract: A video server is a storage system that can
provide a repository for continuous media (CM) data
and sustain CM stream delivery (playback or recording) through networks. The voluminous nature of CM
data demands a video server to be scalable in order to
serve a large number of concurrent client requests. In
addition, deterministic services can be provided by a
video server for playback because the characteristics of
PhD thesis (June 1997)
UCAM-CL-TR-440
Lawrence C. Paulson:
Inductive analysis of the internet
protocol TLS
December 1997, 19 pages, PDF
73
Abstract: Internet browsers use security protocols to
protect confidential messages. An inductive analysis of
TLS (a descendant of SSL 3.0) has been performed using the theorem prover Isabelle. Proofs are based on
higher-order logic and make no assumptions concerning beliefs or finiteness. All the obvious security goals
can be proved; session resumption appears to be secure
even if old session keys have been compromised. The
analysis suggests modest changes to simplify the protocol.
TLS, even at an abstract level, is much more complicated than most protocols that researchers have verified. Session keys are negotiated rather than distributed,
and the protocol has many optional parts. Nevertheless, the resources needed to verify TLS are modest. The
inductive approach scales up.
Lawrence C. Paulson:
January 1998, 16 pages, PDF
Abstract: A generic tableau prover has been implemented and integrated with Isabelle. It is based on
leantap but is much more complicated, with numerous
modifications to allow it to reason with any supplied set
of tableau rules. It has a higher-order syntax in order to
support the binding operators of set theory; unification
is first-order (extended for bound variables in obvious
ways) instead of higher-order, for simplicity.
When a proof is found, it is returned to Isabelle
as a list of tactics. Because Isabelle verifies the proof,
the prover can cut corners for efficiency’s sake without compromising soundness. For example, it knows
almost nothing about types.
A combination of nonstandard
analysis and geometry theorem
proving, with application to
Newton’s Principia
January 1998, 13 pages, PostScript
Lawrence C. Paulson:
February 1998, 46 pages, PDF
A generic tableau prover and
its integration with Isabelle
Jacques Fleuriot, Lawrence C. Paulson:
UCAM-CL-TR-443
The inductive approach to verifying
cryptographic protocols
UCAM-CL-TR-441
UCAM-CL-TR-442
Geometry. These present difficulties that prevent Newton’s proofs from being mechanised using only the existing geometry theorem proving (GTP) techniques.
Using concepts from Robinson’s Nonstandard Analysis (NSA) and a powerful geometric theory, we introduce the concept of an infinitesimal geometry in which
quantities can be infinitely small or infinitesimal. We
reveal and prove new properties of this geometry that
only hold because infinitesimal elements are allowed
and use them to prove lemmas and theorems from the
Principia.
Abstract: Informal arguments that cryptographic protocols are secure can be made rigorous using inductive definitions. The approach is based on ordinary
predicate calculus and copes with infinite-state systems.
Proofs are generated using Isabelle/HOL. The human
effort required to analyze a protocol can be as little as
a week or two, yielding a proof script that takes a few
minutes to run.
Protocols are inductively defined as sets of traces.
A trace is a list of communication events, perhaps
comprising many interleaved protocol runs. Protocol
descriptions incorporate attacks and accidental losses.
The model spy knows some private keys and can forge
messages using components decrypted from previous
traffic. Three protocols are analyzed below: OtwayRees (which uses shared-key encryption), NeedhamSchroeder (which uses public-key encryption), and a recursive protocol (which is of variable length).
One can prove that event ev always precedes event
ev0 or that property P holds provided X remains secret.
Properties can be proved from the viewpoint of the various principals: say, if A receives a final message from B
then the session key it conveys is good.
UCAM-CL-TR-444
Peter Sewell:
From rewrite rules to bisimulation
congruences
Abstract: The theorem prover Isabelle is used to for- May 1998, 72 pages, PostScript
malise and reproduce some of the styles of reasoning
used by Newton in his Principia. The Principia’s reasoning is resolutely geometric in nature but contains “infinitesimal” elements and the presence of motion that
take it beyond the traditional boundaries of Euclidean
74
UCAM-CL-TR-445
Michael Roe, Bruce Christianson,
David Wheeler:
Secure sessions from weak secrets
July 1998, 12 pages, PDF
Abstract: Theorem proving provides formal and detailed support to the claim that timestamps can give
better freshness guarantees than nonces do, and can
simplify the design of crypto-protocols. However, since
they rely on synchronised clocks, their benefits are still
debatable. The debate should gain from our formal
analysis, which is achieved through the comparison
of a nonce-based crypto-protocol, Needham-Schroeder,
with its natural modification by timestamps, Kerberos.
Abstract: Sometimes two parties who share a weak secret k (such as a password) wish to share a strong secret
UCAM-CL-TR-448
s (such as a session key) without revealing information
about k to a (possibly active) attacker. We assume that
G.M. Bierman:
both parties can generate strong random numbers and
forget secrets, and present three protocols for secure A computational interpretation
strong secret sharing, based on RSA, Diffie-Hellman
and El-Gamal. As well as being simpler and quicker λµ calculus
than their predecessors, our protocols also have slightly
September 1998, 10 pages, paper copy
stronger security properties: in particular, they make no
cryptographic use of s and so impose no subtle restricUCAM-CL-TR-449
tions upon the use which is made of s by other protocols.
of the
Florian Kammüller, Markus Wenzel:
UCAM-CL-TR-446
K. Spärck Jones, S. Walker, S.E. Robertson:
A probabilistic model of information
and retrieval:
development and status
August 1998, 74 pages, PostScript
Locales
A sectioning concept for Isabelle
October 1998, 16 pages, paper copy
UCAM-CL-TR-450
Jacobus Erasmus van der Merwe:
Open service support for ATM
Abstract: The paper combines a comprehensive account
of the probabilistic model of retrieval with new sys- November 1998, 164 pages, paper copy
tematic experiments on TREC Programme material. It PhD thesis (St John’s College, September 1997)
presents the model from its foundations through its logical development to cover more aspects of retrieval data
UCAM-CL-TR-451
and a wider range of system functions. Each step in the
argument is matched by comparative retrieval tests, to
Sean Rooney:
provide a single coherent account of a major line of research. The experiments demonstrate, for a large test The structure of open ATM control
collection, that the probabilistic model is effective and
robust, and that it responds appropriately, with major architectures
improvements in performance, to key features of reNovember 1998, 183 pages, paper copy
trieval situations.
PhD thesis (Wolfson College, February 1998)
UCAM-CL-TR-447
Giampaolo Bella, Lawrence C. Paulson:
Are timestamps worth the effort?
A formal treatment
September 1998, 12 pages, PDF
UCAM-CL-TR-452
Florian Kammüller, Lawrence C. Paulson:
A formal proof of Sylow’s theorem
An experiment in abstract algebra
with Isabelle Hol
November 1998, 30 pages, PDF
75
Abstract: The theorem of Sylow is proved in Isabelle
HOL. We follow the proof by Wielandt that is more
general than the original and uses a non-trivial combinatorial identity. The mathematical proof is explained
in some detail leading on to the mechanization of group
theory and the necessary combinatorics in Isabelle. We
present the mechanization of the proof in detail giving
reference to theorems contained in an appendix. Some
weak points of the experiment with respect to a natural
treatment of abstract algebraic reasoning give rise to a
discussion of the use of module systems to represent abstract algebra in theorem provers. Drawing from that,
we present tentative ideas for further research into a
section concept for Isabelle.
This technical report is substantially the same as the
PhD thesis I submitted in August 1998. The minor differences between that document and this are principally
improvements suggested by my examiners Andy Gordon and Tom Melham, whom I thank for their help
and careful reading.
UCAM-CL-TR-454
Andrew M. Pitts:
Parametric polymorphism and
operational equivalence
December 1998, 39 pages, paper copy
UCAM-CL-TR-453
UCAM-CL-TR-455
Michael Norrish:
C formalised in HOL
G.M. Bierman:
December 1998, 156 pages, PDF
Multiple modalities
PhD thesis (August 1998)
December 1998, 26 pages, paper copy
Abstract: We present a formal semantics of the C programming language, covering both the type system and
the dynamic behaviour of programs. The semantics is
wide-ranging, covering most of the language, with its
most significant omission being the C library. Using a
structural operational semantics we specify transition
relations for C’s expressions, statements and declarations in higher order logic.
The consistency of our definition is assured by its
specification in the HOL theorem prover. With the theorem prover, we have used the semantics as the basis for
a set of proofs of interesting theorems about C. We investigate properties of expressions and statements separately.
In our chapter of results about expressions, we begin with two results about the interaction between the
type system and the dynamic semantics. We have both
type preservation, that the values produced by expressions conform to the type predicted for them; and type
safety, that typed expressions will not block, but will either evaluate to a value, or cause undefined behaviour.
We then also show that two broad classes of expression are deterministic. This last result is of considerable
practical value as it makes later verification proofs significantly easier.
In our chapter of results about statements, we prove
a series of derived rules that provide C with FloydHoare style “axiomatic” rules for verifying properties
of programs. These rules are consequences of the original semantics, not independently stated axioms, so we
can be sure of their soundness. This chapter also proves
the correctness of an automatic tool for constructing
post-conditions for loops with break and return statements.
Finally, we perform some simple verification case
studies, going some way towards demonstrating practical utility for the semantics and accompanying tools.
UCAM-CL-TR-456
Joshua Robert Xavier Ross:
An evaluation based approach to
process calculi
January 1999, 206 pages, paper copy
PhD thesis (Clare College, July 1998)
UCAM-CL-TR-457
Andrew D. Gordon, Paul D. Hankin:
A concurrent object calculus:
reduction and typing
February 1999, 63 pages, paper copy
UCAM-CL-TR-458
Lawrence C. Paulson:
Final coalgebras as greatest fixed
points in ZF set theory
March 1999, 25 pages, PDF
76
Abstract: A special final coalgebra theorem, in the style
of Aczel (1988), is proved within standard ZermeloFraenkel set theory. Aczel’s Anti-Foundation Axiom is
replaced by a variant definition of function that admits
non-well-founded constructions. Variant ordered pairs
and tuples, of possibly infinite length, are special cases
of variant functions. Analogues of Aczel’s solution and
substitution lemmas are proved in the style of Rutten and Turi (1993). The approach is less general than
Aczel’s, but the treatment of non-well-founded objects
is simple and concrete. The final coalgebra of a functor
is its greatest fixedpoint. Compared with previous work
(Paulson, 1995a), iterated substitutions and solutions
are considered, as well as final coalgebras defined with
respect to parameters. The disjoint sum construction is
replaced by a smoother treatment of urelements that
simplifies many of the derivations. The theory facilitates machine implementation of recursive definitions
by letting both inductive and coinductive definitions be
represented as fixedpoints. It has already been applied
to the theorem prover Isabelle (Paulson, 1994).
UCAM-CL-TR-459
Mohamad Afshar:
An open parallel architecture for
data-intensive applications
July 1999, 225 pages, PostScript
PhD thesis (King’s College, December 1998)
Abstract: Data-intensive applications consist of both
declarative data-processing parts and imperative computational parts. For applications such as climate
modelling, scale hits both the computational aspects
which are typically handled in a procedural programming language, and the data-processing aspects
which are handled in a database query language. Although parallelism has been successfully exploited in
the data-processing parts by parallel evaluation of
database queries associated with the application, current database query languages are poor at expressing
the computational aspects, which are also subject to
scale.
This thesis proposes an open architecture that delivers parallelism shared between the database, system
and application, thus enabling the integration of the
conventionally separated query and non-query components of a data-intensive application. The architecture
is data-model independent and can be used in a variety of different application areas including decisionsupport applications, which are query based, and complex applications, which comprise procedural language
statements with embedded queries. The architecture encompasses a unified model of parallelism and the realisation of this model in the form of a language within
which it is possible to describe both the query and
non-query components of data-intensive applications.
The language enables the construction of parallel applications by the hierarchical composition of platformindependent parallel forms, each of which implements
a form of task or data parallelism. These forms may be
used to determine both query and non-query actions.
Queries are expressed in a declarative language
based on “monoid comprehensions”. The approach
of using monoids to model data types and monoid
homomorphisms to iterate over collection types enables mathematically provable compile-time optimisations whilst also facilitating multiple collection types
and data type extensibility. Monoid comprehension
programs are automatically transformed into parallel programs composed of applications of the parallel forms, one of which is the “monoid homomorphism”. This process involves identifying the parts of
a query where task and data parallelism are available
and mapping that parallelism onto the most suitable
form. Data parallelism in queries is mapped onto a
form that implements combining tree parallelism for
query evaluation and dividing tree parallelism to realise data partitioning. Task parallelism is mapped onto
two separate forms that implement pipeline and independent parallelism. This translation process is applied to all comprehension queries including those in
complex applications. The result is a skeleton program
in which both the query and non-query parts are expressed within a single language. Expressions in this
language are amenable to the application of optimising
skeleton rewrite rules.
A complete prototype of the decision-support architecture has been constructed on a 128-cell MIMD
parallel computer. A demonstration of the utility of
the query framework is performed by modelling some
of OQL and a substantial subset of SQL. The system is evaluated for query speedup with a number of
hardware configurations using a large music catalogue
database. The results obtained show that the implementation delivers the performance gains expected while
offering a convenient definition of the parallel environment.
UCAM-CL-TR-460
Giampaolo Bella:
Message reception in the inductive
approach
March 1999, 16 pages, PDF
Abstract: Cryptographic protocols can be formally
analysed in great detail by means of Paulson’s Inductive Approach, which is mechanised by the theorem
prover Isabelle. The approach only relied on message
sending (and noting) in order to keep the models simple. We introduce a new event, message reception, and
show that the price paid in terms of runtime is negligible because old proofs can be reused. On the other
77
hand, the new event enhances the global expressiveUCAM-CL-TR-465
ness, and makes it possible to define an accurate notion of agents’ knowledge, which extends and replaces Boaz Lerner, Seema Dhanjal, Maj Hultén:
Paulson’s notion of spy’s knowledge. We have designed
new guarantees to assure each agent that the peer does Gelfish – graphical environment for
not know the crucial message items of the session. This
labelling FISH images
work thus extends the scope of the Inductive approach.
Finally, we provide general guidance on updating the May 1999, 20 pages, paper copy
protocols analysed so far, and give examples for some
cases.
UCAM-CL-TR-466
UCAM-CL-TR-461
Joe Hurd:
Integrating Gandalf and HOL
March 1999, 11 pages, PDF
Boaz Lerner, William Clocksin,
Seema Dhanjal, Maj Hultén,
Christipher Bishop:
Automatic signal classification in
fluorescence in-situ hybridization
images
Abstract: Gandalf is a first-order resolution theoremprover, optimized for speed and specializing in manipulations of large clauses. In this paper I describe GANDALF TAC, a HOL tactic that proves goals by calling May 1999, 24 pages, paper copy
Gandalf and mirroring the resulting proofs in HOL.
This call can occur over a network, and a Gandalf
UCAM-CL-TR-467
server may be set up servicing multiple HOL clients.
In addition, the translation of the Gandalf proof into
Lawrence C. Paulson:
HOL fits in with the LCF model and guarantees logical
consistency.
Mechanizing UNITY in Isabelle
UCAM-CL-TR-462
Peter Sewell, Paweł T. Wojciechowski,
Benjamin C. Pierce:
Location-independent
communication for mobile agents:
a two-level architecture
April 1999, 31 pages, PostScript
UCAM-CL-TR-463
Peter Sewell, Jan Vitek:
Secure composition of insecure
components
June 1999, 22 pages, PDF
Abstract: UNITY is an abstract formalism for proving properties of concurrent systems, which typically
are expressed using guarded assignments [Chandy and
Misra 1988]. UNITY has been mechanized in higherorder logic using Isabelle, a proof assistant. Safety and
progress primitives, their weak forms (for the substitution axiom) and the program composition operator
(union) have been formalized. To give a feel for the
concrete syntax, the paper presents a few extracts from
the Isabelle definitions and proofs. It discusses a small
example, two-process mutual exclusion. A mechanical
theory of unions of programs supports a degree of compositional reasoning. Original work on extending program states is presented and then illustrated through a
simple example involving an array of processes.
April 1999, 44 pages, PostScript
UCAM-CL-TR-464
Boaz Lerner, William Clocksin,
Seema Dhanjal, Maj Hultén,
Christipher Bishop:
Feature representation for the
automatic analysis of fluorescence
in-situ hybridization images
UCAM-CL-TR-468
Stephen Paul Wilcox:
Synthesis of asynchronous circuits
July 1999, 250 pages, PDF
PhD thesis (Queens’ College, December 1998)
May 1999, 36 pages, paper copy
78
Abstract: The majority of integrated circuits today are
synchronous: every part of the chip times its operation
with reference to a single global clock. As circuits become larger and faster, it becomes progressively more
difficult to coordinate all actions of the chip to the
clock. Asynchronous circuits do not suffer from this
problem, because they do not require global synchronization; they also offer other benefits, such as modularity, lower power and automatic adaptation to physical conditions.
The main disadvantage of asynchronous circuits is
that there are few tools to help with design. This thesis
describes a new synthesis tool for asynchronous modules, which combines a number of novel ideas with existing methods for finite state machine synthesis. Connections between modules are assumed to have unbounded finite delays on all wires, but fundamental
mode is used inside modules, rather than the pessimistic
speed-independent or quasi-delay-insensitive models.
Accurate technology-specific verification is performed
to check that circuits work correctly.
Circuits are described using a language based upon
the Signal Transition Graph, which is a well-known
method for specifying asynchronous circuits. Concurrency reduction techniques are used to produce a large
number of circuits that conform to a given specification. Circuits are verified using a simulation algorithm
derived from the work of Brzozowski and Seger, and
then performance estimations are obtained by a gatelevel simulator utilising a new estimation of waveform
slopes. Circuits can be ranked in terms of high speed,
low power dissipation or small size, and then the best
circuit for a particular task chosen.
Results are presented that show significant improvements over most circuits produced by other synthesis tools. Some circuits are twice as fast and dissipate
half the power of equivalent speed-independent circuits. Specification examples are provided which show
that the front-end specification is easier to use than current specification approaches. The price that must be
paid for the improved performance is decreased reliability and technology dependence of the circuits produced; the proposed tool can also can a very long time
to produce a result.
UCAM-CL-TR-469
UCAM-CL-TR-470
Florian Kammüller:
Modular reasoning in Isabelle
August 1999, 128 pages, paper copy
PhD thesis (Clare College, April 1999)
UCAM-CL-TR-471
Robert M. Brady, Ross J. Anderson,
Robin C. Ball:
Murphy’s law, the fitness of
evolving species, and the limits of
software reliability
September 1999, 14 pages, PDF
Abstract: We tackle two problems of interest to the software assurance community. Firstly, existing models of
software development (such as the waterfall and spiral
models) are oriented towards one-off software development projects, while the growth of mass market computing has led to a world in which most software consists of packages which follow an evolutionary development model. This leads us to ask whether anything
interesting and useful may be said about evolutionary
development. We answer in the affirmative. Secondly,
existing reliability growth models emphasise the Poisson distribution of individual software bugs, while the
empirically observed reliability growth for large systems is asymptotically slower than this. We provide a
rigorous explanation of this phenomenon. Our reliability growth model is inspired by statistical thermodynamics, but also applies to biological evolution. It is
in close agreement with experimental measurements of
the fitness of an evolving species and the reliability of
commercial software products. However, it shows that
there are significant differences between the evolution
of software and the evolution of species. In particular,
we establish maximisation properties corresponding to
Murphy’s law which work to the advantage of a biological species, but to the detriment of software reliability.
UCAM-CL-TR-472
Jacques Désiré Fleuriot:
A combination of geometry theorem
proving and nonstandard analysis,
with application to Newton’s
Principia
August 1999, 135 pages, paper copy
PhD thesis (Clare College, March 1999)
Ben Y. Reis:
Simulating music learning with
autonomous listening agents:
entropy, ambiguity and context
September 1999, 200 pages, paper copy
PhD thesis (Queens’ College, July 1999)
79
UCAM-CL-TR-473
UCAM-CL-TR-479
Clemens Ballarin:
Geraint Price:
Computer algebra and theorem
proving
The interaction between fault
tolerance and security
October 1999, 122 pages, paper copy
December 1999, 144 pages, PDF
PhD thesis (Darwin College)
PhD thesis (Wolfson College, June 1999)
UCAM-CL-TR-474
Boaz Lerner:
A Bayesian methodology and
probability density estimation for
fluorescence in-situ hybridization
signal classification
October 1999, 31 pages, paper copy
UCAM-CL-TR-475
Boaz Lerner, Neil D. Lawrence:
A comparison of state-of-the-art
classification techniques with
application to cytogenetics
October 1999, 34 pages, paper copy
UCAM-CL-TR-476
Mark Staples:
Linking ACL2 and HOL
November 1999, 23 pages, paper copy
UCAM-CL-TR-477
Gian Luca Cattani, Glynn Winskel:
Presheaf models for CCS-like
languages
November 1999, 46 pages, paper copy
UCAM-CL-TR-478
Peter Sewell, Jan Vitek:
Secure composition of untrusted
code: wrappers and causality types
Abstract: This dissertation studies the effects on system
design when including fault tolerance design principles
within security services.
We start by looking at the changes made to the trust
model within protocol design, and how moving away
from trusted server design principles affects the structure of the protocol. Taking the primary results from
this work, we move on to study how control in protocol execution can be used to increase assurances in the
actions of legitimate participants. We study some examples, defining two new classes of attack, and note that
by increasing client control in areas of protocol execution, it is possible to overcome certain vulnerabilities.
We then look at different models in fault tolerance,
and how their adoption into a secure environment can
change the design principles and assumptions made
when applying the models.
We next look at the application of timing checks
in protocols. There are some classes of timing attack
that are difficult to thwart using existing techniques,
because of the inherent unreliability of networked communication. We develop a method of converting the
Quality of Service mechanisms built into ATM networks in order to achieve another layer of protection
against timing attacks.
We then study the use of primary-backup mechanisms within server design, as previous work on server
replication in security centres on the use of the state machine approach for replication, which provides a higher
degree of assurance in system design, but adds complexity.
We then provide a design for a server to reliably
and securely store objects across a loosely coupled, distributed environment. The main goal behind this design
was to realise the ability for a client to exert control
over the fault tolerance inherent in the service.
The main conclusions we draw from our research
are that fault tolerance has a wider application within
security than current practices, which are primarily
based on replicating servers, and clients can exert control over the protocols and mechanisms to achieve resilience against differing classes of attack. We promote
some new ideas on how, by challenging the prevailing model for client-server architectures in a secure environment, legitimate clients can have greater control
over the services they use. We believe this to be a useful
goal, given that the client stands to lose if the security
of the server is undermined.
November 1999, 36 pages, PostScript
80
UCAM-CL-TR-480
UCAM-CL-TR-484
Mike Gordon:
Richard Tucker:
Programming combinations of
deduction and BDD-based symbolic
calculation
Automatic summarising and the
CLASP system
December 1999, 24 pages, paper copy
UCAM-CL-TR-481
Mike Gordon, Ken Friis Larsen:
Combining the Hol98 proof assistant
with the BuDDy BDD package
December 1999, 71 pages, paper copy
UCAM-CL-TR-482
John Daugman:
Biometric decision landscapes
January 2000, 15 pages, PDF
January 2000, 190 pages, PDF
PhD thesis (1999)
Abstract: This dissertation discusses summarisers and
summarising in general, and presents CLASP, a new
summarising system that uses a shallow semantic representation of the source text called a “predication cohesion graph”.
Nodes in the graph are “simple predications” corresponding to events, states and entities mentioned in the
text; edges indicate related or similar nodes. Summary
content is chosen by selecting some of these predications according to criteria of “importance”, “representativeness” and “cohesiveness”. These criteria are expressed as functions on the nodes of a weighted graph.
Summary text is produced either by extracting whole
sentences from the source text, or by generating short,
indicative “summary phrases” from the selected predications.
CLASP uses linguistic processing but no domain
knowledge, and therefore does not restrict the subject
matter of the source text. It is intended to deal robustly
with complex texts that it cannot analyse completely
accurately or in full. Experiments in summarising stories from the Wall Street Journal suggest there may be a
benefit in identifying important material in a semantic
representation rather than a surface one, but that, despite the robustness of the source representation, inaccuracies in CLASP’s linguistic analysis can dramatically
affect the readability of its summaries. I discuss ways in
which this and other problems might be overcome.
Abstract: This report investigates the “decision landscapes” that characterize several forms of biometric decision making. The issues discussed include: (i) Estimating the degrees-of-freedom associated with different
biometrics, as a way of measuring the randomness and
complexity (and therefore the uniqueness) of their templates. (ii) The consequences of combining more than
one biometric test to arrive at a decision. (iii) The requirements for performing identification by large-scale
exhaustive database search, as opposed to mere verification by comparison against a single template. (iv)
Scenarios for Biometric Key Cryptography (the use of
UCAM-CL-TR-485
biometrics for encryption of messages). These issues are
considered here in abstract form, but where appropri- Daryl Stewart, Myra VanInwegen:
ate, the particular example of iris recognition is used
as an illustration. A unifying theme of all four sets of Three notes on the interpretation
issues is the role of combinatorial complexity, and its
measurement, in determining the potential decisiveness Verilog
of biometric decision making.
January 2000, 47 pages, paper copy
UCAM-CL-TR-483
UCAM-CL-TR-486
Hendrik Jaap Bos:
James Richard Thomas:
Elastic network control
Stretching a point: aspect and
temporal discourse
January 2000, 184 pages, paper copy
PhD thesis (Wolfson College, August 1999)
February 2000, 251 pages, paper copy
PhD thesis (Wolfson College, January 1999)
81
of
UCAM-CL-TR-487
we design a framework that can support these properties. To analyse the framework, we define a security
policy for web publishing that focuses on the guaranteed integrity and authenticity of web publications, and
then describe some technical primitives that enable us
to achieve our requirements. Finally, the Jikzi publishing system—an implementation of our framework—is
presented with descriptions of its architecture and possible applications.
Tanja Vos, Doaitse Swierstra:
Sequential program composition in
UNITY
March 2000, 20 pages, paper copy
UCAM-CL-TR-488
UCAM-CL-TR-490
Giampaolo Bella, Fabio Massacci,
Lawrence Paulson, Piero Tramontano:
Peter John Cameron Brown:
Selective mesh refinement for
rendering
Formal verification of card-holder
registration in SET
April 2000, 179 pages, paper copy
March 2000, 15 pages, paper copy
PhD thesis (Emmanuel College, February 1998)
UCAM-CL-TR-489
Jong-Hyeon Lee:
Designing a reliable publishing
framework
April 2000, 129 pages, PDF
PhD thesis (Wolfson College, January 2000)
Abstract: Due to the growth of the Internet and the
widespread adoption of easy-to use web browsers, the
web provides a new environment for conventional as
well as new businesses. Publishing on the web is a fundamental and important means of supporting various
activities on the Internet such as commercial transactions, personal home page publishing, medical information distribution, public key certification and academic
scholarly publishing. Along with the dramatic growth
of the web, the number of reported frauds is increasing
sharply. Since the Internet was not originally designed
for web publishing, it has some weaknesses that undermine its reliability.
How can we rely on web publishing? In order to resolve this question, we need to examine what makes
people confident when reading conventional publications printed on paper, to investigate what attacks can
erode confidence in web publishing, and to understand
the nature of publishing in general.
In this dissertation, we examine security properties
and policy models, and their applicability to publishing. We then investigate the nature of publishing so that
we can extract its technical requirements. To help us
understand the practical mechanisms which might satisfy these requirements, some applications of electronic
publishing are discussed and some example mechanisms are presented.
We conclude that guaranteed integrity, verifiable authenticity and persistent availability of publications are
required to make web publishing more reliable. Hence
Abstract: A key task in computer graphics is the rendering of complex models. As a result, there exist a large
number of schemes for improving the speed of the rendering process, many of which involve displaying only
a simplified version of a model. When such a simplification is generated selectively, i.e. detail is only removed
in specific regions of a model, we term this selective
mesh refinement.
Selective mesh refinement can potentially produce a
model approximation which can be displayed at greatly
reduced cost while remaining perceptually equivalent to
a rendering of the original. For this reason, the field of
selective mesh refinement has been the subject of dramatically increased interest recently. The resulting selective refinement methods, though, are restricted in both
the types of model which they can handle and the form
of output meshes which they can generate.
Our primary thesis is that a selectively refined mesh
can be produced by combining fragments of approximations to a model without regard to the underlying approximation method. Thus we can utilise existing approximation techniques to produce selectively refined meshes in n-dimensions. This means that the capabilities and characteristics of standard approximation methods can be retained in our selectively refined
models.
We also show that a selectively refined approximation produced in this manner can be smoothly geometrically morphed into another selective refinement in order to satisfy modified refinement criteria. This geometric morphing is necessary to ensure that detail can be
added and removed from models which are selectively
refined with respect to their impact on the current view
frustum. For example, if a model is selectively refined
in this manner and the viewer approaches the model
then more detail may have to be introduced to the displayed mesh in order to ensure that it satisfies the new
refinement criteria. By geometrically morphing this introduction of detail we can ensure that the viewer is not
distracted by “popping” artifacts.
82
We have developed a novel framework within which
UCAM-CL-TR-493
these proposals have been verified. This framework
consists of a generalised resolution-based model repre- Giampaolo Bella:
sentation, a means of specifying refinement criteria and
Inductive verification of
algorithms which can perform the selective refinement
and geometric morphing tasks. The framework has al- cryptographic protocols
lowed us to demonstrate that these twin tasks can be
July 2000, 189 pages, PDF
performed both on the output of existing approximaPhD thesis (Clare College, March 2000)
tion techniques and with respect to a variety of refinement criteria.
Abstract: The dissertation aims at tailoring Paulson’s
A HTML version of this thesis is at Inductive Approach for the analysis of classical cryphttp://www.cl.cam.ac.uk/research/rainbow/publications/pjcb/thesis/
tographic protocols towards real-world protocols. The
aim is pursued by extending the approach with new
elements (e.g. timestamps and smart cards), new netUCAM-CL-TR-491
work events (e.g. message reception) and more expressive functions (e.g. agents’ knowledge). Hence, the aim
Anna Korhonen, Genevive Gorrell,
is achieved by analysing large protocols (Kerberos IV
Diana McCarthy:
and Shoup-Rubin), and by studying how to specify and
verify their goals.
Is hypothesis testing useful for
More precisely, the modelling of timestamps and of
subcategorization acquisition?
a discrete time are first developed on BAN Kerberos,
while comparing the outcomes with those of the BAN
May 2000, 9 pages, paper copy
logic. The machinery is then applied to Kerberos IV,
whose complicated use of session keys requires a dedUCAM-CL-TR-492
icated treatment. Three new guarantees limiting the
spy’s abilities in case of compromise of a specific session
key are established. Also, it is discovered that Kerberos
Paweł Tomasz Wojciechowski:
IV is subject to an attack due to the weak guarantees of
Nomadic Pict: language and
confidentiality for the protocol responder.
We develop general strategies to investigate the
infrastructure design for mobile
goals of authenticity, key distribution and non-injective
computation
agreement, which is a strong form of authentication.
These strategies require formalising the agents’ knowlJune 2000, 184 pages, PDF
edge of messages. Two approaches are implemented. If
PhD thesis (Wolfson College, March 2000)
an agent creates a message, then he knows all components of the message, including the cryptographic
Abstract: Mobile agents – units of executing compu- key that encrypts it. Alternatively, a broad definition of
tation that can migrate between machines – are likely agents’ knowledge can be developed if a new network
to become an important enabling technology for fu- event, message reception, is formalised.
ture distributed systems. We study the distributed inThe concept of smart card as a secure device that
frastructures required for location-independent com- can store long-term secrets and perform easy compumunication between migrating agents. These infrastruc- tations is introduced. The model cards can be stolen
tures are problematic: the choice or design of an in- and/or cloned by the spy. The kernel of their built-in alfrastructure must be somewhat application-specific – gorithm works correctly, so they spy cannot acquire unany given algorithm will only have satisfactory per- limited knowledge from their use. However, their funcformance for some range of migration and communi- tional interface is unreliable, so they send correct outcation behaviour; the algorithms must be matched to puts in an unspecified order. The provably secure prothe expected properties (and robustness demands) of tocol based on smart cards designed by Shoup & Ruapplications and the failure characteristic of the com- bin is mechanised. Some design weaknesses (unknown
munication medium. To study this problem we in- to the authors’ treatment by Bellare & Rogaway’s aptroduce an agent programming language – Nomadic proach) are unveiled, while feasible corrections are sugPict. It is designed to allow infrastructure algorithms gested and verified.
to be expressed clearly, as translations from a highWe realise that the evidence that a protocol achieves
level language to a lower level. The levels are based on its goals must be available to the peers. In consequence,
rigorously-defined process calculi, which provide sharp we develop a new a principle of prudent protocol delevels of abstraction. In this dissertation we describe the sign, goal availability, which holds of a protocol when
language and use it to develop a distributed infrastruc- suitable guarantees confirming its goals exist on asture for an example application. The language and ex- sumptions that both peers can verify. Failure to observe
amples have been implemented; we conclude with a de- our principle raises the risk of attacks, as is the case, for
scription of the compiler and runtime system.
example, of the attack on Kerberos IV.
83
UCAM-CL-TR-494
UCAM-CL-TR-496
Mark David Spiteri:
Gian Luca Cattani, James J. Leifer,
Robin Milner:
An architecture for the notification,
storage and retrieval of events
Contexts and embeddings for
closed shallow action graphs
July 2000, 165 pages, paper copy
PhD thesis (Darwin College, January 2000)
July 2000, 56 pages, PostScript
UCAM-CL-TR-495
UCAM-CL-TR-497
Mohammad S.M. Khorsheed:
G.M. Bierman, A. Trigoni:
Automatic recognition of words in
Arabic manuscripts
Towards a formal type system for
ODMG OQL
July 2000, 242 pages, PDF
September 2000, 20 pages, paper copy
PhD thesis (Churchill College, June 2000)
Abstract: The need to transliterate large numbers of
historic Arabic documents into machine-readable form
has motivated new work on offline recognition of Arabic script. Arabic script presents two challenges: orthography is cursive and letter shape is context sensitive.
This dissertation presents two techniques to achieve
high word recognition rates: the segmentation-free
technique and the segmentation-based technique. The
segmentation-free technique treats the word as a whole.
The word image is first transformed into a normalised
polar image. The two-dimensional Fourier transform
is then applied to the polar image. This results in a
Fourier spectrum that is invariant to dilation, translation, and rotation. The Fourier spectrum is used to
form the word template, or train the word model in the
template-based and the multiple hidden Markov model
(HMM) recognition systems, respectively. The recognition of an input word image is based on the minimum distance measure from the word templates and
the maximum likelihood probability for the word models.
The segmentation-based technique uses a single hidden Markov model, which is composed of multiple
character-models. The technique implements the analytic approach in which words are segmented into
smaller units, not necessarily characters. The word
skeleton is decomposed into a number of links in orthographic order, it is then transferred into a sequence of
discrete symbols using vector quantisation. the training
of each character-model is performed using either: state
assignment in the lexicon-driven configuration or the
Baum-Welch method in the lexicon-free configuration.
The observation sequence of the input word is given
to the hidden Markov model and the Viterbi algorithm
is applied to provide an ordered list of the candidate
recognitions.
UCAM-CL-TR-498
Peter Sewell:
Applied π – a brief tutorial
July 2000, 65 pages, PDF
Abstract: This note provides a brief introduction to
π-calculi and their application to concurrent and distributed programming. Chapter 1 introduces a simple
π-calculus and discusses the choice of primitives, operational semantics (in terms of reductions and of indexed early labelled transitions), operational equivalences, Pict-style programming and typing. Chapter 2
goes on to discuss the application of these ideas to
distributed systems, looking informally at the design
of distributed π-calculi with grouping and interaction
primitives. Chapter 3 returns to typing, giving precise definitions for a simple type system and soundness results for the labelled transition semantics. Finally, Chapters 4 and 5 provide a model development
of the metatheory, giving first an outline and then detailed proofs of the results stated earlier. The note can
be read in the partial order 1.(2+3+4.5).
UCAM-CL-TR-499
James Edward Gain:
Enhancing spatial deformation for
virtual sculpting
August 2000, 161 pages, PDF
PhD thesis (St John’s College, June 2000)
84
Abstract: The task of computer-based free-form shape
design is fraught with practical and conceptual difficulties. Incorporating elements of traditional clay sculpting has long been recognised as a means of shielding
a user from the complexities inherent in this form of
modelling. The premise is to deform a mathematicallydefined solid in a fashion that loosely simulates the
physical moulding of an inelastic substance, such as
modelling clay or silicone putty. Virtual sculpting combines this emulation of clay sculpting with interactive
feedback.
Spatial deformations are a class of powerful modelling techniques well suited to virtual sculpting. They
indirectly reshape an object by warping the surrounding space. This is analogous to embedding a flexible
shape within a lump of jelly and then causing distortions by flexing the jelly. The user controls spatial deformations by manipulating points, curves or a volumetric hyperpatch. Directly Manipulated Free-Form Deformation (DMFFD), in particular, merges the hyperpatchand point-based approaches and allows the user to pick
and drag object points directly.
This thesis embodies four enhancements to the versatility and validity of spatial deformation:
1. We enable users to specify deformations by manipulating the normal vector and tangent plane at a
point. A first derivative frame can be tilted, twisted and
scaled to cause a corresponding distortion in both the
ambient space and inset object. This enhanced control
is accomplished by extending previous work on bivariate surfaces to trivariate hyperpatches.
2. We extend DMFFD to enable curve manipulation
by exploiting functional composition and degree reduction. Although the resulting curve-composed DMFFD
introduces some modest and bounded approximation,
it is superior to previous curve-based schemes in other
respects. Our technique combines all three forms of
spatial deformation (hyperpatch, point and curve), can
maintain any desired degree of derivative continuity, is
amenable to the automatic detection and prevention of
self-intersection, and achieves interactive update rates
over the entire deformation cycle.
3. The approximation quality of a polygon-mesh
object frequently degrades under spatial deformation
to become either oversaturated or undersaturated with
polygons. We have devised an efficient adaptive mesh
refinement and decimation scheme. Our novel contributions include: incorporating fully symmetrical decimation, reducing the computation cost of the refinement/decimation trigger, catering for boundary and
crease edges, and dealing with sampling problems.
4. The potential self-intersection of an object is a
serious weakness in spatial deformation. We have developed a variant of DMFFD which guards against selfintersection by subdividing manipulations into injective
(one-to-one) mappings. This depends on three novel
contributions: analytic conditions for identifying selfintersection, and two injectivity tests (one exact but
computationally costly and the other approximate but
efficient).
UCAM-CL-TR-500
Jianxin Yan, Alan Blackwell, Ross Anderson,
Alasdair Grant:
The memorability and security of
passwords – some empirical results
September 2000, 13 pages, PDF
Abstract: There are many things that are ‘well known’
about passwords, such as that uers can’t remember
strong passwords and that the passwords they can remember are easy to guess. However, there seems to be a
distinct lack of research on the subject that would pass
muster by the standards of applied psychology.
Here we report a controlled trial in which, of four
sample groups of about 100 first-year students, three
were recruited to a formal experiment and of these
two were given specific advice about password selection. The incidence of weak passwords was determined
by cracking the password file, and the number of password resets was measured from system logs. We observed a number of phenomena which run counter to
the established wisdom. For example, passwords based
on mnemonic phrases are just as hard to crack as random passwords yet just as easy to remember as naive
user selections.
UCAM-CL-TR-501
David Ingram:
Integrated quality of service
management
September 2000, 90 pages, paper copy
PhD thesis (Jesus College, August 2000)
UCAM-CL-TR-502
Thomas Marthedal Rasmussen:
Formalizing basic number theory
September 2000, 20 pages, paper copy
UCAM-CL-TR-503
Alan Mycroft, Richard Sharp:
Hardware/software co-design using
functional languages
September 2000, 8 pages, PDF
85
Abstract: In previous work we have developed and
prototyped a silicon compiler which translates a functional language (SAFL) into hardware. Here we present
a SAFL-level program transformation which: (i) partitions a specification into hardware and software parts
and (ii) generates a specialised architecture to execute
the software part. The architecture consists of a number of interconnected heterogeneous processors. Our
method allows a large design space to be explored by
systematically transforming a single SAFL specification
to investigate different points on the area-time spectrum.
disambiguation tests was done on more than 60,000
noun instances from corpus texts of different types, and
60 blanks from real cloze texts. Results show that combining resources is useful for enriching lexical information, and hence making WSD more effective though not
completely. Also, different target words make different
demand on contextual information, and this interaction
is closely related to text types. Future work is suggested
for expanding the analysis on target nature and making
the combination of disambiguation evidence sensitive
to the requirements of the word being disambiguated.
UCAM-CL-TR-505
UCAM-CL-TR-504
Gian Luca Cattani, Peter Sewell:
Models for name-passing processes:
interleaving and causal
Oi Yee Kwong:
Word sense selection in texts:
an integrated model
September 2000, 42 pages, PDF
September 2000, 177 pages, PostScript
PhD thesis (Downing College, May 2000)
Abstract: Early systems for word sense disambiguation (WSD) often depended on individual tailor-made
lexical resources, hand-coded with as much lexical information as needed, but of severely limited vocabulary size. Recent studies tend to extract lexical information from a variety of existing resources (e.g.
machine-readable dictionaries, corpora) for broad coverage. However, this raises the issue of how to combine
the information from different resources.
Thus while different types of resource could make
different contribution to WSD, studies to date have not
shown what contribution they make, how they should
be combined, and whether they are equally relevant to
all words to be disambiguated. This thesis proposes an
Integrated Model as a framework to study the interrelatedness of three major parameters in WSD: Lexical
Resource, Contextual Information, and Nature of Target Words. We argue that it is their interaction which
shapes the effectiveness of any WSD system.
A generalised, structurally-based sense-mapping algorithm was designed to combine various types of
lexical resource. This enables information from these
resources to be used simultaneously and compatibly,
while respecting their distinctive structures. In studying the effect of context on WSD, different semantic
relations available from the combined resources were
used, and a recursive filtering algorithm was designed
to overcome combinatorial explosion. We then investigated, from two directions, how the target words themselves could affect the usefulness of different types of
knowledge. In particular, we modelled WSD with the
cloze test format, i.e. as texts with blanks and all senses
for one specific word as alternative choices for filling
the blank.
A full-scale combination of WordNet and Roget’s
Thesaurus was done, linking more than 30,000 senses.
Using these two resources in combination, a range of
Abstract: We study syntax-free models for namepassing processes. For interleaving semantics, we identify the indexing structure required of an early labelled
transition system to support the usual π-calculus operations, defining Indexed Labelled Transition Systems.
For noninterleaving causal semantics we define Indexed
Labelled Asynchronous Transition Systems, smoothly
generalizing both our interleaving model and the standard Asynchronous Transition Systems model for CCSlike calculi. In each case we relate a denotational semantics to an operational view, for bisimulation and
causal bisimulation respectively. We establish completeness properties of, and adjunctions between, categories
of the two models. Alternative indexing structures and
possible applications are also discussed. These are first
steps towards a uniform understanding of the semantics
and operations of name-passing calculi.
UCAM-CL-TR-506
Peter Sewell:
Modules, abstract types,
and distributed versioning
September 2000, 46 pages, PDF
Abstract: In a wide-area distributed system it is often
impractical to synchronise software updates, so one
must deal with many coexisting versions. We study
static typing support for modular wide-area programming, modelling separate compilation/linking and execution of programs that interact along typed channels.
Interaction may involve communication of values of
abstract types; we provide the developer with fine-grain
versioning control of these types to support interoperation of old and new code. The system makes use of
a second-class module system with singleton kinds; we
give a novel operational semantics for separate compilation/linking and execution and prove soundness.
86
UCAM-CL-TR-507
Lawrence Paulson:
Mechanizing a theory of program
composition for UNITY
November 2000, 28 pages, PDF
Abstract: Compositional reasoning must be better understood if non-trivial concurrent programs are to be
verified. Chandy and Sanders [2000] have proposed a
new approach to reasoning about composition, which
Charpentier and Chandy [1999] have illustrated by developing a large example in the UNITY formalism.
The present paper describes extensive experiments on
mechanizing the compositionality theory and the example, using the proof tool Isabelle. Broader issues are
discussed, in particular, the formalization of program
states. The usual representation based upon maps from
variables to values is contrasted with the alternatives,
such as a signature of typed variables. Properties need
to be transferred from one program component’s signature to the common signature of the system. Safety
properties can be so transferred, but progress properties cannot be. Using polymorphism, this problem can
be circumvented by making signatures sufficiently flexible. Finally the proof of the example itself is outlined.
UCAM-CL-TR-508
James Leifer, Robin Milner:
Shallow linear action graphs and
their embeddings
October 2000, 16 pages, PostScript
UCAM-CL-TR-509
Wojciech Basalaj:
Proximity visualisation of abstract
data
January 2001, 117 pages, PDF
PhD thesis (October 2000)
Abstract: Data visualisation is an established technique
for exploration, analysis and presentation of data. A
graphical presentation is generated from the data content, and viewed by an observer, engaging vision – the
human sense with the greatest bandwidth, and the ability to recognise patterns subconciously. For instance, a
correlation present between two variables can be elucidated with a scatter plot. An effective visualisation
can be difficult to achieve for an abstract collection of
objects, e.g. a database table with many attributes, or
a set of multimedia documents, since there is no immediately obvious way of arranging the objects based
on their content. Thankfully, similarity between pairs
of elements of such a collection can be measured, and
a good overview picture should respect this proximity
information, by positioning similar elements close to
one another, and far from dissimilar objects. The resulting proximity visualisation is a topology preserving
map of the underlying data collection, and this work investigates various methods for generating such maps. A
number of algorithms are devised, evaluated quantitatively by means of statistical inference, and qualitatively
in a case study for each type of data collection. Other
graphical representations for abstract data are surveyed
and compared to proximity visualisation.
A standard method for modelling prximity relations
is multidimensional scaling (MDS) analysis. The result
is usually a two- or three-dimensional configuration of
points – each representing a single element from a collection., with inter-point distances approximating the
corresponding proximities. The quality of this approximation can be expressed as a loss function, and the
optimal arrangement can be found by minimising it numerically – a procedure known as least-squares metric
MDS. This work presents a number of algorithmic instances of this problem, using established function optimisation heuristics: Newton-Raphson, Tabu Search,
Genetic Algorithm, Iterative Majorization, and Stimulated annealing. Their effectiveness at minimising the
loss function is measured for a representative sample of
data collections, and the relative ranking established.
The popular classical scaling method serves as a benchmark for this study.
The computational cost of conventional MDS
makes it unsuitable for visualising a large data collection. Incremental multidimensional scaling solves this
problem by considering only a carefully chosen subset
of all pairwise proximities. Elements that make up cluster diameters at a certain level of the single link cluster hierarchy are identified, and are subject to standard
MDS, in order to establish the overall shape of the configuration. The remaining elements are positioned independently of one another with respect to this skeleton configuration. For very large collections the skeleton configuration can itself be built up incrementally.
The incremental method is analysed for the compromise between solution quality and the proportion of
proximities used, and compared to Principal Components Analysis on a number of large database tables.
In some applications it is convenient to represent individual objects by compact icons of fixed size, for example the use of thumbnails when visualising a set of
images. Because the MDS analysis only takes the position of icons into account, and not their size, its direct
use for visualisation may lead to partial or complete
overlap of icons. Proximity grid – an analogue of MDS
in a discrete domain – is proposed to overcome this deficiency. Each element of an abstract data collection is
represented within a single cell of the grid, and thus
considerable detail can be shown without overlap. The
proximity relationships are preserved by clustering similar elements in the grid, and keeping dissimilar ones
87
apart. Algorithms for generating such an arrangement mapping specifies the information content of each view.
are presented and compared in terms of output quality The second mapping specifies a graphical representato one another as well as standard MDS.
tion of the information, and a third mapping specifies
the graphical components that make up the graphical
representation. By combining different mappings, comUCAM-CL-TR-510
pletely different views can be generated.
The approach has been implemented in Prolog to
Richard Mortier, Rebecca Isaacs, Keir Fraser:
provide a very high level specification language for information visualization, and a knowledge engineering
Switchlets and resource-assured
environment that allows data queries to tailor the inforMPLS networks
mation in a view. The output is generated by a graphical
constraint solver that assembles the graphical compoMay 2000, 16 pages, PDF
nents into a scene.
This system provides a framework for SV called
Abstract: MPLS (Multi-Protocol Label Switching) is a
technology with the potential to support multiple con- Vmax. Source code and run-time data are analyzed by
trol systems, each with guaranteed QoS (Quality of Ser- Prolog to provide access to information about the provice), on connectionless best-effort networks. However, gram structure and run-time data for a wide range of
it does not provide all the capabilities required of a highly interconnected browsable views. Different views
multi-service network. In particular, although resource- and means of visualization can be selected from menus.
assured VPNs (Virtual Private Networks) can be cre- An automatic legend describes each view, and can be
ated, there is no provision for inter-VPN resource man- interactively modified to customize how data is preagement. Control flexibility is limited because resources sented. A text window for editing source code is synmust be pinned down to be guaranteed, and best-effort chronized with the graphical view. Vmax is a complete
flows in different VPNs compete for the same resources, Java development environment and end user SV system.
Vmax compares favourably to existing SV systems
leading to QoS crosstalk.
in
many
taxonometric criteria, including automation,
The contribution of this paper is an implementation
scope,
information
content, graphical output form,
on MPLS of a network control framework that supspecification,
tailorability,
navigation, granularity and
ports inter-VPN resource management. Using resource
elision
control.
The
performance
and scalability of the
partitions known as switchlets, it allows the creation
new
approach
is
very
reasonable.
of multiple VPNs with guaranteed resource allocations,
We conclude that Prolog provides a formal and high
and maintains isolation between these VPNs. Devolved
level
specification language that is suitable for specifycontrol techniques permit each VPN a customised coning
all
aspects of a SV system.
trol system.
We motivate our work by discussing related efforts
UCAM-CL-TR-512
and example scenarios of effective deployment of our
system. The implementation is described and evaluated,
and we address interoperability with external IP control Anthony Fox:
systems, in addition to interoperability of data across
different layer 2 technologies.
An algebraic framework
for modelling and verifying
microprocessors using HOL
UCAM-CL-TR-511
Calum Grant:
March 2001, 24 pages, PDF
Software visualization in Prolog
December 1999, 193 pages, PDF
PhD thesis (Queens’ College, 1999)
Abstract: Software visualization (SV) uses computer
graphics to communicate the structure and behaviour
of complex software and algorithms. One of the important issues in this field is how to specify SV, because
existing systems are very cumbersome to specify and
implement, which limits their effectiveness and hinders
SV from being integrated into professional software development tools.
In this dissertation the visualization process is decomposed into a series of formal mappings, which provides a formal foundation, and allows separate aspects
of visualization to be specified independently. The first
Abstract: This report describes an algebraic approach
to the specification and verification of microprocessor
designs. Key results are expressed and verified using
the HOL proof tool. Particular attention is paid to the
models of time and temporal abstraction, culminating
in a number of one-step theorems. This work is then
explained with a small but complete case study, which
verifies the correctness of a datapath with microprogram control.
UCAM-CL-TR-513
Tetsuya Sakai, Karen Spärck Jones:
88
Generic summaries for indexing
in information retrieval –
Detailed test results
May 2001, 29 pages, PostScript
Abstract: This paper examines the use of generic summaries for indexing in information retrieval. Our main
observations are that:
– With or without pseudo-relevance feedback, a
summary index may be as effective as the corresponding fulltext index for precision-oriented search of highly
relevant documents. But a reasonably sophisticated
summarizer, using a compression ratio of 10–30%, is
desirable for this purpose.
– In pseudo-relevance feedback, using a summary
index at initial search and a fulltext index at final search
is possibly effective for precision-oriented search, regardless of relevance levels. This strategy is significantly
more effective than the one using the summary index
only and probably more effective than using summaries
as mere term selection filters. For this strategy, the summary quality is probably not a critical factor, and a
compression ratio of 5–10% appears best.
UCAM-CL-TR-514
Asis Unyapoth:
Nomadic π-calculi: Expressing
and verifying communication
infrastructure for mobile computation
June 2001, 316 pages, PDF
PhD thesis (Pembroke College, March 2001)
Abstract: This thesis addresses the problem of verifying distributed infrastructure for mobile computation.
In particular, we study language primitives for communication between mobile agents. They can be classified into two groups. At a low level there are “location dependent” primitives that require a programmer to know the current site of a mobile agent in order to communicate with it. At a high level there are
“location independent” primitives that allow communication with a mobile agent irrespective of any migrations. Implementation of the high level requires delicate
distributed infrastructure algorithms. In earlier work of
Sewell, Wojciechowski and Pierce, the two levels were
made precise as process calculi, allowing such algorithms to be expressed as encodings of the high level
into the low level; a distributed programming language
“Nomadic Pict” has been built for experimenting with
such encodings.
This thesis turns to semantics, giving a definition of
the core language (with a type system) and proving correctness of an example infrastructure. This involves extending the standard semantics and proof techniques
of process calculi to deal with the new notions of sites
and agents. The techniques adopted include labelled
transition semantics, operational equivalences and preorders (e.g., expansion and coupled simulation), “up
to” equivalences, and uniform receptiveness. We also
develop two novel proof techniques for capturing the
design intuitions regarding mobile agents: we consider
“translocating” versions of operational equivalences
that take migration into account, allowing compositional reasoning; and “temporary immobility”, which
captures the intuition that while an agent is waiting for
a lock somewhere in the system, it will not migrate.
The correctness proof of an example infrastructure
is non-trivial. It involves analysing the possible reachable states of the encoding applied to an arbitrary highlevel source program. We introduce an intermediate
language for factoring out as many ‘house-keeping’ reduction steps as possible, and focusing on the partiallycommitted steps.
UCAM-CL-TR-515
Andrei Serjantov, Peter Sewell,
Keith Wansbrough:
The UDP calculus:
rigorous semantics for real
networking
July 2001, 70 pages, PostScript
UCAM-CL-TR-516
Rebecca Isaacs:
Dynamic provisioning of
resource-assured and programmable
virtual private networks
September 2001, 145 pages, PostScript
PhD thesis (Darwin College, December 2000)
Abstract: Virtual Private Networks (VPNs) provide
dedicated connectivity to a closed group of users on
a shared network. VPNs have traditionally been deployed for reasons of economy of scale, but have either
been statically defined, requiring manual configuration,
or else unable to offer any quality of service (QoS) guarantees.
This dissertation describes VServ, a service offering dynamic and resource-assured VPNs that can be
acquired and modified on demand. In VServ, a VPN
is both a subset of physical resources, such as bandwidth and label space, together with the means to perform fine-grained management of those resources. This
network programmability, combined with QoS guarantees, enables the multiservice network – a single universal network that can support all types of service and
thus be efficient, cost-effective and flexible.
89
VServ is deployed over a network control framework known as Tempest. The Tempest explicitly distinguishes between inter- and intra-VPN resource management mechanisms. This makes the dynamic resource
reallocation capabilities of VServ viable, whilst handling highly dynamic VPNs or a large number of VPNs.
Extensions to the original implementation of the Tempest to support dynamically reconfigurable QoS are detailed.
A key part of a dynamic and responsive VPN service is fully automated VPN provisioning. A notation
for VPN specification is described, together with mechanisms for incorporating policies of the service provider
and the current resource availability in the network into
the design process. The search for a suitable VPN topology can be expressed as a optimisation problem that
is not computationally tractable except for very small
networks. This dissertation describes how the search is
made practical by tailoring it according to the characteristics of the desired VPN.
Availability of VServ is addressed with a proposal
for distributed VPN creation. A resource revocation
protocol exploits the dynamic resource management
capabilities of VServ to allow adaptation in the control plane on a per-VPN basis. Managed resource revocation supports highly flexible resource allocation and
reallocation policies, allowing VServ to efficiently provision for short-lived or highly dynamic VPNs.
UCAM-CL-TR-518
Jeff Jianxin Yan, Yongdong Wu:
An attack on a traitor tracing scheme
July 2001, 14 pages, PDF
Abstract: In Crypto’99, Boneh and Franklin proposed
a public key traitor tracing scheme, which was believed
to be able to catch all traitors while not accusing any
innocent users (i.e., full-tracing and error-free). Assuming that Decision Diffie-Hellman problem is unsolvable
in Gq, Boneh and Franklin proved that a decoder cannot distinguish valid ciphertexts from invalid ones that
are used for tracing. However, our novel pirate decoder
P3 manages to make some invalid ciphertexts distinguishable without violating their assumption, and it can
also frame innocent user coalitions to fool the tracer.
Neither the single-key nor arbitrary pirate tracing algorithm presented in [1] can identify all keys used by P3
as claimed. Instead, it is possible for both algorithms to
catch none of the traitors. We believe that the construction of our novel pirate also demonstrates a simple way
to defeat some other black-box traitor tracing schemes
in general.
UCAM-CL-TR-519
Martin Choquette:
UCAM-CL-TR-517
Local evidence in document retrieval
Karen Spärck Jones, P. Jourlin, S.E. Johnson,
P.C. Woodland:
August 2001, 177 pages, paper copy
PhD thesis (Fitzwilliam College, November 2002)
The Cambridge Multimedia
Document Retrieval Project:
summary of experiments
UCAM-CL-TR-520
Mohamed Hassan, Neil A. Dodgson:
July 2001, 30 pages, PostScript
Abstract: This report summarises the experimental
work done under the Multimedia Document Retrieval
(MDR) project at Cambridge from 1997-2000, with selected illustrations. The focus is primarily on retrieval
studies, and on speech tests directly related to retrieval,
not on speech recognition itself. The report draws on
the many and varied tests done during the project, but
also presents a new series of results designed to compare strategies across as many different data sets as possible by using consistent system parameter settings.
The project tests demonstrate that retrieval from
files of audio news material transcribed using a state of
the art speech recognition system can match the reference level defined by human transcriptions; and that expansion techniques, especially when applied to queries,
can be very effective means for improving basic search
performance.
Ternary and three-point univariate
subdivision schemes
September 2001, 8 pages, PDF
Abstract: The generating function formalism is used to
analyze the continuity properties of univariate ternary
subdivision schemes. These are compared with their binary counterparts.
UCAM-CL-TR-521
James Leifer:
Operational congruences for
reactive systems
September 2001, 144 pages, PostScript
PhD thesis (Trinity College, March 2001)
90
UCAM-CL-TR-522
Mark F.P. Gillies:
Practical behavioural animation
based on vision and attention
September 2001, 187 pages, PDF
Abstract: The animation of human like characters is a
vital aspect of computer animation. Most animations
rely heavily on characters of some sort or other. This
means that one important aspect of computer animation research is to improve the animation of these characters both by making it easier to produce animations
and by improving the quality of animation produced.
One approach to animating characters is to produce a
simulation of the behaviour of the characters which will
automatically animate the character.
The dissertation investigates the simulation of behaviour in practical applications. In particular it focuses on models of visual perception for use in simulating human behaviour. A simulation of perception is
vital for any character that interacts with its surroundings. Two main aspects of the simulation of perception
are investigated:
– The use of psychology for designing visual algorithms.
– The simulation of attention in order to produce
both behaviour and gaze patterns.
Psychological theories are a useful starting point
for designing algorithms for simulating visual perception. The dissertation investigates their use and presents
some algorithms based on psychological theories.
Attention is the focusing of a person’s perception on
a particular object. The dissertation presents a simulation of what a character is attending to (looking at).
This is used to simulate behaviour and for animating
eye movements.
The algorithms for the simulation of vision and attention are applied to two tasks in the simulation of
behaviour. The first is a method for designing generic
behaviour patterns from simple pieces of motion. The
second is a behaviour pattern for navigating a cluttered
environment. The simulation of vision and attention
gives advantages over existing work on both problems.
The approaches to the simulation of perception will be
evaluated in the context of these examples.
UCAM-CL-TR-523
Robin Milner:
Bigraphical reactive systems:
basic theory
September 2001, 87 pages, PDF
Abstract: A notion of bigraph is proposed as the basis
for a model of mobile interaction. A bigraph consists
of two independent structures: a topograph representing locality and a monograph representing connectivity. Bigraphs are equipped with reaction rules to form
bigraphical reactive systems (BRSs), which include versions of the π-calculus and the ambient calculus. Bigraphs are shown to be a special case of a more abstract notion, wide reactive systems (WRSs), not assuming any particular graphical or other structure but
equipped with a notion of width, which expresses that
agents, contexts and reactions may all be widely distributed entities.
A behavioural theory is established for WRSs using the categorical notion of relative pushout; it allows
labelled transition systems to be derived uniformly, in
such a way that familiar behavioural preorders and
equivalences, in particular bisimilarity, are congruential
under certain conditions. Then the theory of bigraphs
is developed, and they are shown to meet these conditions. It is shown that, using certain functors, other
WRSs which meet the conditions may also be derived;
these may, for example, be forms of BRS with additional structure.
Simple examples of bigraphical systems are discussed; the theory is developed in a number of ways
in preparation for deeper application studies.
UCAM-CL-TR-524
Giampaolo Bella, Fabio Massacci,
Lawrence C. Paulson:
Verifying the SET purchase protocols
November 2001, 14 pages, PDF
Abstract: The Secure Electronic Transaction (SET) protocol has been proposed by a consortium of credit card
companies and software corporations to guarantee the
authenticity of e-commerce transactions and the confidentiality of data. When the customer makes a purchase, the SET dual signature keeps his account details
secret from the merchant and his choice of goods secret from the bank. This paper reports verification results for the purchase step of SET, using the inductive
method. The credit card details do remain confidential.
The customer, merchant and bank can confirm most details of a transaction even when some of those details
are kept from them. The usage of dual signatures requires repetition in protocol messages, making proofs
more difficult but still feasible. The formal analysis has
revealed a significant defect. The dual signature lacks
explicitness, giving rise to potential vulnerabilities.
UCAM-CL-TR-525
Timothy L. Harris:
Extensible virtual machines
December 2001, 209 pages, PDF
PhD thesis
91
Abstract: Virtual machines (VMs) have enjoyed a resurgence as a way of allowing the same application program to be used across a range of computer systems.
This flexibility comes from the abstraction that the provides over the native interface of a particular computer.
However, this also means that the application is prevented from taking the features of particular physical
machines into account in its implementation.
This dissertation addresses the question of why,
where and how it is useful, possible and practicable
to provide an application with access to lower-level interfaces. It argues that many aspects of implementation can be devolved safely to untrusted applications
and demonstrates this through a prototype which allows control over run-time compilation, object placement within the heap and thread scheduling. The proposed architecture separates these application-specific
policy implementations from the application itself. This
allows one application to be used with different policies
on different systems and also allows naı̈ve or premature
optimizations to be removed.
UCAM-CL-TR-526
Andrew J. Penrose:
Extending lossless image compression
December 2001, 137 pages, PDF
PhD thesis (Gonville & Caius College, January 2001)
Abstract: “It is my thesis that worthwhile improvements can be made to lossless image compression
schemes, by considering the correlations between the
spectral, temporal and interview aspects of image data,
in extension to the spatial correlations that are traditionally exploited.”
Images are an important part of today’s digital
world. However, due to the large quantity of data
needed to represent modern imagery the storage of such
data can be expensive. Thus, work on efficient image
storage (image compression) has the potential to reduce
storage costs and enable new applications.
Many image compression schemes are lossy; that is
they sacrifice image informationto achieve very compact storage. Although this is acceptable for many applications, some environments require that compression not alter the image data. This lossless image compression has uses in medical, scientific and professional
video processing applications.
Most of the work on lossless image compression
has focused on monochrome images and has made
use of the spatial smoothness of image data. Only recently have researchers begun to look specifically at
the lossless compression of colour images and video.
By extending compression schemes for colour images
and video, the storage requirements for these important
classes of image data can be further reduced.
Much of the previous research into lossless colour
image and video compression has been exploratory.
This dissertation studies the problem in a structured
way. Spatial, spectral and temporal correlations are
all considered to facilitate improved compression. This
has lead to a greater data reduction than many existing schemes for lossless colour image and colour video
compression.
Furthermore, this work has considered the application of extended lossless image coding to more recent
image types, such as multiview imagery. Thus, systems
that use multiple views of the same scene to provide 3D
viewing, have beenprovided with a completely novel solution for the compression of multiview colour video.
UCAM-CL-TR-527
Umar Saif:
Architectures for ubiquitous systems
January 2002, 271 pages, PDF
PhD thesis
Abstract: Advances in digital electronics over the
last decade have made computers faster, cheaper and
smaller. This coupled with the revolution in communication technology has led to the development of sophisticated networked appliances and handheld devices.
“Computers” are no longer boxes sitting on a desk,
they are all around us, embedded in every nook and
corner of our environment. This increasing complexity in our environment leads to the desire to design a
system that could allow this pervasive functionality to
disappear in the infrastructure, automatically carrying
out everyday tasks of the users.
Such a system would enable devices embedded in the
environment to cooperate with one another to make a
wide range of new and useful applications possible, not
originally conceived by the manufacturer, to achieve
greater functionality, flexibility and utility.
The compelling question then becomes “what software needs to be embedded in these devices to enable
them to participate in such a ubiquitous system”? This
is the question addressed by the dissertation.
Based on the experience with home automation systems, as part of the AutoHAN project, the dissertation
presents two compatible but different architectures; one
to enable dumb devices to be controlled by the system
and the other to enable intelligent devices to control,
extend and program the system.
Control commands for dumb devices are managed
using an HTTP-based publish/subscribe/notify architecture; devices publish their control commands to the
system as XML-typed discrete messages, applications
discover and subscribe interest in these events to send
and receive control commands from these devices, as
typed messages, to control their behavior. The architecture handles mobility and failure of devices by using soft-state, redundent subscriptions and “care-of”
nodes. The system is programmed with event scripts
that encode automation rules as condition-action bindings. Finally, the use of XML and HTTP allows devices
to be controlled by a simple Internet browser.
92
While the publish/subscribe/notify defines a simple
architecture to enable interoperability of limited capability devices, intelligent devices can afford more complexity that can be utilized to support user applications
and services to control, manage and program the system. However, the operating system embedded in these
devices needs to address the heterogeneity, longevity,
mobility and dynamism of the system.
The dissertation presents the architecture of an embedded distributed operating system that lends itself
to safe context-driven adaptation. The operating system is instrumented with four artifacts to address the
challenges posed by a ubiquitous system. 1) An XMLbased directory service captures and notifies the applications and services about changes in the device context, as resources move, fail, leave or join the system, to
allow context-driven adaptation. 2) A Java-based mobile agent system allows new software to be injected in
the system and moved and replicated with the changing
characteristics of the system to define a self-organizing
system. 3) A subscribe/notify interface allows contextspecific extensions to be dynamically added to the operating system to enable it to efficiently interoperate in its
current context according to application requirements.
4) Finally, a Dispatcher module serves as the contextaware system call interface for the operating system;
when requested to invoke a service, the Dispatcher invokes the resource that best satisfies the requirements
given the characteristics of the system.
Definition alone is not sufficient to prove the validity of an architecture. The dissertation therefore describes a prototype implementation of the operating
system and presents both a quantitative comparison of
its performance with related systems and its qualitative
merit by describing new applications made possible by
its novel architecture.
UCAM-CL-TR-528
Andrew William Moore:
Measurement-based management of
network resources
April 2002, 273 pages, PDF
PhD thesis
measurements, these management techniques have relied upon the accurate characterisation of traffic – without accurate traffic characterisation, network resources
may be under or over utilised.
Embracing Measurement-Based Estimation in admission control, Measurement-Based Admission Control (MBAC) algorithms have allowed characterisation
of new traffic flows while adapting to changing flow
requirements. However, there have been many MBAC
algorithms proposed, often with no clear differentiation
between them. This has motivated the need for a realistic, implementation-based comparison in order to identify an ideal MBAC algorithm.
This dissertation reports on an implementationbased comparison of MBAC algorithms conducted
using a purpose built test environment. The use of
an implementation-based comparison has allowed the
MBAC algorithms to be tested under realistic conditions of traffic load and realistic limitations on memory,
computational resources and measurements. Alongside
this comparison is a decomposition of a group of
MBAC algorithms, illustrating the relationship among
MBAC algorithm components, as well as highlighting
common elements among different MBAC algorithms.
The MBAC algorithm comparison reveals that,
while no single algorithm is ideal, the specific resource
demands, such as computation overheads, can dramatically impact on the MBAC algorithm’s performance.
Further, due to the multiple timescales present in both
traffic and management, the estimator of a robust
MBAC algorithm must base its estimate on measurements made over a wide range of timescales. Finally, a
reliable estimator must account for the error resulting
from random properties of measurements.
Further identifying that the estimator components
used in MBAC algorithms need not be tied to the admission control problem, one of the estimators (originally constructed as part of an MBAC algorithm)
is used to continuously characterise resource requirements for a number of classes of traffic. Continuous
characterisation of traffic, whether requiring similar
or orthogonal resources, leads to the construction and
demonstration of a network switch that is able to provide differentiated service while being adaptive to the
demands of each traffic class. The dynamic allocation
of resources is an approach unique to a measurementbased technique that would not be possible if resources
were based upon static declarations of requirement.
Abstract: Measurement-Based Estimators are able to
characterise data flows, enabling improvements to exUCAM-CL-TR-529
isting management techniques and access to previously
impossible management techniques. It is the thesis of
this dissertation that in addition to making practical Neil Johnson:
adaptive management schemes, measurement-based estimators can be practical within current limitations of The triVM intermediate language
resource.
reference manual
Examples of network management include the characterisation of current utilisation for explicit admission February 2002, 83 pages, PDF
control and the configuration of a scheduler to divide This research was sponsored by a grant from ARM
link-capacity among competing traffic classes. Without Limited.
93
Abstract: The triVM intermediate language has been
developed as part of a research programme concentrating on code space optimization. The primary aim
in developing triVM is to provide a language that removes the complexity of high-level languages, such as
C or ML, while maintaining sufficient detail, at as simple a level as possible, to support reseach and experimentation into code size optimization. The basic structure of triVM is a notional Static Single Assignmentbased three-address machine. A secondary aim is to
develop an intermediate language that supports graphbased translation, using graph rewrite rules, in a textual, human-readable format. Experience has shown
that text-format intermediate files are much easier to
use for experimentation, while the penalty in translating this human-readable form to the internal data structures used by the software is negligible. Another aim is
to provide a flexible language in which features and innovations can be evaluated; for example, this is one of
the first intermediate languages directly based on the
Static Single Assignment technique, and which explicitly exposes the condition codes as a result of arithmetic
operations. While this paper is concerned solely with
the description of triVM, we present a brief summary
of other research-orientated intermediate languages.
We explore whether more accurate estimates could
be obtained by basing them on linguistic verb classes.
Experiments are reported which show that in terms
of SCF distributions, individual verbs correlate more
closely with syntactically similar verbs and even more
closely with semantically similar verbs, than with all
verbs in general. On the basis of this result, we suggest
classifying verbs according to their semantic classes and
obtaining back-off estimates specific to these classes.
We propose a method for obtaining such semantically based back-off estimates, and a novel approach
to hypothesis selection which makes use of these estimates. This approach involves automatically identifying the semantic class of a predicate, using subcategorization acquisition machinery to hypothesise conditional SCF distribution for the predicate, smoothing the
conditional distribution with the back-off estimates of
the respective semantic verb class, and employing a simple method for filtering, which uses a threshold on the
estimates from smoothing. Adopting Briscoe and Carroll’s (1997) system as a framework, we demonstrate
that this semantically-driven approach to hypothesis selection can significantly improve the accuracy of largescale subcategorization acquisition.
UCAM-CL-TR-531
UCAM-CL-TR-530
Giampaolo Bella, Fabio Massacci,
Lawrence C. Paulson:
Anna Korhonen:
Subcategorization acquisition
Verifying the SET registration
protocols
February 2002, 189 pages, PDF
PhD thesis (Trinity Hall, September 2001)
March 2002, 24 pages, PDF
Abstract: Manual development of large subcategorised
lexicons has proved difficult because predicates change
behaviour between sublanguages, domains and over
time. Yet access to a comprehensive subcategorization
lexicon is vital for successful parsing capable of recovering predicate-argument relations, and probabilistic parsers would greatly benefit from accurate information concerning the relative likelihood of different
subcategorisation frames SCFs of a given predicate.
Acquisition of subcategorization lexicons from textual
corpora has recently become increasingly popular. Although this work has met with some success, resulting lexicons indicate a need for greater accuracy. One
significant source of error lies in the statistical filtering
used for hypothesis selection, i.e. for removing noise
from automatically acquired SCFs.
This thesis builds on earlier work in verbal subcategorization acquisition, taking as a starting point
the problem with statistical filtering. Our investigation
shows that statistical filters tend to work poorly because not only is the underlying distribution zipfian,
but there is also very little correlation between conditional distribution of SCFs specific to a verb and unconditional distribution regardless of the verb. More accurate back-off estimates are needed for SCF acquisition
than those provided by unconditional distribution.
Abstract: SET (Secure Electronic Transaction) is an immense e-commerce protocol designed to improve the security of credit card purchases. In this paper we focus
on the initial bootstrapping phases of SET, whose objective is the registration of customers and merchants
with a SET certification authority. The aim of registration is twofold: getting the approval of the cardholder’s
or merchant’s bank, and replacing traditional credit
card numbers with electronic credentials that customers
can present to the merchant, so that their privacy is protected. These registration sub-protocols present a number of challenges to current formal verification methods. First, they do not assume that each agent knows
the public keys of the other agents. Key distribution
is one of the protocols’ tasks. Second, SET uses complex encryption primitives (digital envelopes) which introduce dependency chains: the loss of one secret key
can lead to potentially unlimited losses. Building upon
our previous work, we have been able to model and
formally verify SET’s registration with the inductive
method in Isabelle/HOL solving its challenges with very
general techniques.
94
UCAM-CL-TR-532
Richard Mortier:
Internet traffic engineering
April 2002, 129 pages, PDF
PhD thesis (Churchill College, October 2001)
Abstract: Due to the dramatically increasing popularity
of the services provided over the public Internet, problems with current mechanisms for control and management of the Internet are becoming apparent. In particular, it is increasingly clear that the Internet and other
networks built on the Internet protocol suite do not
provide sufficient support for the efficient control and
management of traffic, i.e. for Traffic Engineering.
This dissertation addresses the problem of traffic engineering in the Internet. It argues that traffic
management techniques should be applied at multiple
timescales, and not just at data timescales as is currently
the case. It presents and evaluates mechanisms for traffic engineering in the Internet at two further timescales:
flow admission control and control of per-flow packet
marking, enabling control timescale traffic engineering;
and support for load based inter-domain routeing in
the Internet, enabling management timescale traffic engineering.
This dissertation also discusses suitable policies for
the application of the proposed mechanisms. It argues
that the proposed mechanisms are able to support a
wide range of policies useful to both users and operators. Finally, in a network of the size of the Internet
consideration must also be given to the deployment of
proposed solutions. Consequently, arguments for and
against the deployment of these mechanisms are presented and the conclusion drawn that there are a number of feasible paths toward deployment.
The work presented argues the following: firstly, it is
possible to implement mechanisms within the Internet
framework that enable traffic engineering to be carried
out by operators; secondly, that applying these mechanisms with suitable policies can ease the management
problems faced by operators and at the same time improve the efficiency with which the network can be run;
thirdly, that these improvements can correspond to increased network performance as viewed by the user;
and finally, that not only the resulting deployment but
also the deployment process itself are feasible.
UCAM-CL-TR-533
Aline Villavicencio:
The acquisition of a unification-based
generalised categorial grammar
April 2002, 223 pages, PDF
PhD thesis (Hughes Hall, September 2001)
Abstract: The purpose of this work is to investigate
the process of grammatical acquisition from data. In
order to do that, a computational learning system is
used, composed of a Universal Grammar with associated parameters, and a learning algorithm, following
the Principles and Parameters Theory. The Universal
Grammar is implemented as a Unification-Based Generalised Categorial Grammar, embedded in a default
inheritance network of lexical types. The learning algorithm receives input from a corpus of spontaneous
child-directed transcribed speech annotated with logical forms and sets the parameters based on this input.
This framework is used as a basis to investigate several aspects of language acquisition. In this thesis I concentrate on the acquisition of subcategorisation frames
and word order information, from data. The data to
which the learner is exposed can be noisy and ambiguous, and I investigate how these factors affect the learning process. The results obtained show a robust learner
converging towards the target grammar given the input data available. They also show how the amount
of noise present in the input data affects the speed of
convergence of the learner towards the target grammar.
Future work is suggested for investigating the developmental stages of language acquisition as predicted by
the learning model, with a thorough comparison with
the developmental stages of a child. This is primarily
a cognitive computational model of language learning
that can be used to investigate and gain a better understanding of human language acquisition, and can potentially be relevant to the development of more adaptive NLP technology.
UCAM-CL-TR-534
Austin Donnelly:
Resource control in network elements
April 2002, 183 pages, PDF
PhD thesis (Pembroke College, January 2002)
Abstract: Increasingly, substantial data path processing
is happening on devices within the network. At or near
the edges of the network, data rates are low enough
that commodity workstations may be used to process
packet flows. However, the operating systems such machines use are not suited to the needs of data-driven
processing. This dissertation shows why this is a problem, how current work fails to address it, and proposes
a new approach.
The principal problem is that crosstalk occurs in the
processing of different data flows when they contend
for a shared resource and their accesses to this resource
are not scheduled appropriately; typically the shared resource is located in a server process. Previous work on
vertically structured operating systems reduces the need
for such shared servers by making applications responsible for performing as much of their own processing
as possible, protecting and multiplexing devices at the
95
lowest level consistent with allowing untrusted user access.
However, shared servers remain on the data path
in two circumstances: firstly, dumb network adaptors
need non-trivial processing to allow safe access by untrusted user applications. Secondly, shared servers are
needed wherever trusted code must be executed for security reasons.
This dissertation presents the design and implementation of Expert, an operating system which avoids
crosstalk by removing the need for such servers.
This dissertation describes how Expert handles
dumb network adaptors to enable applications to access them via a low-level interface which is cheap to implement in the kernel, and retains application responsibility for the work involved in running a network stack.
Expert further reduces the need for applicationlevel shared servers by introducing paths which can
trap into protected modules of code to perform actions
which would otherwise have to be implemented within
a server.
Expert allows traditional compute-bound tasks to
be freely mixed with these I/O-driven paths in a single
system, and schedules them in a unified manner. This
allows the processing performed in a network element
to be resource controlled, both for background processing tasks such as statistics gathering, and for data path
processing such as encryption.
UCAM-CL-TR-535
Claudia Faggian, Martin Hyland:
Designs, disputes and strategies
Abstract: Security processors typically store secret key
material in static RAM, from which power is removed
if the device is tampered with. It is commonly believed
that, at temperatures below −20 ◦ C, the contents of
SRAM can be ‘frozen’; therefore, many devices treat
temperatures below this threshold as tampering events.
We have done some experiments to establish the temperature dependency of data retention time in modern
SRAM devices. Our experiments show that the conventional wisdom no longer holds.
UCAM-CL-TR-537
Mantsika Matooane:
Parallel systems in symbolic and
algebraic computation
June 2002, 139 pages, PDF
PhD thesis (Trinity College, August 2001)
Abstract: This report describes techniques to exploit
distributed memory massively parallel supercomputers
to satisfy the peak memory demands of some very large
computer algebra problems (over 10 GB). The memory balancing is based on a randomized hashing algorithm for dynamic data distribution. Fine grained partitioning is used to provide flexibility in the memory
allocation, at the cost of higher communication cost.
The main problem areas are multivariate polynomial
algebra, and linear algebra with polynomial matrices.
The system was implemented and tested on a Hitachi
SR2201 supercomputer.
UCAM-CL-TR-538
May 2002, 21 pages, PDF
Abstract: Important progresses in logic are leading to
interactive and dynamical models. Geometry of Interaction and Games Semantics are two major examples.
Ludics, initiated by Girard, is a further step in this direction.
The objects of Ludics which correspond to proofs
are designs. A design can be described as the skeleton
of a sequent calculus derivation, where we do not manipulate formulas, but their location (the address where
the formula is stored). To study the traces of the interactions between designs as primitive leads to an alternative presentation, which is to describe a design as
the set of its possible interactions, called disputes. This
presentation has the advantage to make precise the correspondence between the basic notions of Ludics (designs, disputes and chronicles) and the basic notions of
Games semantics (strategies, plays and views).
UCAM-CL-TR-536
Sergei Skorobogatov:
Low temperature data remanence in
static RAM
Mark Ashdown, Peter Robinson:
The Escritoire: A personal projected
display for interacting with
documents
June 2002, 12 pages, PDF
Abstract: The Escritoire is a horizontal desk interface
that uses two projectors to create a foveal display. Items
such as images, documents, and the interactive displays
of other conventional computers, can be manipulated
on the desk using pens in both hands. The periphery
covers the desk, providing ample space for laying out
the objects relevant to a task, allowing them to be identified at a glance and exploiting human spatial memory
for rapid retrieval. The fovea is a high resolution focal
area that can be used to view any item in detail. The
projected images are continuously warped with commodity graphics hardware before display, to reverse the
effects of misaligned projectors and ensure registration
between fovea and periphery. The software is divided
into a hardware-specific client driving the display, and
a platform-independent server imposing control.
June 2002, 9 pages, PDF
96
UCAM-CL-TR-539
group boolean composition operators with sharp transitions or smooth free-form transitions in a single modeling metaphor. This idea is generalized for the creation,
N.A. Dodgson, M.A. Sabin, L. Barthe,
sculpting and manipulation of volume objects, while
M.F. Hassan:
providing the user with simplicity, controllability and
freedom in volume modeling.
Towards a ternary interpolating
Bounded volume objects, known as “Soft objects”
subdivision scheme for the triangular or “Metaballs”, have specific properties. We also
present binary Boolean composition operators that
mesh
gives more control on the form of the transition when
July 2002, 12 pages, PDF
these objects are blended.
To finish, we show how our free-form implicit
Abstract: We derive a ternary interpolating subdivision curves can be used to build implicit sweep objects.
scheme which works on the regular triangular mesh. It
has quadratic precision and fulfils the standard necesUCAM-CL-TR-542
sary conditions for C2 continuity. Further analysis is
required to determine its actual continuity class and to
I.P. Ivrissimtzis, N.A. Dodgson, M.A. Sabin:
define its behaviour around extraordinary points.
A generative classification of
mesh refinement rules with lattice
transformations
UCAM-CL-TR-540
N.A. Dodgson, J.R. Moore:
The use of computer graphics
rendering software in the analysis of a
novel autostereoscopic display design
August 2002, 6 pages, PDF
Abstract: Computer graphics ‘ray tracing’ software has
been used in the design and evaluation of a new autostereoscopic 3D display. This software complements
the conventional optical design software and provides
a cost-effective method of simulating what is actually
seen by a viewer of the display. It may prove a useful
tool in similar design problems.
UCAM-CL-TR-541
L. Barthe, N.A. Dodgson, M.A. Sabin,
B. Wyvill, V. Gaildrat:
September 2002, 13 pages, PDF
An updated, improved version of this report
has been published in Computer Aided
Geometric Design 22(1):99–109, January 2004
[doi:10.1016/j.cagd.2003.08.001]. The classification
scheme is slightly different in the CAGD version.
Please refer to the CAGD version in any work which
you produce.
Abstract: We give a classification of the subdivision refinement rules using sequences of similar lattices. Our
work expands and unifies recent results in the classification of primal triangular subdivision [Alexa, 2001],
and results on the refinement of quadrilateral lattices
[Sloan, 1994, 1989]. In the examples we concentrate
on the cases with low ratio of similarity and find new
univariate and bivariate refinement rules with the lowest possible such ratio, showing that this very low ratio
usually comes at the expense of symmetry.
UCAM-CL-TR-543
Different applications of
two-dimensional potential fields for
volume modeling
Kerry Rodden:
August 2002, 26 pages, PDF
Abstract: Current methods for building models using
implicit volume techniques present problems defining
accurate and controllable blend shapes between implicit primitives. We present new methods to extend the
freedom and controllability of implicit volume modeling. The main idea is to use a free-form curve to define
the profile of the blend region between implicit primitives.
The use of a free-form implicit curve, controlled
point-by-point in the Euclidean user space, allows us to
Evaluating similarity-based
visualisations as interfaces for
image browsing
September 2002, 248 pages, PDF
PhD thesis (Newnham College, 11 October 2001)
Abstract: Large collections of digital images are becoming more and more common, and the users of these
collections need computer-based systems to help them
find the images they require. Digital images are easy to
shrink to thumbnail size, allowing a large number of
97
them to be presented to the user simultaneously. Generally, current image browsing interfaces display thumbnails in a two-dimensional grid, in some default order,
and there has been little exploration of possible alternatives to this model.
With textual document collections, information visualisation techniques have been used to produce representations where the documents appear to be clustered
according to their mutual similarity, which is based on
the words they have in common. The same techniques
can be applied to images, to arrange a set of thumbnails according to a defined measure of similarity. In
many collections, the images are manually annotated
with descriptive text, allowing their similarity to be
measured in an analogous way to textual documents.
Alternatively, research in content-based image retrieval
has made it possible to measure similarity based on
low-level visual features, such as colour.
The primary goal of this research was to investigate the usefulness of such similarity-based visualisations as interfaces for image browsing. We concentrated on visual similarity, because it is applicable to
any image collection, regardless of the availability of
annotations. Initially, we used conventional information retrieval evaluation methods to compare the relative performance of a number of different visual similarity measures, both for retrieval and for creating visualisations.
Thereafter, our approach to evaluation was influenced more by human-computer interaction: we carried out a series of user experiments where arrangements based on visual similarity were compared to random arrangements, for different image browsing tasks.
These included finding a given target image, finding a
group of images matching a generic requirement, and
choosing subjectively suitable images for a particular
purpose (from a shortlisted set). As expected, we found
that similarity-based arrangements are generally more
helpful than random arrangements, especially when the
user already has some idea of the type of image she is
looking for.
Images are used in many different application domains; the ones we chose to study were stock photography and personal photography. We investigated the
organisation and browsing of personal photographs in
some depth, because of the inevitable future growth
in usage of digital cameras, and a lack of previous research in this area.
UCAM-CL-TR-544
I.P. Ivrissimtzis, M.A. Sabin, N.A. Dodgson:
On the support of recursive
subdivision
September 2002, 20 pages, PDF
An updated, improved version of this report has been
published in ACM Trans. Graphics 23(4):1043–1060,
October 2004 [doi:10.1145/1027411.1027417]
Abstract: We study the support of subdivision schemes,
that is, the area of the subdivision surface that will be
affected by the displacement of a single control point.
Our main results cover the regular case, where the mesh
induces a regular Euclidean tessellation of the parameter space. If n is the ratio of similarity between the
tessellation at step k and step k−1 of the subdivision,
we show that this number determines if the support is
polygonal or fractal. In particular if n=2, as it is in the
most schemes, the support is a polygon whose vertices
can be easily determined. If n is not equal to two as,
for example, in the square root of three scheme, the
support is usually fractal and on its boundary we can
identify sets like the classic ternary Cantor set.
UCAM-CL-TR-545
Anthony C.J. Fox:
A HOL specification of the
ARM instruction set architecture
June 2001, 45 pages, PDF
Abstract: This report gives details of a HOL specification of the ARM instruction set architecture. It is shown
that the HOL proof tool provides a suitable environment in which to model the architecture. The specification is used to execute fragments of ARM code generated by an assembler. The specification is based primarily around the third version of the ARM architecture,
and the intent is to provide a target semantics for future microprocessor verifications.
UCAM-CL-TR-546
Jonathan David Pfautz:
Depth perception in computer
graphics
September 2002, 182 pages, PDF
PhD thesis (Trinity College, May 2000)
Abstract: With advances in computing and visual display technology, the interface between man and machine has become increasingly complex. The usability of a modern interactive system depends on the design of the visual display. This dissertation aims to improve the design process by examining the relationship between human perception of depth and threedimensional computer-generated imagery (3D CGI).
Depth is perceived when the human visual system combines various different sources of information about a scene. In Computer Graphics, linear perspective is a common depth cue, and systems utilising binocular disparity cues are of increasing interest.
When these cues are inaccurately and inconsistently
presented, the effectiveness of a display will be limited.
98
Images generated with computers are sampled, meaning they are discrete in both time and space. This thesis
describes the sampling artefacts that occur in 3D CGI
and their effects on the perception of depth. Traditionally, sampling artefacts are treated as a Signal Processing problem. The approach here is to evaluate artefacts
using Human Factors and Ergonomics methodology;
sampling artefacts are assessed via performance on relevant visual tasks.
A series of formal and informal experiments were
performed on human subjects to evaluate the effects of
spatial and temporal sampling on the presentation of
depth in CGI. In static images with perspective information, the relative size of an object can be inconsistently presented across depth. This inconsistency prevented subjects from making accurate relative depth
judgements. In moving images, these distortions were
most visible when the object was moving slowly, pixel
size was large, the object was located close to the line
of sight and/or the object was located a large virtual
distance from the viewer. When stereo images are presented with perspective cues, the sampling artefacts
found in each cue interact. Inconsistencies in both size
and disparity can occur as the result of spatial and temporal sampling. As a result, disparity can vary inconsistently across an object. Subjects judged relative depth
less accurately when these inconsistencies were present.
An experiment demonstrated that stereo cues dominated in conflict situations for static images. In moving
imagery, the number of samples in stereo cues is limited. Perspective information dominated the perception
of depth for unambiguous (i.e., constant in direction
and velocity) movement.
Based on the experimental results, a novel method
was developed that ensures the size, shape and disparity of an object are consistent as it moves in depth.
This algorithm manipulates the edges of an object (at
the expense of positional accuracy) to enforce consistent size, shape and disparity. In a time-to-contact task
using only stereo and perspective depth cues, velocity
was judged more accurately using this method. A second method manipulated the location and orientation
of the viewpoint to maximise the number of samples of
perspective and stereo depth in a scene. This algorithm
was tested in a simulated air traffic control task. The
experiment demonstrated that knowledge about where
the viewpoint is located dominates any benefit gained
in reducing sampling artefacts.
This dissertation provides valuable information for
the visual display designer in the form of task-specific
experimental results and computationally inexpensive
methods for reducing the effects of sampling.
UCAM-CL-TR-547
Agathoniki Trigoni:
Semantic optimization of OQL
queries
October 2002, 171 pages, PDF
PhD thesis (Pembroke College, October 2001)
Abstract: This work explores all the phases of developing a query processor for OQL, the Object Query
Language proposed by the Object Data Management
Group (ODMG 3.0). There has been a lot of research
on the execution of relational queries and their optimization using syntactic or semantic transformations.
However, there is no context that has integrated and
tested all the phases of processing an object query
language, including the use of semantic optimization
heuristics. This research is motivated by the need for
query execution tools that combine two valuable properties: i) the expressive power to encompass all the features of the object-oriented paradigm and ii) the flexibility to benefit from the experience gained with relational systems, such as the use of semantic knowledge
to speed up query execution.
The contribution of this work is twofold. First, it
establishes a rigorous basis for OQL by defining a type
inference model for OQL queries and proposing a complete framework for their translation into calculus and
algebraic representations. Second, in order to enhance
query execution it provides algorithms for applying two
semantic optimization heuristics: constraint introduction and constraint elimination techniques. By taking
into consideration a set of association rules with exceptions, it is possible to add or remove predicates from
an OQL query, thus transforming it to a more efficient
form.
We have implemented this framework, which enables us to measure the benefits and the cost of exploiting semantic knowledge during query execution.
The experiments showed significant benefits, especially
in the application of the constraint introduction technique. In contexts where queries are optimized once
and are then executed repeatedly, we can ignore the
cost of optimization, and it is always worth carrying
out the proposed transformation. In the context of adhoc queries the cost of the optimization becomes an important consideration. We have developed heuristics to
estimate the cost as well as the benefits of optimization.
The optimizer will carry out a semantic transformation
only when the overhead is less than the expected benefit. Thus transformations are performed safely even
with adhoc queries. The framework can often speed up
the execution of an OQL query to a considerable extent.
UCAM-CL-TR-548
Anthony Fox:
Formal verification of the ARM6
micro-architecture
November 2002, 59 pages, PDF
Abstract: This report describes the formal verification
of the ARM6 micro-architecture using the HOL theorem prover. The correctness of the microprocessor
99
design compares the micro-architecture with an abstract, target instruction set semantics. Data and temporal abstraction maps are used to formally relate the
state spaces and to capture the timing behaviour of
the processor. The verification is carried out in HOL
and one-step theorems are used to provide the framework for the proof of correctness. This report also describes the formal specification of the ARM6’s three
stage pipelined micro-architecture.
UCAM-CL-TR-549
Ross Anderson:
Two remarks on public key
cryptology
December 2002, 7 pages, PDF
Abstract: The proof of the relative consistency of the
axiom of choice has been mechanized using Isabelle/ZF.
The proof builds upon a previous mechanization of the
reflection theorem. The heavy reliance on metatheory
in the original proof makes the formalization unusually long, and not entirely satisfactory: two parts of
the proof do not fit together. It seems impossible to
solve these problems without formalizing the metatheory. However, the present development follows a standard textbook, Kunen’s “Set Theory”, and could support the formalization of further material from that
book. It also serves as an example of what to expect
when deep mathematics is formalized.
UCAM-CL-TR-552
Keir A. Fraser, Steven M. Hand,
Timothy L. Harris, Ian M. Leslie,
Ian A. Pratt:
Abstract: In some talks I gave in 1997-98, I put forward
two observations on public-key cryptology, concerning The Xenoserver computing
forward-secure signatures and compatible weak keys. I
did not publish a paper on either of them as they ap- infrastructure
peared to be rather minor footnotes to public key cryp- January 2003, 11 pages, PDF
tology. But the work has occasionally been cited, and
I’ve been asked to write a permanent record.
Abstract: The XenoServer project will build a public
infrastructure for wide-area distributed computing. We
UCAM-CL-TR-550
envisage a world in which XenoServer execution platforms will be scattered across the globe and available
for any member of the public to submit code for exeKaren Spärck Jones:
cution. Crucially, the code’s sponsor will be billed for
Computer security –
all the resources used or reserved during its execution.
This will encourage load balancing, limit congestion,
a layperson’s guide, from the
and make the platform self-financing.
bottom up
Such a global infrastructure is essential to address
the fundamental problem of communication latency. By
June 2002, 23 pages, PDF
enabling principals to run programs at points throughout the network they can ensure that their code exeAbstract: Computer security as a technical matter is cutes close to the entities with which it interacts. As
complex, and opaque for those who are not them- well as reducing latency this can be used to avoid netselves computer professionals but who encounter, or work bottlenecks, to reduce long-haul network charges
are ultimately responsible for, computer systems. This and to provide a network presence for transientlypaper presents the essentials of computer security in connected mobile devices.
non-technical terms, with the aim of helping people afThis project will build and deploy a global
fected by computer systems to understand what secu- XenoServer test-bed and make it available to authenrity is about and to withstand the blinding with science ticated external users; initially members of the scientific
mantras that too often obscure the real issues.
community and ultimately of the general public. In this
environment accurate resource accounting and pricing
UCAM-CL-TR-551
is critical – whether in an actual currency or one that
is fictitious. As with our existing work on OS resource
management, pricing provides the feedback necessary
Lawrence C. Paulson:
for applications that can adapt, and prevents over-use
The relative consistency of the axiom by applications that cannot.
of choice —
mechanized using Isabelle/ZF
December 2002, 63 pages, PDF
UCAM-CL-TR-553
Paul R. Barham, Boris Dragovic,
Keir A. Fraser, Steven M. Hand,
Timothy L. Harris, Alex C. Ho,
100
Evangelos Kotsovinos,
Anil V.S. Madhavapeddy, Rolf Neugebauer,
Ian A. Pratt, Andrew K. Warfield:
technique for designing network control on different
timescales, including traffic engineering and the set of
admission and congestion control laws. We also speculate about the use of the same idea in wireless networks.
Xen 2002
UCAM-CL-TR-555
January 2003, 15 pages, PDF
Abstract: This report describes the design of Xen, the
hypervisor developed as part of the XenoServer widearea computing project. Xen enables the hardware resources of a machine to be virtualized and dynamically
partitioned such as to allow multiple different ‘guest’
operating system images to be run simultaneously.
Virtualizing the machine in this manner provides
flexibility, allowing different users to choose their preferred operating system (Windows, Linux, NetBSD),
and also enables use of the platform as a testbed for
operating systems research. Furthermore, Xen provides
secure partitioning between these ‘domains’, and enables better resource accounting and QoS isolation than
can be achieved within a conventional operating system. We show these benefits can be achieved at negligible performance cost.
We outline the design of Xen’s main sub-systems,
and the interface exported to guest operating systems.
Initial performance results are presented for our most
mature guest operating system port, Linux 2.4. This report covers the initial design of Xen, leading up to our
first public release which we plan to make available for
download in April 2003. Further reports will update
the design as our work progresses and present the implementation in more detail.
UCAM-CL-TR-554
Jon Crowcroft:
Towards a field theory for networks
January 2003, 9 pages, PDF
Abstract: It is often claimed that Internet Traffic patterns are interesting because the Internet puts few constraints on sources. This leads to innovation. It also
makes the study of Internet traffic, what we might cal
the search for the Internet Erlang, very difficult. At the
same time, traffic control (congestion control) and engineering are both hot topics.
What if “flash crowds” (a.k.a. slashdot), cascades,
epidemics and so on are the norm? What if the trend
continues for network link capacity to become flatter,
with more equal capacity in the access and core, or
even more capacity in the access than the core (as in
the early 1980s with 10Mbps LANs versus Kbps links
in the ARPANET)? How could we cope?
This is a paper about the use of field equations (e.g.
gravitational, electrical, magnetic, strong and weak
atomic and so forth) as a future model for managing
network traffic. We believe that in the future, one could
move from this model to a very general prescriptive
Jon Crowcroft, Richard Gibbens,
Stephen Hailes:
BOURSE – Broadband Organisation
of Unregulated Radio Systems
through Economics
January 2003, 10 pages, PDF
Abstract: This is a technical report about an idea for research in the intersection of active nets, cognitive radio
and power laws of network topologies.
UCAM-CL-TR-556
Jon Crowcroft:
Turing Switches – Turing machines
for all-optical Internet routing
January 2003, 7 pages, PDF
Abstract: This is technical report outlining an idea for
basic long term research into the architectures for programmable all-optical Internet routers.
We are revisiting some of the fundamental tenets of
computer science to carry out this work, and so it is
necessarily highly speculative.
Currently, the processing elements in all-electronic
routers are typically fairly conventional von-Neumann
architecture computers with processors that have large,
complex instruction sets (even RISC is relatively complex compared with the actual requirements for packet
processing) and Random Access Memory.
As the need for speed increases, first this architecture, and then the classical computing hardware components, and finally, electronics cease to be able to keep
up.
At this time, optical device technology is making
great strides, and we see the availability of gates, as
well as a plethora of invention in providing buffering
mechanisms.
However, a critical problem we foresee is the ability to re-program devices for different packet processing functions such as classification and scheduling.
This proposal is aimed at researching one direction for
adding optical domain programmability.
101
UCAM-CL-TR-557
The work explores formally the security properties
of the established model, in particular, support for separation of duty and least privilege principles that are imG.M. Bierman, P. Sewell:
portant requirements in many commercial systems. Results have implications for understanding different variIota: A concurrent XML scripting
ations of separation of duty policy that are currently
language with applications to
used in the role-based access control.
Finally, a design architecture of the defined secuHome Area Networking
rity model is presented detailing the components and
January 2003, 32 pages, PDF
processing phases required for successful application of
the model to distributed computer environments. The
Abstract: Iota is a small and simple concurrent lan- model provides opportunities for the implementers,
guage that provides native support for functional XML based on application requirements, to choose between
computation and for typed channel-based communica- several alternative design approaches.
tion. It has been designed as a domain-specific language
to express device behaviour within the context of Home
UCAM-CL-TR-559
Area Networking.
In this paper we describe Iota, explaining its novel Eiko Yoneki, Jean Bacon:
treatment of XML and describing its type system and
Pronto: MobileGateway with
operational semantics. We give a number of examples
including Iota code to program Universal Plug ’n’ Play publish-subscribe paradigm over
(UPnP) devices.
wireless network
February 2003, 22 pages, PDF
UCAM-CL-TR-558
Yolanta Beresnevichiene:
A role and context based security
model
January 2003, 89 pages, PDF
PhD thesis (Wolfson College, June 2000)
Abstract: Security requirements approached at the enterprise level initiate the need for models that capture
the organisational and distributed aspects of information usage. Such models have to express organisationspecific security policies and internal controls aiming
to protect information against unauthorised access and
modification, and against usage of information for unintended purposes. This technical report describes a
systematic approach to modelling the security requirements from the perspective of job functions and tasks
performed in an organisation. It deals with the design,
analysis, and management of security abstractions and
mechanisms in a unified framework.
The basis of access control policy in this framework is formulated around a semantic construct of a
role. Roles are granted permissions according to the
job functions that exist in an organisation, and then
users are assigned to roles on basis of their specific
job responsibilities. In order to ensure that permissions
included in the roles are used by users only for purposes corresponding to the organisation’s present business needs, a novel approach of “active” context-based
access control is proposed. The usage of role permissions in this approach is controlled according to the
emerging context associated with progress of various
tasks in the organisation.
Abstract: This paper presents the design, implementation, and evaluation of Pronto, a middleware system for
mobile applications with messaging as a basis. It provides a solution for mobile application specific problems such as resource constraints, network characteristics, and data optimization. Pronto consists of three
main functions: 1) MobileJMS Client, a lightweight
client of Message Oriented Middleware (MOM) based
on Java Message Service (JMS), 2) Gateway for reliable and efficient transmission between mobile devices
and a server with pluggable components, and 3) Serverless JMS based on IP multicast. The publish-subscribe
paradigm is ideal for mobile applications, as mobile devices are commonly used for data collection under conditions of frequent disconnection and changing numbers of recipients. This paradigm provides greater flexibility due to the decoupling of publisher and subscriber.
Adding a gateway as a message hub to transmit information in real-time or with store-and-forward messaging provides powerful optimization and data transformation. Caching is an essential function of the gateway, and SmartCaching is designed for generic caching
in an N-tier architecture. Serverless JMS aims at a decentralized messaging model, which supports an ad-hoc
network, as well as creating a high-speed messaging
BUS. Pronto is an intelligent MobileGateway, providing a useful MOM intermediary between a server and
mobile devices over a wireless network.
UCAM-CL-TR-560
Mike Bond, Piotr Zieliński:
Decimalisation table attacks for
PIN cracking
February 2003, 14 pages, PDF
102
Abstract: We present an attack on hardware security
modules used by retail banks for the secure storage and
verification of customer PINs in ATM (cash machine)
infrastructures. By using adaptive decimalisation tables
and guesses, the maximum amount of information is
learnt about the true PIN upon each guess. It takes an
average of 15 guesses to determine a four digit PIN using this technique, instead of the 5000 guesses intended.
In a single 30 minute lunch-break, an attacker can thus
discover approximately 7000 PINs rather than 24 with
the brute force method. With a £300 withdrawal limit
per card, the potential bounty is raised from £7200
to £2.1 million and a single motivated attacker could
withdraw £30–50 thousand of this each day. This attack thus presents a serious threat to bank security.
UCAM-CL-TR-561
Paul B. Menage:
Resource control of untrusted code in
an open network environment
March 2003, 185 pages, PDF
PhD thesis (Magdalene College, June 2000)
Abstract: Current research into Active Networks, Open
Signalling and other forms of mobile code have made
use of the ability to execute user-supplied code at locations within the network infrastructure, in order to
avoid the inherent latency associated with wide area
networks or to avoid sending excessive amounts of data
across bottleneck links or nodes. Existing research has
addressed the design and evaluation of programming
environments, and testbeds have been implemented on
traditional operating systems. Such work has deferred
issues regarding resource control; this has been reasonable, since this research has been conducted in a closed
environment.
In an open environment, which is required for
widespread deployment of such technologies, the code
supplied to the network nodes may not be from a
trusted source. Thus, it cannot be assumed that such
code will behave non-maliciously, nor that it will avoid
consuming more than its fair share of the available system resources.
The computing resources consumed by end-users on
programmable nodes within a network are not free,
and must ultimately be paid for in some way. Programmable networks allow users substantially greater
complexity in the way that they may consume network resources. This dissertation argues that, due to
this complexity, it is essential to be able control and account for the resources used by untrusted user-supplied
code if such technology is to be deployed effectively in
a wide-area open environment.
The Resource Controlled Active Node Environment
(RCANE) is presented to facilitate the control of untrusted code. RCANE supports the allocation, scheduling and accounting of the resources available on a node,
including CPU and network I/O scheduling, memory allocation, and garbage collection overhead.
UCAM-CL-TR-562
Carsten Moenning, Neil A. Dodgson:
Fast Marching farthest point
sampling
April 2003, 16 pages, PDF
Abstract: Using Fast Marching for the incremental
computation of distance maps across the sampling domain, we obtain an efficient farthest point sampling
technique (FastFPS). The method is based on that of
Eldar et al. (1992, 1997) but extends more naturally to
the case of non-uniform sampling and is more widely
applicable. Furthermore, it can be applied to both
planar domains and curved manifolds and allows for
weighted domains in which different cost is associated
with different points on the surface. We conclude with
considering the extension of FastFPS to the sampling of
point clouds without the need for prior surface reconstruction.
UCAM-CL-TR-563
G.M. Bierman, M.J. Parkinson, A.M. Pitts:
MJ: An imperative core calculus for
Java and Java with effects
April 2003, 53 pages, PDF
Abstract: In order to study rigorously object-oriented
languages such as Java or C#, a common practice is to
define lightweight fragments, or calculi, which are sufficiently small to facilitate formal proofs of key properties. However many of the current proposals for calculi lack important language features. In this paper we
propose Middleweight Java, MJ, as a contender for a
minimal imperative core calculus for Java. Whilst compact, MJ models features such as object identity, field
assignment, constructor methods and block structure.
We define the syntax, type system and operational semantics of MJ, and give a proof of type safety. In order
to demonstrate the usefulness of MJ to reason about
operational features, we consider a recent proposal of
Greenhouse and Boyland to extend Java with an effects
system. This effects system is intended to delimit the
scope of computational effects within a Java program.
We define an extension of MJ with a similar effects system and instrument the operational semantics. We then
prove the correctness of the effects system; a question
left open by Greenhouse and Boyland. We also consider
the question of effect inference for our extended calculus, detail an algorithm for inferring effects information
and give a proof of correctness.
103
UCAM-CL-TR-564
UCAM-CL-TR-565
Ulrich Lang:
Carsten Moenning, Neil A. Dodgson:
Access policies for middleware
Fast Marching farthest point
sampling for point clouds and
implicit surfaces
May 2003, 138 pages, PDF
PhD thesis (Wolfson College, March 2003)
Abstract: This dissertation examines how the architectural layering of middleware constrains the design of
a middleware security architecture, and analyses the
complications that arise from that. First, we define a
precise notion of middleware that includes its architecture and features. Our definition is based on the Common Object Request Broker Architecture (CORBA),
which is used throughout this dissertation both as a
reference technology and as a basis for a proof of concept implementation. In several steps, we construct a
security model that fits to the described middleware architecture. The model facilitates conceptual reasoning
about security. The results of our analysis indicate that
the cryptographic identities available on the lower layers of the security model are only of limited use for expressing fine-grained security policies, because they are
separated from the application layer entities by the middleware layer. To express individual application layer
entities in access policies, additional more fine-grained
descriptors are required. To solve this problem for the
target side (i.e., the receiving side of an invocation), we
propose an improved middleware security model that
supports individual access policies on a per-target basis. The model is based on so-called “resource descriptors”, which are used in addition to cryptographic identities to describe application layer entities in access policies. To be useful, descriptors need to fulfil a number
of properties, such as local uniqueness and persistency.
Next, we examine the information available at the middleware layer for its usefulness as resource descriptors,
in particular the interface name and the instance information inside the object reference. Unfortunately neither fulfils all required properties. However, it is possible to obtain resource descriptors on the target side
through a mapping process that links target instance
information to an externally provided descriptor. We
describe both the mapping configuration when the target is instantiated and the mapping process at invocation time. A proof of concept implementation, which
contains a number of technical improvements over earlier attempts to solve this problem, shows that this approach is useable in practice, even for complex architectures, such as CORBA and CORBASec (the security
services specified for CORBA). Finally, we examine the
security approaches of several related middleware technologies that have emerged since the specification of
CORBA and CORBASec, and show the applicability
of the resource descriptor mapping.
May 2003, 15 pages, PDF
Abstract: In a recent paper (Moenning and Dodgson,
2003), the Fast Marching farthest point sampling strategy (FastFPS) for planar domains and curved manifolds was introduced. The version of FastFPS for curved
manifolds discussed in the paper deals with surface domains in triangulated form only. Due to a restriction
of the underlying Fast Marching method, the algorithm
further requires the splitting of any obtuse into acute
triangles to ensure the consistency of the Fast Marching approximation. In this paper, we overcome these
restrictions by using Memoli and Sapiro’s (Memoli and
Sapiro, 2001 and 2002) extension of the Fast Marching
method to the handling of implicit surfaces and point
clouds. We find that the extended FastFPS algorithm
can be applied to surfaces in implicit or point cloud
form without the loss of the original algorithm’s computational optimality and without the need for any preprocessing.
UCAM-CL-TR-566
Joe Hurd:
Formal verification of probabilistic
algorithms
May 2003, 154 pages, PDF
PhD thesis (Trinity College, December 2001)
Abstract: This thesis shows how probabilistic algorithms can be formally verified using a mechanical theorem prover.
We begin with an extensive foundational development of probability, creating a higher-order logic formalization of mathematical measure theory. This allows the definition of the probability space we use to
model a random bit generator, which informally is a
stream of coin-flips, or technically an infinite sequence
of IID Bernoulli(1/2) random variables.
Probabilistic programs are modelled using the statetransformer monad familiar from functional programming, where the random bit generator is passed around
in the computation. Functions remove random bits
from the generator to perform their calculation, and
then pass back the changed random bit generator with
the result.
Our probability space modelling the random bit
generator allows us to give precise probabilistic specifications of such programs, and then verify them in the
theorem prover.
104
We also develop technical support designed to expedite verification: probabilistic quantifiers; a compositional property subsuming measurability and independence; a probabilistic while loop together with a formal
concept of termination with probability 1. We also introduce a technique for reducing properties of a probabilistic while loop to properties of programs that are
guaranteed to terminate: these can then be established
using induction and standard methods of program correctness.
We demonstrate the formal framework with some
example probabilistic programs: sampling algorithms
for four probability distributions; some optimal procedures for generating dice rolls from coin flips; the
symmetric simple random walk. In addition, we verify the Miller-Rabin primality test, a well-known and
commercially used probabilistic algorithm. Our fundamental perspective allows us to define a version with
strong properties, which we can execute in the logic to
prove compositeness of numbers.
UCAM-CL-TR-567
Joe Hurd:
Using inequalities as term ordering
constraints
a marshalled value is received from the network, containing identifiers that must be rebound to local resources. Typically it is provided only by ad-hoc mechanisms that lack clean semantics.
In this paper we adopt a foundational approach, developing core dynamic rebinding mechanisms as extensions to the simply-typed call-by-value λ-calculus. To
do so we must first explore refinements of the callby-value reduction strategy that delay instantiation, to
ensure computations make use of the most recent versions of rebound definitions. We introduce redex-time
and destruct-time strategies. The latter forms the basis for a λ-marsh calculus that supports dynamic rebinding of marshalled values, while remaining as far
as possible statically-typed. We sketch an extension of
λ-marsh with concurrency and communication, giving
examples showing how wrappers for encapsulating untrusted code can be expressed. Finally, we show that a
high-level semantics for dynamic updating can also be
based on the destruct-time strategy, defining a λ-update
calculus with simple primitives to provide type-safe updating of running code. We thereby establish primitives
and a common semantic foundation for a variety of
real-world dynamic rebinding requirements.
UCAM-CL-TR-569
James J. Leifer, Gilles Peskine, Peter Sewell,
Keith Wansbrough:
June 2003, 17 pages, PDF
Abstract: In this paper we show how linear inequalities
can be used to approximate Knuth-Bendix term ordering constraints, and how term operations such as substitution can be carried out on systems of inequalities.
Using this representation allows an off-the-shelf linear
arithmetic decision procedure to check the satisfiability of a set of ordering constraints. We present a formal
description of a resolution calculus where systems of inequalities are used to constrain clauses, and implement
this using the Omega test as a satisfiability checker. We
give the results of an experiment over problems in the
TPTP archive, comparing the practical performance of
the resolution calculus with and without inherited inequality constraints.
UCAM-CL-TR-568
Gavin Bierman, Michael Hicks, Peter Sewell,
Gareth Stoyle, Keith Wansbrough:
Dynamic rebinding for marshalling
and update, with destruct-time λ
February 2004, 85 pages, PDF
Abstract: Most programming languages adopt static
binding, but for distributed programming an exclusive
reliance on static binding is too restrictive: dynamic
binding is required in various guises, for example when
Global abstraction-safe marshalling
with hash types
June 2003, 86 pages, PDF
Abstract: Type abstraction is a key feature of ML-like
languages for writing large programs. Marshalling is
necessary for writing distributed programs, exchanging values via network byte-streams or persistent stores.
In this paper we combine the two, developing compiletime and run-time semantics for marshalling, that guarantee abstraction-safety between separately-built programs.
We obtain a namespace for abstract types that
is global, ie meaningful between programs, by hashing module declarations. We examine the scenarios in
which values of abstract types are communicated from
one program to another, and ensure, by constructing
hashes appropriately, that the dynamic and static notions of type equality mirror each other. We use singleton kinds to express abstraction in the static semantics; abstraction is tracked in the dynamic semantics
by coloured brackets. These allow us to prove preservation, erasure, and coincidence results. We argue that
our proposal is a good basis for extensions to existing
ML-like languages, pragmatically straightforward for
language users and for implementors.
105
UCAM-CL-TR-570
Ole Høgh Jensen, Robin Milner:
Bigraphs and mobile processes
July 2003, 121 pages, PDF
Abstract: A bigraphical reactive system (BRS) involves
bigraphs, in which the nesting of nodes represents locality, independently of the edges connecting them; it also
allows bigraphs to reconfigure themselves. BRSs aim to
provide a uniform way to model spatially distributed
systems that both compute and communicate. In this
memorandum we develop their static and dynamic theory.
In Part I we illustrate bigraphs in action, and show
how they correspond to to process calculi. We then develop the abstract (non-graphical) notion of wide reactive system (WRS), of which BRSs are an instance.
Starting from reaction rules —often called rewriting
rules— we use the RPO theory of Leifer and Milner to
derive (labelled) transition systems for WRSs, in a way
that leads automatically to behavioural congruences.
In Part II we develop bigraphs and BRSs formally.
The theory is based directly on graphs, not on syntax.
Key results in the static theory are that sufficient RPOs
exist (enabling the results of Part I to be applied), that
parallel combinators familiar from process calculi may
be defined, and that a complete algebraic theory exists
at least for pure bigraphs (those without binding). Key
aspects in the dynamic theory —the BRSs— are the definition of parametric reaction rules that may replicate
or discard parameters, and the full application of the
behavioural theory of Part I.
In Part III we introduce a special class: the simple BRSs. These admit encodings of many process calculi, including the π-calculus and the ambient calculus. A still narrower class, the basic BRSs, admits an
easy characterisation of our derived transition systems.
We exploit this in a case study for an asynchronous πcalculus. We show that structural congruence of process terms corresponds to equality of the representing
bigraphs, and that classical strong bisimilarity corresponds to bisimilarity of bigraphs. At the end, we explore several directions for further work.
UCAM-CL-TR-571
James Hall:
Multi-layer network monitoring and
analysis
July 2003, 230 pages, PDF
PhD thesis (King’s College, April 2003)
Abstract: Passive network monitoring offers the possibility of gathering a wealth of data about the traffic traversing the network and the communicating processes generating that traffic. Significant advantages include the non-intrusive nature of data capture and the
range and diversity of the traffic and driving applications which may be observed. Conversely there are also
associated practical difficulties which have restricted
the usefulness of the technique: increasing network
bandwidths can challenge the capacity of monitors to
keep pace with passing traffic without data loss, and
the bulk of data recorded may become unmanageable.
Much research based upon passive monitoring has
in consequence been limited to that using a sub-set of
the data potentially available, typically TCP/IP packet
headers gathered using Tcpdump or similar monitoring
tools. The bulk of data collected is thereby minimised,
and with the possible exception of packet filtering, the
monitor’s available processing power is available for
the task of collection and storage. As the data available
for analysis is drawn from only a small section of the
network protocol stack, detailed study is largely confined to the associated functionality and dynamics in
isolation from activity at other levels. Such lack of context severely restricts examination of the interaction between protocols which may in turn lead to inaccurate
or erroneous conclusions.
The work described in this report attempts to address some of these limitations. A new passive monitoring architecture — Nprobe — is presented, based
upon ‘off the shelf’ components and which, by using
clusters of probes, is scalable to keep pace with current high bandwidth networks without data loss. Monitored packets are fully captured, but are subject to the
minimum processing in real time needed to identify and
associate data of interest across the target set of protocols. Only this data is extracted and stored. The data
reduction ratio thus achieved allows examination of a
wider range of encapsulated protocols without straining the probe’s storage capacity.
Full analysis of the data harvested from the network
is performed off-line. The activity of interest within
each protocol is examined and is integrated across the
range of protocols, allowing their interaction to be
studied. The activity at higher levels informs study of
the lower levels, and that at lower levels infers detail
of the higher. A technique for dynamically modelling
TCP connections is presented, which, by using data
from both the transport and higher levels of the protocol stack, differentiates between the effects of network
and end-process activity.
The balance of the report presents a study of Web
traffic using Nprobe. Data collected from the IP, TCP,
HTTP and HTML levels of the stack is integrated to
identify the patterns of network activity involved in
downloading whole Web pages: by using the links contained in HTML documents observed by the monitor,
together with data extracted from the HTML headers of downloaded contained objects, the set of TCP
connections used, and the way in which browsers use
106
them, are studied as a whole. An analysis of the degree
and distribution of delay is presented and contributes to
the understanding of performance as perceived by the
user. The effects of packet loss on whole page download
times are examined, particularly those losses occurring
early in the lifetime of connections before reliable estimations of round trip times are established. The implications of such early packet losses for pages downloads
using persistent connections are also examined by simulations using the detailed data available.
UCAM-CL-TR-572
Tim Harris:
Design choices for language-based
transactions
August 2003, 7 pages, PDF
Abstract: This report discusses two design choices
which arose in our recent work on introducing a new
‘atomic’ keyword as an extension to the Java programming language. We discuss the extent to which programs using atomic blocks should be provided with
an explicit ‘abort’ operation to roll-back the effects of
the current block. We also discuss mechanisms for supporting blocks that perform I/O operations or external
database transactions.
UCAM-CL-TR-573
Lawrence C. Paulson:
Mechanizing compositional
reasoning for concurrent systems:
some lessons
August 2003, 20 pages, PDF
Abstract: The paper reports on experiences of mechanizing various proposals for compositional reasoning in
concurrent systems. The work uses the UNITY formalism and the Isabelle proof tool. The proposals investigated include existential/universal properties, guarantees properties and progress sets. The paper mentions
some alternative proposals that are also worth of investigation. The conclusions are that many of these methods work and are suitable candidates for further development.
UCAM-CL-TR-574
Ivan Edward Sutherland:
Sketchpad: A man-machine graphical
communication system
September 2003, 149 pages, PDF
PhD thesis (Massachusetts Institute of Technology,
January 1963)
New preface by Alan Blackwell and Kerry Rodden.
Abstract: The Sketchpad system uses drawing as a
novel communication medium for a computer. The system contains input, output, and computation programs
which enable it to interpret information drawn directly
on a computer display. It has been used to draw electrical, mechanical, scientific, mathematical, and animated
drawings; it is a general purpose system. Sketchpad has
shown the most usefulness as an aid to the understanding of processes, such as the notion of linkages, which
can be described with pictures. Sketchpad also makes it
easy to draw highly repetitive or highly accurate drawings and to change drawings previously drawn with it.
The many drawings in this thesis were all made with
Sketchpad.
A Sketchpad user sketches directly on a computer
display with a “light pen.” The light pen is used both
to position parts of the drawing on the display and
to point to them to change them. A set of push buttons controls the changes to be made such as “erase,”
or “move.” Except for legends, no written language is
used.
Information sketched can include straight line segments and circle arcs. Arbitrary symbols may be defined
from any collection of line segments, circle arcs, and
previously defined symbols. A user may define and use
as many symbols as he wishes. Any change in the definition of a symbol is at once seen wherever that symbol
appears.
Sketchpad stores explicit information about the
topology of a drawing. If the user moves one vertex of a
polygon, both adjacent sides will be moved. If the user
moves a symbol, all lines attached to that symbol will
automatically move to stay attached to it. The topological connections of the drawing are automatically indicated by the user as he sketches. Since Sketchpad is able
to accept topological information from a human being
in a picture language perfectly natural to the human,
it can be used as an input program for computation
programs which require topological data, e.g., circuit
simulators.
Sketchpad itself is able to move parts of the drawing
around to meet new conditions which the user may apply to them. The user indicates conditions with the light
pen and push buttons. For example, to make two lines
parallel, he successively points to the lines with the light
pen and presses a button. The conditions themselves are
displayed on the drawing so that they may be erased or
changed with the light pen language. Any combination
of conditions can be defined as a composite condition
and applied in one step.
It is easy to add entirely new types of conditions
to Sketchpad’s vocabulary. Since the conditions can involve anything computable, Sketchpad can be used for
a very wide range of problems. For example, Sketchpad has been used to find the distribution of forces in
the members of truss bridges drawn with it.
Sketchpad drawings are stored in the computer in
a specially designed “ring” structure. The ring structure features rapid processing of topological information with no searching at all. The basic operations used
107
UCAM-CL-TR-576
in Sketchpad for manipulating the ring structure are described.
David R. Spence:
UCAM-CL-TR-575
An implementation of a coordinate
based location system
Tim Granger:
Reconfigurable wavelength-switched
optical networks for the Internet core
November 2003, 184 pages, PDF
PhD thesis (King’s College, August 2003)
Abstract: With the quantity of data traffic carried on
the Internet doubling each year, there is no let up in the
demand for ever increasing network capacity. Optical
fibres have a theoretical capacity of many tens of terabits per second. Currently six terabits per second has
been achieved using Dense Wavelength Division Multiplexing: multiple signals at different wavelengths carried on the same fibre.
This large available bandwidth moves the performance bottlenecks to the processing required at each
network node to receive, buffer, route, and transmit
each individual packet. For the last 10 years the speed
of the electronic routers has been, in relative terms,
increasing slower than optical capacity. The space required and power consumed by these routers is also becoming a significant limitation.
One solution examined in this dissertation is to create a virtual topology in the optical layer by using alloptical switches to create lightpaths across the network.
In this way nodes that are not directly connected can
appear to be a single virtual hop away, and no perpacket processing is required at the intermediate nodes.
With advances in optical switches it is now possible for
the network to reconfigure lightpaths dynamically. This
allows the network to share the resources available between the different traffic streams flowing across the
network, and track changes in traffic volumes by allocating bandwidth on demand.
This solution is inherently a circuit-switched approach, but taken into account are characteristics of
optical switching, in particular waveband switching
(where we switch a contiguous range of wavelengths
as a single unit) and latency required to achieve non
disruptive switching.
This dissertation quantifies the potential gain from
such a system and how that gain is related to the frequency of reconfiguration. It outlines possible network
architectures which allow reconfiguration and, through
simulation, measures the performance of these architectures. It then discusses the possible interactions between
a reconfiguring optical layer and higher-level network
layers.
This dissertation argues that the optical layer should
be distinct from higher network layers, maintaining stable full-mesh connectivity, and dynamically reconfiguring the sizes and physical routes of the virtual paths to
take advantage of changing traffic levels.
November 2003, 12 pages, PDF
Abstract: This paper explains the co-ordinate based location system built for XenoSearch, a resource discovery system in the XenoServer Open Platform. The system is builds on the work of GNP, Lighthouse and
many more recent schemes. We also present results
from various combinations of algorithms to perform
the actual co-ordinate calculation based on GNP, Lighthouse and spring based systems and show our implementations of the various algorithms give similar prediction errors.
UCAM-CL-TR-577
Markus G. Kuhn:
Compromising emanations:
eavesdropping risks of computer
displays
December 2003, 167 pages, PDF
PhD thesis (Wolfson College, June 2002)
Abstract: Electronic equipment can emit unintentional
signals from which eavesdroppers may reconstruct processed data at some distance. This has been a concern for military hardware for over half a century. The
civilian computer-security community became aware of
the risk through the work of van Eck in 1985. Military “Tempest” shielding test standards remain secret
and no civilian equivalents are available at present. The
topic is still largely neglected in security textbooks due
to a lack of published experimental data.
This report documents eavesdropping experiments
on contemporary computer displays. It discusses the
nature and properties of compromising emanations for
both cathode-ray tube and liquid-crystal monitors. The
detection equipment used matches the capabilities to
be expected from well-funded professional eavesdroppers. All experiments were carried out in a normal unshielded office environment. They therefore focus on
emanations from display refresh signals, where periodic
averaging can be used to obtain reproducible results in
spite of varying environmental noise.
Additional experiments described in this report
demonstrate how to make information emitted via
the video signal more easily receivable, how to recover plaintext from emanations via radio-character
recognition, how to estimate remotely precise videotiming parameters, and how to protect displayed text
from radio-frequency eavesdroppers by using specialized screen drivers with a carefully selected video card.
108
Furthermore, a proposal for a civilian radio-frequency
emission-security standard is outlined, based on pathloss estimates and published data about radio noise levels.
Finally, a new optical eavesdropping technique is
demonstrated that reads CRT displays at a distance. It
observes high-frequency variations of the light emitted,
even after diffuse reflection. Experiments with a typical
monitor show that enough video signal remains in the
light to permit the reconstruction of readable text from
signals detected with a fast photosensor. Shot-noise calculations provide an upper bound for this risk.
UCAM-CL-TR-578
Robert Ennals, Richard Sharp, Alan Mycroft:
Linear types for packet processing
(extended version)
January 2004, 31 pages, PDF
Abstract: We present PacLang: an imperative, concurrent, linearly-typed language designed for expressing
packet processing applications. PacLang’s linear type
system ensures that no packet is referenced by more
than one thread, but allows multiple references to a
packet within a thread. We argue (i) that this property
greatly simplifies compilation of high-level programs to
the distributed memory architectures of modern Network Processors; and (ii) that PacLang’s type system
captures that style in which imperative packet processing programs are already written. Claim (ii) is justified
by means of a case-study: we describe a PacLang implementation of the IPv4 unicast packet forwarding algorithm.
PacLang is formalised by means of an operational
semantics and a Unique Ownership theorem formalises
its correctness with respect to the type system.
because they can harbour problems such as deadlock,
priority inversion and convoying. Lock manipulations
may also degrade the performance of cache-coherent
multiprocessor systems by causing coherency conflicts
and increased interconnect traffic, even when the lock
protects read-only data.
In looking for solutions to these problems, interest
has developed in lock-free data structures. By eschewing mutual exclusion it is hoped that more efficient and
robust systems can be built. Unfortunately the current
reality is that most lock-free algorithms are complex,
slow and impractical. In this dissertation I address these
concerns by introducing and evaluating practical abstractions and data structures that facilitate the development of large-scale lock-free systems.
Firstly, I present an implementation of two useful
abstractions that make it easier to develop arbitrary
lock-free data structures. Although these abstractions
have been described in previous work, my designs are
the first that can be practically implemented on current
multiprocessor systems.
Secondly, I present a suite of novel lock-free search
structures. This is interesting not only because of the
fundamental importance of searching in computer science and its wide use in real systems, but also because it
demonstrates the implementation issues that arise when
using the practical abstractions I have developed.
Finally, I evaluate each of my designs and compare them with existing lock-based and lock-free alternatives. To ensure the strongest possible competition, several of the lock-based alternatives are significant improvements on the best-known solutions in the
literature. These results demonstrate that it is possible
to build useful data structures with all the perceived
benefits of lock-freedom and with performance better than sophisticated lock-based designs. Furthermore,
and contrary to popular belief, this work shows that existing hardware primitives are sufficient to build practical lock-free implementations of complex data structures.
UCAM-CL-TR-579
UCAM-CL-TR-580
Keir Fraser:
Ole Høgh Jensen, Robin Milner:
Practical lock-freedom
Bigraphs and mobile processes
(revised)
February 2004, 116 pages, PDF
PhD thesis (King’s College, September 2003)
Abstract: Mutual-exclusion locks are currently the
most popular mechanism for interprocess synchronisation, largely due to their apparent simplicity and
ease of implementation. In the parallel-computing environments that are increasingly commonplace in highperformance applications, this simplicity is deceptive:
mutual exclusion does not scale well with large numbers of locks and many concurrent threads of execution. Highly-concurrent access to shared data demands
a sophisticated ‘fine-grained’ locking strategy to avoid
serialising non-conflicting operations. Such strategies
are hard to design correctly and with good performance
February 2004, 131 pages, PDF
Abstract: A bigraphical reactive system (BRS) involves
bigraphs, in which the nesting of nodes represents locality, independently of the edges connecting them; it also
allows bigraphs to reconfigure themselves. BRSs aim to
provide a uniform way to model spatially distributed
systems that both compute and communicate. In this
memorandum we develop their static and dynamic theory.
109
In Part I we illustrate bigraphs in action, and show
how they correspond to to process calculi. We then develop the abstract (non-graphical) notion of wide reactive system (WRS), of which BRSs are an instance.
Starting from reaction rules —often called rewriting
rules— we use the RPO theory of Leifer and Milner to
derive (labelled) transition systems for WRSs, in a way
that leads automatically to behavioural congruences.
In Part II we develop bigraphs and BRSs formally.
The theory is based directly on graphs, not on syntax.
Key results in the static theory are that sufficient RPOs
exist (enabling the results of Part I to be applied), that
parallel combinators familiar from process calculi may
be defined, and that a complete algebraic theory exists
at least for pure bigraphs (those without binding). Key
aspects in the dynamic theory —the BRSs— are the definition of parametric reaction rules that may replicate
or discard parameters, and the full application of the
behavioural theory of Part I.
In Part III we introduce a special class: the simple BRSs. These admit encodings of many process calculi, including the π-calculus and the ambient calculus. A still narrower class, the basic BRSs, admits an
easy characterisation of our derived transition systems.
We exploit this in a case study for an asynchronous πcalculus. We show that structural congruence of process terms corresponds to equality of the representing
bigraphs, and that classical strong bisimilarity corresponds to bisimilarity of bigraphs. At the end, we explore several directions for further work.
UCAM-CL-TR-581
Robin Milner:
Axioms for bigraphical structure
February 2004, 26 pages, PDF
Abstract: This paper axiomatises the structure of bigraphs, and proves that the resulting theory is complete. Bigraphs are graphs with double structure, representing locality and connectivity. They have been
shown to represent dynamic theories for the π-calculus,
mobile ambients and Petri nets, in a way that is faithful to each of those models of discrete behaviour. While
the main purpose of bigraphs is to understand mobile
systems, a prerequisite for this understanding is a wellbehaved theory of the structure of states in such systems. The algebra of bigraph structure is surprisingly
simple, as the paper demonstrates; this is because bigraphs treat locality and connectivity orthogonally.
UCAM-CL-TR-582
Piotr Zieliński:
Latency-optimal Uniform Atomic
Broadcast algorithm
Abstract: We present a new asynchronous Uniform
Atomic Broadcast algorithm with a delivery latency of
two communication steps in optimistic settings, which
is faster than any other known algorithm and has been
shown to be the lower bound. It also has the weakest possible liveness requirements (the Ω failure detector
and a majority of correct processes) and achieves three
new lower bounds presented in this paper. Finally, we
introduce a new notation and several new abstractions,
which are used to construct and present the algorithm
in a clear and modular way.
UCAM-CL-TR-583
Cédric Gérot, Loı̈c Barthe, Neil A. Dodgson,
Malcolm A. Sabin:
Subdivision as a sequence of
sampled Cp surfaces and conditions
for tuning schemes
March 2004, 68 pages, PDF
Abstract: We deal with practical conditions for tuning
a subdivision scheme in order to control its artifacts
in the vicinity of a mark point. To do so, we look for
good behaviour of the limit vertices rather than good
mathematical properties of the limit surface. The good
behaviour of the limit vertices is characterised with the
definition of C2-convergence of a scheme. We propose
necessary explicit conditions for C2-convergence of a
scheme in the vicinity of any mark point being a vertex of valency n or the centre of an n-sided face with
n greater or equal to three. These necessary conditions
concern the eigenvalues and eigenvectors of subdivision
matrices in the frequency domain. The components of
these matrices may be complex. If we could guarantee that they were real, this would simplify numerical
analysis of the eigenstructure of the matrices, especially
in the context of scheme tuning where we manipulate
symbolic terms. In this paper we show that an appropriate choice of the parameter space combined with a
substitution of vertices lets us transform these matrices into pure real ones. The substitution consists in replacing some vertices by linear combinations of themselves. Finally, we explain how to derive conditions on
the eigenelements of the real matrices which are necessary for the C2-convergence of the scheme.
UCAM-CL-TR-584
Stephen Brooks:
Concise texture editing
March 2004, 164 pages, PDF
PhD thesis (Jesus College, October 2003)
February 2004, 28 pages, PDF
110
Abstract: Many computer graphics applications remain
in the domain of the specialist. They are typically characterized by complex user-directed tasks, often requiring proficiency in design, colour spaces, computer interaction and file management. Furthermore, the demands
of this skill set are often exacerbated by an equally complex collection of image or object manipulation commands embedded in a variety of interface components.
The complexity of these graphic editing tools often requires that the user possess a correspondingly high level
of expertise.
Concise Texture Editing is aimed at addressing the
over-complexity of modern graphics tools and is based
on the intuitive notion that the human user is skilled at
high level decision making while the computer is proficient at rapid computation. This thesis has focused on
the development of interactive editing tools for 2D texture images and has led to the development of a novel
texture manipulation system that allows:
– the concise painting of a texture;
– the concise cloning of textures;
– the concise alteration of texture element size.
The system allows complex operations to be performed on images with minimal user interaction. When
applied to the domain of image editing, this implies
that the user can instruct the system to perform complex changes to digital images without having to specify copious amounts of detail. In order to reduce the
user’s workload, the inherent self-similarity of textures
is exploited to interactively replicate editing operations globally over an image. This unique image system thereby reduces the user’s workload through semiautomation, resulting in an acutely concise user interface.
UCAM-CL-TR-585
Mark S. D. Ashdown:
Personal projected displays
March 2004, 150 pages, PDF
PhD thesis (Churchill College, September 2003)
Abstract: Since the inception of the personal computer, the interface presented to users has been defined by the monitor screen, keyboard, and mouse, and
by the framework of the desktop metaphor. It is very
different from a physical desktop which has a large
horizontal surface, allows paper documents to be arranged, browsed, and annotated, and is controlled via
continuous movements with both hands. The desktop
metaphor will not scale to such a large display; the continuing profusion of paper, which is used as much as
ever, attests to its unsurpassed affordances as a medium
for manipulating documents; and despite its proven
manual and cognitive benefits, two-handed input is still
not used in computer interfaces.
I present a system called the Escritoire that uses a
novel configuration of overlapping projectors to create a large desk display that fills the area of a conventional desk and also has a high resolution region
in front of the user for precise work. The projectors
need not be positioned exactly—the projected imagery
is warped using standard 3D video hardware to compensate for rough projector positioning and oblique
projection. Calibration involves computing planar homographies between the 2D co-ordinate spaces of the
warped textures, projector framebuffers, desk, and input devices.
The video hardware can easily perform the necessary warping and achieves 30 frames per second for the
dual-projector display. Oblique projection has proved
to be a solution to the problem of occlusion common to
front-projection systems. The combination of an electromagnetic digitizer and an ultrasonic pen allows simultaneous input with two hands. The pen for the nondominant hand is simpler and coarser than that for
the dominant hand, reflecting the differing roles of the
hands in bimanual manipulation. I give a new algorithm for calibrating a pen, that uses piecewise linear
interpolation between control points. I also give an algorithm to calibrate a wall display at distance using a
device whose position and orientation are tracked in
three dimensions.
The Escritoire software is divided into a client that
exploits the video hardware and handles the input devices, and a server that processes events and stores all
of the system state. Multiple clients can connect to a
single server to support collaboration. Sheets of virtual paper on the Escritoire can be put in piles which
can be browsed and reordered. As with physical paper
this allows items to be arranged quickly and informally,
avoiding the premature work required to add an item
to a hierarchical file system. Another interface feature is
pen traces, which allow remote users to gesture to each
other. I report the results of tests with individuals and
with pairs collaborating remotely. Collaborating participants found an audio channel and the shared desk surface much more useful than a video channel showing
their faces.
The Escritoire is constructed from commodity components, and unlike multi-projector display walls its
cost is feasible for an individual user and it fits into
a normal office setting. It demonstrates a hardware
configuration, calibration algorithm, graphics warping
process, set of interface features, and distributed architecture that can make personal projected displays a reality.
UCAM-CL-TR-586
András Belokosztolszki:
Role-based access control policy
administration
March 2004, 170 pages, PDF
PhD thesis (King’s College, November 2003)
Abstract: The wide proliferation of the Internet has set
new requirements for access control policy specification. Due to the demand for ad-hoc cooperation between organisations, applications are no longer isolated
111
from each other; consequently, access control policies
face a large, heterogeneous, and dynamic environment.
Policies, while maintaining their main functionality, go
through many minor adaptations, evolving as the environment changes.
In this thesis we investigate the long-term administration of role-based access control (RBAC) – in particular OASIS RBAC – policies.
With the aim of encapsulating persistent goals of
policies we introduce extensions in the form of metapolicies. These meta-policies, whose expected lifetime
is longer than the lifetime of individual policies, contain extra information and restrictions about policies. It
is expected that successive policy versions are checked
at policy specification time to ensure that they comply with the requirements and guidelines set by metapolicies.
In the first of the three classes of meta-policies we
group together policy components by annotating them
with context labels. Based on this grouping and an information flow relation on context labels, we limit the
way in which policy components may be connected to
other component groups. We use this to partition conceptually disparate portions of policies, and reference
these coherent portions to specify policy restrictions
and policy enforcement behaviour.
In our second class of meta-policies – compliance
policies – we specify requirements on an abstract policy
model. We then use this for static policy checking. As
compliance tests are performed at policy specification
time, compliance policies may include restrictions that
either cannot be included in policies, or whose inclusion would result in degraded policy enforcement performance. We also indicate how to use compliance policies to provide information about organisational policies without disclosing sensitive information.
The final class of our meta-policies, called interface
policies, is used to help set up and maintain cooperation
among organisations by enabling them to use components from each other’s policies. Being based on compliance policies, they use an abstract policy component
model, and can also specify requirements for both component exporters and importers. Using such interface
policies we can reconcile compatibility issues between
cooperating parties automatically.
Finally, building on our meta-policies, we consider
policy evolution and self-administration, according to
which we treat RBAC policies as distributed resources
to which access is specified with the help of RBAC itself.
This enables environments where policies are maintained by many administrators who have varying levels
of competence, trust, and jurisdiction.
We have tested all of these concepts in Desert, our
proof of concept implementation.
UCAM-CL-TR-587
Paul Alexander Cunningham:
Verification of asynchronous circuits
April 2004, 174 pages, PDF
PhD thesis (Gonville and Caius College, January
2002)
Abstract: The purpose of this thesis is to introduce
proposition-oriented behaviours and apply them to the
verification of asynchronous circuits. The major contribution of proposition-oriented behaviours is their ability to extend existing formal notations to permit the
explicit use of both signal levels and transitions.
This thesis begins with the formalisation of
proposition-oriented behaviours in the context of gate
networks, and with the set-theoretic extension of both
regular-expressions and trace-expressions to reason
over proposition-oriented behaviours. A new traceexpression construct, referred to as biased composition,
is also introduced. Algorithmic realisation of these settheoretic extensions is documented using a special form
of finite automata called proposition automata. A verification procedure for conformance of gate networks
to a set of proposition automata is described in which
each proposition automaton may be viewed either as
a constraint or a specification. The implementation of
this procedure as an automated verification program
called Veraci is summarised, and a number of example
Veraci programs are used to demonstrate contributions
of proposition-oriented behaviour to asynchronous circuit design. These contributions include level-event unification, event abstraction, and relative timing assumptions using biased composition. The performance of Veraci is also compared to an existing event-oriented verification program called Versify, the result of this comparison being a consistent performance gain using Veraci over Versify.
This thesis concludes with the design and implementation of a 2048 bit dual-rail asynchronous
Montgomery exponentiator, MOD EXP, in a 0.18µm
standard-cell process. The application of Veraci to the
design of MOD EXP is summarised, and the practical benefits of proposition-oriented verification are discussed.
UCAM-CL-TR-588
Panit Watcharawitch:
MulTEP: A MultiThreaded
Embedded Processor
May 2004, 190 pages, PDF
PhD thesis (Newnham College, November 2003)
Abstract: Conventional embedded microprocessors
have traditionally followed the footsteps of high-end
processor design to achieve high performance. Their
underlying architectures prioritise tasks by time-critical
interrupts and rely on software to perform scheduling
tasks. Single threaded execution relies on instructionbased probabilistic techniques, such as speculative execution and branch prediction, which are unsuitable for
112
embedded systems when real-time performance guarantees need to be met. Multithreading appears to be
a feasible solution for embedded processors. Threadlevel parallelism has a potential to overcome the limitations of insufficient instruction-level parallelism to hide
the increasing memory latencies. MulTEP is designed
to provide high performance thread-level parallelism,
real-time characteristics, a flexible number of threads
and low incremental cost per thread for the embedded
system. In its architecture, a matching-store synchronisation mechanism allows a thread to wait for multiple
data items. A tagged up/down dynamic-priority hardware scheduler is provided for real-time scheduling.
Pre-loading, pre-fetching and colour-tagging techniques
are implemented to allow context switches without any
overhead. The architecture provides four additional
multithreading instructions for programmers and advance compilers to create programs with low-overhead
multithreaded operations. Experimental results demonstrate that multithreading can be effectively used to improve performance and system utilisation. Latency operations that would otherwise stall the pipeline are hidden by the execution of the other threads. The hardware scheduler provides priority scheduling, which is
suitable for real-time embedded applications.
UCAM-CL-TR-589
Glynn Winskel, Francesco Zappa Nardelli:
new-HOPLA — a higher-order
process language with name
generation
May 2004, 16 pages, PDF
Abstract: This paper introduces new-HOPLA, a concise but powerful language for higher-order nondeterministic processes with name generation. Its origins as
a metalanguage for domain theory are sketched but for
the most part the paper concentrates on its operational
semantics. The language is typed, the type of a process
describing the shape of the computation paths it can
perform. Its transition semantics, bisimulation, congruence properties and expressive power are explored. Encodings of π-calculus and HOπ are presented.
Abstract: Large-scale distributed systems require new
middleware paradigms that do not suffer from the limitations of traditional request/reply middleware. These
limitations include tight coupling between components,
a lack of information filtering capabilities, and support
for one-to-one communication semantics only. We argue that event-based middleware is a scalable and powerful new type of middleware for building large-scale
distributed systems. However, it is important that an
event-based middleware platform includes all the standard functionality that an application programmer expects from middleware.
In this thesis we describe the design and implementation of Hermes, a distributed, event-based middleware platform. The power and flexibility of Hermes
is illustrated throughout for two application domains:
Internet-wide news distribution and a sensor-rich, active building. Hermes follows a type- and attributebased publish/subscribe model that places particular
emphasis on programming language integration by supporting type-checking of event data and event type
inheritance. To handle dynamic, large-scale environments, Hermes uses peer-to-peer techniques for autonomic management of its overlay network of event brokers and for scalable event dissemination. Its routing
algorithms, implemented on top of a distributed hash
table, use rendezvous nodes to reduce routing state in
the system, and include fault-tolerance features for repairing event dissemination trees. All this is achieved
without compromising scalability and efficiency, as is
shown by a simulational evaluation of Hermes routing.
The core functionality of an event-based middleware is extended with three higher-level middleware
services that address different requirements in a distributed computing environment. We introduce a novel
congestion control service that avoids congestion in the
overlay broker network during normal operation and
recovery after failure, and therefore enables a resourceefficient deployment of the middleware. The expressiveness of subscriptions in the event-based middleware is
enhanced with a composite event service that performs
the distributed detection of complex event patterns,
thus taking the burden away from clients. Finally, a security service adds access control to Hermes according
to a secure publish/subscribe model. This model supports fine-grained access control decisions so that separate trust domains can share the same overlay broker
network.
UCAM-CL-TR-590
UCAM-CL-TR-591
Peter R. Pietzuch:
Hermes: A scalable event-based
middleware
June 2004, 180 pages, PDF
PhD thesis (Queens’ College, February 2004)
Silas S. Brown:
Conversion of notations
June 2004, 159 pages, PDF
PhD thesis (St John’s College, November 2003)
Abstract: Music, engineering, mathematics, and many
other disciplines have established notations for writing
113
their documents. The effectiveness of each of these notations can be hampered by the circumstances in which
it is being used, or by a user’s disability or cultural
background. Adjusting the notation can help, but the
requirements of different cases often conflict, meaning
that each new document will have to be transformed
between many versions. Tools that support the programming of such transformations can also assist by allowing the creation of new notations on demand, which
is an under-explored option in the relief of educational
difficulties.
This thesis reviews some programming tools that
can be used to manipulate the tree-like structure of a
notation in order to transform it into another. It then
describes a system “4DML” that allows the programmer to create a “model” of the desired result, from
which the transformation is derived. This is achieved
by representing the structure in a geometric space with
many dimensions, where the model acts as an alternative frame of reference.
Example applications of 4DML include the transcription of songs and musical scores into various notations, the production of specially-customised notations
to assist a sight-impaired person in learning Chinese, an
unusual way of re-organising personal notes, a “website scraping” system for extracting data from on-line
services that provide only one presentation, and an aid
to making mathematics and diagrams accessible to people with severe print disabilities. The benefits and drawbacks of the 4DML approach are evaluated, and possible directions for future work are explored.
UCAM-CL-TR-592
Mike Bond, Daniel Cvrček,
Steven J. Murdoch:
Unwrapping the Chrysalis
June 2004, 15 pages, PDF
Abstract: We describe our experiences reverse engineering the Chrysalis-ITS Luna CA3 – a PKCS#11 compliant cryptographic token. Emissions analysis and security API attacks are viewed by many to be simpler and
more efficient than a direct attack on an HSM. But how
difficult is it to actually “go in the front door”? We describe how we unpicked the CA3 internal architecture
and abused its low-level API to impersonate a CA3 token in its cloning protocol – and extract PKCS#11 private keys in the clear. We quantify the effort involved in
developing and applying the skills necessary for such a
reverse-engineering attack. In the process, we discover
that the Luna CA3 has far more undocumented code
and functionality than is revealed to the end-user.
UCAM-CL-TR-593
Piotr Zieliński:
Paxos at war
June 2004, 30 pages, PDF
Abstract: The optimistic latency of Byzantine Paxos can
be reduced from three communication steps to two,
without using public-key cryptography. This is done
by making a decision when more than (n+3f)/2 acceptors report to have received the same proposal from the
leader, with n being the total number of acceptors and
f the number of the faulty ones. No further improvement in latency is possible, because every Consensus
algorithm must take at least two steps even in benign
settings. Moreover, if the leader is correct, our protocol
achieves the latency of at most three steps, even if some
other processes fail. These two properties make this the
fastest Byzantine agreement protocol proposed so far.
By running many instances of this algorithm in parallel, we can implement Vector Consensus and Byzantine Atomic Broadcast in two and three steps, respectively, which is two steps faster than any other known
algorithm.
UCAM-CL-TR-594
George Danezis:
Designing and attacking anonymous
communication systems
July 2004, 150 pages, PDF
PhD thesis (Queens’ College, January 2004)
Abstract: This report contributes to the field of anonymous communications over widely deployed communication networks. It describes novel schemes to protect
anonymity; it also presents powerful new attacks and
new ways of analysing and understanding anonymity
properties.
We present Mixminion, a new generation anonymous remailer, and examine its security against all
known passive and active cryptographic attacks. We
use the secure anonymous replies it provides, to describe a pseudonym server, as an example of the anonymous protocols that mixminion can support. The security of mix systems is then assessed against a compulsion threat model, in which an adversary can request
the decryption of material from honest nodes. A new
construction, the fs-mix, is presented that makes tracing messages by such an adversary extremely expensive.
Moving beyond the static security of anonymous
communication protocols, we define a metric based
on information theory that can be used to measure
anonymity. The analysis of the pool mix serves as
an example of its use. We then create a framework
within which we compare the traffic analysis resistance
provided by different mix network topologies. A new
topology, based on expander graphs, proves to be efficient and secure. The rgb-mix is also presented; this
implements a strategy to detect flooding attacks against
honest mix nodes and neutralise them by the use of
cover traffic.
Finally a set of generic attacks are studied. Statistical disclosure attacks model the whole anonymous system as a black box, and are able to uncover
114
the relationships between long-term correspondents.
Stream attacks trace streams of data travelling through
anonymizing networks, and uncover the communicating parties very quickly. They both use statistical methods to drastically reduce the anonymity of users. Other
minor attacks are described against peer discovery and
route reconstruction in anonymous networks, as well
as the naı̈ve use of anonymous replies.
Thus tradeoff between the eavesdropper’s information
gain, and the disturbance she necessarily induces, can
be viewed as the power engine behind quantum cryptographic protocols. We begin by quantifying this tradeoff in the case of a measure distinguishing two nonorthogonal equiprobable pure states. A formula for this
tradeoff was first obtained by Fuchs and Peres, but
we provide a shorter, geometrical derivation (within
the framework of the above mentioned conal representation). Next we proceed to analyze the Information
UCAM-CL-TR-595
gain versus disturbance tradeoff in a scenario where
Alice and Bob interleave, at random, pairwise superPablo J. Arrighi:
positions of two message words within their otherwise
classical communications. This work constitutes one
Representations of quantum
of the few results currently available regarding d-level
operations, with applications to
systems quantum cryptography, and seems to provide
a good general primitive for building such protocols.
quantum cryptography
The proof crucially relies on the state-operator correJuly 2004, 157 pages, PDF
spondence formulae derived in the first part, together
PhD thesis (Emmanuel College, 23 September 2003)
with some methods by Banaszek. Finally we make use
of this analysis to prove the security of a ‘blind quanAbstract: Representations of quantum operations – We tum computation’ protocol, whereby Alice gets Bob to
start by introducing a geometrical representation (real perform some quantum algorithm for her, but prevents
vector space) of quantum states and quantum opera- him from learning her input to this quantum algorithm.
tions. To do so we exploit an isomorphism from positive matrices to a subcone of the Minkowski future
UCAM-CL-TR-596
light-cone. Pure states map onto certain light-like vectors, whilst the axis of revolution encodes the overall
Keir Fraser, Steven Hand, Rolf Neugebauer,
probability of occurrence for the state. This extension
of the Generalized Bloch Sphere enables us to cater for Ian Pratt, Andrew Warfield,
non-trace-preserving quantum operations, and in par- Mark Williamson:
ticular to view the per-outcome effects of generalized
measurements. We show that these consist of the prod- Reconstructing I/O
uct of an orthogonal transform about the axis of the
August 2004, 16 pages, PDF
cone of revolution and a positive real symmetric linear transform. In the case of a qubit the representation
Abstract: We present a next-generation architecture
becomes all the more interesting since it elegantly asthat addresses problems of dependability, maintainabilsociates, to each measurement element of a generalized
ity, and manageability of I/O devices and their software
measurement, a Lorentz transformation in Minkowski
drivers on the PC platform. Our architecture resolves
space. We formalize explicitly this correspondence beboth hardware and software issues, exploiting emergtween ‘observation of a quantum system’ and ‘special
ing hardware features to improve device safety. Our
relativistic change of inertial frame’. To end this part we
high-performance implementation, based on the Xen
review the state-operator correspondence, which was
virtual machine monitor, provides an immediate transuccessfully exploited by Choi to derive the operatorsition opportunity for today’s systems.
sum representation of quantum operations. We go further and show that all of the important theorems conUCAM-CL-TR-597
cerning quantum operations can in fact be derived as
simple corollaries of those concerning quantum states.
Using this methodology we derive novel composition Advaith Siddharthan:
laws upon quantum states and quantum operations,
Schmidt-type decompositions for bipartite pure states Syntactic simplification and
and some powerful formulae relating to the correspon- text cohesion
dence.
Quantum cryptography – The key principle of August 2004, 195 pages, PDF
quantum cryptography could be summarized as fol- PhD thesis (Gonville and Caius College, November
lows. Honest parties communicate using quantum 2003)
states. To the eavesdropper these states are random
and non-orthogonal. In order to gather information Abstract: Syntactic simplification is the process of reshe must measure them, but this may cause irreversible ducing the grammatical complexity of a text, while redamage. Honest parties seek to detect her mischief by taining its information content and meaning. The aim
checking whether certain quantum states are left intact.
115
of syntactic simplification is to make text easier to comprehend for human readers, or process by programs. In
this thesis, I describe how syntactic simplification can
be achieved using shallow robust analysis, a small set of
hand-crafted simplification rules and a detailed analysis
of the discourse-level aspects of syntactically rewriting
text. I offer a treatment of relative clauses, apposition,
coordination and subordination.
I present novel techniques for relative clause and appositive attachment. I argue that these attachment decisions are not purely syntactic. My approaches rely
on a shallow discourse model and on animacy information obtained from a lexical knowledge base. I also
show how clause and appositive boundaries can be determined reliably using a decision procedure based on
local context, represented by part-of-speech tags and
noun chunks.
I then formalise the interactions that take place between syntax and discourse during the simplification
process. This is important because the usefulness of
syntactic simplification in making a text accessible to a
wider audience can be undermined if the rewritten text
lacks cohesion. I describe how various generation issues
like sentence ordering, cue-word selection, referringexpression generation, determiner choice and pronominal use can be resolved so as to preserve conjunctive
and anaphoric cohesive-relations during syntactic simplification.
In order to perform syntactic simplification, I have
had to address various natural language processing
problems, including clause and appositive identification and attachment, pronoun resolution and referringexpression generation. I evaluate my approaches to
solving each problem individually, and also present a
holistic evaluation of my syntactic simplification system.
to an established calculus, namely condition-event Petri
nets. In particular, a labelled transition system is derived for condition-event nets, corresponding to a natural notion of observable actions in Petri net theory. The
transition system yields a congruential bisimilarity coinciding with one derived directly from the observable
actions. This yields a calibration of the general theory
of reactive systems and link graphs against known specific theories.
UCAM-CL-TR-599
Mohamed F. Hassan:
Further analysis of ternary and
3-point univariate subdivision
schemes
August 2004, 9 pages, PDF
Abstract: The precision set, approximation order and
Hölder exponent are derived for each of the univariate subdivision schemes described in Technical Report
UCAM-CL-TR-520.
UCAM-CL-TR-600
Brian Ninham Shand:
Trust for resource control:
Self-enforcing automatic rational
contracts between computers
August 2004, 154 pages, PDF
PhD thesis (Jesus College, February 2004)
UCAM-CL-TR-598
James J. Leifer, Robin Milner:
Transition systems, link graphs and
Petri nets
August 2004, 64 pages, PDF
Abstract: A framework is defined within which reactive systems can be studied formally. The framework
is based upon s-categories, a new variety of categories,
within which reactive systems can be set up in such a
way that labelled transition systems can be uniformly
extracted. These lead in turn to behavioural preorders
and equivalences, such as the failures preorder (treated
elsewhere) and bisimilarity, which are guaranteed to be
congruential. The theory rests upon the notion of relative pushout previously introduced by the authors. The
framework is applied to a particular graphical model
known as link graphs, which encompasses a variety
of calculi for mobile distributed processes. The specific
theory of link graphs is developed. It is then applied
Abstract: Computer systems need to control access to
their resources, in order to give precedence to urgent or
important tasks. This is increasingly important in networked applications, which need to interact with other
machines but may be subject to abuse unless protected
from attack. To do this effectively, they need an explicit
resource model, and a way to assess others’ actions in
terms of it. This dissertation shows how the actions
can be represented using resource-based computational
contracts, together with a rich trust model which monitors and enforces contract compliance.
Related research in the area has focused on individual aspects of this problem, such as resource pricing
and auctions, trust modelling and reputation systems,
or resource-constrained computing and resource-aware
middleware. These need to be integrated into a single
model, in order to provide a general framework for
computing by contract.
This work explores automatic computerized contracts for negotiating and controlling resource usage in
a distributed system. Contracts express the terms under
which client and server promise to exchange resources,
such as processor time in exchange for money, using
116
a constrained language which can be automatically interpreted. A novel, distributed trust model is used to
enforce these promises, and this also supports trust delegation through cryptographic certificates. The model
is formally proved to have appropriate properties of
safety and liveness, which ensure that cheats cannot
systematically gain resources by deceit, and that mutually profitable contracts continue to be supported.
The contract framework has many applications, in
automating distributed services and in limiting the disruptiveness of users’ programs. Applications such as
resource-constrained sandboxes, operating system multimedia support and automatic distribution of personal
address book entries can all treat the user’s time as
a scarce resource, to trade off computational costs
against user distraction. Similarly, commercial Grid services can prioritise computations with contracts, while
a cooperative service such as distributed composite
event detection can use contracts for detector placement and load balancing. Thus the contract framework provides a general purpose tool for managing distributed computation, allowing participants to take calculated risks and rationally choose which contracts to
perform.
implement an executable theory for counterexampleguided abstraction refinement that also uses a SAT
solver. We verify properties of a bus architecture in use
in industry as well as a pedagogical arithmetic and logic
unit. The benchmarks show an acceptable performance
penalty, and the results are correct by construction.
UCAM-CL-TR-601
UCAM-CL-TR-603
UCAM-CL-TR-602
Hasan Amjad:
Model checking the AMBA protocol
in HOL
September 2004, 27 pages, PDF
Abstract: The Advanced Microcontroller Bus Architecture (AMBA) is an open System-on-Chip bus protocol
for high-performance buses on low-power devices. In
this report we implement a simple model of AMBA
and use model checking and theorem proving to verify
latency, arbitration, coherence and deadlock freedom
properties of the implementation.
Hasan Amjad:
Robin Milner:
Combining model checking and
theorem proving
Bigraphs whose names have
multiple locality
September 2004, 131 pages, PDF
September 2004, 15 pages, PDF
PhD thesis (Trinity College, March 2004)
Abstract: We implement a model checker for the modal
mu-calculus as a derived rule in a fully expansive mechanical theorem prover, without causing an unacceptable performance penalty.
We use a restricted form of a higher order logic
representation calculus for binary decision diagrams
(BDDs) to interface the model checker to a highperformance BDD engine. This is used with a formalised theory of the modal mu-calculus (which we
also develop) for model checking in which all steps
of the algorithm are justified by fully expansive proof.
This provides a fine-grained integration of model
checking and theorem proving using a mathematically
rigourous interface. The generality of our theories allows us to perform much of the proof offline, in contrast with earlier work. This substantially reduces the
inevitable performance penalty of doing model checking by proof.
To demonstrate the feasibility of our approach, optimisations to the model checking algorithm are added.
We add naive caching and also perform advanced
caching for nested non-alternating fixed-point computations.
Finally, the usefulness of the work is demonstrated.
We leverage our theory by proving translations to simpler logics that are in more widespread use. We then
Abstract: The previous definition of binding bigraphs
is generalised so that local names may be located in
more than one region, allowing more succinct and flexible presentation of bigraphical reactive systems. This
report defines the generalisation, verifies that it retains
relative pushouts, and introduces a new notion of bigraph extension; this admits a wider class of parametric
reaction rules. Extension is shown to be well-behaved
algebraically; one consequence is that, as in the original
definition of bigraphs, discrete parameters are sufficient
to generate all reactions.
UCAM-CL-TR-604
Andrei Serjantov:
On the anonymity of anonymity
systems
October 2004, 162 pages, PDF
PhD thesis (Queens’ College, March 2003)
117
Abstract: Anonymity on the Internet is a property commonly identified with privacy of electronic communications. A number of different systems exist which claim
to provide anonymous email and web browsing, but
their effectiveness has hardly been evaluated in practice. In this thesis we focus on the anonymity properties of such systems. First, we show how the anonymity
of anonymity systems can be quantified, pointing out
flaws with existing metrics and proposing our own. In
the process we distinguish the anonymity of a message
and that of an anonymity system.
Secondly, we focus on the properties of building
blocks of mix-based (email) anonymity systems, evaluating their resistance to powerful blending attacks,
their delay, their anonymity under normal conditions
and other properties. This leads us to methods of computing anonymity for a particular class of mixes – timed
mixes – and a new binomial mix.
Next, we look at the anonymity of a message going
through an entire anonymity system based on a mix
network architecture. We construct a semantics of a
network with threshold mixes, define the information
observable by an attacker, and give a principled definition of the anonymity of a message going through such
a network.
We then consider low latency connection-based
anonymity systems, giving concrete attacks and describing methods of protection against them. In particular, we show that Peer-to-Peer anonymity systems
provide less anonymity against the global passive adversary than ones based on a “classic” architecture.
Finally, we give an account of how anonymity can
be used in censorship resistant systems. These are designed to provide availability of documents, while facing threats from a powerful adversary. We show how
anonymity can be used to hide the identity of the servers
where each of the documents are stored, thus making
them harder to remove from the system.
UCAM-CL-TR-605
Acute extends an OCaml core to support distributed development, deployment, and execution, allowing type-safe interaction between separately-built
programs. It is expressive enough to enable a wide variety of distributed infrastructure layers to be written as
simple library code above the byte-string network and
persistent store APIs, disentangling the language runtime from communication.
This requires a synthesis of novel and existing features:
(1) type-safe marshalling of values between programs;
(2) dynamic loading and controlled rebinding to local resources;
(3) modules and abstract types with abstraction
boundaries that are respected by interaction;
(4) global names, generated either freshly or based
on module hashes: at the type level, as runtime names
for abstract types; and at the term level, as channel
names and other interaction handles;
(5) versions and version constraints, integrated with
type identity;
(6) local concurrency and thread thunkification; and
(7) second-order polymorphism with a namecase
construct.
We deal with the interplay among these features and
the core, and develop a semantic definition that tracks
abstraction boundaries, global names, and hashes
throughout compilation and execution, but which still
admits an efficient implementation strategy.
UCAM-CL-TR-606
Nicholas Nethercote:
Dynamic binary analysis and
instrumentation
November 2004, 177 pages, PDF
PhD thesis (Trinity College, November 2004)
Peter Sewell, James J. Leifer,
Keith Wansbrough, Mair Allen-Williams,
Francesco Zappa Nardelli, Pierre Habouzit,
Viktor Vafeiadis:
Abstract: Dynamic binary analysis (DBA) tools such
as profilers and checkers help programmers create better software. Dynamic binary instrumentation (DBI)
frameworks make it easy to build new DBA tools. This
dissertation advances the theory and practice of dynamic binary analysis and instrumentation, with an emAcute: High-level programming
phasis on the importance of the use and support of
language design for distributed
metadata.
The dissertation has three main parts.
computation
The first part describes a DBI framework called ValDesign rationale and language
grind which provides novel features to support heavyweight DBA tools that maintain rich metadata, espedefinition
cially location metadata—the shadowing of every register and memory location with a metavalue. Location
October 2004, 193 pages, PDF
metadata is used in shadow computation, a kind of
Abstract: This paper studies key issues for distributed DBA where every normal operation is shadowed by an
programming in high-level languages. We discuss the abstract operation.
The second part describes three powerful DBA
design space and describe an experimental language,
tools. The first tool performs detailed cache profiling.
Acute, which we have defined and implemented.
118
The second tool does an old kind of dynamic analysis—
bounds-checking—in a new way. The third tool produces dynamic data flow graphs, a novel visualisation
that cuts to the essence of a program’s execution. All
three tools were built with Valgrind, and rely on Valgrind’s support for heavyweight DBA and rich metadata, and the latter two perform shadow computation.
The third part describes a novel system of semiformal descriptions of DBA tools. It gives many example descriptions, and also considers in detail exactly
what dynamic analysis is.
The dissertation makes six main contributions.
First, the descriptions show that metadata is the key
component of dynamic analysis; in particular, whereas
static analysis predicts approximations of a program’s
future, dynamic analysis remembers approximations of
a program’s past, and these approximations are exactly
what metadata is.
Second, the example tools show that rich metadata
and shadow computation make for powerful and novel
DBA tools that do more than the traditional tracing and
profiling.
Third, Valgrind and the example tools show that a
DBI framework can make it easy to build heavyweight
DBA tools, by providing good support for rich metadata and shadow computation.
Fourth, the descriptions are a precise and concise
way of characterising tools, provide a directed way of
thinking about tools that can lead to better implementations, and indicate the theoretical upper limit of the
power of DBA tools in general.
Fifth, the three example tools are interesting in their
own right, and the latter two are novel.
Finally, the entire dissertation provides many details, and represents a great deal of condensed experience, about implementing DBI frameworks and DBA
tools.
UCAM-CL-TR-607
generation. Additional (semantically inessential) state
edges are added to transform the VSDG into a Control
Flow Graph, from which target code is generated.
We show how procedural abstraction can be advantageously applied to the VSDG. Graph patterns are
extracted from a program’s VSDG. We then select repeated patterns giving the greatest size reduction, generate new functions from these patterns, and replace all
occurrences of the patterns in the original VSDG with
calls to these abstracted functions. Several embedded
processors have load- and store-multiple instructions,
representing several loads (or stores) as one instruction.
We present a method, benefiting from the VSDG form,
for using these instructions to reduce code size by provisionally combining loads and stores before code generation. The final contribution of this thesis is a combined register allocation and code motion (RACM) algorithm. We show that our RACM algorithm formulates these two previously antagonistic phases as one
combined pass over the VSDG, transforming the graph
(moving or cloning nodes, or spilling edges) to fit within
the physical resources of the target processor.
We have implemented our ideas within a prototype
C compiler and suite of VSDG optimizers, generating
code for the Thumb 32-bit processor. Our results show
improvements for each optimization and that we can
achieve code sizes comparable to, and in some cases
better than, that produced by commercial compilers
with significant investments in optimization technology.
UCAM-CL-TR-608
Walt Yao:
Trust management for widely
distributed systems
November 2004, 191 pages, PDF
PhD thesis (Jesus College, February 2003)
Neil E. Johnson:
Code size optimization for embedded
processors
November 2004, 159 pages, PDF
PhD thesis (Robinson College, May 2004)
Abstract: This thesis studies the problem of reducing
code size produced by an optimizing compiler. We develop the Value State Dependence Graph (VSDG) as
a powerful intermediate form. Nodes represent computation, and edges represent value (data) and state
(control) dependencies between nodes. The edges specify a partial ordering of the nodes—sufficient ordering
to maintain the I/O semantics of the source program,
while allowing optimizers greater freedom to move
nodes within the program to achieve better (smaller)
code. Optimizations, both classical and new, transform
the graph through graph rewriting rules prior to code
Abstract: In recent years, we have witnessed the evolutionary development of a new breed of distributed systems. Systems of this type share a number of characteristics – highly decentralized, of Internet-grade scalability, and autonomous within their administrative domains. Most importantly, they are expected to operate collaboratively across both known and unknown
domains. Prime examples include peer-to-peer applications and open web services. Typically, authorization in
distributed systems is identity-based, e.g. access control
lists. However, approaches based on predefined identities are unsuitable for the new breed of distributed
systems because of the need to deal with unknown
users, i.e. strangers, and the need to manage a potentially large number of users and/or resources. Furthermore, effective administration and management of authorization in such systems requires: (1) natural mapping of organizational policies into security policies; (2)
managing collaboration of independently administered
119
domains/organizations; (3) decentralization of security
policies and policy enforcement.
This thesis describes Fidelis, a trust management
framework designed to address the authorization needs
for the next-generation distributed systems. A trust
management system is a term coined to refer to a unified framework for the specification of security policies, the representation of credentials, and the evaluation and enforcement of policy compliances. Based on
the concept of trust conveyance and a generic abstraction for trusted information as trust statements, Fidelis
provides a generic platform for building secure, trustaware distributed applications. At the heart of the Fidelis framework is a language for the specification of
security policies, the Fidelis Policy Language (FPL), and
the inference model for evaluating policies expressed in
FPL. With the policy language and its inference model,
Fidelis is able to model recommendation-style policies
and policies with arbitrarily complex chains of trust
propagation.
Web services have rapidly been gaining significance
both in industry and research as a ubiquitous, nextgeneration middleware platform. The second half of the
thesis describes the design and implementation of the
Fidelis framework for the standard web service platform. The goal of this work is twofold: first, to demonstrate the practical feasibility of Fidelis, and second, to
investigate the use of a policy-driven trust management
framework for Internet-scale open systems. An important requirement in such systems is trust negotiation
that allows unfamiliar principals to establish mutual
trust and interact with confidence. Addressing this requirement, a trust negotiation framework built on top
of Fidelis is developed.
This thesis examines the application of Fidelis in
three distinctive domains: implementing generic rolebased access control, trust management in the World
Wide Web, and an electronic marketplace comprising
unfamiliar and untrusted but collaborative organizations.
available client/server software framework for rapid development of applications that exploit our interaction
technique. We describe two prototype applications that
were implemented using this framework and present
findings from user-experience studies based on these applications.
UCAM-CL-TR-610
Tommy Ingulfsen:
Influence of syntax on prosodic
boundary prediction
December 2004, 49 pages, PDF
MPhil thesis (Churchill College, July 2004)
Abstract: In this thesis we compare the effectiveness
of different syntactic features and syntactic representations for prosodic boundary prediction, setting out to
clarify which representations are most suitable for this
task. The results of a series of experiments show that it
is not possible to conclude that a single representation
is superior to all others. Three representations give rise
to similar experimental results. One of these representations is composed only of shallow features, which were
originally thought to have less predictive power than
deep features. Conversely, one of the deep representations that seemed to be best suited for our purposes
(syntactic chunks) turns out not to be among the three
best.
UCAM-CL-TR-611
Neil A. Dodgson:
An heuristic analysis of the
classification of bivariate subdivision
schemes
December 2004, 18 pages, PDF
UCAM-CL-TR-609
Eleanor Toye, Anil Madhavapeddy,
Richard Sharp, David Scott, Alan Blackwell,
Eben Upton:
Using camera-phones to interact with
context-aware mobile services
December 2004, 23 pages, PDF
Abstract: We describe an interaction technique for controlling site-specific mobile services using commercially
available camera-phones, public information displays
and visual tags. We report results from an experimental study validating this technique in terms of pointing
speed and accuracy. Our results show that even novices
can use camera-phones to “point-and-click” on visual
tags quickly and accurately. We have built a publicly
Abstract: Alexa [*] and Ivrissimtzis et al. [+] have
proposed a classification mechanism for bivariate subdivision schemes. Alexa considers triangular primal
schemes, Ivrissimtzis et al. generalise this both to
quadrilateral and hexagonal meshes and to dual and
mixed schemes. I summarise this classification and then
proceed to analyse it in order to determine which
classes of subdivision scheme are likely to contain useful members. My aim is to ascertain whether there are
any potentially useful classes which have not yet been
investigated or whether we can say, with reasonable
confidence, that all of the useful classes have already
been considered. I apply heuristics related to the mappings of element types (vertices, face centres, and midedges) to one another, to the preservation of symmetries, to the alignment of meshes at different subdivision
levels, and to the size of the overall subdivision mask.
My conclusion is that there are only a small number of
120
useful classes and that most of these have already been
investigated in terms of linear, stationary subdivision
schemes. There is some space for further work, particularly in the investigation of whether there are useful ternary linear, stationary subdivision schemes, but
it appears that future advances are more likely to lie
elsewhere.
[*] M. Alexa. Refinement operators for triangle
meshes. Computer Aided Geometric Design, 19(3):169172, 2002.
[+] I. P. Ivrissimtzis, N. A. Dodgson, and M. A.
Sabin. A generative classification of mesh refinement
rules with lattice transformations. Computer Aided Geometric Design, 22(1):99-109, 2004.
model reduces the temporal or spatial accuracy of location information to maintain user anonymity at every
location.
Both models provide a quantitative measure of the
level of anonymity achieved; therefore any given situation can be analysed to determine the amount of information an attacker can gain through analysis of the
anonymised data. The suitability of both these models
is demonstrated and the level of location privacy available to users of real location-aware applications is measured.
UCAM-CL-TR-613
David J. Scott:
UCAM-CL-TR-612
Abstracting application-level security
policy for ubiquitous computing
Alastair R. Beresford:
Location privacy in ubiquitous
computing
January 2005, 186 pages, PDF
PhD thesis (Robinson College, September 2004)
January 2005, 139 pages, PDF
PhD thesis (Robinson College, April 2004)
Abstract: The field of ubiquitous computing envisages
an era when the average consumer owns hundreds or
thousands of mobile and embedded computing devices.
These devices will perform actions based on the context
of their users, and therefore ubiquitous systems will
gather, collate and distribute much more personal information about individuals than computers do today.
Much of this personal information will be considered
private, and therefore mechanisms which allow users to
control the dissemination of these data are vital. Location information is a particularly useful form of context
in ubiquitous computing, yet its unconditional distribution can be very invasive.
This dissertation develops novel methods for providing location privacy in ubiquitous computing. Much
of the previous work in this area uses access control to
enable location privacy. This dissertation takes a different approach and argues that many location-aware applications can function with anonymised location data
and that, where this is possible, its use is preferable to
that of access control.
Suitable anonymisation of location data is not a trivial task: under a realistic threat model simply removing explicit identifiers does not anonymise location information. This dissertation describes why this is the
case and develops two quantitative security models for
anonymising location data: the mix zone model and the
variable quality model.
A trusted third-party can use one, or both, models to
ensure that all location events given to untrusted applications are suitably anonymised. The mix zone model
supports untrusted applications which require accurate location information about users in a set of disjoint physical locations. In contrast, the variable quality
Abstract: In the future world of Ubiquitous Computing, tiny embedded networked computers will be found
in everything from mobile phones to microwave ovens.
Thanks to improvements in technology and software
engineering, these computers will be capable of running sophisticated new applications constructed from
mobile agents. Inevitably, many of these systems will
contain application-level vulnerabilities; errors caused
by either unanticipated mobility or interface behaviour.
Unfortunately existing methods for applying security
policy – network firewalls – are inadequate to control
and protect the hordes of vulnerable mobile devices. As
more and more critical functions are handled by these
systems, the potential for disaster is increasing rapidly.
To counter these new threats, this report champions the approach of using new application-level security policy languages in combination to protect vulnerable applications. Policies are abstracted from main
application code, facilitating both analysis and future
maintenance. As well as protecting existing applications, such policy systems can help as part of a securityaware design process when building new applications
from scratch.
Three new application-level policy languages are
contributed each addressing a different kind of vulnerability. Firstly, the policy language MRPL allows the creation of Mobility Restriction Policies, based on a unified spatial model which represents both physical location of objects as well as virtual location of mobile
code. Secondly, the policy language SPDL-2 protects
applications against a large number of common errors
by allowing the specification of per-request/response
validation and transformation rules. Thirdly, the policy language SWIL allows interfaces to be described
as automata which may be analysed statically by a
model-checker before being checked dynamically in
an application-level firewall. When combined together,
these three languages provide an effective means for
121
preventing otherwise critical application-level vulnerabilities.
Systems implementing these policy languages have
been built; an implementation framework is described
and encouraging performance results and analysis are
presented.
UCAM-CL-TR-614
Robin Milner:
Pure bigraphs
January 2005, 66 pages, PDF
Abstract: Bigraphs are graphs whose nodes may be
nested, representing locality, independently of the edges
connecting them. They may be equipped with reaction
rules, forming a bigraphical reactive system (Brs) in
which bigraphs can reconfigure themselves. Brss aim
to unify process calculi, and to model applications —
such as pervasive computing— where locality and mobility are prominent. The paper is devoted to the theory
of pure bigraphs, which underlie various more refined
forms. It begins by developing a more abstract structure, a wide reactive system Wrs, of which a Brs is an
instance; in this context, labelled transitions are defined
in such a way that the induced bisimilarity is a congruence.
This work is then specialised to Brss, whose graphical structure allows many refinements of the dynamic
theory. Elsewhere it is shown that behavioural analysis for Petri nets, π-calculus and mobile ambients can
all be recovered in the uniform framework of bigraphs.
The latter part of the paper emphasizes the parts of bigraphical theory that are common to these applications,
especially the treatment of dynamics via labelled transitions. As a running example, the theory is applied to
finite pure CCS, whose resulting transition system and
bisimilarity are analysed in detail.
The paper also discusses briefly the use of bigraphs
to model both pervasive computing and biological systems.
UCAM-CL-TR-615
ground. Grid computing promises to provide a common framework for scheduling scientific computation
and managing the associated large data sets. Proposals
for utility computing envision a world in which businesses rent computing bandwidth in server farms ondemand instead of purchasing and maintaining servers
themselves.
All such architectures target particular user and application groups or deployment scenarios, where simplifying assumptions can be made. They expect centralised ownership of resources, cooperative users, and
applications that are well-behaved and compliant to a
specific API or middleware. Members of the public who
are not involved in Grid communities or wish to deploy out-of-the-box distributed services, such as game
servers, have no means to acquire resources on large
numbers of machines around the world to launch their
tasks.
This dissertation proposes a new distributed computing paradigm, termed global public computing,
which allows any user to run any code anywhere.
Such platforms price computing resources, and ultimately charge users for resources consumed. This dissertation presents the design and implementation of
the XenoServer Open Platform, putting this vision into
practice. The efficiency and scalability of the developed
mechanisms are demonstrated by experimental evaluation; the prototype platform allows the global-scale deployment of complex services in less than 45 seconds,
and could scale to millions of concurrent sessions without presenting performance bottlenecks.
To facilitate global public computing, this work
addresses several research challenges. It introduces
reusable mechanisms for representing, advertising, and
supporting the discovery of resources. To allow flexible and federated control of resource allocation by all
stakeholders involved, it proposes a novel role-based
resource management framework for expressing and
combining distributed management policies. Furthermore, it implements effective service deployment models for launching distributed services on large numbers
of machines around the world easily, quickly, and efficiently. To keep track of resource consumption and pass
charges on to consumers, it devises an accounting and
charging infrastructure.
UCAM-CL-TR-616
Evangelos Kotsovinos:
Global public computing
Donnla Nic Gearailt:
January 2005, 229 pages, PDF
Dictionary characteristics in
cross-language information retrieval
PhD thesis (Trinity Hall, November 2004)
Abstract: High-bandwidth networking and cheap computing hardware are leading to a world in which the resources of one machine are available to groups of users
beyond their immediate owner. This trend is visible in
many different settings. Distributed computing, where
applications are divided into parts that run on different machines for load distribution, geographical dispersion, or robustness, has recently found new fertile
February 2005, 158 pages, PDF
PhD thesis (Gonville and Caius College, February
2003)
Abstract: In the absence of resources such a as suitable
MT system, translation in Cross-Language Information
Retrieval (CLIR) consists primarily of mapping query
122
terms to a semantically equivalent representation in the
target language. This can be accomplished by looking
up each term in a simple bilingual dictionary. The main
problem here is deciding which of the translations provided by the dictionary for each query term should
be included in the query translation. We tackled this
problem by examining different characteristics of the
system dictionary. We found that dictionary properties
such as scale (the average number of translations per
term), translation repetition (providing the same translation for a term more than once in a dictionary entry,
for example, for different senses of a term), and dictionary coverage rate (the percentage of query terms for
which the dictionary provides a translation) can have
a profound effect on retrieval performance. Dictionary
properties were explored in a series of carefully controlled tests, designed to evaluate specific hypotheses.
These experiments showed that (a) contrary to expectation, smaller scale dictionaries resulted in better performance than large-scale ones, and (b) when appropriately managed e.g. through strategies to ensure adequate translational coverage, dictionary-based CLIR
could perform as well as other CLIR methods discussed
in the literature. Our experiments showed that it is possible to implement an effective CLIR system with no
resources other than the system dictionary itself, provided this dictionary is chosen with careful examination of its characteristics, removing any dependency on
outside resources.
UCAM-CL-TR-617
Augustin Chaintreau, Pan Hui,
Jon Crowcroft, Christophe Diot,
Richard Gass, James Scott:
Pocket Switched Networks:
Real-world mobility and its
consequences for opportunistic
forwarding
February 2005, 26 pages, PDF
Abstract: Opportunistic networks make use of human
mobility and local forwarding in order to distribute
data. Information can be stored and passed, taking advantage of the device mobility, or forwarded over a
wireless link when an appropriate contact is met. Such
networks fall into the fields of mobile ad-hoc networking and delay-tolerant networking. In order to evaluate forwarding algorithms for these networks, accurate
data is needed on the intermittency of connections.
In this paper, the inter-contact time between two
transmission opportunities is observed empirically using four distinct sets of data, two having been specifically collected for this work, and two provided by other
research groups.
We discover that the distribution of inter-contact
time follows an approximate power law over a large
time range in all data sets. This observation is at odds
with the exponential decay expected by many currently
used mobility models. We demonstrate that opportunistic transmission schemes designed around these current models have poor performance under approximate
power-law conditions, but could be significantly improved by using limited redundant transmissions.
UCAM-CL-TR-618
Mark R. Shinwell:
The Fresh Approach:
functional programming with names
and binders
February 2005, 111 pages, PDF
PhD thesis (Queens’ College, December 2004)
Abstract: This report concerns the development of a
language called Fresh Objective Caml, which is an extension of the Objective Caml language providing facilities for the manipulation of data structures representing syntax involving α-convertible names and binding
operations.
After an introductory chapter which includes a survey of related work, we describe the Fresh Objective
Caml language in detail. Next, we proceed to formalise a small core language which captures the essence
of Fresh Objective Caml; we call this Mini-FreshML.
We provide two varieties of operational semantics for
this language and prove them equivalent. Then in order to prove correctness properties of representations
of syntax in the language we introduce a new variety of domain theory called FM-domain theory, based
on the permutation model of name binding from Pitts
and Gabbay. We show how classical domain-theoretic
constructions—including those for the solution of recursive domain equations—fall naturally into this setting, where they are augmented by new constructions
to handle name-binding.
After developing the necessary domain theory, we
demonstrate how it may be exploited to give a monadic
denotational semantics to Mini-FreshML. This semantics in itself is quite novel and demonstrates how a
simple monad of continuations is sufficient to model
dynamic allocation of names. We prove that our denotational semantics is computationally adequate with
respect to the operational semantics—in other words,
equality of denotation implies observational equivalence. After this, we show how the denotational semantics may be used to prove our desired correctness properties.
In the penultimate chapter, we examine the implementation of Fresh Objective Caml, describing detailed
issues in the compiler and runtime systems. Then in the
final chapter we close the report with a discussion of future avenues of research and an assessment of the work
completed so far.
123
Middleware support for
context-awareness in distributed
sensor-driven systems
UCAM-CL-TR-619
James R. Bulpin:
Operating system support for
simultaneous multithreaded
processors
February 2005, 176 pages, PDF
PhD thesis (Clare College, January 2005)
February 2005, 130 pages, PDF
PhD thesis (King’s College, September 2004)
Abstract: Simultaneous multithreaded (SMT) processors are able to execute multiple application threads in
parallel in order to improve the utilisation of the processor’s execution resources. The improved utilisation
provides a higher processor-wide throughput at the expense of the performance of each individual thread.
Simultaneous multithreading has recently been incorporated into the Intel Pentium 4 processor family as
“Hyper-Threading”. While there is already basic support for it in popular operating systems, that support
does not take advantage of any knowledge about the
characteristics of SMT, and therefore does not fully exploit the processor.
SMT presents a number of challenges to operating system designers. The threads’ dynamic sharing of
processor resources means that there are complex performance interactions between threads. These interactions are often unknown, poorly understood, or hard to
avoid. As a result such interactions tend to be ignored
leading to a lower processor throughput.
In this dissertation I start by describing simultaneous multithreading and the hardware implementations
of it. I discuss areas of operating system support that
are either necessary or desirable.
I present a detailed study of a real SMT processor,
the Intel Hyper-Threaded Pentium 4, and describe the
performance interactions between threads. I analyse the
results using information from the processor’s performance monitoring hardware.
Building on the understanding of the processor’s operation gained from the analysis, I present a design
for an operating system process scheduler that takes
into account the characteristics of the processor and
the workloads in order to improve the system-wide
throughput. I evaluate designs exploiting various levels
of processor-specific knowledge.
I finish by discussing alternative ways to exploit
SMT processors. These include the partitioning onto
separate simultaneous threads of applications and
hardware interrupt handling. I present preliminary experiments to evaluate the effectiveness of this technique.
UCAM-CL-TR-620
Eleftheria Katsiri:
Abstract: Context-awareness concerns the ability of
computing devices to detect, interpret and respond to
aspects of the user’s local environment. Sentient Computing is a sensor-driven programming paradigm which
maintains an event-based, dynamic model of the environment which can be used by applications in order to drive changes in their behaviour, thus achieving context-awareness. However, primitive events, especially those arising from sensors, e.g., that a user is at
position {x,y,z} are too low-level to be meaningful to
applications. Existing models for creating higher-level,
more meaningful events, from low-level events, are insufficient to capture the user’s intuition about abstract
system state. Furthermore, there is a strong need for
user-centred application development, without undue
programming overhead. Applications need to be created dynamically and remain functional independently
of the distributed nature and heterogeneity of sensordriven systems, even while the user is mobile. Both issues combined necessitate an alternative model for developing applications in a real-time, distributed sensordriven environment such as Sentient Computing.
This dissertation describes the design and implementation of the SCAFOS framework. SCAFOS has
two novel aspects. Firstly, it provides powerful tools for
inferring abstract knowledge from low-level, concrete
knowledge, verifying its correctness and estimating its
likelihood. Such tools include Hidden Markov Models, a Bayesian Classifier, Temporal First-Order Logic,
the theorem prover SPASS and the production system
CLIPS. Secondly, SCAFOS provides support for simple application development through the XML-based
SCALA language. By introducing the new concept of
a generalised event, an abstract event, defined as a notification of changes in abstract system state, expressiveness compatible with human intuition is achieved
when using SCALA. The applications that are created
through SCALA are automatically integrated and operate seamlessly in the various heterogeneous components of the context-aware environment even while the
user is mobile or when new entities or other applications are added or removed in SCAFOS.
UCAM-CL-TR-621
Mark R. Shinwell, Andrew M. Pitts:
Fresh Objective Caml user manual
February 2005, 21 pages, PDF
Abstract: This technical report is the user manual for
the Fresh Objective Caml system, which implements a
124
functional programming language incorporating facilities for manipulating syntax involving names and binding operations.
UCAM-CL-TR-623
Keith Wansbrough:
Simple polymorphic usage analysis
UCAM-CL-TR-622
March 2005, 364 pages, PDF
Jörg H. Lepler:
PhD thesis (Clare Hall, March 2002)
Cooperation and deviation in
market-based resource allocation
March 2005, 173 pages, PDF
PhD thesis (St John’s College, November 2004)
Abstract: This thesis investigates how business transactions are enhanced through competing strategies for
economically motivated cooperation. To this end, it focuses on the setting of a distributed, bilateral allocation
protocol for electronic services and resources. Cooperative efforts like these are often threatened by transaction parties who aim to exploit their competitors by
deviating from so-called cooperative goals. We analyse this conflict between cooperation and deviation by
presenting the case of two novel market systems which
use economic incentives to solve the complications that
arise from cooperation.
The first of the two systems is a pricing model which
is designed to address the problematic resource market situation, where supply exceeds demand and perfect competition can make prices collapse to level zero.
This pricing model uses supply functions to determine
the optimal Nash-Equilibrium price. Moreover, in this
model the providers’ market estimations are updated
with information about each of their own transactions.
Here, we implement the protocol in a discrete event
simulation, to show that the equilibrium prices are
above competitive levels, and to demonstrate that deviations from the pricing model are not profitable.
The second of the two systems is a reputation aggregation model, which seeks the subgroup of raters that
(1) contains the largest degree of overall agreement and
(2) derives the resulting reputation scores from their
comments. In order to seek agreement, this model assumes that not all raters in the system are equally able
to foster an agreement. Based on the variances of the
raters’ comments, the system derives a notion of the
reputation for each rater, which is in turn fed back into
the model’s recursive scoring algorithm. We demonstrate the convergence of this algorithm, and show the
effectiveness of the model’s ability to discriminate between poor and strong raters. Then with a series of
threat models, we show how resilient this model is in
terms of finding agreement, despite large collectives of
malicious raters. Finally, in a practical example, we apply the model to the academic peer review process in
order to show its versatility at establishing a ranking of
rated objects.
Abstract: Implementations of lazy functional languages
ensure that computations are performed only when
they are needed, and save the results so that they are
not repeated. This frees the programmer to describe solutions at a high level, leaving details of control flow to
the compiler.
This freedom however places a heavy burden on the
compiler; measurements show that over 70% of these
saved results are never used again. A usage analysis that
could statically detect values used at most once would
enable these wasted updates to be avoided, and would
be of great benefit. However, existing usage analyses
either give poor results or have been applied only to
prototype compilers or toy languages.
This thesis presents a sound, practical, type-based
usage analysis that copes with all the language features
of a modern functional language, including type polymorphism and user-defined algebraic data types, and
addresses a range of problems that have caused difficulty for previous analyses, including poisoning, mutual recursion, separate compilation, and partial application and usage dependencies. In addition to welltyping rules, an inference algorithm is developed, with
proofs of soundness and a complexity analysis.
In the process, the thesis develops simple polymorphism, a novel approach to polymorphism in the presence of subtyping that attempts to strike a balance between pragmatic concerns and expressive power. This
thesis may be considered an extended experiment into
this approach, worked out in some detail but not yet
conclusive.
The analysis described was designed in parallel with
a full implementation in the Glasgow Haskell Compiler,
leading to informed design choices, thorough coverage of language features, and accurate measurements of
its potential and effectiveness when used on real code.
The latter demonstrate that the analysis yields moderate benefit in practice.
UCAM-CL-TR-624
Steve Bishop, Matthew Fairbairn,
Michael Norrish, Peter Sewell,
Michael Smith, Keith Wansbrough:
TCP, UDP, and Sockets:
rigorous and experimentally-validated
behavioural specification
Volume 1: Overview
March 2005, 88 pages, PDF
125
Abstract: We have developed a mathematically rigorous and experimentally-validated post-hoc specification
of the behaviour of TCP, UDP, and the Sockets API.
It characterises the API and network-interface interactions of a host, using operational semantics in the
higher-order logic of the HOL automated proof assistant. The specification is detailed, covering almost all
the information of the real-world communications: it
is in terms of individual TCP segments and UDP datagrams, though it abstracts from the internals of IP. It
has broad coverage, dealing with arbitrary API call sequences and incoming messages, not just some wellbehaved usage. It is also accurate, closely based on
the de facto standard of (three of) the widely-deployed
implementations. To ensure this we have adopted a
novel experimental semantics approach, developing test
generation tools and symbolic higher-order-logic model
checking techniques that let us validate the specification
directly against several thousand traces captured from
the implementations.
The resulting specification, which is annotated for
the non-HOL-specialist reader, may be useful as an informal reference for TCP/IP stack implementors and
Sockets API users, supplementing the existing informal standards and texts. It can also provide a basis
for high-fidelity automated testing of future implementations, and a basis for design and formal proof of
higher-level communication layers. More generally, the
work demonstrates that it is feasible to carry out similar rigorous specification work at design-time for new
protocols. We discuss how such a design-for-test approach should influence protocol development, leading
to protocol specifications that are both unambiguous
and clear, and to high-quality implementations that can
be tested directly against those specifications.
This document (Volume 1) gives an overview of the
project, discussing the goals and techniques and giving
an introduction to the specification. The specification
itself is given in the companion Volume 2 (UCAM-CLTR-625), which is automatically typeset from the (extensively annotated) HOL source. As far as possible we
have tried to make the work accessible to four groups
of intended readers: workers in networking (implementors of TCP/IP stacks, and designers of new protocols);
in distributed systems (implementors of software above
the Sockets API); in distributed algorithms (for whom
this may make it possible to prove properties about executable implementations of those algorithms); and in
semantics and automated reasoning.
Volume 2: The Specification
March 2005, 386 pages, PDF
Abstract: See Volume 1 (UCAM-CL-TR-624).
UCAM-CL-TR-626
Meng How Lim, Adam Greenhalgh,
Julian Chesterfield, Jon Crowcroft:
Landmark Guided Forwarding:
A hybrid approach for Ad Hoc
routing
March 2005, 28 pages, PDF
Abstract: Wireless Ad Hoc network routing presents
some extremely challenging research problems, trying
to optimize parameters such as energy conservation vs
connectivity and global optimization vs routing overhead scalability. In this paper we focus on the problems
of maintaining network connectivity in the presence of
node mobility whilst providing globally efficient and
robust routing. The common approach among existing wireless Ad Hoc routing solutions is to establish a
global optimal path between a source and a destination.
We argue that establishing a globally optimal path is
both unreliable and unsustainable as the network diameter, traffic volume, number of nodes all increase in the
presence of moderate node mobility. To address this we
propose Landmark Guided Forwarding (LGF), a protocol that provides a hybrid solution of topological and
geographical routing algorithms. We demonstrate that
LGF is adaptive to unstable connectivity and scalable
to large networks. Our results indicate therefore that
Landmark Guided Forwarding converges much faster,
scales better and adapts well within a dynamic wireless
Ad Hoc environment in comparison to existing solutions.
UCAM-CL-TR-627
Keith Vertanen:
Efficient computer interfaces
using continuous gestures,
language models, and speech
March 2005, 46 pages, PDF
UCAM-CL-TR-625
MPhil thesis (Darwin College, July 2004)
Steve Bishop, Matthew Fairbairn,
Michael Norrish, Peter Sewell,
Michael Smith, Keith Wansbrough:
TCP, UDP, and Sockets:
rigorous and experimentally-validated
behavioural specification
Abstract: Despite advances in speech recognition technology, users of dictation systems still face a significant
amount of work to correct errors made by the recognizer. The goal of this work is to investigate the use of
a continuous gesture-based data entry interface to provide an efficient and fun way for users to correct recognition errors. Towards this goal, techniques are investigated which expand a recognizer’s results to help cover
126
recognition errors. Additionally, models are developed and hybrid forwarding. Our results show that a hybrid
which utilize a speech recognizer’s n-best list to build of the position and topology approaches used in Landletter-based language models.
mark Guided Forwarding yields a high goodput and
timely packet delivery, even with 200 meters of position error.
UCAM-CL-TR-628
UCAM-CL-TR-630
Moritz Y. Becker:
A formal security policy for an
NHS electronic health record service
Sergei P. Skorobogatov:
March 2005, 81 pages, PDF
Abstract: The ongoing NHS project for the development of a UK-wide electronic health records service,
also known as the ‘Spine’, raises many controversial
issues and technical challenges concerning the security
and confidentiality of patient-identifiable clinical data.
As the system will need to be constantly adapted to
comply with evolving legal requirements and guidelines, the Spine’s authorisation policy should not be
hard-coded into the system but rather be specified in
a high-level, general-purpose, machine-enforceable policy language.
We describe a complete authorisation policy for the
Spine and related services, written for the trust management system Cassandra, and comprising 375 formal rules. The policy is based on the NHS’s Outputbased Specification (OBS) document and deals with
all requirements concerning access control of patientidentifiable data, including legitimate relationships, patients restricting access, authenticated express consent,
third-party consent, and workgroup management.
UCAM-CL-TR-629
Meng How Lim, Adam Greenhalgh,
Julian Chesterfield, Jon Crowcroft:
Hybrid routing: A pragmatic
approach to mitigating position
uncertainty in geo-routing
April 2005, 26 pages, PDF
Abstract: In recent years, research in wireless Ad Hoc
routing seems to be moving towards the approach of
position based forwarding. Amongst proposed algorithms, Greedy Perimeter Stateless Routing has gained
recognition for guaranteed delivery with modest network overheads. Although this addresses the scaling
limitations with topological routing, it has limited tolerance for position inaccuracy or stale state reported by a
location service. Several researchers have demonstrated
that the inaccuracy of the positional system could have
a catastrophic effect on position based routing protocols. In this paper, we evaluate how the negative effects
of position inaccuracy can be countered by extending
position based forwarding with a combination of restrictive topological state, adaptive route advertisement
Semi-invasive attacks –
A new approach to hardware security
analysis
April 2005, 144 pages, PDF
PhD thesis (Darwin College, September 2004)
Abstract: Semiconductor chips are used today not only
to control systems, but also to protect them against
security threats. A continuous battle is waged between manufacturers who invent new security solutions, learning their lessons from previous mistakes,
and the hacker community, constantly trying to break
implemented protections. Some chip manufacturers do
not pay enough attention to the proper design and testing of protection mechanisms. Even where they claim
their products are highly secure, they do not guarantee this and do not take any responsibility if a device is
compromised. In this situation, it is crucial for the design engineer to have a convenient and reliable method
of testing secure chips.
This thesis presents a wide range of attacks on hardware security in microcontrollers and smartcards. This
includes already known non-invasive attacks, such as
power analysis and glitching, and invasive attacks, such
as reverse engineering and microprobing. A new class
of attacks – semi-invasive attacks – is introduced. Like
invasive attacks, they require depackaging the chip to
get access to its surface. But the passivation layer remains intact, as these methods do not require electrical contact to internal lines. Semi-invasive attacks stand
between non-invasive and invasive attacks. They represent a greater threat to hardware security, as they are
almost as effective as invasive attacks but can be lowcost like non-invasive attacks.
This thesis’ contribution includes practical faultinjection attacks to modify SRAM and EEPROM content, or change the state of any individual CMOS transistor on a chip. This leads to almost unlimited capabilities to control chip operation and circumvent protection mechanisms. A second contribution consist of experiments on data remanence, which show that it is feasible to extract information from powered-off SRAM
and erased EPROM, EEPROM and Flash memory devices.
A brief introduction to copy protection in microcontrollers is given. Hardware security evaluation techniques using semi-invasive methods are introduced.
They should help developers to make a proper selection
127
of components according to the required level of security. Various defence technologies are discussed, from
low-cost obscurity methods to new approaches in silicon design.
UCAM-CL-TR-631
and cohesiveness, in choosing node sets to form the
content representation for the summary. This is used
in different ways for output summaries. The paper
presents the motivation for the strategy, details of the
CLASP system, and the results of initial testing and
evaluation on news material.
UCAM-CL-TR-633
Wenjun Hu, Jon Crowcroft:
MIRRORS: An integrated framework Alex Ho, Steven Smith, Steven Hand:
for capturing real world behaviour
On deadlock, livelock, and forward
for models of ad hoc networks
progress
April 2005, 16 pages, PDF
May 2005, 8 pages, PDF
Abstract: The simulation models used in mobile ad hoc
network research have been criticised for lack of realism. While credited with ease of understanding and implementation, they are often based on theoretical models, rather than real world observations. Criticisms have
centred on radio propagation or mobility models.
In this work, we take an integrated approach to
modelling the real world that underlies a mobile ad
hoc network. While pointing out the correlations between the space, radio propagation and mobility models, we use mobility as a focal point to propose a new
framework, MIRRORS, that captures real world behaviour. We give the formulation of a specific model
within the framework and present simulation results
that reflect topology properties of the networks synthesised. Compared with the existing models studied,
our model better represent real world topology properties and presents a wider spectrum of variation in the
metrics examined, due to the model encapsulating more
detailed dynamics. While the common approach is to
focus on performance evaluation of existing protocols
using these models, we discuss protocol design opportunities across layers in view of the simulation results.
UCAM-CL-TR-632
R.I. Tucker and K. Spärck Jones:
Between shallow and deep:
an experiment in automatic
summarising
April 2005, 34 pages, PDF
Abstract: This paper describes an experiment in automatic summarising using a general-purpose strategy
based on a compromise between shallow and deep processing. The method combines source text analysis into
simple logical forms with the use of a semantic graph
for representation and operations on the graph to identify summary content.
The graph is based on predications extracted from
the logical forms, and the summary operations apply
three criteria, namely importance, representativeness,
Abstract: Deadlock and livelock can happen at many
different levels in a distributed system. We unify both
around the concept of forward progress and standstill.
We describe a framework capable of detecting the lack
of forward progress in distributed systems. Our prototype can easily solve traditional deadlock problems
where synchronization is via a customer network protocol; however, many interesting research challenges remain.
UCAM-CL-TR-634
Kasim Rehman:
Visualisation, interpretation and use
of location-aware interfaces
May 2005, 159 pages, PDF
PhD thesis (St Catharine’s College, November 2004)
Abstract: Ubiquitous Computing (Ubicomp), a term
coined by Mark Weiser in the early 1990’s, is about
transparently equipping the physical environment and
everyday objects in it with computational, sensing and
networking abilities. In contrast with traditional desktop computing the “computer” moves into the background, unobtrusively supporting users in their everyday life.
One of the instantiations of Ubicomp is locationaware computing. Using location sensors, the “computer” reacts to changes in location of users and everyday objects. Location changes are used to infer user
intent in order to give the user the most appropriate
support for the task she is performing. Such support
can consist of automatically providing information or
configuring devices and applications deemed adequate
for the inferred user task.
Experience with these applications has uncovered
a number of usability problems that stem from the
fact that the “computer” in this paradigm has become
unidentifiable for the user. More specifically, these arise
from lack of feedback from, loss of user control over,
and the inability to provide a conceptual model of the
“computer”.
128
Starting from the proven premise that feedback is
indispensable for smooth human-machine interaction,
a system that uses Augmented Reality in order to visually provide information about the state of a locationaware environment and devices in it, is designed and
implemented.
Augmented Reality (AR) as it is understood for
the purpose of this research uses a see-through headmounted display, trackers and 3-dimensional (3D)
graphics in order to give users the illusion that 3dimensional graphical objects specified and generated
on a computer are actually located in the real world.
The system described in this thesis can be called
a Graphical User Interface (GUI) for a physical environment. Properties of GUIs for desktop environments
are used as a valuable resource in designing a software
architecture that supports interactivity in a locationaware environment, understanding how users might
conceptualise the “computer” and extracting design
principles for visualisation in a Ubicomp environment.
Most importantly this research offers a solution to
fundamental interaction problems in Ubicomp environments. In doing so this research presents the next
step from reactive environments to interactive environments.
UCAM-CL-TR-635
John Daugman:
Results from 200 billion iris
cross-comparisons
June 2005, 8 pages, PDF
Abstract: Statistical results are presented for biometric
recognition of persons by their iris patterns, based on
200 billion cross-comparisons between different eyes.
The database consisted of 632,500 iris images acquired
in the Middle East, in a national border-crossing protection programme that uses the Daugman algorithms
for iris recognition. A total of 152 different nationalities were represented in this database. The set of exhaustive cross-comparisons between all possible pairings of irises in the database shows that with reasonable
acceptance thresholds, the False Match rate is less than
1 in 200 billion. Recommendations are given for the
numerical decision threshold policy that would enable
reliable identification performance on a national scale
in the UK.
UCAM-CL-TR-636
Rana Ayman el Kaliouby:
Mind-reading machines:
automated inference of complex
mental states
July 2005, 185 pages, PDF
PhD thesis (Newnham College, March 2005)
Abstract: People express their mental states all the time,
even when interacting with machines. These mental
states shape the decisions that we make, govern how we
communicate with others, and affect our performance.
The ability to attribute mental states to others from
their behaviour, and to use that knowledge to guide
one’s own actions and predict those of others is known
as theory of mind or mind-reading.
The principal contribution of this dissertation is the
real time inference of a wide range of mental states from
head and facial displays in a video stream. In particular,
the focus is on the inference of complex mental states:
the affective and cognitive states of mind that are not
part of the set of basic emotions. The automated mental state inference system is inspired by and draws on
the fundamental role of mind-reading in communication and decision-making.
The dissertation describes the design, implementation and validation of a computational model of mindreading. The design is based on the results of a number
of experiments that I have undertaken to analyse the facial signals and dynamics of complex mental states. The
resulting model is a multi-level probabilistic graphical
model that represents the facial events in a raw video
stream at different levels of spatial and temporal abstraction. Dynamic Bayesian Networks model observable head and facial displays, and corresponding hidden mental states over time.
The automated mind-reading system implements
the model by combining top-down predictions of mental state models with bottom-up vision-based processing of the face. To support intelligent human-computer
interaction, the system meets three important criteria.
These are: full automation so that no manual preprocessing or segmentation is required, real time execution,
and the categorization of mental states early enough after their onset to ensure that the resulting knowledge is
current and useful.
The system is evaluated in terms of recognition accuracy, generalization and real time performance for six
broad classes of complex mental states—agreeing, concentrating, disagreeing, interested, thinking and unsure,
on two different corpora. The system successfully classifies and generalizes to new examples of these classes
with an accuracy and speed that are comparable to that
of human recognition.
The research I present here significantly advances
the nascent ability of machines to infer cognitiveaffective mental states in real time from nonverbal expressions of people. By developing a real time system
for the inference of a wide range of mental states beyond the basic emotions, I have widened the scope
of human-computer interaction scenarios in which this
technology can be integrated. This is an important step
towards building socially and emotionally intelligent
machines.
UCAM-CL-TR-637
Shishir Nagaraja, Ross Anderson:
129
The topology of covert conflict
instances of a distributed algorithm, provided that only
finitely many of them are different. We assume that
fewer than a third of all processes are faulty (n > 3f).
July 2005, 15 pages, PDF
Abstract: Often an attacker tries to disconnect a network by destroying nodes or edges, while the defender
counters using various resilience mechanisms. Examples include a music industry body attempting to close
down a peer-to-peer file-sharing network; medics attempting to halt the spread of an infectious disease
by selective vaccination; and a police agency trying to
decapitate a terrorist organisation. Albert, Jeong and
Barabási famously analysed the static case, and showed
that vertex-order attacks are effective against scale-free
networks. We extend this work to the dynamic case by
developing a framework based on evolutionary game
theory to explore the interaction of attack and defence
strategies. We show, first, that naive defences don’t
work against vertex-order attack; second, that defences
based on simple redundancy don’t work much better,
but that defences based on cliques work well; third, that
attacks based on centrality work better against clique
defences than vertex-order attacks do; and fourth, that
defences based on complex strategies such as delegation plus clique resist centrality attacks better than simple clique defences. Our models thus build a bridge between network analysis and evolutionary game theory,
and provide a framework for analysing defence and attack in networks where topology matters. They suggest
definitions of efficiency of attack and defence, and may
even explain the evolution of insurgent organisations
from networks of cells to a more virtual leadership
that facilitates operations rather than directing them.
Finally, we draw some conclusions and present possible
directions for future research.
UCAM-CL-TR-638
UCAM-CL-TR-639
Chris Purcell, Tim Harris:
Non-blocking hashtables with
open addressing
September 2005, 23 pages, PDF
Abstract: We present the first non-blocking hashtable
based on open addressing that provides the following
benefits: it combines good cache locality, accessing a
single cacheline if there are no collisions, with short
straight-line code; it needs no storage overhead for
pointers and memory allocator schemes, having instead
an overhead of two words per bucket; it does not need
to periodically reorganise or replicate the table; and it
does not need garbage collection, even with arbitrarysized keys. Open problems include resizing the table
and replacing, rather than erasing, entries. The result
is a highly-concurrent set algorithm that approaches or
outperforms the best externally-chained implementations we tested, with fixed memory costs and no need to
select or fine-tune a garbage collector or locking strategy.
UCAM-CL-TR-640
Feng Hao, Ross Anderson, John Daugman:
Combining cryptography with
biometrics effectively
Piotr Zieliński:
July 2005, 17 pages, PDF
Optimistic Generic Broadcast
Abstract: We propose the first practical and secure way
to integrate the iris biometric into cryptographic applications. A repeatable binary string, which we call a biometric key, is generated reliably from genuine iris codes.
A well-known difficulty has been how to cope with the
10 to 20% of error bits within an iris code and derive
an error-free key. To solve this problem, we carefully
studied the error patterns within iris codes, and devised
a two-layer error correction technique that combines
Hadamard and Reed-Solomon codes. The key is generated from a subject’s iris image with the aid of auxiliary
error-correction data, which do not reveal the key, and
can be saved in a tamper-resistant token such as a smart
card. The reproduction of the key depends on two factors: the iris biometric and the token. The attacker has
to procure both of them to compromise the key. We
evaluated our technique using iris samples from 70 different eyes, with 10 samples from each eye. We found
that an error-free key can be reproduced reliably from
genuine iris codes with a 99.5% success rate. We can
generate up to 140 bits of biometric key, more than
July 2005, 22 pages, PDF
Abstract: We consider an asynchronous system with the
Ω failure detector, and investigate the number of communication steps required by various broadcast protocols in runs in which the leader does not change.
Atomic Broadcast, used for example in state machine
replication, requires three communication steps. Optimistic Atomic Broadcast requires only two steps if all
correct processes receive messages in the same order.
Generic Broadcast requires two steps if no messages
conflict. We present an algorithm that subsumes both
of these approaches and guarantees two-step delivery if
all conflicting messages are received in the same order,
and three-step delivery otherwise. Internally, our protocol uses two new algorithms. First, a Consensus algorithm which decides in one communication step if all
proposals are the same, and needs two steps otherwise.
Second, a method that allows us to run infinitely many
130
enough for 128-bit AES. The extraction of a repeatable of relationship inheritance. We formalize our language
binary string from biometrics opens new possible ap- giving both the type system and operational semantics
plications, where a strong binding is required between and prove certain key safety properties.
a person and cryptographic operations. For example,
it is possible to identify individuals without maintainUCAM-CL-TR-643
ing a central database of biometric templates, to which
privacy objections might be raised.
Nathan E. Dimmock:
Using trust and risk for access control
in Global Computing
UCAM-CL-TR-641
Ross Anderson, Mike Bond, Jolyon Clulow,
Sergei Skorobogatov:
August 2005, 145 pages, PDF
PhD thesis (Jesus College, April 2005)
Cryptographic processors –
a survey
August 2005, 19 pages, PDF
Abstract: Tamper-resistant cryptographic processors
are becoming the standard way to enforce data-usage
policies. Their history began with military cipher machines, and hardware security modules used to encrypt
the PINs that bank customers use to authenticate themselves to ATMs. In both cases, the designers wanted to
prevent abuse of data and key material should a device fall into the wrong hands. From these specialist
beginnings, cryptoprocessors spread into devices such
as prepayment electricity meters, and the vending machines that sell credit for them. In the 90s, tamperresistant smartcards became integral to GSM mobile
phone identification and to key management in payTV set-top boxes, while secure microcontrollers were
used in remote key entry devices for cars. In the last
five years, dedicated crypto chips have been embedded
in devices from games console accessories to printer ink
cartridges, to control product and accessory aftermarkets. The ‘Trusted Computing’ initiative will soon embed cryptoprocessors in PCs so that they can identify
each other remotely.
This paper surveys the range of applications of
tamper-resistant hardware, and the array of attack and
defence mechanisms which have evolved in the tamperresistance arms race.
UCAM-CL-TR-642
Gavin Bierman, Alisdair Wren:
First-class relationships in an
object-oriented language
August 2005, 53 pages, PDF
Abstract: In this paper we investigate the addition
of first-class relationships to a prototypical objectoriented programming language (a “middleweight”
fragment of Java). We provide language-level constructs to declare relationships between classes and to
manipulate relationship instances. We allow relationships to have attributes and provide a novel notion
Abstract: Global Computing is a vision of a massively
networked infrastructure supporting a large population
of diverse but cooperating entities. Similar to ubiquitous computing, entities of global computing will operate in environments that are dynamic and unpredictable, requiring them to be capable of dealing with
unexpected interactions and previously unknown principals using an unreliable infrastructure.
These properties will pose new security challenges
that are not adequately addressed by existing security
models and mechanisms. Traditionally privileges are
statically encoded as security policy, and while rôlebased access control introduces a layer of abstraction
between privilege and identity, rôles, privileges and context must still be known in advance of any interaction
taking place.
Human society has developed the mechanism of
trust to overcome initial suspicion and gradually
evolve privileges. Trust successfully enables collaboration amongst human agents — a computational model
of trust ought to be able to enable the same in computational agents. Existing research in this area has concentrated on developing trust management systems that
permit the encoding of, and reasoning about, trust beliefs, but the relationship between these and privilege is
still hard coded. These systems also omit any explicit
reasoning about risk, and its relationship to privilege,
nor do they permit the automated evolution of trust
over time.
This thesis examines the relationship between trust,
risk and privilege in an access control system. An
outcome-based approach is taken to risk modelling, using explicit costs and benefits to model the relationship
between risk and privilege. This is used to develop a
novel model of access control — trust-based access control (TBAC) — firstly for the limited domain of collaboration between Personal Digital Assistants (PDAs), and
later for more general global computing applications
using the SECURE computational trust framework.
This general access control model is also used to extend an existing rôle-based access control system to explicitly reason about trust and risk. A further refinement is the incorporation of the economic theory of
decision-making under uncertainty by expressing costs
and benefits as utility, or preference-scaling, functions.
It is then shown how Bayesian trust models can be used
131
in the SECURE framework, and how these models enable a better abstraction to be obtained in the access
control policy. It is also shown how the access control
model can be used to take such decisions as whether
the cost of seeking more information about a principal is justified by the risk associated with granting the
privilege, and to determine whether a principal should
respond to such requests upon receipt. The use of game
theory to help in the construction of policies is also
briefly considered.
Global computing has many applications, all of
which require access control to prevent abuse by malicious principals. This thesis develops three in detail:
an information sharing service for PDAs, an identitybased spam detector and a peer-to-peer collaborative
spam detection network. Given the emerging nature of
computational trust systems, in order to evaluate the effectiveness of the TBAC model, it was first necessary to
develop an evaluation methodology. This takes the approach of a threat-based analysis, considering possible
attacks at the component and system level, to ensure
that components are correctly integrated, and systemlevel assumptions made by individual components are
valid. Applying the methodology to the implementation of the TBAC model demonstrates its effectiveness
in the scenarios chosen, with good promise for further,
untested, scenarios.
case for continued focus on automated formal analysis
of cryptographic APIs.
UCAM-CL-TR-645
Frank Stajano:
RFID is X-ray vision
August 2005, 10 pages, PDF
Abstract: Making RFID tags as ubiquitous as barcodes
will enable machines to see and recognize any tagged
object in their vicinity, better than they ever could with
the smartest image processing algorithms. This opens
many opportunities for “sentient computing” applications.
However, in so far as this new capability has some
of the properties of X-ray vision, it opens the door to
abuses. To promote discussion, I won’t elaborate on
low level technological solutions; I shall instead discuss
a simple security policy model that addresses most of
the privacy issues. Playing devil’s advocate, I shall also
indicate why it is currently unlikely that consumers will
enjoy the RFID privacy that some of them vociferously
demand.
UCAM-CL-TR-647
UCAM-CL-TR-644
Paul Youn, Ben Adida, Mike Bond,
Jolyon Clulow, Jonathan Herzog,
Amerson Lin, Ronald L. Rivest,
Ross Anderson:
Robbing the bank with a theorem
prover
August 2005, 26 pages, PDF
Sam Staton:
An agent architecture for simulation
of end-users in programming-like
tasks
October 2005, 12 pages, PDF
Abstract: We present some motivation and technical
details for a software simulation of an end-user performing programming-like tasks. The simulation uses
an agent/agenda model by breaking tasks down into
subgoals, based on work of A. Blackwell. This document was distributed at the CHI 2002 workshop on
Cognitive Models of Programming-Like Processes.
Abstract: We present the first methodology for analysis and automated detection of attacks on security application programming interfaces (security APIs) – the
interfaces to hardware cryptographic services used by
developers of critical security systems, such as banking
applications. Taking a cue from previous work on the
UCAM-CL-TR-648
formal analysis of security protocols, we model APIs
purely according to specifications, under the assump- Moritz Y. Becker:
tion of ideal encryption primitives. We use a theorem
prover tool and adapt it to the security API context. Cassandra: flexible trust management
We develop specific formalization and automation techniques that allow us to fully harness the power of a the- and its application to electronic
orem prover. We show how, using these techniques, we health records
were able to automatically re-discover all of the pure
API attacks originally documented by Bond and An- October 2005, 214 pages, PDF
derson against banking payment networks, since their PhD thesis (Trinity College, September 2005)
discovery of this type of attack in 2000. We conclude
with a note of encouragement: the complexity and unintuiveness of the modelled attacks make a very strong
132
Abstract: The emergence of distributed applications operating on large-scale, heterogeneous and decentralised
networks poses new and challenging problems of concern to society as a whole, in particular for data security, privacy and confidentiality. Trust management and
authorisation policy languages have been proposed to
address access control and authorisation in this context. Still, many key problems have remained unsolved.
Existing systems are often not expressive enough, or are
so expressive that access control becomes undecidable;
their semantics is not formally specified; and they have
not been shown to meet the requirements set by actual
real-world applications.
This dissertation addresses these problems. We
present Cassandra, a role-based language and system
for expressing authorisation policy, and the results of a
substantial case study, a policy for a national electronic
health record (EHR) system, based on the requirements
of the UK National Health Service’s National Programme for Information Technology (NPfIT).
Cassandra policies are expressed in a language derived from Datalog with constraints. Cassandra supports credential-based authorisation (eg between administrative domains), and rules can refer to remote
policies (for credential retrieval and trust negotiation).
The expressiveness of the language (and its computational complexity) can be tuned by choosing an appropriate constraint domain. The language is small and has
a formal semantics for both query evaluation and the
access control engine.
There has been a lack of real-world examples of
complex security policies: our NPfIT case study fills this
gap. The resulting Cassandra policy (with 375 rules)
demonstrates that the policy language is expressive
enough for a real-world application. We thus demonstrate that a general-purpose trust management system
can be designed to be highly flexible, expressive, formally founded and meet the complex requirements of
real-world applications.
give an extensive range of examples and compare our
method with other recently published algorithms.
UCAM-CL-TR-650
Rashid Mehmood, Jon Crowcroft:
Parallel iterative solution method for
large sparse linear equation systems
October 2005, 22 pages, PDF
Abstract: Solving sparse systems of linear equations
is at the heart of scientific computing. Large sparse
systems often arise in science and engineering problems. One such problem we consider in this paper is
the steady-state analysis of Continuous Time Markov
Chains (CTMCs). CTMCs are a widely used formalism
for the performance analysis of computer and communication systems. A large variety of useful performance
measures can be derived from a CTMC via the computation of its steady-state probabilities. A CTMC may
be represented by a set of states and a transition rate
matrix containing state transition rates as coefficients,
and can be analysed using probabilistic model checking. However, CTMC models for realistic systems are
very large. We address this largeness problem in this
paper, by considering parallelisation of symbolic methods. In particular, we consider Multi-Terminal Binary
Decision Diagrams (MTBDDs) to store CTMCs, and,
using Jacobi iterative method, present a parallel method
for the CTMC steady-state solution. Employing a 24node processor bank, we report results of the sparse
systems with over a billion equations and eighteen billion nonzeros.
UCAM-CL-TR-651
Rob Hague:
UCAM-CL-TR-649
End-user programming in multiple
languages
Mark Grundland, Neil A. Dodgson:
The decolorize algorithm for
contrast enhancing, color to
grayscale conversion
October 2005, 122 pages, PDF
PhD thesis (Fitzwilliam College, July 2004)
October 2005, 15 pages, PDF
Abstract: We present a new contrast enhancing color
to grayscale conversion algorithm which works in realtime. It incorporates novel techniques for image sampling and dimensionality reduction, sampling color differences by Gaussian pairing and analyzing color differences by predominant component analysis. In addition to its speed and simplicity, the algorithm has the
advantages of continuous mapping, global consistency,
and grayscale preservation, as well as predictable luminance, saturation, and hue ordering properties. We
Abstract: Advances in user interface technology have
removed the need for the majority of users to program,
but they do not allow the automation of repetitive or indirect tasks. End-user programming facilities solve this
problem without requiring users to learn and use a conventional programming language, but must be tailored
to specific types of end user. In situations where the user
population is particularly diverse, this presents a problem.
In addition, studies have shown that the performance of tasks based on the manipulation and interpretation of data depends on the way in which the
133
data is represented. Different representations may facilitate different tasks, and there is not necessarily a single, optimal representation that is best for all tasks. In
many cases, the choice of representation is also constrained by other factors, such as display size. It would
be advantageous for an end-user programming system
to provide multiple, interchangeable representations of
programs.
This dissertation describes an architecture for providing end-user programming facilities in the networked home, a context with a diverse user population, and a wide variety of input and output devices.
The Media Cubes language, a novel end-user programming language, is introduced as the context that lead
to the development of the architecture. A framework
for translation between languages via a common intermediate form is then described, with particular attention paid to the requirements of mappings between languages and the intermediate form. The implementation
of Lingua Franca, a system realizing this framework in
the given context, is described.
Finally, the system is evaluated by considering several end-user programming languages implemented
within this system. It is concluded that translation between programming languages, via a common intermediate form, is viable for systems within a limited domain, and the wider applicability of the technique is
discussed.
UCAM-CL-TR-652
Roongroj Nopsuwanchai:
Discriminative training methods and
their applications to handwriting
recognition
November 2005, 186 pages, PDF
PhD thesis (Downing College, August 2004)
Abstract: This thesis aims to improve the performance
of handwriting recognition systems by introducing the
use of discriminative training methods. Discriminative
training methods use data from all competing classes
when training the recogniser for each class. We develop discriminative training methods for two popular classifiers: Hidden Markov Models (HMMs) and
a prototype-based classifier. At the expense of additional computations in the training process, discriminative training has demonstrated significant improvements in recognition accuracies from the classifiers that
are not discriminatively optimised. Our studies focus
on isolated character recognition problems with an emphasis on, but not limited to, off-line handwritten Thai
characters.
The thesis is organised as followed. First, we develop an HMM-based classifier that employs a Maximum Mutual Information (MMI) discriminative training criterion. HMMs have an increasing number of applications to character recognition in which they are
usually trained by Maximum Likelihood (ML) using
the Baum-Welch algorithm. However, ML training does
not take into account the data of other competing categories, and thus is considered non-discriminative. By
contrast, MMI provides an alternative training method
with the aim of maximising the mutual information
between the data and their correct categories. One of
our studies highlights the efficiency of MMI training
that improves the recognition results from ML training, despite being applied to a highly constrained system (tied-mixture density HMMs). Various aspects of
MMI training are investigated, including its optimisation algorithms and a set of optimised parameters that
yields maximum discriminabilities.
Second, a system for Thai handwriting recognition
based on HMMs and MMI training is introduced. In
addition, novel feature extraction methods using blockbased PCA and composite images are proposed and
evaluated. A technique to improve generalisation of
the MMI-trained systems and the use of N-best lists
to efficiently compute the probabilities are described.
By applying these techniques, the results from extensive experiments are compelling, showing up to 65%
relative error reduction, compared to conventional ML
training without the proposed features. The best results
are comparable to those achieved by other high performance systems.
Finally, we focus on the Prototype-Based Minimum Error Classifier (PBMEC), which uses a discriminative Minimum Classification Error (MCE) training
method to generate the prototypes. MCE tries to minimise recognition errors during the training process using data from all classes. Several key findings are revealed, including the setting of smoothing parameters
and a proposed clustering method that are more suitable for PBMEC than using the conventional methods.
These studies reinforce the effectiveness of discriminative training and are essential as a foundation for its application to the more difficult problem of cursive handwriting recognition.
UCAM-CL-TR-653
Richard Clayton:
Anonymity and traceability in
cyberspace
November 2005, 189 pages, PDF
PhD thesis (Darwin College, August 2005)
Abstract: Traceability is the ability to map events in
cyberspace, particularly on the Internet, back to realworld instigators, often with a view to holding them accountable for their actions. Anonymity is present when
traceability fails.
I examine how traceability on the Internet actually
works, looking first at a classical approach from the
late 1990s that emphasises the rôle of activity logging
134
and reporting on the failures that are known to occur. Failures of traceability, with consequent unintentional anonymity, have continued as the technology has
changed. I present an analysis that ascribes these failures to the mechanisms at the edge of the network being inherently inadequate for the burden that traceability places upon them. The underlying reason for this
continuing failure is a lack of economic incentives for
improvement. The lack of traceability at the edges is
further illustrated by a new method of stealing another
person’s identity on an Ethernet Local Area Network
that existing tools and procedures would entirely fail
to detect.
Preserving activity logs is seen, especially by Governments, as essential for the traceability of illegal cyberspace activity. I present a new and efficient method
of processing email server logs to detect machines sending bulk unsolicited email “spam” or email infected
with “viruses”. This creates a clear business purpose
for creating logs, but the new detector is so effective
that the logs can be discarded within days, which may
hamper general traceability.
Preventing spam would be far better than tracing its
origin or detecting its transmission. Many analyse spam
in economic terms, and wish to levy a small charge
for sending each email. I consider an oft-proposed approach using computational “proof-of-work” that is
elegant and anonymity preserving. I show that, in a
world of high profit margins and insecure end-user machines, it is impossible to find a payment level that stops
the spam without affecting legitimate usage of email.
Finally, I consider a content-blocking system with a
hybrid design that has been deployed by a UK Internet
Service Provider to inhibit access to child pornography.
I demonstrate that the two-level design can be circumvented at either level, that content providers can use the
first level to attack the second, and that the selectivity
of the first level can be used as an “oracle” to extract a
list of the sites being blocked. Although many of these
attacks can be countered, there is an underlying failure that cannot be fixed. The system’s database holds
details of the traceability of content, as viewed from a
single location at a single time. However, a blocking
system may be deployed at many sites and must track
content as it moves in space and time; functions which
traceability, as currently realized, cannot deliver.
UCAM-CL-TR-654
Matthew J. Parkinson:
We begin by developing a formal semantics for
a core imperative subset of Java, Middleweight Java
(MJ), and then adapt separation logic to reason about
this subset. However, a naive adaption of separation
logic is unable to reason about encapsulation or inheritance: it provides no support for modularity.
First, we address the issue of encapsulation with the
novel concept of an abstract predicate, which is the logical analogue of an abstract datatype. We demonstrate
how this method can encapsulate state, and provide a
mechanism for ownership transfer: the ability to transfer state safely between a module and its client. We also
show how abstract predicates can be used to express
the calling protocol of a class.
However, the encapsulation provided by abstract
predicates is too restrictive for some applications. In
particular, it cannot reason about multiple datatypes
that have shared read-access to state, for example list
iterators. To compensate, we alter the underlying model
to allow the logic to express properties about read-only
references to state. Additionally, we provide a model
that allows both sharing and disjointness to be expressed directly in the logic.
Finally, we address the second modularity issue: inheritance. We do this by extending the concept of abstract predicates to abstract predicate families. This extension allows a predicate to have multiple definitions
that are indexed by class, which allows subclasses to
have a different internal representation while remaining
behavioural subtypes. We demonstrate the usefulness of
this concept by verifying a use of the visitor design pattern.
UCAM-CL-TR-655
Karen Spärck Jones:
Wearing proper combinations
November 2005, 27 pages, PDF
Abstract: This paper discusses the proper treatment of
multiple indexing fields, representations, or streams, in
document retrieval. Previous experiments by Robertson
and his colleagues have shown that, with a widely used
type of term weighting and fields that share keys, document scores should be computed using term frequencies
over fields rather than by combining field scores. Here
I examine a wide range of document and query indexing situations, and consider their implications for this
approach to document scoring.
Local reasoning for Java
UCAM-CL-TR-656
November 2005, 120 pages, PDF
PhD thesis (Churchill College, August 2005)
Pablo Vidales:
Abstract: This thesis develops the local reasoning ap- Seamless mobility in 4G systems
proach of separation logic for common forms of modularity such as abstract datatypes and objects. In partic- November 2005, 141 pages, PDF
ular, this thesis focuses on the modularity found in the PhD thesis (Girton College, May 2005)
Java programming language.
135
Abstract: The proliferation of radio access technologies, wireless networking devices, and mobile services
has encouraged intensive nomadic computing activity.
When travelling, mobile users experience connectivity
disturbances, particularly when they handoff between
two access points that belong to the same wireless network and when they change from one access technology to another. Nowadays, an average mobile user
might connect to many different wireless networks in
the course of a day to obtain diverse services, whilst demanding transparent operation. Current protocols offer portability and transparent mobility. However, they
fail to cope with huge delays caused by different linklayer characteristics when roaming between independent disparate networks. In this dissertation, I address
this deficiency by introducing and evaluating practical
methods and solutions that minimise connection disruptions and support transparent mobility in future
communication systems.
UCAM-CL-TR-657
Hyun-Jin Choi:
Security protocol design by
composition
January 2006, 155 pages, PDF
PhD thesis (Churchill College, December 2004)
Abstract: The aim of this research is to present a new
methodology for the systematic design of compound
protocols from their parts. Some security properties can
be made accumulative, i.e. can be put together without
interfering with one another, by carefully selecting the
mechanisms which implement them. Among them are
authentication, secrecy and non-repudiation. Based on
this observation, a set of accumulative protocol mechanisms called protocol primitives are proposed and their
correctness is verified. These protocol primitives are obtained from common mechanisms found in many security protocols such as challenge and response. They
have been carefully designed not to interfere with each
other. This feature makes them flexible building blocks
in the proposed methodology. Equipped with these protocol primitives, a scheme for the systematic construction of a complicated protocol from simple protocol
primitives is presented, namely, design by composition.
This design scheme allows the combination of several
simple protocol parts into a complicated protocol without destroying the security properties established by
each independent part. In other words, the composition
framework permits the specification of a complex protocol to be decomposed into the specifications of simpler components, and thus makes the design and verification of the protocol easier to handle. Benefits of this
approach are similar to those gained when using a modular approach to software development.
The applicability and practicality of the proposed
methodology are validated through many design examples of protocols found in many different environments
and with various initial assumptions. The method is not
aimed to cover all existent design issues, but a reasonable range of protocols is addressed.
UCAM-CL-TR-658
Carsten Moenning:
Intrinsic point-based surface
processing
January 2006, 166 pages, PDF
PhD thesis (Queens’ College, January 2005)
Abstract: The need for the processing of surface geometry represents an ubiquitous problem in computer
graphics and related disciplines. It arises in numerous important applications such as computer-aided design, reverse engineering, rapid prototyping, medical
imaging, cultural heritage acquisition and preservation,
video gaming and the movie industry. Existing surface
processing techniques predominantly follow an extrinsic approach using combinatorial mesh data structures
in the embedding Euclidean space to represent, manipulate and visualise the surfaces. This thesis advocates,
firstly, the intrinsic processing of surfaces, i.e. processing directly across the surface rather than in its embedding space. Secondly, it continues the trend towards the
use of point primitives for the processing and representation of surfaces.
The discussion starts with the design of an intrinsic point sampling algorithm template for surfaces. This
is followed by the presentation of a module library of
template instantiations for surfaces in triangular mesh
or point cloud form. The latter is at the heart of the
intrinsic meshless surface simplification algorithm also
put forward. This is followed by the introduction of
intrinsic meshless surface subdivision, the first intrinsic
meshless surface subdivision scheme and a new method
for the computation of geodesic centroids on manifolds. The meshless subdivision scheme uses an intrinsic neighbourhood concept for point-sampled geometry
also presented in this thesis. Its main contributions can
therefore be summarised as follows:
– An intrinsic neighbourhood concept for pointsampled geometry.
– An intrinsic surface sampling algorithm template
with sampling density guarantee.
– A modular library of template instantiations for
the sampling of planar domains and surfaces in triangular mesh or point cloud form.
– A new method for the computation of geodesic
centroids on manifolds.
– An intrinsic meshless surface simplification algorithm.
– The introduction of the notion of intrinsic meshless surface subdivision.
– The first intrinsic meshless surface subdivision
scheme.
136
The overall result is a set of algorithms for the processing of point-sampled geometry centering around
a generic sampling template for surfaces in the most
widely-used forms of representation. The intrinsic nature of these point-based algorithms helps to overcome
limitations associated with the more traditional extrinsic, mesh-based processing of surfaces when dealing
with highly complex point-sampled geometry as is typically encountered today.
UCAM-CL-TR-659
Viktor Vafeiadis, Maurice Herlihy,
Tony Hoare, Marc Shapiro:
A safety proof of a lazy concurrent
list-based set implementation
January 2006, 19 pages, PDF
Abstract: We prove the safety of a practical concurrent
list-based implementation due to Heller et al. It exposes
an interface of an integer set with methods contains,
add, and remove. The implementation uses a combination of fine-grain locking, optimistic and lazy synchronisation. Our proofs are hand-crafted. They use relyguarantee reasoning and thereby illustrate its power
and applicability, as well as some of its limitations. For
each method, we identify the linearisation point, and
establish its validity. Hence we show that the methods are safe, linearisable and implement a high-level
specification. This report is a companion document to
our PPoPP 2006 paper entitled “Proving correctness of
highly-concurrent linearisable objects”.
UCAM-CL-TR-660
Jeremy Singer:
be generalized to operate on any VRRS family member. Analysis properties such as accuracy and efficiency
depend on the underlying VRRS.
This dissertation makes four significant contributions to the field of static analysis research.
First, it develops the SSI representation. Although
SSI was introduced five years ago, it has not yet received widespread recognition as an interesting IR in
its own right. This dissertation presents a new SSI definition and an optimistic construction algorithm. It also
sets SSI in context among the broad range of IRs for
static analysis.
Second, it demonstrates how to reformulate existing data flow analyses using new sparse SSI-based
techniques. Examples include liveness analysis, sparse
type inference and program slicing. It presents algorithms, together with empirical results of these algorithms when implemented within a research compiler
framework.
Third, it provides the only major comparative evaluation of the merits of SSI for data flow analysis. Several
qualitative and quantitative studies in this dissertation
compare SSI with other similar IRs.
Last, it identifies the family of VRRSs, which are
all CFGs with different virtual register naming conventions. Many extant IRs are classified as VRRSs. Several
new IRs are presented, based on a consideration of previously unspecified members of the VRRS family. General analyses can operate on any family member. The
required level of accuracy or efficiency can be selected
by working in terms of the appropriate family member.
UCAM-CL-TR-661
Anna Ritchie:
Compatible RMRS representations
from RASP and the ERG
March 2006, 41 pages, PDF
Static program analysis based on
virtual register renaming
Abstract: Various applications could potentially benefit from the integration of deep and shallow processing techniques. A universal representation, compatiFebruary 2006, 183 pages, PDF
ble between deep and shallow parsers, would enable
PhD thesis (Christ’s College, March 2005)
such integration, allowing the advantages of both to be
combined. This paper describes efforts to make RMRS
Abstract: Static single assignment form (SSA) is a popu- such a representation. This work was done as part of
lar program intermediate representation (IR) for static DeepThought, funded under the 5th Framework Proanalysis. SSA programs differ from equivalent control gram of the European Commission (contract reference
flow graph (CFG) programs only in the names of vir- IST-2001-37836).
tual registers, which are systematically transformed to
comply with the naming convention of SSA. Static sinUCAM-CL-TR-662
gle information form (SSI) is a recently proposed extension of SSA that enforces a greater degree of sys- Ted Briscoe:
tematic virtual register renaming than SSA. This dissertation develops the principles, properties, and practice An introduction to tag sequence
of SSI construction and data flow analysis. Further, it
grammars and the RASP system
shows that SSA and SSI are two members of a larger
family of related IRs, which are termed virtual register parser
renaming schemes (VRRSs). SSA and SSI analyses can
March 2006, 30 pages, PDF
137
Abstract: This report describes the tag sequence grammars released as part of the Robust Accurate Statistical Parsing (RASP) system. It is intended to help users
of RASP understand the linguistic and engineering rationale behind the grammars and prepare them to customise the system for their application. It also contains
a fairly exhaustive list of references to extant work utilising the RASP parser.
technologist among social scientists, the outcome described in this report is intended for adoption as a kind
of social technology. I have given this product a name:
the “Blackwell-Leach Process” for interdisciplinary design. The Blackwell-Leach process has since been applied and proven useful in several novel situations, and
I believe is now sufficiently mature to justify publication of the reports that describe both the process and
its development.
UCAM-CL-TR-663
UCAM-CL-TR-665
Richard Bergmair:
Huiyun Li:
Syntax-driven analysis of
context-free languages with respect to Security evaluation at design time for
cryptographic hardware
fuzzy relational semantics
April 2006, 81 pages, PDF
March 2006, 49 pages, PDF
PhD thesis (Trinity Hall, December 2005)
Abstract: A grammatical framework is presented that
augments context-free production rules with semantic
production rules that rely on fuzzy relations as representations of fuzzy natural language concepts. It is
shown how the well-known technique of syntax-driven
semantic analysis can be used to infer from an expression in a language defined in such a semantically
augmented grammar a weak ordering on the possible
worlds it describes. Considering the application of natural language query processing, we show how to order
elements in the domain of a relational database scheme
according to the degree to which they fulfill the intuition behind a given natural language statement like
“Carol lives in a small city near San Francisco”.
UCAM-CL-TR-664
Alan F. Blackwell:
Designing knowledge:
An interdisciplinary experiment
in research infrastructure for
shared description
April 2006, 18 pages, PDF
Abstract: Consumer security devices are becoming
ubiquitous, from pay-TV through mobile phones, PDA,
prepayment gas meters to smart cards. There are many
ongoing research efforts to keep these devices secure
from opponents who try to retrieve key information
by observation or manipulation of the chip’s components. In common industrial practise, it is after the chip
has been manufactured that security evaluation is performed. Due to design time oversights, however, weaknesses are often revealed in fabricated chips. Furthermore, post manufacture security evaluation is time consuming, error prone and very expensive. This evokes
the need of “design time security evaluation” techniques in order to identify avoidable mistakes in design.
This thesis proposes a set of “design time security evaluation” methodologies covering the wellknown non-invasive side-channel analysis attacks, such
as power analysis and electromagnetic analysis attacks.
The thesis also covers the recently published semiinvasive optical fault injection attacks. These security
evaluation technologies examine the system under test
by reproducing attacks through simulation and observing its subsequent response.
The proposed “design time security evaluation”
methodologies can be easily implemented into the standard integrated circuit design flow, requiring only commonly used EDA tools. So it adds little non-recurrent
engineering (NRE) cost to the chip design but helps
identify the security weaknesses at an early stage,
avoids costly silicon re-spins, and helps succeed in industrial evaluation for faster time-to-market.
Abstract: The report presents the experimental development, evaluation and refinement of a method for doing
adventurous design work, in contexts where academics
must work in collaboration with corporate and public policy strategists and researchers. The intention has
been to do applied social science, in which a reflective
UCAM-CL-TR-666
research process has resulted in a “new social form”, as
expressed in the title of the research grant that funded
Mike Bond, George Danezis:
the project. The objective in doing so is not simply to
produce new theories, or to enjoy interdisciplinary en- A pact with the Devil
counters (although both of those have been side effects
of this work). My purpose in doing the work and writ- June 2006, 14 pages, PDF
ing this report is purely instrumental – working as a
138
Abstract: We study malware propagation strategies
which exploit not the incompetence or naivety of
users, but instead their own greed, malice and shortsightedness. We demonstrate that interactive propagation strategies, for example bribery and blackmail of
computer users, are effective mechanisms for malware
to survive and entrench, and present an example employing these techniques. We argue that in terms of
propagation, there exists a continuum between legitimate applications and pure malware, rather than a
quantised scale.
Because of its continuous nature, Atomic Broadcast is considered separately from other agreement abstractions. I first show that no algorithm can guarantee a latency of less than three communication steps in
all failure-free scenarios. Then, I present new Atomic
Broadcast algorithms that achieve the two-step latency
in some special cases, while still guaranteeing three
steps for other failure-free scenarios. The special cases
considered here are: Optimistic Atomic Broadcast, (Optimistic) Generic Broadcast, and closed-group Atomic
Broadcast. For each of these, I present an appropriate
algorithm and prove its latency to be optimal.
UCAM-CL-TR-667
UCAM-CL-TR-668
Piotr Zieliński:
Piotr Zieliński:
Minimizing latency of agreement
protocols
Optimistically Terminating
Consensus
June 2006, 239 pages, PDF
PhD thesis (Trinity Hall, September 2005)
June 2006, 35 pages, PDF
Abstract: Maintaining consistency of fault-tolerant distributed systems is notoriously difficult to achieve. It
often requires non-trivial agreement abstractions, such
as Consensus, Atomic Broadcast, or Atomic Commitment. This thesis investigates implementations of such
abstractions in the asynchronous model, extended with
unreliable failure detectors or eventual synchrony. The
main objective is to develop protocols that minimize
the number of communication steps required in failurefree scenarios but remain correct if failures occur. For
several agreement problems and their numerous variants, this thesis presents such low-latency algorithms
and lower-bound theorems proving their optimality.
The observation that many agreement protocols
share the same round-based structure helps to cope
with a large number of agreement problems in a uniform way. One of the main contributions of this thesis
is “Optimistically Terminating Consensus” (OTC) – a
new lightweight agreement abstraction that formalizes
the notion of a round. It is used to provide simple modular solutions to a large variety of agreement problems,
including Consensus, Atomic Commitment, and Interactive Consistency. The OTC abstraction tolerates malicious participants and has no latency overhead; agreement protocols constructed in the OTC framework require no more communication steps than their ad-hoc
counterparts.
The attractiveness of this approach lies in the fact
that the correctness of OTC algorithms can be tested
automatically. A theory developed in this thesis allows us to quickly evaluate OTC algorithm candidates
without the time-consuming examination of their entire state space. This technique is then used to scan
the space of possible solutions in order to automatically discover new low-latency OTC algorithms. From
these, one can now easily obtain new implementations
of Consensus and similar agreement problems such as
Atomic Commitment or Interactive Consistency.
Abstract: Optimistically Terminating Consensus (OTC)
is a variant of Consensus that decides if all correct processes propose the same value. It is surprisingly easy
to implement: processes broadcast their proposals and
decide if sufficiently many processes report the same
proposal. This paper shows an OTC-based framework
which can reconstruct all major asynchronous Consensus algorithms, even in Byzantine settings, with no overhead in latency or the required number of processes.
This result does not only deepen our understanding
of Consensus, but also reduces the problem of designing new, modular distributed agreement protocols to
choosing the parameters of OTC.
UCAM-CL-TR-669
David M. Eyers:
Active privilege management for
distributed access control systems
June 2006, 222 pages, PDF
PhD thesis (King’s College, June 2005)
Abstract: The last decade has seen the explosive uptake of technologies to support true Internet-scale distributed systems, many of which will require security.
The policy dictating authorisation and privilege restriction should be decoupled from the services being
protected: (1) policy can be given its own independent
language syntax and semantics, hopefully in an application independent way; (2) policy becomes portable –
it can be stored away from the services it protects; and
(3) the evolution of policy can be effected dynamically.
Management of dynamic privileges in wide-area distributed systems is a challenging problem. Supporting
fast credential revocation is a simple example of dynamic privilege management. More complex examples
139
include policies that are sensitive to the current state of
a principal, such as dynamic separation of duties.
The Open Architecture for Secure Interworking Services (OASIS), an expressive distributed role-based access control system, is traced to the development of the
Clinical and Biomedical Computing Limited (CBCL)
OASIS implementation. Two OASIS deployments are
discussed – an Electronic Health Record framework,
and an inter-organisational distributed courseware system.
The Event-based Distributed Scalable Authorisation
Control architecture for the 21st century (EDSAC21,
or just EDSAC) is then presented along with its four design layers. It builds on OASIS, adding support for the
collaborative enforcement of distributed dynamic constraints, and incorporating publish/subscribe messaging
to allow scalable and flexible deployment. The OASIS
policy language is extended to support delegation, dynamic separation of duties, and obligation policies.
An EDSAC prototype is examined. We show that
our architecture is ideal for experiments performed into
location-aware access control. We then demonstrate
how event-based features specific to EDSAC facilitate
integration of an ad hoc workflow monitor into an access control system.
The EDSAC architecture is powerful, flexible and
extensible. It is intended to have widespread applicability as the basis for designing next-generation security middleware and implementing distributed, dynamic privilege management.
UCAM-CL-TR-670
software. As a consequence, the newest and most powerful techniques have not been significantly applied to
hardware; this work seeks to make a modest contribution toward redressing the imbalance.
An abstract interpretation-based formalism is introduced, transitional logic, that supports formal reasoning about dynamic behaviour of combinational asynchronous circuits. The behaviour of majority voting circuits with respect to single-event transients is analysed,
demonstrating that such circuits are not SET-immune.
This result is generalised to show that SET immunity is
impossible for all delay-insensitive circuits.
An experimental hardware partial evaluator,
HarPE, is used to demonstrate the 1st Futamura
projection in hardware – a small CPU is specialised
with respect to a ROM image, yielding results that
are equivalent to compiling the program into hardware. HarPE is then used alongside an experimental
non-clausal SAT solver to implement an automated
transformation system that is capable of repairing
FPGAs that have suffered cosmic ray damage. This
approach is extended to support automated configuration, dynamic testing and dynamic error recovery of
reconfigurable spacecraft wiring harnesses.
UCAM-CL-TR-671
Piotr Zieliński:
Low-latency Atomic Broadcast in the
presence of contention
July 2006, 23 pages, PDF
Sarah Thompson:
On the application of program
analysis and transformation to
high reliability hardware
July 2006, 215 pages, PDF
PhD thesis (St Edmund’s College, April 2006)
Abstract: The Atomic Broadcast algorithm described in
this paper can deliver messages in two communication
steps, even if multiple processes broadcast at the same
time. It tags all broadcast messages with the local real
time, and delivers all messages in order of these timestamps. The Ω-elected leader simulates processes it suspects to have crashed (♦S). For fault-tolerance, it uses
a new cheap Generic Broadcast algorithm that requires
only a majority of correct processes (n > 2f) and, in
failure-free runs, delivers all non-conflicting messages
in two steps. The main algorithm satisfies several new
lower bounds, which are proved in this paper.
Abstract: Safety- and mission-critical systems must be
both correct and reliable. Electronic systems must behave as intended and, where possible, do so at the first
attempt – the fabrication costs of modern VLSI devices
are such that the iterative design/code/test methodolUCAM-CL-TR-672
ogy endemic to the software world is not financially
feasible. In aerospace applications it is also essential to
establish that systems will, with known probability, re- Calicrates Policroniades-Borraz:
main operational for extended periods, despite being
exposed to very low or very high temperatures, high Decomposing file data into
radiation, large G-forces, hard vacuum and severe vi- discernible items
bration.
Hardware designers have long understood the ad- August 2006, 230 pages, PDF
vantages of formal mathematical techniques. Notably, PhD thesis (Hughes Hall, December 2005)
model checking and automated theorem proving both
gained acceptance within the electronic design community at an early stage, though more recently the research
focus in validation and verification has drifted toward
140
Abstract: The development of the different persistent
data models shows a constant pattern: the higher the
level of abstraction a storage system exposes the greater
the payoff for programmers. The file API offers a simple
storage model that is agnostic of any structure or data
types in file contents. As a result, developers employ
substantial programming effort in writing persistent
code. At the other extreme, orthogonally persistent programming languages reduce the impedance mismatch
between the volatile and the persistent data spaces by
exposing persistent data as conventional programming
objects. Consequently, developers spend considerably
less effort in developing persistent code.
This dissertation addresses the lack of ability in the
file API to exploit the advantages of gaining access to
the logical composition of file content. It argues that the
trade-off between efficiency and ease of programmability of persistent code in the context of the file API is
unbalanced. Accordingly, in this dissertation I present
and evaluate two practical strategies to disclose structure and type in file data.
First, I investigate to what extent it is possible
to identify specific portions of file content in diverse
data sets through the implementation and evaluation of
techniques for data redundancy detection. This study is
interesting not only because it characterises redundancy
levels in storage systems content, but also because redundant portions of data at a sub-file level can be an
indication of internal file data structure. Although these
techniques have been used by previous work, my analysis of data redundancy is the first that makes an indepth comparison of them and highlights the trade-offs
in their employment.
Second, I introduce a novel storage system API,
called Datom, that departs from the view of file content as a monolithic object. Through a minimal set of
commonly-used abstract data types, it discloses a judicious degree of structure and type in the logical composition of files and makes the data access semantics
of applications explicit. The design of the Datom API
weighs the addition of advanced functionality and the
overheads introduced by their employment, taking into
account the requirements of the target application domain. The implementation of the Datom API is evaluated according to different criteria such as usability,
impact at the source-code level, and performance. The
experimental results demonstrate that the Datom API
reduces work-effort and improves software quality by
providing a storage interface based on high-level abstractions.
UCAM-CL-TR-673
Judita Preiss:
Probabilistic word sense
disambiguation
Analysis and techniques for
combining knowledge sources
August 2006, 108 pages, PDF
PhD thesis (Trinity College, July 2005)
Abstract: This thesis shows that probabilistic word
sense disambiguation systems based on established statistical methods are strong competitors to current stateof-the-art word sense disambiguation (WSD) systems.
We begin with a survey of approaches to WSD, and
examine their performance in the systems submitted to
the SENSEVAL-2 WSD evaluation exercise. We discuss
existing resources for WSD, and investigate the amount
of training data needed for effective supervised WSD.
We then present the design of a new probabilistic
WSD system. The main feature of the design is that
it combines multiple probabilistic modules using both
Dempster-Shafer theory and Bayes Rule. Additionally,
the use of Lidstone’s smoothing provides a uniform
mechanism for weighting modules based on their accuracy, removing the need for an additional weighting
scheme.
Lastly, we evaluate our probabilistic WSD system
using traditional evaluation methods, and introduce
a novel task-based approach. When evaluated on the
gold standard used in the SENSEVAL-2 competition,
the performance of our system lies between the first and
second ranked WSD system submitted to the English all
words task.
Task-based evaluations are becoming more popular in natural language processing, being an absolute
measure of a system’s performance on a given task. We
present a new evaluation method based on subcategorization frame acquisition. Experiments with our probabilistic WSD system give an extremely high correlation between subcategorization frame acquisition performance and WSD performance, thus demonstrating
the suitability of SCF acquisition as a WSD evaluation
task.
UCAM-CL-TR-674
Meng How Lim:
Landmark Guided Forwarding
October 2006, 109 pages, PDF
PhD thesis (St Catharine’s College, September 2006)
Abstract: Wireless mobile ad hoc network routing
presents some extremely challenging research problems. While primarily trying to provide connectivity,
algorithms may also be designed to minimise resource
consumption such as power, or to trade off global optimisation against the routing protocol overheads. In
this thesis, we focus on the problems of maintaining
network connectivity in the presence of node mobility whilst providing a balance between global efficiency
and robustness. The common design goal among existing wireless ad hoc routing solutions is to search for an
optimal topological path between a source and a destination for some shortest path metric. We argue that
the goal of establishing an end to end globally optimal
141
path is unsustainable as the network diameter, traffic
volume and number of nodes all increase in the presence of moderate node mobility.
Some researchers have proposed using geographic
position-based forwarding, rather than a topologicalbased approach. In position-based forwarding, besides
knowing about its own geographic location, every node
also acquires the geographic position of its surrounding neighbours. Packet delivery in general is achieved
by first learning the destination position from a location service. This is followed by addressing the packet
with the destination position before forwarding the
packet on to a neighbour that, amongst all other neighbours, is geographically nearest to the destination. It
is clear that in the ad hoc scenario, forwarding only
by geodesic position could result in situations that prevent the packet from advancing further. To resolve this,
some researchers propose improving delivery guarantees by routing the packet along a planar graph constructed from a Gabriel (GG) or a Relative Neighbour
Graph (RNG). This approach however has been shown
to fail frequently when position information is inherently inaccurate, or neighbourhood state is stale, such
as is the case in many plausible deployment scenarios,
e.g. due to relative mobility rates being higher than location service update frequency.
We propose Landmark Guided Forwarding (LGF),
an algorithm that harnesses the strengths of both topological and geographical routing algorithms. LGF is a
hybrid scheme that leverages the scaling property of the
geographic approach while using local topology knowledge to mitigate location uncertainty. We demonstrate
through extensive simulations that LGF is suited both
to situations where there are high mobility rates, and
deployment when there is inherently less accurate position data. Our results show that Landmark Guided
Forwarding converges faster, scales better and is more
flexible in a range of plausible mobility scenarios than
representative protocols from the leading classes of existing solutions, namely GPSR, AODV and DSDV.
UCAM-CL-TR-675
General Categorial Grammar (which uses the rules of
function application, function composition and Generalised Weak Permutation). The novel concept of Sentence Objects (simple strings, augmented strings, unlabelled structures and functor-argument structures) are
presented as potential points from which learning may
commence. Augmented strings (which are strings augmented with some basic syntactic information) are suggested as a sensible input to the CGL as they are cognitively plausible objects and have greater information
content than strings alone. Building on the work of
Siskind, a method for constructing augmented strings
from unordered logic forms is detailed and it is suggested that augmented strings are simply a representation of the constraints placed on the space of possible
parses due to a strings associated semantic content. The
CGL makes crucial use of a statistical Memory Module (constructed from a Type Memory and Word Order Memory) that is used to both constrain hypotheses
and handle data which is noisy or parametrically ambiguous. A consequence of the Memory Module is that
the CGL learns in an incremental fashion. This echoes
real child learning as documented in Browns Stages of
Language Development and also as alluded to by an included corpus study of child speech. Furthermore, the
CGL learns faster when initially presented with simpler
linguistic data; a further corpus study of child-directed
speech suggests that this echos the input provided to
children. The CGL is demonstrated to learn from real
data. It is evaluated against previous parametric learners (the Triggering Learning Algorithm of Gibson and
Wexler and the Structural Triggers Learner of Fodor
and Sakas) and is found to be more efficient.
UCAM-CL-TR-676
R.J. Gibbens, Y. Saacti:
Road traffic analysis using MIDAS
data: journey time prediction
December 2006, 35 pages, PDF
Department for Transport Horizons Research
Programme “Investigating the handling of large
transport related datasets” (project number H05-217)
Paula J. Buttery:
Computational models for first
language acquisition
November 2006, 176 pages, PDF
PhD thesis (Churchill College, March 2006)
Abstract: This work investigates a computational
model of first language acquisition; the Categorial
Grammar Learner or CGL. The model builds on the
work of Villavicenio, who created a parametric Categorial Grammar learner that organises its parameters
into an inheritance hierarchy, and also on the work
of Buszkowski and Kanazawa, who demonstrated the
learnability of a k-valued Classic Categorial Grammar (which uses only the rules of function application) from strings. The CGL is able to learn a k-valued
Abstract: The project described in this report was undertaken within the Department for Transport’s second call for proposals in the Horizons research programme under the theme of “Investigating the handling
of large transport related datasets”. The project looked
at the variability of journey times across days in three
day categories: Mondays, midweek days and Fridays.
Two estimators using real-time data were considered:
a simple-to-implement regression-based method and a
more computationally demanding k-nearest neighbour
method. Our example scenario of UK data was taken
from the M25 London orbital motorway during 2003
and the results compared in terms of the root-meansquare prediction error. It was found that where the
142
variability was greatest (typically during the rush hours
periods or periods of flow breakdowns) the regression
and nearest neighbour estimators reduced the prediction error substantially compared with a naive estimator constructed from the historical mean journey time.
Only as the lag between the decision time and the journey start time increased to beyond around 2 hours did
the potential to improve upon the historical mean estimator diminish. Thus, there is considerable scope for
prediction methods combined with access to real-time
data to improve the accuracy in journey time estimates.
In so doing, they reduce the uncertainty in estimating
the generalized cost of travel. The regression-based prediction estimator has a particularly low computational
overhead, in contrast to the nearest neighbour estimator, which makes it entirely suitable for an online implementation. Finally, the project demonstrates both the
value of preserving historical archives of transport related datasets as well as provision of access to real-time
measurements.
UCAM-CL-TR-677
Eiko Yoneki:
ECCO: Data centric asynchronous
communication
December 2006, 210 pages, PDF
PhD thesis (Lucy Cavendish College, September 2006)
Abstract: This dissertation deals with data centric networking in distributed systems, which relies on content addressing instead of host addressing for participating nodes, thus providing network independence
for applications. Publish/subscribe asynchronous group
communication realises the vision of data centric networking that is particularly important for networks
supporting mobile clients over heterogeneous wireless
networks. In such networks, client applications prefer to receive specific data and require selective data
dissemination. Underlying mechanisms such as asynchronous message passing, distributed message filtering
and query/subscription management are essential. Furthermore, recent progress in wireless sensor networks
brought a new dimension of data processing in ubiquitous computing, where the sensors are used to gather
high volumes of different data types and to feed them
as contexts to a wide range of applications.
Particular emphasis has been placed on fundamental design of event representation. Besides existing event
attributes, event order, and continuous context information such as time or geographic location can be
incorporated within an event description. Data representation of event and query will be even more important in future ubiquitous computing, where events
flow over heterogeneous networks. This dissertation
presents a multidimensional event representation (i.e.,
Hypercube structure in RTree) for efficient indexing,
filtering, matching, and scalability in publish/subscribe
systems. The hypercube event with a typed contentbased publish/subscribe system for wide-area networks
is demonstrated for improving the event filtering process.
As a primary focus, this dissertation investigates a
structureless, asynchronous group communication over
wireless ad hoc networks named ‘ECCO Pervasive Publish/Subscribe’ (ECCO-PPS). ECCO-PPS uses contextadaptive controlled flooding, which takes a cross-layer
approach between middleware and network layers and
provides a content-based publish/subscribe paradigm.
Traditionally events have been payload data within
network layer components; the network layer never
touches the data contents. However, application data
have more influence on data dissemination in ubiquitous computing scenarios.
The state information of the local node may be
the event forwarding trigger. Thus, the model of
publish/subscribe must become more symmetric, with
events being disseminated based on rules and conditions defined by the events themselves. The event can
thus choose the destinations instead of relying on the
potential receivers’ decision. The publish/subscribe system offers a data centric approach, where the destination address is not described with any explicit network
address. The symmetric publish/subscribe paradigm
brings another level to the data-centric paradigm, leading to a fundamental change in functionality at the network level of asynchronous group communication and
membership maintenance.
To add an additional dimension of event processing in global computing, It is important to understand
event aggregation, filtering and correlation. Temporal
ordering of events is essential for event correlation
over distributed systems. This dissertation introduces
generic composite event semantics with interval-based
semantics for event detection. This precisely defines
complex timing constraints among correlated event instances.
In conclusion, this dissertation provides advanced
data-centric asynchronous communication, which provides efficiency, reliability, and robustness, while adapting to the underlying network environments.
UCAM-CL-TR-678
Andrew D. Twigg:
Compact forbidden-set routing
December 2006, 115 pages, PDF
PhD thesis (King’s College, June 2006)
Abstract: We study the compact forbidden-set routing problem. We describe the first compact forbiddenset routing schemes that do not suffer from nonconvergence problems often associated with BellmanFord iterative schemes such as the interdomain routing
protocol, BGP. For degree-d n-node graphs of treewidth
t, our schemes use space O(t2 d polylog(n)) bits per
node; a trivial scheme uses O(n2 ) and routing trees
143
use Ω(n) per node (these results have since been improved and extended – see [Courcelle, Twigg, Compact
forbidden-set routing, 24th Symposium on Theoretical
Aspects of Computer Science, Aachen 2007]. We also
show how to do forbidden-set routing on planar graphs
between nodes whose distance is less than a parameter
l. We prove a lower bound on the space requirements of
forbidden-set routing for general graphs, and show that
the problem is related to constructing an efficient distributed representation of all the separators of an undirected graph. Finally, we consider routing while taking
into account path costs of intermediate nodes and show
that this requires large routing labels. We also study a
novel way of approximating forbidden-set routing using quotient graphs of low treewidth.
UCAM-CL-TR-679
Karen Spärck Jones:
Automatic summarising:
a review and discussion of the state of
the art
UCAM-CL-TR-680
Jing Su, James Scott, Pan Hui, Eben Upton,
Meng How Lim, Christophe Diot,
Jon Crowcroft, Ashvin Goel, Eyal de Lara:
Haggle: Clean-slate networking for
mobile devices
January 2007, 30 pages, PDF
Abstract: Haggle is a layerless networking architecture
for mobile devices. It is motivated by the infrastructure dependence of applications such as email and web
browsing, even in situations where infrastructure is not
necessary to accomplish the end user goal, e.g. when the
destination is reachable by ad hoc neighbourhood communication. In this paper we present details of Haggle’s architecture, and of the prototype implementation
which allows existing email and web applications to become infrastructure-independent, as we show with an
experimental evaluation.
UCAM-CL-TR-681
January 2007, 67 pages, PDF
Abstract: This paper reviews research on automatic
summarising over the last decade. This period has seen
a rapid growth of work in the area stimulated by technology and by several system evaluation programmes.
The review makes use of several frameworks to organise the review, for summarising, for systems, for the task
factors affecting summarising, and for evaluation design and practice.
The review considers the evaluation strategies that
have been applied to summarising and the issues they
raise, and the major summary evaluation programmes.
It examines the input, purpose and output factors that
have been investigated in summarising research in the
last decade, and discusses the classes of strategy, both
extractive and non-extractive, that have been explored,
illustrating the range of systems that have been built.
This analysis of strategies is amplified by accounts of
specific exemplar systems.
The conclusions drawn from the review are that
automatic summarisation research has made valuable
progress in the last decade, with some practically useful
approaches, better evaluation, and more understanding
of the task. However as the review also makes clear,
summarising systems are often poorly motivated in relation to the factors affecting summaries, and evaluation needs to be taken significantly further so as to
engage with the purposes for which summaries are intended and the contexts in which they are used.
A reduced version of this report, entitled ‘Automatic
summarising: the state of the art’ will appear in Information Processing and Management, 2007.
Piotr Zieliński:
Indirect channels:
a bandwidth-saving technique for
fault-tolerant protocols
April 2007, 24 pages, PDF
Abstract: Sending large messages known to the recipient is a waste of bandwidth. Nevertheless, many faulttolerant agreement protocols send the same large message between each pair of participating processes. This
practical problem has recently been addressed in the
context of Atomic Broadcast by presenting a specialized algorithm.
This paper proposes a more general solution by providing virtual indirect channels that physically transmit
message ids instead of full messages if possible. Indirect
channels are transparent to the application; they can be
used with any distributed algorithm, even with unreliable channels or malicious participants. At the same
time, they provide rigorous theoretical properties.
Indirect channels are conservative: they do not allow manipulating message ids if full messages are not
known. This paper also investigates the consequences
of relaxing this assumption on the latency and correctness of Consensus and Atomic Broadcast implementations: new algorithms and lower bounds are shown.
144
UCAM-CL-TR-682
Juliano Iyoda:
Translating HOL functions to
hardware
April 2007, 89 pages, PDF
PhD thesis (Hughes Hall, October 2006)
Abstract: Delivering error-free products is still a major
challenge for hardware and software engineers. Due to
the increasingly growing complexity of computing systems, there is a demand for higher levels of automation
in formal verification.
This dissertation proposes an approach to generate
formally verified circuits automatically. The main outcome of our project is a compiler implemented on top
of the theorem prover HOL4 which translates a subset
of higher-order logic to circuits. The subset of the logic
is a first-order tail-recursive functional language. The
compiler takes a function f as argument and automatically produces the theorem “` C implements f” where
C is a circuit and “implements” is a correctness relation between a circuit and a function. We achieve full
mechanisation of proofs by defining theorems which
are composable. The correctness of a circuit can be
mechanically determined by the correctness of its subcircuits. This technology allows the designer to focus
on higher levels of abstraction instead of reasoning and
verifying systems at the gate level.
A pretty-printer translates netlists described in
higher-order logic to structural Verilog. Our compiler
is integrated with Altera tools to run our circuits in
FPGAs. Thus the theorem prover is used as an environment for supporting the development process from
formal specification to implementation.
Our approach has been tested with fairly substantial case studies. We describe the design and the
verification of a multiplier and a simple microcomputer which has shown us that the compiler supports
small and medium-sized applications. Although this approach does not scale to industrial-sized applications
yet, it is a first step towards the implementation of a
new technology that can raise the level of mechanisation in formal verification.
a triangle mesh, and may be connected with a variety
of different joints. Joints are represented by constraint
functions which are solved at run-time using Lagrange
multipliers. The simulation performs collision detection
and prevents penetration of rigid bodies by applying
impulses to colliding bodies and reaction forces to bodies in resting contact.
The simulation is shown to be physically accurate
and is tested on several different scenes, including one
of an articulated human character falling down a flight
of stairs.
An appendix describes how to derive arbitrary constraint functions for the Lagrange multiplier method.
Collisions and joints are both represented as constraints, which allows them to be handled with a unified
algorithm. The report also includes some results relating to the use of quaternions in dynamic simulations.
UCAM-CL-TR-684
Pan Hui, Jon Crowcroft:
Bubble Rap: Forwarding in small
world DTNs in ever decreasing circles
May 2007, 44 pages, PDF
Abstract: In this paper we seek to improve understanding of the structure of human mobility, and to use this
in the design of forwarding algorithms for Delay Tolerant Networks for the dissemination of data amongst
mobile users.
Cooperation binds but also divides human society
into communities. Members of the same community
interact with each other preferentially. There is structure in human society. Within society and its communities, individuals have varying popularity. Some people are more popular and interact with more people
than others; we may call them hubs. Popularity ranking is one facet of the population. In many physical
networks, some nodes are more highly connected to
each other than to the rest of the network. The set
of such nodes are usually called clusters, communities,
cohesive groups or modules. There is structure to social networking. Different metrics can be used such as
information flow, Freeman betweenness, closeness and
inference power, but for all of them, each node in the
UCAM-CL-TR-683
network can be assigned a global centrality value.
What can be inferred about individual popularity,
Martin Kleppmann:
and the structure of human society from measurements
within a network? How can the local and global charSimulation of colliding constrained
acteristics of the network be used practically for inforrigid bodies
mation dissemination? We present and evaluate a sequence of designs for forwarding algorithms for Pocket
April 2007, 65 pages, PDF
Switched Networks, culminating in Bubble, which exAbstract: I describe the development of a program to ploit increasing levels of information about mobility
simulate the dynamic behaviour of interacting rigid and interaction.
bodies. Such a simulation may be used to generate animations of articulated characters in 3D graphics applications. Bodies may have an arbitrary shape, defined by
145
UCAM-CL-TR-685
John Daugman, Cathryn Downing:
Effect of severe image compression on
iris recognition performance
May 2007, 20 pages, PDF
Abstract: We investigate three schemes for severe compression of iris images, in order to assess what their
impact would be on recognition performance of the algorithms deployed today for identifying persons by this
biometric feature. Currently, standard iris images are
600 times larger than the IrisCode templates computed
from them for database storage and search; but it is administratively desired that iris data should be stored,
transmitted, and embedded in media in the form of images rather than as templates computed with proprietary algorithms. To reconcile that goal with its implications for bandwidth and storage, we present schemes
that combine region-of-interest isolation with JPEG
and JPEG2000 compression at severe levels, and we test
them using a publicly available government database
of iris images. We show that it is possible to compress
iris images to as little as 2 KB with minimal impact
on recognition performance. Only some 2% to 3% of
the bits in the IrisCode templates are changed by such
severe image compression. Standard performance metrics such as error trade-off curves document very good
recognition performance despite this reduction in data
size by a net factor of 150, approaching a convergence
of image data size and template size.
UCAM-CL-TR-686
Andrew C. Rice:
Dependable systems for Sentient
Computing
May 2007, 150 pages, PDF
Abstract: Computers and electronic devices are continuing to proliferate throughout our lives. Sentient
Computing systems aim to reduce the time and effort
required to interact with these devices by composing
them into systems which fade into the background of
the user’s perception. Failures are a significant problem
in this scenario because their occurrence will pull the
system into the foreground as the user attempts to discover and understand the fault. However, attempting
to exist and interact with users in a real, unpredictable,
physical environment rather than a well-constrained
virtual environment makes failures inevitable.
This dissertation describes a study of dependability.
A dependable system permits applications to discover
the extent of failures and to adapt accordingly such
that their continued behaviour is intuitive to users of
the system.
Cantag, a reliable marker-based machine-vision system, has been developed to aid the investigation of dependability. The description of Cantag includes specific
contributions for marker tracking such as rotationally
invariant coding schemes and reliable back-projection
for circular tags. An analysis of Cantag’s theoretical
performance is presented and compared to its realworld behaviour. This analysis is used to develop optimised tag designs and performance metrics. The use
of validation is proposed to permit runtime calculation
of observable metrics and verification of system components. Formal proof methods are combined with a
logical validation framework to show the validity of
performance optimisations.
UCAM-CL-TR-687
Viktor Vafeiadis, Matthew Parkinson:
A marriage of rely/guarantee and
separation logic
June 2007, 31 pages, PDF
Abstract: In the quest for tractable methods for reasoning about concurrent algorithms both rely/guarantee
logic and separation logic have made great advances.
They both seek to tame, or control, the complexity of
concurrent interactions, but neither is the ultimate approach. Rely-guarantee copes naturally with interference, but its specifications are complex because they describe the entire state. Conversely separation logic has
difficulty dealing with interference, but its specifications
are simpler because they describe only the relevant state
that the program accesses.
We propose a combined system which marries the
two approaches. We can describe interference naturally
(using a relation as in rely/guarantee), and where there
is no interference, we can reason locally (as in separation logic). We demonstrate the advantages of the combined approach by verifying a lock-coupling list algorithm, which actually disposes/frees removed nodes.
UCAM-CL-TR-688
Sam Staton:
Name-passing process calculi:
operational models and structural
operational semantics
June 2007, 245 pages, PDF
PhD thesis (Girton College, December 2006)
Abstract: This thesis is about the formal semantics
of name-passing process calculi. We study operational
models by relating various different notions of model,
and we analyse structural operational semantics by extracting a congruence rule format from a model theory. All aspects of structural operational semantics are
146
addressed: behaviour, syntax, and rule-based inductive
definitions.
A variety of models for name-passing behaviour are
considered and developed. We relate classes of indexed
labelled transition systems, proposed by Cattani and
Sewell, with coalgebraic models proposed by Fiore and
Turi. A general notion of structured coalgebra is introduced and developed, and a natural notion of structured bisimulation is related to Sangiorgi’s open bisimulation for the π-calculus. At first the state spaces are
organised as presheaves, but it is reasonable to constrain the models to sheaves in a category known as the
Schanuel topos. This sheaf topos is exhibited as equivalent to a category of named-sets proposed by Montanari and Pistore for efficient verification of namepassing systems.
Syntax for name-passing calculi involves variable
binding and substitution. Gabbay and Pitts proposed
nominal sets as an elegant model for syntax with binding, and we develop a framework for substitution in
this context. The category of nominal sets is equivalent
to the Schanuel topos, and so syntax and behaviour can
be studied within one universe.
An abstract account of structural operational semantics was developed by Turi and Plotkin. They explained the inductive specification of a system by rules
in the GSOS format of Bloom et al., in terms of initial algebra recursion for lifting a monad of syntax to
a category of behaviour. The congruence properties of
bisimilarity can be observed at this level of generality.
We study this theory in the general setting of structured coalgebras, and then for the specific case of namepassing systems, based on categories of nominal sets.
At the abstract level of category theory, classes of
rules are understood as natural transformations. In the
concrete domain, though, rules for name-passing systems are formulae in a suitable logical framework. By
imposing a format on rules in Pitts’s nominal logic, we
characterise a subclass of rules in the abstract domain.
Translating the abstract results, we conclude that, for a
name-passing process calculus defined by rules in this
format, a variant of open bisimilarity is a congruence.
UCAM-CL-TR-689
Ursula H. Augsdörfer, Neil A. Dodgson,
Malcolm A. Sabin:
Removing polar rendering artifacts in
subdivision surfaces
June 2007, 7 pages, PDF
Abstract: There is a belief that subdivision schemes require the subdominant eigenvalue, λ, to be the same
around extraordinary vertices as in the regular regions of the mesh. This belief is owing to the polar
rendering artifacts which occur around extraordinary
points when λ is significantly larger than in the regular regions. By constraining the tuning of subdivision
schemes to solutions which fulfill this condition we may
prevent ourselves from finding the optimal limit surface. We show that the perceived problem is purely a
rendering artifact and that it does not reflect the quality of the underlying limit surface. Using the bounded
curvature Catmull-Clark scheme as an example, we describe three practical methods by which this rendering
artifact can be removed, thereby allowing us to tune
subdivision schemes using any appropriate values of λ.
UCAM-CL-TR-690
Russell Glen Ross:
Cluster storage for commodity
computation
June 2007, 178 pages, PDF
PhD thesis (Wolfson College, December 2006)
Abstract: Standards in the computer industry have
made basic components and entire architectures into
commodities, and commodity hardware is increasingly
being used for the heavy lifting formerly reserved for
specialised platforms. Now software and services are
following. Modern updates to virtualization technology
make it practical to subdivide commodity servers and
manage groups of heterogeneous services using commodity operating systems and tools, so services can be
packaged and managed independent of the hardware
on which they run. Computation as a commodity is
soon to follow, moving beyond the specialised applications typical of today’s utility computing.
In this dissertation, I argue for the adoption of service clusters—clusters of commodity machines under
central control, but running services in virtual machines
for arbitrary, untrusted clients—as the basic building
block for an economy of flexible commodity computation. I outline the requirements this platform imposes
on its storage system and argue that they are necessary
for service clusters to be practical, but are not found in
existing systems.
Next I introduce Envoy, a distributed file system for
service clusters. In addition to meeting the needs of a
new environment, Envoy introduces a novel file distribution scheme that organises metadata and cache management according to runtime demand. In effect, the
file system is partitioned and control of each part given
to the client that uses it the most; that client in turn
acts as a server with caching for other clients that require concurrent access. Scalability is limited only by
runtime contention, and clients share a perfectly consistent cache distributed across the cluster. As usage patterns change, the partition boundaries are updated dynamically, with urgent changes made quickly and more
minor optimisations made over a longer period of time.
Experiments with the Envoy prototype demonstrate
that service clusters can support cheap and rapid deployment of services, from isolated instances to groups
of cooperating components with shared storage demands.
147
UCAM-CL-TR-691
Neil A. Dodgson, Malcolm A. Sabin,
Richard Southern:
Preconditions on geometrically
sensitive subdivision schemes
August 2007, 13 pages, PDF
Abstract: Our objective is to create subdivision schemes
with limit surfaces which are surfaces useful in engineering (spheres, cylinders, cones etc.) without resorting to special cases. The basic idea explored by us previously in the curve case is that if the property that all
vertices lie on an object of the required class can be
preserved through the subdivision refinement, it will be
preserved into the limit surface also. The next obvious
step was to try a bivariate example. We therefore identified the simplest possible scheme and implemented it.
However, this misbehaved quite dramatically. This report, by doing the limit analysis, identifies why the misbehaviour occurred, and draws conclusions about how
the problems should be avoided.
UCAM-CL-TR-692
Alan F. Blackwell:
Toward an undergraduate
programme in Interdisciplinary
Design
July 2007, 13 pages, PDF
Abstract: Eventual failure detectors, such as Ω or ♦P,
can make arbitrarily many mistakes before they start
providing correct information. This paper shows that
any detector implementable in an purely asynchronous
system can be implemented as a function of only the
order of most-recently heard-from processes. The finiteness of this representation means that eventual failure
detectors can be enumerated and their relative strengths
tested automatically. The results for systems with two
and three processes are presented.
Implementability can also be modelled as a game
between Prover and Disprover. This approach not only
speeds up automatic implementability testing, but also
results in shorter and more intuitive proofs. I use this
technique to identify the new weakest failure detector
anti-Ω and prove its properties. Anti-Ω outputs process
ids and, while not necessarily stabilizing, it ensures that
some correct process is eventually never output.
UCAM-CL-TR-694
Piotr Zieliński:
Anti-Ω: the weakest failure detector
for set agreement
July 2007, 24 pages, PDF
Abstract: In the set agreement problem, n processes
have to decide on at most n−1 of the proposed values.
This paper shows that the anti-Ω failure detector is both
sufficient and necessary to implement set agreement in
an asynchronous shared-memory system equipped with
registers. Each query to anti-Ω returns a single process
id; the specification ensures that there is a correct process whose id is returned only finitely many times.
Abstract: This technical report describes an experimenUCAM-CL-TR-695
tal syllabus proposal that was developed for the Cambridge Computer Science Tripos (the standard undergraduate degree programme in Computer Science at Karen Su, Inaki Berenguer, Ian J. Wassell,
Cambridge). The motivation for the proposal was to Xiaodong Wang:
create an innovative research-oriented taught course
that would be compatible with the broader policy goals Efficient maximum-likelihood
of the Crucible network for research in interdisciplinary
design. As the course is not proceeding, the syllabus is decoding of spherical lattice codes
published here for use by educators and educational re- July 2007, 29 pages, PDF
searchers with interests in design teaching.
UCAM-CL-TR-693
Piotr Zieliński:
Automatic classification of eventual
failure detectors
July 2007, 21 pages, PDF
Abstract: A new framework for efficient and exact
Maximum-Likelihood (ML) decoding of spherical lattice codes is developed. It employs a double-tree structure: The first is that which underlies established treesearch decoders; the second plays the crucial role of
guiding the primary search by specifying admissible
candidates and is our focus in this report. Lattice
codes have long been of interest due to their rich
structure, leading to numerous decoding algorithms
for unbounded lattices, as well as those with axisaligned rectangular shaping regions. Recently, spherical Lattice Space-Time (LAST) codes were proposed
to realize the optimal diversity-multiplexing tradeoff of
148
MIMO channels. We address the so-called boundary
control problem arising from the spherical shaping region defining these codes. This problem is complicated
because of the varying number of candidates potentially
under consideration at each search stage; it is not obvious how to address it effectively within the frameworks
of existing schemes. Our proposed strategy is compatible with all sequential tree-search detectors, as well
as auxiliary processing such as the MMSE-GDFE and
lattice reduction. We demonstrate the superior performance and complexity profiles achieved when applying
the proposed boundary control in conjunction with two
current efficient ML detectors and show an improvement of 1dB over the state-of-the-art at a comparable
complexity.
UCAM-CL-TR-696
Oliver J. Woodman:
An introduction to inertial navigation
August 2007, 37 pages, PDF
Abstract: Until recently the weight and size of inertial sensors has prohibited their use in domains
such as human motion capture. Recent improvements
in the performance of small and lightweight micromachined electromechanical systems (MEMS) inertial
sensors have made the application of inertial techniques
to such problems possible. This has resulted in an increased interest in the topic of inertial navigation, however current introductions to the subject fail to sufficiently describe the error characteristics of inertial systems.
We introduce inertial navigation, focusing on strapdown systems based on MEMS devices. A combination
of measurement and simulation is used to explore the
error characteristics of such systems. For a simple inertial navigation system (INS) based on the Xsens Mtx
inertial measurement unit (IMU), we show that the average error in position grows to over 150 m after 60
seconds of operation. The propagation of orientation
errors caused by noise perturbing gyroscope signals is
identified as the critical cause of such drift. By simulation we examine the significance of individual noise
processes perturbing the gyroscope signals, identifying
white noise as the process which contributes most to
the overall drift of the system.
Sensor fusion and domain specific constraints can
be used to reduce drift in INSs. For an example INS
we show that sensor fusion using magnetometers can
reduce the average error in position obtained by the
system after 60 seconds from over 150 m to around
5 m. We conclude that whilst MEMS IMU technology
is rapidly improving, it is not yet possible to build a
MEMS based INS which gives sub-meter position accuracy for more than one minute of operation.
UCAM-CL-TR-697
Chris J. Purcell:
Scaling Mount Concurrency:
scalability and progress in concurrent
algorithms
August 2007, 155 pages, PDF
PhD thesis (Trinity College, July 2007)
Abstract: As processor speeds plateau, chip manufacturers are turning to multi-processor and multi-core designs to increase performance. As the number of simultaneous threads grows, Amdahl’s Law means the performance of programs becomes limited by the cost that
does not scale: communication, via the memory subsystem. Algorithm design is critical in minimizing these
costs.
In this dissertation, I first show that existing instruction set architectures must be extended to allow general
scalable algorithms to be built. Since it is impractical
to entirely abandon existing hardware, I then present
a reasonably scalable implementation of a map built
on the widely-available compare-and-swap primitive,
which outperforms existing algorithms for a range of
usages.
Thirdly, I introduce a new primitive operation, and
show that it provides efficient and scalable solutions to
several problems before proving that it satisfies strong
theoretical properties. Finally, I outline possible hardware implementations of the primitive with different
properties and costs, and present results from a hardware evaluation, demonstrating that the new primitive
can provide good practical performance.
UCAM-CL-TR-698
Simon J. Hollis:
Pulse-based, on-chip interconnect
September 2007, 186 pages, PDF
PhD thesis (Queens’ College, June 2007)
Abstract: This thesis describes the development of an
on-chip point-to-point link, with particular emphasis
on the reduction of its global metal area footprint.
To reduce its metal footprint, the interconnect uses
a serial transmission approach. 8-bit data is sent using just two wires, through a pulse-based technique,
inspired by the GasP interconnect from Sun Microsystems. Data and control signals are transmitted bidirectionally on a wire using this double-edged, pulsebased signalling protocol, and formatted using a variant of dual-rail encoding. These choices enable a reduction in the number of wires needed, an improvement
in the acknowledgement overhead of the asynchronous
protocol, and the ability to cross clock domains without synchronisation hazards.
149
New, stateful, repeaters are demonstrated, and results from spice simulations of the system show that
data can be transferred at over 1Gbit/s, over 1mm of
minimum-sized, minimally-spaced metal 5 wiring, on
a 180nm (0.18um) technology. This reduces to only
926Mbit/s, when 10mm of wiring is considered, and
represents a channel utilisation of a very attractive 45%
of theoretical capacity at this length. Analysis of latencies, energy consumption, and area use are also provided.
The point-to-point link is then expanded with the
invention and demonstration of a router and an arbitrated merge element, to produce a Network-on-Chip
(NoC) design, called RasP. The full system is then evaluated, and peak throughput is shown to be 763Mbit/s
for 1mm of wiring, reducing to 599Mbit/s for 10mm of
the narrow metal 5 interconnect.
Finally, RasP is compared in performance with the
Chain interconnect from the University of Manchester.
Results for the metrics of throughput, latency, energy
consumption and area footprint show that the two systems perform very similarly — the maximum absolute deviation is under 25% for throughput, latency
and area; and the energy-efficiency of RasP is approximately twice that of Chain. Between the two systems,
RasP has the smaller latency, energy and area requirements and is shown to be a viable alternative NoC design.
UCAM-CL-TR-699
Richard Southern, Neil A. Dodgson:
A smooth manifold based
construction of approximating lofted
surfaces
October 2007, 17 pages, PDF
Abstract: We present a new method for constructing
a smooth manifold approximating a curve network or
control mesh. In our two-step method, smooth vertex patches are initially defined by extrapolating and
then blending a univariate or bivariate surface representation. Each face is then constructed by blending together the segments of each vertex patch corresponding
to the face corners. By approximating the input curve
network, rather than strictly interpolating it, we have
greater flexibility in controlling surface behaviour and
have local control. Additionally no initial control mesh
fitting or fairing needs to be performed, and no derivative information is needed to ensure continuity at patch
boundaries.
UCAM-CL-TR-700
Maja Vuković:
Context aware service composition
October 2007, 225 pages, PDF
PhD thesis (Newnham College, April 2006)
Abstract: Context aware applications respond and
adapt to changes in the computing environment. For
example, they may react when the location of the user
or the capabilities of the device used change. Despite
the increasing importance and popularity of such applications, advances in application models to support
their development have not kept up. Legacy application
design models, which embed contextual dependencies
in the form of if-then rules specifying how applications
should react to context changes, are still widely used.
Such models are impractical to accommodate the large
variety of possibly even unanticipated context types
and their values.
This dissertation proposes a new application model
for building context aware applications, considering
them as dynamically composed sequences of calls
to services, software components that perform welldefined computational operations and export open interfaces through which they can be invoked. This work
employs goal-oriented inferencing from planning technologies for selecting the services and assembling the
sequence of their execution, allowing different compositions to result from different context parameters such
as resources available, time constraints, and user location. Contextual changes during the execution of the
services may trigger further re-composition causing the
application to evolve dynamically.
An important challenge in providing a context
aware service composition facility is dealing with failures that may occur, for instance as a result of context changes or missing service descriptions. To handle
composition failures, this dissertation introduces GoalMorph, a system which transforms failed composition
requests into alternative ones that can be solved.
This dissertation describes the design and implementation of the proposed framework for context
aware service composition. Experimental evaluation
of a realistic infotainment application demonstrates
that the framework provides an effcient and scalable solution. Furthermore, it shows that GoalMorph
transforms goals successfully, increasing the utility of
achieved goals without imposing a prohibitive composition time overhead.
By developing the proposed framework for faulttolerant, context aware service composition this work
ultimately lowers the barrier for building extensible applications that automatically adapt to the user’s context. This represents a step towards a new paradigm
for developing adaptive software to accommodate the
increasing dynamicity of computing environments.
UCAM-CL-TR-701
Jacques Jean-Alain Fournier:
Vector microprocessors for
cryptography
October 2007, 174 pages, PDF
PhD thesis (Trinity Hall, April 2007)
150
Abstract: Embedded security devices like ‘Trusted Platforms’ require both scalability (of power, performance
and area) and flexibility (of software and countermeasures). This thesis illustrates how data parallel techniques can be used to implement scalable architectures
for cryptography. Vector processing is used to provide high performance, power efficient and scalable
processors. A programmable vector 4-stage pipelined
co-processor, controlled by a scalar MIPS compatible
processor, is described. The instruction set of the coprocessor is defined for cryptographic algorithms like
AES and Montgomery modular multiplication for RSA
and ECC. The instructions are assessed using an instruction set simulator based on the ArchC tool. This
instruction set simulator is used to see the impact of
varying the vector register depth (p) and the number
of vector processing units (r). Simulations indicate that
for vector versions of AES, RSA and ECC the performance improves in O(log(r)). A cycle-accurate synthesisable Verilog model of the system (VeMICry) is implemented in TSMC’s 90nm technology and used to
show that the best area/power/performance tradeoff is
reached for r = (p/4). Also, this highly scalable design
allows area/power/performance trade-offs to be made
for a panorama of applications ranging from smartcards to servers. This thesis is, to my best knowledge,
the first attempt to implement embedded cryptography
using vector processing techniques.
may have fields and methods, they may inherit from one
another and their instances may be referenced just like
objects. Moving into the object-based world, QSigma
is based on the sigma-calculi of Abadi and Cardelli, extended with the ability to query the heap. Heap query
allows objects to determine how they are referenced by
other objects, such that single references are sufficient
for establishing an inter-object relationship observable
by all participants. Both RelJ and QSigma are equipped
with a formal type system and semantics to ensure type
safety in the presence of these extensions.
By giving formal models of relationships in both
class- and object-based settings, we can obtain general
principles for relationships in programming languages
and, therefore, establish a correspondence between implementation and design.
UCAM-CL-TR-702
Abstract: The open nature of Internet services has been
of great value to users, enabling dramatic innovation
and evolution of services. However, this openness permits many abuses of open-access Internet services such
as web, email, and DNS. To counteract such abuses, a
number of so called proof-of-work schemes have been
proposed. They aim to prevent or limit such abuses
by demanding potential clients of the service to prove
that they have carried out some amount of work before
they will be served. In this paper we show that existing resource-based schemes have several problems, and
instead propose latency-based proof-of-work as a solution. We describe centralised and distributed variants,
introducing the problem class of non-parallelisable
shared secrets in the process. We also discuss application of this technique at the network layer as a way to
prevent Internet distributed denial-of-service attacks.
Alisdair Wren:
Relationships for object-oriented
programming languages
November 2007, 153 pages, PDF
PhD thesis (Sidney Sussex College, March 2007)
UCAM-CL-TR-703
Jon Crowcroft, Tim Deegan,
Christian Kreibich, Richard Mortier,
Nicholas Weaver:
Lazy Susan: dumb waiting as proof
of work
November 2007, 23 pages, PDF
Abstract: Object-oriented approaches to software design and implementation have gained enormous popularity over the past two decades. However, whilst models of software systems routinely allow software engineers to express relationships between objects, objectoriented programming languages lack this ability. Instead, relationships must be encoded using complex reference structures. When the model cannot be expressed
directly in code, it becomes more difficult for programUCAM-CL-TR-704
mers to see a correspondence between design and implementation – the model no longer faithfully docuPaul William Hunter:
ments the code. As a result, programmer intuition is
lost, and error becomes more likely, particularly during Complexity and infinite games
maintenance of an unfamiliar software system.
This thesis explores extensions to object-oriented finite graphs
languages so that relationships may be expressed with
November 2007, 170 pages, PDF
the same ease as objects. Two languages with relationPhD thesis (Hughes Hall, July 2007)
ships are specified: RelJ, which offers relationships in
a class-based language based on Java, and QSigma,
which is an object calculus with heap query.
In RelJ, relationship declarations exist at the same
level as class declarations: relationships are named, they
151
on
Abstract: This dissertation investigates the interplay between complexity, infinite games, and finite graphs. We
present a general framework for considering two-player
games on finite graphs which may have an infinite number of moves and we consider the computational complexity of important related problems. Such games are
becoming increasingly important in the field of theoretical computer science, particularly as a tool for formal verification of non-terminating systems. The framework introduced enables us to simultaneously consider problems on many types of games easily, and this
is demonstrated by establishing previously unknown
complexity bounds on several types of games.
We also present a general framework which uses infinite games to define notions of structural complexity
for directed graphs. Many important graph parameters,
from both a graph theoretic and algorithmic perspective, can be defined in this system. By considering natural generalizations of these games to directed graphs,
we obtain a novel feature of digraph complexity: directed connectivity. We show that directed connectivity
is an algorithmically important measure of complexity
by showing that when it is limited, many intractable
problems can be efficiently solved. Whether it is structurally an important measure is yet to be seen, however
this dissertation makes a preliminary investigation in
this direction.
We conclude that infinite games on finite graphs
play an important role in the area of complexity in theoretical computer science.
different characteristics of the input VSDG, and tend
to be concerned with different properties of the output and target machine. The stages integrate a wide variety of important optimizations, exploit opportunities
offered by the VSDG to address many common phaseorder problems, and unify many operations previously
considered distinct.
Focusing on branch-intensive code, we demonstrate
how effective control flow—sometimes superior to that
of the original source code, and comparable to the best
CFG optimization techniques—can be reconstructed
from just the dataflow information comprising the
VSDG. Further, a wide variety of more invasive optimizations involving the duplication and specialization
of program elements are eased because the VSDG relaxes the CFG’s overspecification of instruction and
branch ordering. Specifically we identify the optimization of nested branches as generalizing the problem of
minimizing boolean expressions.
We conclude that it is now practical to discard the
control flow information rather than maintain it in parallel as is done in many previous approaches (e.g. the
PDG).
UCAM-CL-TR-706
Steven J. Murdoch:
Covert channel vulnerabilities in
anonymity systems
December 2007, 140 pages, PDF
UCAM-CL-TR-705
PhD thesis (Girton College, August 2007)
Alan C. Lawrence:
Optimizing compilation with the
Value State Dependence Graph
December 2007, 183 pages, PDF
PhD thesis (Churchill College, May 2007)
Abstract: Most modern compilers are based on variants of the Control Flow Graph. Developments on
this representation—specifically, SSA form and the Program Dependence Graph (PDG)—have focused on
adding and refining data dependence information, and
these suggest the next step is to use a purely datadependence-based representation such as the VDG
(Ernst et al.) or VSDG (Johnson et al.).
This thesis studies such representations, identifying
key differences in the information carried by the VSDG
and several restricted forms of PDG, which relate to
functional programming and continuations. We unify
these representations in a new framework for specifying
the sharing of resources across a computation.
We study the problems posed by using the VSDG,
and argue that existing techniques have not solved the
sequentialization problem of mapping VSDGs back to
CFGs. We propose a new compiler architecture breaking sequentialization into several stages which focus on
Abstract: The spread of wide-scale Internet surveillance
has spurred interest in anonymity systems that protect
users’ privacy by restricting unauthorised access to their
identity. This requirement can be considered as a flow
control policy in the well established field of multilevel secure systems. I apply previous research on covert
channels (unintended means to communicate in violation of a security policy) to analyse several anonymity
systems in an innovative way.
One application for anonymity systems is to prevent
collusion in competitions. I show how covert channels
may be exploited to violate these protections and construct defences against such attacks, drawing from previous covert channel research and collusion-resistant
voting systems.
In the military context, for which multilevel secure
systems were designed, covert channels are increasingly
eliminated by physical separation of interconnected
single-role computers. Prior work on the remaining network covert channels has been solely based on protocol specifications. I examine some protocol implementations and show how the use of several covert channels
can be detected and how channels can be modified to
resist detection.
I show how side channels (unintended information
leakage) in anonymity networks may reveal the behaviour of users. While drawing on previous research
152
on traffic analysis and covert channels, I avoid the
traditional assumption of an omnipotent adversary.
Rather, these attacks are feasible for an attacker with
limited access to the network. The effectiveness of these
techniques is demonstrated by experiments on a deployed anonymity network, Tor.
Finally, I introduce novel covert and side channels
which exploit thermal effects. Changes in temperature
can be remotely induced through CPU load and measured by their effects on crystal clock skew. Experiments show this to be an effective attack against Tor.
This side channel may also be usable for geolocation
and, as a covert channel, can cross supposedly infallible air-gap security boundaries.
This thesis demonstrates how theoretical models
and generic methodologies relating to covert channels
may be applied to find practical solutions to problems
in real-world anonymity systems. These findings confirm the existing hypothesis that covert channel analysis, vulnerabilities and defences developed for multilevel secure systems apply equally well to anonymity
systems.
UCAM-CL-TR-707
the instruction set, is shown to provide a good trade-off
between area/power and performance.
Several novel developments to the ILDP architecture are then described and analysed. Firstly, a scheme
to halve the number of processing elements and thus
greatly reduce silicon area and power consumption is
outlined but proves to result in a 12–14% drop in performance. Secondly, a method to reduce the area and
power requirements of the memory logic in the architecture is presented which can achieve similar performance to the original architecture with a large reduction in area and power requirements or, at an increased
area/power cost, can improve performance by approximately 24%. Finally, a new organisation for the register
file is proposed, which reduces the silicon area used by
the register file by approximately three-quarters and allows even greater power savings, especially in the case
where processing elements are power gated.
Overall, it is shown that the ILDP methodology
is a viable approach for future embedded system design, and several new variants on the architecture are
contributed. Several areas of useful future research are
highlighted, especially with respect to compiler design
for the ILDP paradigm.
UCAM-CL-TR-708
Ian Caulfield:
Complexity-effective superscalar
embedded processors using
instruction-level distributed
processing
Chi-Kin Chau, Jon Crowcroft,
Kang-Won Lee, Starsky H.Y. Wong:
IDRM: Inter-Domain Routing
Protocol for Mobile Ad Hoc
Networks
December 2007, 130 pages, PDF
PhD thesis (Queens’ College, May 2007)
January 2008, 24 pages, PDF
Abstract: Modern trends in mobile and embedded devices require ever increasing levels of performance,
while maintaining low power consumption and silicon area usage. This thesis presents a new architecture for a high-performance embedded processor, based
upon the instruction-level distributed processing (ILDP)
methodology. A qualitative analysis of the complexity of an ILDP implementation as compared to both
a typical scalar RISC CPU and a superscalar design
is provided, which shows that the ILDP architecture
eliminates or greatly reduces the size of a number of
structures present in a superscalar architecture, allowing its complexity and power consumption to compare
favourably with a simple scalar design.
The performance of an implementation of the ILDP
architecture is compared to some typical processors
used in high-performance embedded systems. The effect on performance of a number of the architectural
parameters is analysed, showing that many of the parallel structures used within the processor can be scaled
to provide less parallelism with little cost to the overall performance. In particular, the size of the register
file can be greatly reduced with little average effect on
performance – a size of 32 registers, with 16 visible in
Abstract: Inter-domain routing is an important component to allow interoperation among heterogeneous network domains operated by different organizations. Although inter-domain routing has been extensively studied in the Internet, it remains relatively unexplored in
the Mobile Ad Hoc Networks (MANETs) space. In
MANETs, the inter-domain routing problem is challenged by: (1) dynamic network topology, and (2) diverse intra-domain ad hoc routing protocols. In this
paper, we propose a networking protocol called IDRM
(Inter-Domain Routing Protocol for MANETs) to enable interoperation among MANETs. IDRM can handle the dynamic nature of MANETs and support policybased routing similarly to BGP. We first discuss the design challenges for inter-domain routing in MANETs,
and then present the design of IDRM with illustrative
examples. Finally, we present a simulation-based study
to understand the operational effectiveness of interdomain routing and show that the overhead of IDRM
is moderate.
153
UCAM-CL-TR-709
Ford Long Wong:
Protocols and technologies for
security in pervasive computing and
communications
January 2008, 167 pages, PDF
PhD thesis (Girton College, August 2007)
Abstract: As the state-of-the-art edges towards Mark
Weiser’s vision of ubiquitous computing (ubicomp), we
found that we have to revise some previous assumptions about security engineering for this domain. Ubicomp devices have to be networked together to be
able to realize their promise. To communicate securely
amongst themselves, they have to establish secret session keys, but this is a difficult problem when this is
done primarily over radio in an ad-hoc scenario, i.e.
without the aid of an infrastructure (such as a PKI),
and when it is assumed that the devices are resourceconstrained and cannot perform complex calculations.
Secondly, when ubicomp devices are carried by users
as personal items, their permanent identifiers inadvertently allow the users to be tracked, to the detriment
of user privacy. Unless there are deliberate improvements in designing for location privacy, ubicomp devices can be trivially detected, and linked to individual
users, with discomfiting echoes of a surveillance society. Our findings and contributions are thus as follow.
In considering session key establishment, we learnt that
asymmetric cryptography is not axiomatically infeasible, and may in fact be essential, to counter possible
attackers, for some of the more computationally capable (and important) devices. We next found existing
attacker models to be inadequate, along with existing
models of bootstrapping security associations, in ubicomp. We address the inadequacies with a contribution
which we call: ‘multi-channel security protocols’, by
leveraging on multiple channels, with different properties, existing in the said environment. We gained an
appreciation of the fact that location privacy is really a
multi-layer problem, particularly so in ubicomp, where
an attacker often may have access to different layers.
Our contributions in this area are to advance the design
for location privacy by introducing a MAC-layer proposal with stronger unlinkability, and a physical-layer
proposal with stronger unobservability.
Abstract: Increasingly, the style of computation is
changing. Instead of one machine running a program
sequentially, we have systems with many individual
agents running in parallel. The need for mathematical
models of such computations is therefore ever greater.
There are many models of concurrent computations. Such models can, for example, provide a semantics to process calculi and thereby suggest behavioural
equivalences between processes. They are also key
to the development of automated tools for reasoning
about concurrent systems. In this thesis we explore
some applications and generalisations of one particular model – event structures. We describe a variety
of kinds of morphism between event structures. Each
kind expresses a different sort of behavioural relationship. We demonstrate the way in which event structures
can model both processes and types of processes by recalling a semantics for Affine HOPLA, a higher order
process language. This is given in terms of asymmetric spans of event structures. We show that such spans
support a trace construction. This allows the modelling of feedback and suggests a semantics for nondeterministic dataflow processes in terms of spans. The
semantics given is shown to be consistent with Kahn’s
fixed point construction when we consider spans modelling deterministic processes.
A generalisation of event structures to include persistent events is proposed. Based on previously described morphisms between classical event structures,
we define several categories of event structures with
persistence. We show that, unlike for the corresponding categories of classical event structures, all are isomorphic to Kleisli categories of monads on the most
restricted category. Amongst other things, this provides
us with a way of understanding the asymmetric spans
mentioned previously as symmetric spans where one
morphism is modified by a monad. Thus we provide a
general setting for future investigations involving event
structures.
UCAM-CL-TR-711
Saar Drimer, Steven J. Murdoch,
Ross Anderson:
Thinking inside the box:
system-level failures of tamper
proofing
February 2008, 37 pages, PDF
UCAM-CL-TR-710
Lucy G. Brace-Evans:
Event structures with persistence
February 2008, 113 pages, PDF
PhD thesis (St. John’s College, October 2007)
Abstract: PIN entry devices (PEDs) are critical security components in EMV smartcard payment systems
as they receive a customer’s card and PIN. Their approval is subject to an extensive suite of evaluation and
certification procedures. In this paper, we demonstrate
that the tamper proofing of PEDs is unsatisfactory, as
is the certification process. We have implemented practical low-cost attacks on two certified, widely-deployed
PEDs – the Ingenico i3300 and the Dione Xtreme. By
154
tapping inadequately protected smartcard communications, an attacker with basic technical skills can expose card details and PINs, leaving cardholders open
to fraud. We analyze the anti-tampering mechanisms of
the two PEDs and show that, while the specific protection measures mostly work as intended, critical vulnerabilities arise because of the poor integration of cryptographic, physical and procedural protection. As these
vulnerabilities illustrate a systematic failure in the design process, we propose a methodology for doing it
better in the future. They also demonstrate a serious
problem with the Common Criteria. We discuss the incentive structures of the certification process, and show
how they can lead to problems of the kind we identified. Finally, we recommend changes to the Common
Criteria framework in light of the lessons learned.
An abridged version of this paper is to appear at the
IEEE Symposium on Security and Privacy, May 2008,
Oakland, CA, US.
UCAM-CL-TR-712
Christian Richardt:
Flash-exposure high dynamic range
imaging: virtual photography and
depth-compensating flash
March 2008, 9 pages, PDF
Abstract: I present a revised approach to flash-exposure
high dynamic range (HDR) imaging and demonstrate
two applications of this image representation. The first
application enables the creation of realistic ‘virtual photographs’ for arbitrary flash-exposure settings, based
on a single flash-exposure HDR image. The second application is a novel tone mapping operator for flashexposure HDR images based on the idea of an ‘intelligent flash’. It compensates for the depth-related brightness fall-off occurring in flash photographs by taking
the ambient illumination into account.
than others; we may call them hubs. I develop methods
to extract this kind of social information from experimental traces and use it to choose the next hop forwarders in Pocket Switched Networks (PSNs). I find
that by incorporating social information, forwarding
efficiency can be significantly improved. For practical
reasons, I also develop distributed algorithms for inferring communities.
Forwarding in Delay Tolerant Networks (DTNs), or
more particularly PSNs, is a challenging problem since
human mobility is usually difficult to predict. In this
thesis, I aim to tackle this problem using an experimental approach by studying real human mobility. I
perform six mobility experiments in different environments. The resultant experimental datasets are valuable
for the research community. By analysing the experimental data, I find out that the inter-contact time of
humans follows a power-law distribution with coefficient smaller than 1 (over the range of 10 minutes to
1 day). I study the limits of “oblivious” forwarding in
the experimental environment and also the impact of
the power-law coefficient on message delivery.
In order to study social-based forwarding, I develop
methods to infer human communities from the data
and use these in the study of social-aware forwarding.
I propose several social-aware forwarding schemes and
evaluate them on different datasets. I find out that by
combining community and centrality information, forwarding efficiency can be significantly improved, and
I call this scheme BUBBLE forwarding with the analogy that each community is a BUBBLE with big bubbles containing smaller bubbles. For practical deployment of these algorithms, I propose distributed community detection schemes, and also propose methods
to approximate node centrality in the system.
Besides the forwarding study, I also propose a layerless data-centric architecture for the PSN scenario to
address the problem with the status quo in communication (e.g. an infrastructuredependent and synchronous
API), which brings PSN one step closer to real-world
deployment.
UCAM-CL-TR-714
UCAM-CL-TR-713
Tim Moreton:
Pan Hui:
A wide-area file system for migrating
People are the network:
virtual machines
experimental design and evaluation of
March 2008, 163 pages, PDF
social-based forwarding algorithms
PhD thesis (King’s College, February 2007)
March 2008, 160 pages, PDF
PhD thesis (Churchill College, July 2007)
Abstract: Cooperation binds but also divides human
society into communities. Members of the same community interact with each other preferentially. There is
structure in human society. Within society and its communities, individuals have varying popularity. Some
people are more popular and interact with more people
Abstract: Improvements in processing power and core
bandwidth set against fundamental constraints on
wide-area latency increasingly emphasise the position
in the network at which services are deployed. The
XenoServer project is building a platform for distributed computing that facilitates the migration of services between hosts to minimise client latency and balance load in response to changing patterns of demand.
155
Applications run inside whole-system virtual machines,
allowing the secure multiplexing of host resources.
Since services are specified in terms of a complete
root file system and kernel image, a key component
of this architecture is a substrate that provides an abstraction akin to local disks for these virtual machines,
whether they are running, migrating or suspended.
However, the same combination of wide-area latency,
constrained bandwidth and global scale that motivates
the XenoServer platform itself impedes the location,
management and rapid transfer of storage between deployment sites. This dissertation describes Xest, a novel
wide-area file system that aims to address these challenges.
I examine Xest’s design, centred on the abstraction
of virtual disks, volumes that allow only a single writer
yet are transparently available despite migration. Virtual disks support the creation of snapshots and may
be rapidly forked into copies that can be modified independently. This encourages an architectural separation
into node-local file system and global content distribution framework and reduces the dependence of local
operations on wide-area interactions.
I then describe how Xest addresses the dual problem
of latency and scale by managing, caching, advertising
and retrieving storage on the basis of groups, sets of
files that correspond to portions of inferred working
sets of client applications. Coarsening the granularity
of these interfaces further decouples local and global
activity: fewer units can lead to fewer interactions and
the maintenance of less addressing state. The precision
of these interfaces is retained by clustering according to
observed access patterns and, in response to evidence of
poor clusterings, selectively degrading groups into their
constituent elements.
I evaluate a real deployment of Xest over a widearea testbed. Doing so entails developing new tools for
capturing and replaying traces to simulate virtual machine workloads. My results demonstrate the practicality and high performance of my design and illustrate
the trade-offs involved in modifying the granularity of
established storage interfaces.
UCAM-CL-TR-715
Feng Hao:
On using fuzzy data in security
mechanisms
April 2008, 69 pages, PDF
PhD thesis (Queens’ College, April 2007)
Abstract: Under the microscope, every physical object
has unique features. It is impossible to clone an object,
reproducing exactly the same physical traits. This unclonability principle has been applied in many security
applications. For example, the science of biometrics is
about measuring unique personal features. It can authenticate individuals with a high level of assurance.
Similarly, a paper document can be identified by measuring its unique physical properties, such as randomlyinterleaving fiber structure.
Unfortunately, when physical measurements are involved, errors arise inevitably and the obtained data
are fuzzy by nature. This causes two main problems:
1) fuzzy data cannot be used as a cryptographic key,
as cryptography demands the key be precise; 2) fuzzy
data cannot be sorted easily, which prevents efficient
information retrieval. In addition, biometric measurements create a strong binding between a person and his
unique features, which may conflict with personal privacy. In this dissertation, we study these problems in
detail and propose solutions.
First, we propose a scheme to derive error-free keys
from fuzzy data, such as iris codes. There are two
types of errors within iris codes: background-noise errors and burst errors. Accordingly, we devise a twolayer error correction technique, which first corrects the
background-noise errors using a Hadamard code, then
the burst errors using a Reed-Solomon code. Based on
a database of 700 iris images, we demonstrate that an
error-free key of 140 bits can be reliably reproduced
from genuine iris codes with a 99.5% success rate. In
addition, despite the irrevocability of the underlying
biometric data, the keys produced using our technique
can be easily revoked or updated.
Second, we address the search problem for a large
fuzzy database that stores iris codes or data with a similar structure. Currently, the algorithm used in all public
deployments of iris recognition is to search exhaustively
through a database of iris codes, looking for a match
that is close enough. We propose a much more efficient
search algorithm: Beacon Guided Search (BGS). BGS
works by indexing iris codes, adopting a “multiple colliding segments principle” along with an early termination strategy to reduce the search range dramatically.
We evaluate this algorithm using 632,500 real-world
iris codes, showing a substantial speed-up over exhaustive search with a negligible loss of precision. In addition, we demonstrate that our empirical findings match
theoretical analysis.
Finally, we study the anonymous-veto problem,
which is more commonly known as the Dining Cryptographers problem: how to perform a secure multiparty computation of the boolean-OR function, while
preserving the privacy of each input bit. The solution to
this problem has general applications in security going
way beyond biometrics. Even though there have been
several solutions presented over the past 20 years, we
propose a new solution called: Anonymous Veto Network (AV-net). Compared with past work, the AV-net
protocol provides the strongest protection of each delegate’s privacy against collusion; it requires only two
rounds of broadcast, fewer than any other solutions;
the computational load and bandwidth usage are the
lowest among the available techniques; and our protocol does not require any private channels or third parties. Overall, it seems unlikely that, with the same underlying technology, there can be any other solutions
156
significantly more efficient than ours.
UCAM-CL-TR-716
Gavin M. Bierman, Matthew J. Parkinson,
James Noble:
UpgradeJ: Incremental typechecking
for class upgrades
April 2008, 35 pages, PDF
This thesis discusses the benefits of the simulation of
attention; this technique recreates the eye movements
of each actor, and allows each actor to build up its own
mental model of its surroundings. It is this model that
the actor then uses in its decisions on how to behave:
techniques for collision prediction and collision avoidance are discussed. On top of this basic behaviour, variability is introduced by allowing all actors to have different sets of moods and emotions, which influence all
aspects of their behaviour. The real-time 3D simulation
created to demonstrate the actors’ behaviour is also described.
This thesis demonstrates that the use of techniques
based on psychology research leads to a qualitative and
quantitative improvement in the simulation of human
behaviour; this is shown through a variety of pictures
and videos, and by results of numerical experiments
and user testing. Results are compared with previous
work in the field, and with real human behaviour.
Abstract: One of the problems facing developers is the
constant evolution of components that are used to build
applications. This evolution is typical of any multiperson or multi-site software project. How can we program in this environment? More precisely, how can language design address such evolution? In this paper we
attack two significant issues that arise from constant
component evolution: we propose language- level exUCAM-CL-TR-718
tensions that permit multiple, co-existing versions of
classes and the ability to dynamically upgrade from one
Tyler Moore:
version of a class to another, whilst still maintaining
type safety guarantees and requiring only lightweight Cooperative attack and defense in
extensions to the runtime infrastructure. We show how
our extensions, whilst intuitive, provide a great deal of distributed networks
power by giving a number of examples. Given the sub- June 2008, 172 pages, PDF
tlety of the problem, we formalize a core fragment of PhD thesis (St. John’s College, March 2008)
our language and prove a number of important safety
properties.
Abstract: The advance of computer networking has
made cooperation essential to both attackers and defenders. Increased decentralization of network ownerUCAM-CL-TR-717
ship requires devices to interact with entities beyond
their own realm of control. The distribution of inStephen Julian Rymill:
telligence forces decisions to be taken at the edge.
The exposure of devices makes multiple, simultaneous
Psychologically-based simulation of
attacker-chosen compromise a credible threat. Motivahuman behaviour
tion for this thesis derives from the observation that it
is often easier for attackers to cooperate than for deJune 2008, 250 pages, PDF
fenders to do so. I describe a number of attacks which
PhD thesis (Jesus College, 2006)
exploit cooperation to devastating effect. I also propose
Abstract: The simulation of human behaviour is a key and evaluate defensive strategies which require cooperarea of computer graphics as there is currently a great ation.
I first investigate the security of decentralized, or
demand for animations consisting of virtual human
‘ad-hoc’,
wireless networks. Many have proposed precharacters, ranging from film special effects to building
loading
symmetric
keys onto devices. I describe two
design. Currently, animated characters can either be lapractical
attacks
on
these schemes. First, attackers may
boriously created by hand, or by using an automated
compromise
several
devices and share the pre-loaded
system: however, results from the latter may still look
secrets
to
impersonate
legitimate users. Second, whenartificial and require much further manual work.
ever
some
keys
are
not
pre-assigned but exchanged
The aim of this work is to improve the automated
upon
deployment,
a
revoked
attacker can rejoin the
simulation of human behaviour by making use of ideas
network.
from psychology research; the ways in which this reI next consider defensive strategies where devices
search has been used are made clear throughout this
collectively
decide to remove a malicious device from
thesis. It has influenced all aspects of the design:
the
network.
Existing voting-based protocols are made
• Collision avoidance techniques are based on obresilient
to
the
attacks I have developed, and I propose
served practices.
alternative
strategies
that can be more efficient and se• Actors have simulated vision and attention.
cure.
First,
I
describe
a reelection protocol which relies
• Actors can be given a variety of moods and emoon
positive
affirmation
from peers to continue particitions to affect their behaviour.
pation. Then I describe a more radical alternative called
157
suicide: a good device removes a bad one unilaterally
by declaring both devices dead. Suicide offers significant improvements in speed and efficiency compared to
voting-based decision mechanisms. I then apply suicide
and voting to revocation in vehicular networks.
Next, I empirically investigate attack and defense
in another context: phishing attacks on the Internet.
I have found evidence that one group responsible for
half of all phishing, the rock-phish gang, cooperates by
pooling hosting resources and by targeting many banks
simultaneously. These cooperative attacks are shown to
be far more effective.
I also study the behavior of defenders – banks and
Internet service providers – who must cooperate to
remove malicious sites. I find that phishing-website
lifetimes follow a long-tailed lognormal distribution.
While many sites are removed quickly, others remain
much longer. I examine several feeds from professional
‘take-down’ companies and find that a lack of data
sharing helps many phishing sites evade removal for
long time periods.
One anti-phishing organization has relied on volunteers to submit and verify suspected phishing sites.
I find its voting-based decision mechanism to be slower
and less comprehensive than unilateral verification performed by companies. I also note that the distribution of user participation is highly skewed, leaving the
scheme vulnerable to manipulation.
UCAM-CL-TR-719
teaching scripts. Students write proofs using MathsTiles: a graphical notation consisting of composable
tiles, each of which can contain an arbitrary piece of
mathematics or logic written by the teacher. These tiles
resemble parts of the proof as it might be written on paper, and are translated into Isabelle/HOL’ws Isar syntax
on the server. Unlike traditional syntax-directed editors,
MathsTiles allow students to freely sketch out parts of
an answer and do not constrain the order in which an
answer is written. They also allow details of the language to change between or even during questions.
A number of smaller contributions are also presented. By using the dynamic nature of MathsTiles,
a type of proof exercise is developed where the student must search for the statements he or she wishes
to use. This allows questions to be supported by informal modelling, making them much easier to write, but
still ensures that the interface does not act as a prop for
the answer. The concept of searching for statements is
extended to develop “massively multiple choice” questions: a mid-point between the multiple choice and
short answer formats. The question architecture that
is presented is applicable across different notational
forms and different answer analysis techniques. The
content architecture uses an informal ontology that enables students and untrained users to add and adapt
content within the book, including adding their own
chapters, while ensuring the content can also be referred to by the models and systems that advise students
during exercises.
UCAM-CL-TR-720
William H. Billingsley:
The Intelligent Book: technologies for Lauri I.W. Pesonen:
intelligent and adaptive textbooks,
A capability-based access control
focussing on Discrete Mathematics
architecture for multi-domain
June 2008, 156 pages, PDF
publish/subscribe systems
PhD thesis (Wolfson College, April 2007)
June 2008, 175 pages, PDF
Abstract: An “Intelligent Book” is a Web-based textbook that contains exercises that are backed by computer models or reasoning systems. Within the exercises, students work using appropriate graphical notations and diagrams for the subject matter, and comments and feedback from the book are related into the
content model of the book. The content model can be
extended by its readers. This dissertation examines the
question of how to provide an Intelligent Book that
can support undergraduate questions in Number Theory, and particularly questions that allow the student to
write a proof as the answer. Number Theory questions
pose a challenge not only because the student is working on an unfamiliar topic in an unfamiliar syntax, but
also because there is no straightforward procedure for
how to prove an arbitrary Number Theory problem.
The main contribution is a system for supporting student-written proof exercises, backed by the Isabelle/HOL automated proof assistant and a set of
PhD thesis (Wolfson College, December 2007)
Abstract: Publish/subscribe is emerging as the favoured
communication paradigm for large-scale, wide-area
distributed systems. The publish/subscribe many-tomany interaction model together with asynchronous
messaging provides an efficient transport for highly distributed systems in high latency environments with direct peer-to-peer interactions amongst the participants.
Decentralised publish/subscribe systems implement
the event service as a network of event brokers. The
broker network makes the system more resilient to
failures and allows it to scale up efficiently as the
number of event clients increases. In many cases such
distributed systems will only be feasible when implemented over the Internet as a joint effort spanning multiple administrative domains. The participating members will benefit from the federated event broker networks both with respect to the size of the system as
well as its fault-tolerance.
158
Large-scale, multi-domain environments require access control; users will have different privileges for
sending and receiving instances of different event types.
Therefore, we argue that access control is vital for
decentralised publish/subscribe systems, consisting of
multiple independent administrative domains, to ever
be deployable in large scale.
This dissertation presents MAIA, an access control mechanism for decentralised, type-based publish/subscribe systems. While the work concentrates
on type-based publish/subscribe the contributions are
equally applicable to both topic and content-based publish/subscribe systems.
Access control in distributed publish/subscribe requires secure, distributed naming, and mechanisms for
enforcing access control policies. The first contribution
of this thesis is a mechanism for names to be referenced
unambiguously from policy without risk of forgeries.
The second contribution is a model describing how
signed capabilities can be used to grant domains and
their members’ access rights to event types in a scalable and expressive manner. The third contribution is a
model for enforcing access control in the decentralised
event service by encrypting event content.
We illustrate the design and implementation of
MAIA with a running example of the UK Police Information Technology Organisation and the UK police
forces.
a state-of-the-art classifier based on structured language
model components. Thirdly, we introduce the problem
of anonymisation, which has received little attention to
date within the NLP community. We define the task in
terms of obfuscating potentially sensitive references to
real world entities and present a new publicly-available
benchmark corpus. We explore the implications of the
subjective nature of the problem and present an interactive model for anonymising large quantities of data
based on syntactic analysis and active learning. Finally,
we investigate the task of hedge classification, a relatively new application which is currently of growing interest due to the expansion of research into the
application of NLP techniques to scientific literature
for information extraction. A high level of annotation
agreement is obtained using new guidelines and a new
benchmark corpus is made publicly available. As part
of our investigation, we develop a probabilistic model
for training data acquisition within a semi-supervised
learning framework which is explored both theoretically and experimentally.
Throughout the report, many common themes
of fundamental importance to classification for
NLP are addressed, including sample representation, performance evaluation, learning model selection, linguistically-motivated feature engineering, corpus construction and real-world application.
UCAM-CL-TR-722
UCAM-CL-TR-721
Mbou Eyole-Monono:
Ben W. Medlock:
Energy-efficient sentient computing
Investigating classification for
natural language processing tasks
July 2008, 138 pages, PDF
PhD thesis (Trinity College, January 2008)
June 2008, 138 pages, PDF
PhD thesis (Fitzwilliam College, September 2007)
Abstract: This report investigates the application of
classification techniques to four natural language processing (NLP) tasks. The classification paradigm falls
within the family of statistical and machine learning
(ML) methods and consists of a framework within
which a mechanical ‘learner’ induces a functional mapping between elements drawn from a particular sample
space and a set of designated target classes. It is applicable to a wide range of NLP problems and has met
with a great deal of success due to its flexibility and
firm theoretical foundations.
The first task we investigate, topic classification, is
firmly established within the NLP/ML communities as a
benchmark application for classification research. Our
aim is to arrive at a deeper understanding of how class
granularity affects classification accuracy and to assess
the impact of representational issues on different classification models. Our second task, content-based spam
filtering, is a highly topical application for classification
techniques due to the ever-worsening problem of unsolicited email. We assemble a new corpus and formulate
Abstract: In a bid to improve the interaction between
computers and humans, it is becoming necessary to
make increasingly larger deployments of sensor networks. These clusters of small electronic devices can
be embedded in our surroundings and can detect and
react to physical changes. They will make computers more proactive in general by gathering and interpreting useful information about the physical environment through a combination of measurements. Increasing the performance of these devices will mean more
intelligence can be embedded within the sensor network. However, most conventional ways of increasing
performance often come with the burden of increased
power dissipation which is not an option for energyconstrained sensor networks. This thesis proposes, develops and tests a design methodology for performing greater amounts of processing within a sensor network while satisfying the requirement for low energy
consumption. The crux of the thesis is that there is a
great deal of concurrency present in sensor networks
which when combined with a tightly-coupled group of
small, fast, energy-conscious processors can result in a
significantly more efficient network. The construction
of a multiprocessor system aimed at sensor networks
159
is described in detail. It is shown that a routine critical to sensor networks can be sped up with the addition of a small set of primitives. The need for a very
fast inter-processor communication mechanism is highlighted, and the hardware scheduler developed as part
of this effort forms the cornerstone of the new sentient computing framework by facilitating thread operations and minimising the time required for contextswitching. The experimental results also show that endto-end latency can be reduced in a flexible way through
multiprocessing.
UCAM-CL-TR-723
Richard Southern:
Animation manifolds for representing
topological alteration
July 2008, 131 pages, PDF
PhD thesis (Clare Hall, February 2008)
Abstract: An animation manifold encapsulates an animation sequence of surfaces contained within a higher
dimensional manifold with one dimension being time.
An iso–surface extracted from this structure is a frame
of the animation sequence.
In this dissertation I make an argument for the use
of animation manifolds as a representation of complex
animation sequences. In particular animation manifolds can represent transitions between shapes with differing topological structure and polygonal density.
I introduce the animation manifold, and show how
it can be constructed from a keyframe animation sequence and rendered using raytracing or graphics hardware. I then adapt three Laplacian editing frameworks
to the higher dimensional context. I derive new boundary conditions for both primal and dual Laplacian
methods, and present a technique to adaptively regularise the sampling of a deformed manifold after editing.
The animation manifold can be used to represent a
morph sequence between surfaces of arbitrary topology. I present a novel framework for achieving this
by connecting planar cross sections in a higher dimension with a new constrained Delaunay triangulation.
Topological alteration is achieved by using the Voronoi
skeleton, a novel structure which provides a fast medial
axis approximation.
UCAM-CL-TR-724
Ulrich Paquet:
Bayesian inference for latent variable
models
July 2008, 137 pages, PDF
PhD thesis (Wolfson College, March 2007)
Abstract: Bayes’ theorem is the cornerstone of statistical inference. It provides the tools for dealing with
knowledge in an uncertain world, allowing us to explain observed phenomena through the refinement of
belief in model parameters. At the heart of this elegant
framework lie intractable integrals, whether in computing an average over some posterior distribution, or in
determining the normalizing constant of a distribution.
This thesis examines both deterministic and stochastic
methods in which these integrals can be treated. Of particular interest shall be parametric models where the
parameter space can be extended with additional latent
variables to get distributions that are easier to handle
algorithmically.
Deterministic methods approximate the posterior
distribution with a simpler distribution over which
the required integrals become tractable. We derive and
examine a new generic α-divergence message passing
scheme for a multivariate mixture of Gaussians, a particular modeling problem requiring latent variables.
This algorithm minimizes local α-divergences over a
chosen posterior factorization, and includes variational
Bayes and expectation propagation as special cases.
Stochastic (or Monte Carlo) methods rely on a sample from the posterior to simplify the integration tasks,
giving exact estimates in the limit of an infinite sample. Parallel tempering and thermodynamic integration
are introduced as ‘gold standard’ methods to sample
from multimodal posterior distributions and determine
normalizing constants. A parallel tempered approach
to sampling from a mixture of Gaussians posterior
through Gibbs sampling is derived, and novel methods are introduced to improve the numerical stability
of thermodynamic integration.
A full comparison with parallel tempering and thermodynamic integration shows variational Bayes, expectation propagation, and message passing with the
Hellinger distance α = 1/2 to be perfectly suitable for
model selection, and for approximating the predictive
distribution with high accuracy.
Variational and stochastic methods are combined
in a novel way to design Markov chain Monte Carlo
(MCMC) transition densities, giving a variational transition kernel, which lower bounds an exact transition
kernel. We highlight the general need to mix variational
methods with other MCMC moves, by proving that the
variational kernel does not necessarily give a geometrically ergodic chain.
UCAM-CL-TR-725
Hamed Haddadi, Damien Fay,
Almerima Jamakovic, Olaf Maennel,
Andrew W. Moore, Richard Mortier,
Miguel Rio, Steve Uhlig:
Beyond node degree:
evaluating AS topology models
July 2008, 16 pages, PDF
160
Abstract: Many models have been proposed to generate Internet Autonomous System (AS) topologies, most
of which make structural assumptions about the AS
graph. In this paper we compare AS topology generation models with several observed AS topologies. In
contrast to most previous works, we avoid making assumptions about which topological properties are important to characterize the AS topology. Our analysis shows that, although matching degree-based properties, the existing AS topology generation models fail
to capture the complexity of the local interconnection
structure between ASs. Furthermore, we use BGP data
from multiple vantage points to show that additional
measurement locations significantly affect local structure properties, such as clustering and node centrality.
Degree-based properties, however, are not notably affected by additional measurements locations. These observations are particularly valid in the core. The shortcomings of AS topology generation models stems from
an underestimation of the complexity of the connectivity in the core caused by inappropriate use of BGP data.
Second, it describes techniques for proving linearisability, the standard correctness condition for finegrained concurrent algorithms. The main proof technique is to introduce auxiliary single-assignment variables to capture the linearisation point and to inline the
abstract effect of the program at that point as auxiliary
code.
Third, it demonstrates this approach by proving linearisability of a collection of concurrent list and stack
algorithms, as well as providing the first correctness
proofs of the RDCSS and MCAS implementations of
Harris et al.
Finally, it describes a prototype safety checker,
SmallfootRG, for fine-grained concurrent programs
that is based on RGSep. SmallfootRG proves simple
safety properties for a number of list and stack algorithms and verifies the absence of memory leaks.
UCAM-CL-TR-727
Ruoshui Liu, Ian J. Wassell:
A novel auto-calibration system for
wireless sensor motes
UCAM-CL-TR-726
Viktor Vafeiadis:
September 2008, 65 pages, PDF
Modular fine-grained concurrency
verification
July 2008, 148 pages, PDF
PhD thesis (Selwyn College, July 2007)
Abstract: Traditionally, concurrent data structures are
protected by a single mutual exclusion lock so that
only one thread may access the data structure at any
time. This coarse-grained approach makes it relatively
easy to reason about correctness, but it severely limits
parallelism. More advanced algorithms instead perform
synchronisation at a finer grain. They employ sophisticated synchronisation schemes (both blocking and nonblocking) and are usually written in low-level languages
such as C.
This dissertation addresses the formal verification of
such algorithms. It proposes techniques that are modular (and hence scalable), easy for programmers to use,
and yet powerful enough to verify complex algorithms.
In doing so, it makes two theoretical and two practical
contributions to reasoning about fine-grained concurrency.
First, building on rely/guarantee reasoning and separation logic, it develops a new logic, RGSep, that
subsumes these two logics and enables simple, modular proofs of fine-grained concurrent algorithms that
use complex dynamically allocated data structures and
may explicitly deallocate memory. RGSep allows for
ownership-based reasoning and ownership transfer between threads, while maintaining the expressiveness of
binary relations to describe inter-thread interference.
Abstract: In recent years, Wireless Sensor Networks
(WSNs) research has undergone a quiet revolution, providing a new paradigm for sensing and disseminating information from various environments. In reality,
the wireless propagation channel in many harsh environments has a significant impact on the coverage
range and quality of the radio links between the wireless nodes (motes). Therefore, the use of diversity techniques (e.g., frequency diversity and spatial diversity)
must be considered to ameliorate the notoriously variable and unpredictable point-to-point radio communication links. However, in order to determine the space
and frequency diversity characteristics of the channel,
accurate measurements need to be made. The most representative and inexpensive solution is to use motes,
however they suffer poor accuracy owing to their lowcost and compromised radio frequency (RF) performance.
In this report we present a novel automated calibration system for characterising mote RF performance.
The proposed strategy provides us with good knowledge of the actual mote transmit power, RSSI characteristics and receive sensitivity by establishing calibration tables for transmitting and receiving mote pairs
over their operating frequency range. The validated results show that our automated calibration system can
achieve an increase of ±1.5 dB in the RSSI accuracy.
In addition, measurements of the mote transmit power
show a significant difference from that claimed in the
manufacturer’s data sheet. The proposed calibration
method can also be easily applied to wireless sensor
motes from virtually any vendor, provided they have
a RF connector.
161
UCAM-CL-TR-728
UCAM-CL-TR-730
Peter J.C. Brown, Christopher T. Faigle:
Robert J. Ennals:
A robust efficient algorithm for
point location in triangulations
Adaptive evaluation of non-strict
programs
February 1997, 16 pages, PDF
August 2008, 243 pages, PDF
PhD thesis (King’s College, June 2004)
Abstract: This paper presents a robust alternative to
previous approaches to the problem of point location
in triangulations represented using the quadedge data
structure. We generalise the reasons for the failure of
an earlier routine to terminate when applied to certain
non-Delaunay triangulations. This leads to our new deterministic algorithm which we prove is guaranteed to
terminate. We also present a novel heuristic for choosing a starting edge for point location queries and show
that this greatly enhances the efficiency of point location for the general case.
UCAM-CL-TR-729
Damien Fay, Hamed Haddadi, Steve Uhlig,
Andrew W. Moore, Richard Mortier,
Almerima Jamakovic:
Weighted spectral distribution
September 2008, 13 pages, PDF
Abstract: Comparison of graph structures is a frequently encountered problem across a number of problem domains. Comparing graphs requires a metric to
discriminate which features of the graphs are considered important. The spectrum of a graph is often
claimed to contain all the information within a graph,
but the raw spectrum contains too much information
to be directly used as a useful metric. In this paper
we introduce a metric, the weighted spectral distribution, that improves on the raw spectrum by discounting those eigenvalues believed to be unimportant and
emphasizing the contribution of those believed to be
important.
We use this metric to optimize the selection of parameter values for generating Internet topologies. Our
metric leads to parameter choices that appear sensible
given prior knowledge of the problem domain: the resulting choices are close to the default values of the
topology generators and, in the case of the AB generator, fall within the expected region. This metric provides a means for meaningfully optimizing parameter
selection when generating topologies intended to share
structure with, but not match exactly, measured graphs.
Abstract: Most popular programming languages are
strict. In a strict language, the binding of a variable to
an expression coincides with the evaluation of the expression.
Non-strict languages attempt to make life easier for
programmers by decoupling expression binding and expression evaluation. In a non-strict language, a variable
can be bound to an unevaluated expression, and such
expressions can be passed around just like values in a
strict language. This separation allows the programmer
to declare a variable at the point that makes most logical sense, rather than at the point at which its value is
known to be needed.
Non-strict languages are usually evaluated using a
technique called Lazy Evaluation. Lazy Evaluation will
only evaluate an expression when its value is known
to be needed. While Lazy Evaluation minimises the total number of expressions evaluated, it imposes a considerable bookkeeping overhead, and has unpredictable
space behaviour.
In this thesis, we present a new evaluation strategy which we call Optimistic Evaluation. Optimistic
Evaluation blends lazy and eager evaluation under the
guidance of an online profiler. The online profiler observes the running program and decides which expressions should be evaluated lazily, and which should be
evaluated eagerly. We show that the worst case performance of Optimistic Evaluation relative to Lazy Evaluation can be bounded with an upper bound chosen by
the user. Increasing this upper bound allows the profiler to take greater risks and potentially achieve better
average performance.
This thesis describes both the theory and practice of
Optimistic Evaluation. We start by giving an overview
of Optimistic Evaluation. We go on to present a formal
model, which we use to justify our design. We then detail how we have implemented Optimistic Evaluation
as part of an industrial-strength compiler. Finally, we
provide experimental results to back up our claims.
UCAM-CL-TR-731
Matthew Johnson:
A new approach to Internet banking
September 2008, 113 pages, PDF
PhD thesis (Trinity Hall, July 2008)
162
Abstract: This thesis investigates the protection landscape surrounding online banking. First, electronic
banking is analysed for vulnerabilities and a survey of
current attacks is carried out. This is represented graphically as an attack tree describing the different ways in
which online transactions can be attacked.
The discussion then moves on to various defences
which have been developed, categorizing them and analyzing how successful they are at protecting against the
attacks given in the first chapter. This covers everything
from TLS encryption through phishing site detection to
two-factor authentication.
Having declared all current schemes for protecting
online banking lacking in some way, the key aspects of
the problem are identified. This is followed by a proposal for a more robust defence system which uses a
small security device to create a trusted path to the
customer, rather than depend upon trusting the customer’s computer. The protocol for this system is described along with all the other restrictions required
for actual use. This is followed by a description of a
demonstration implementation of the system.
Extensions to the system are then proposed, designed to afford extra protection for the consumer and
also to support other types of device. There is then a
discussion of ways of managing keys in a heterogeneous
system, rather than one managed by a single entity.
The conclusion discusses the weaknesses of the proposed scheme and evaluates how successful it is likely
to be in practice and what barriers there may be to
adoption in the banking system.
the tiled architecture is proposed for reasoning with
touch-based gestures. The validity of this middleware
is proven in a case study, where a fully distributed algorithm for online recognition of unistrokes – a particular
class of touch-based gestures – is presented and evaluated.
Novel interaction techniques based around interactive display surfaces involve direct manipulation with
displayed digital objects. In order to facilitate such
interactions in computing surfaces, an efficient distributed algorithm to perform 2D image transformations is introduced and evaluated. The performance of
these transformations is heavily influenced by the arbitration policies of the interconnection network. One
approach for improving the performance of these transformations in conventional network architectures is
proposed and evaluated.
More advanced applications in computing surfaces
require the presence of some notion of time. An efficient
algorithm for internal time synchronisation is presented
and evaluated. A hardware solution is adopted to minimise the delay uncertainty of special timestamp messages. The proposed algorithm allows efficient, scalable
time synchronisation among clusters of tiles. A hardware reference platform is constructed to demonstrate
the basic principles and features of computing surfaces.
This platform and a complementary simulation environment is used for extensive evaluation and analysis.
UCAM-CL-TR-733
Darren Edge:
UCAM-CL-TR-732
Tangible user interfaces for peripheral
interaction
Alban Rrustemi:
Computing surfaces – a platform for
scalable interactive displays
December 2008, 237 pages, PDF
PhD thesis (Jesus College, January 2008)
November 2008, 156 pages, PDF
PhD thesis (St Edmund’s College, November 2008)
Abstract: Recent progress in electronic, display and
sensing technologies makes possible a future with omnipresent, arbitrarily large interactive display surfaces.
Nonetheless, current methods of designing display systems with multi-touch sensitivity do not scale. This thesis presents computing surfaces as a viable platform for
resolving forthcoming scalability limitations.
Computing surfaces are composed of a homogeneous network of physically adjoined, small sensitive
displays with local computation and communication
capabilities. In this platform, inherent scalability is provided by a distributed architecture. The regular spatial
distribution of resources presents new demands on the
way surface input and output information is managed
and processed.
Direct user input with touch based gestures needs to
account for the distributed architecture of computing
surfaces. A scalable middleware solution that conceals
Abstract: Since Mark Weiser’s vision of ubiquitous
computing in 1988, many research efforts have been
made to move computation away from the workstation
and into the world. One such research area focuses on
“Tangible” User Interfaces or TUIs – those that provide
both physical representation and control of underlying
digital information.
This dissertation describes how TUIs can support a
“peripheral” style of interaction, in which users engage
in short, dispersed episodes of low-attention interaction
with digitally-augmented physical tokens. The application domain in which I develop this concept is the office context, where physical tokens can represent items
of common interest to members of a team whose work
is mutually interrelated, but predominantly performed
independently by individuals at their desks.
An “analytic design process” is introduced as a way
of developing TUI designs appropriate for their intended contexts of use. This process is then used to
present the design of a bimanual desktop TUI that
complements the existing workstation, and encourages
163
peripheral interaction in parallel with workstationintensive tasks. Implementation of a prototype TUI is
then described, comprising “task” tokens for worktime management, “document” tokens for face-to-face
sharing of collaborative documents, and “contact” tokens for awareness of other team members’ status and
workload. Finally, evaluation of this TUI is presented
via description of its extended deployment in a real office context.
The main empirically-grounded results of this work
are a categorisation of the different ways in which users
can interact with physical tokens, and an identification
of the qualities of peripheral interaction that differentiate it from other interaction styles. The foremost benefits of peripheral interaction were found to arise from
the freedom with which tokens can be appropriated to
create meaningful information structures of both cognitive and social significance, in the physical desktop
environment and beyond.
Thirdly, an empirical investigation compares remote
and co-located tabletop interfaces. The findings show
how the design of remote tabletop interfaces leads to
collaborators having a high level of awareness of each
other’s actions in the workspace. This enables smooth
transitions between individual and group work, together with anticipation and assistance, similar to colocated tabletop collaboration. However, remote tabletop collaborators use different coordination mechanisms from co-located collaborators. The results have
implications for the design and future study of these interfaces.
UCAM-CL-TR-735
Diarmuid Ó Séaghdha:
Learning compound noun semantics
December 2008, 167 pages, PDF
PhD thesis (Corpus Christi College, July 2008)
UCAM-CL-TR-734
Philip Tuddenham:
Tabletop interfaces for remote
collaboration
December 2008, 243 pages, PDF
PhD thesis (Gonville and Caius College, June 2008)
Abstract: Effective support for synchronous remote collaboration has long proved a desirable yet elusive goal
for computer technology. Although video views showing the remote participants have recently improved,
technologies providing a shared visual workspace of the
task still lack support for the visual cues and work practices of co-located collaboration.
Researchers have recently demonstrated shared
workspaces for remote collaboration using large horizontal interactive surfaces. These remote tabletop interfaces may afford the beneficial work practices associated with co-located collaboration around tables. However, there has been little investigation of remote tabletop interfaces beyond limited demonstrations. There is
currently little theoretical basis for their design, and little empirical characterisation of their support for collaboration. The construction of remote tabletop applications also presents considerable technical challenges.
This dissertation addresses each of these areas.
Firstly, a theory of workspace awareness is applied to
consider the design of remote tabletop interfaces and
the work practices that they may afford.
Secondly, two technical barriers to the rapid exploration of useful remote tabletop applications are identified: the low resolution of conventional tabletop displays; and the lack of support for existing user interface
components. Techniques from multi-projector display
walls are applied to address these problems. The resulting method is evaluated empirically and used to create
a number of novel tabletop interfaces.
Abstract: This thesis investigates computational approaches for analysing the semantic relations in compound nouns and other noun-noun constructions.
Compound nouns in particular have received a great
deal of attention in recent years due to the challenges
they pose for natural language processing systems. One
reason for this is that the semantic relation between the
constituents of a compound is not explicitly expressed
and must be retrieved from other sources of linguistic
and world knowledge.
I present a new scheme for the semantic annotation
of compounds, describing in detail the motivation for
the scheme and the development process. This scheme
is applied to create an annotated dataset for use in
compound interpretation experiments. The results of
a dual-annotator experiment indicate that good agreement can be obtained with this scheme relative to previously reported results and also provide insights into
the challenging nature of the annotation task.
I describe two corpus-driven paradigms for comparing pairs of nouns: lexical similarity and relational similarity. Lexical similarity is based on comparing each
constituent of a noun pair to the corresponding constituent of another pair. Relational similarity is based
on comparing the contexts in which both constituents
of a noun pair occur together with the corresponding
contexts of another pair. Using the flexible framework
of kernel methods, I develop techniques for implementing both similarity paradigms.
A standard approach to lexical similarity represents
words by their co-occurrence distributions. I describe a
family of kernel functions that are designed for the classification of probability distributions. The appropriateness of these distributional kernels for semantic tasks is
suggested by their close connection to proven measures
of distributional lexical similarity. I demonstrate the effectiveness of the lexical similarity model by applying it
164
to two classification tasks: compound noun interpretation and the 2007 SemEval task on classifying semantic
relations between nominals.
To implement relational similarity I use kernels on
strings and sets of strings. I show that distributional set
kernels based on a multinomial probability model can
be computed many times more efficiently than previously proposed kernels, while still achieving equal or
better performance. Relational similarity does not perform as well as lexical similarity in my experiments.
However, combining the two models brings an improvement over either model alone and achieves stateof-the-art results on both the compound noun and SemEval Task 4 datasets.
UCAM-CL-TR-736
Mike Dodds, Xinyu Feng,
Matthew Parkinson, Viktor Vafeiadis:
Deny-guarantee reasoning
January 2009, 82 pages, PDF
Abstract: Rely-guarantee is a well-established approach
to reasoning about concurrent programs that use parallel composition. However, parallel composition is not
how concurrency is structured in real systems. Instead,
threads are started by ‘fork’ and collected with ‘join’
commands. This style of concurrency cannot be reasoned about using rely-guarantee, as the life-time of a
thread can be scoped dynamically. With parallel composition the scope is static.
In this paper, we introduce deny-guarantee reasoning, a reformulation of rely-guarantee that enables
reasoning about dynamically scoped concurrency. We
build on ideas from separation logic to allow interference to be dynamically split and recombined, in a similar way that separation logic splits and joins heaps. To
allow this splitting, we use deny and guarantee permissions: a deny permission specifies that the environment
cannot do an action, and guarantee permission allow us
to do an action. We illustrate the use of our proof system with examples, and show that it can encode all the
original rely-guarantee proofs. We also present the semantics and soundness of the deny-guarantee method.
UCAM-CL-TR-737
like Java and C#, for checking basic safety properties such as memory leaks. In a pure functional language, many of these basic properties are guaranteed
by design, which suggests the opportunity for verifying more sophisticated program properties. Nevertheless, few automatic systems for doing so exist. In this
thesis, we show the challenges and solutions to verifying advanced properties of a pure functional language,
Haskell. We describe a sound and automatic static verification framework for Haskell, that is based on contracts and symbolic execution. Our approach gives precise blame assignments at compile-time in the presence
of higher-order functions and laziness.
First, we give a formal definition of contract satisfaction which can be viewed as a denotational semantics for contracts. We then construct two contract
checking wrappers, which are dual to each other, for
checking the contract satisfaction. We prove the soundness and completeness of the construction of the contract checking wrappers with respect to the definition
of the contract satisfaction. This part of my research
shows that the two wrappers are projections with respect to a partial ordering crashes-more-often and furthermore, they form a projection pair and a closure
pair. These properties give contract checking a strong
theoretical foundation.
As the goal is to detect bugs during compile time,
we symbolically execute the code constructed by the
contract checking wrappers and prove the soundness
of this approach. We also develop a technique named
counter-example-guided (CEG) unrolling which only
unroll function calls on demand. This technique speeds
up the checking process.
Finally, our verification approach makes error tracing much easier compared with the existing set-based
analysis. Thus equipped, we are able to tell programmers during compile-time which function to blame and
why if there is a bug in their program. This is a breakthrough for lazy languages because it is known to be
difficult to report such informative messages either at
compile-time or run-time.
UCAM-CL-TR-738
Scott Fairbanks:
High precision timing using self-timed
circuits
Na Xu:
January 2009, 99 pages, PDF
Static contract checking for Haskell
PhD thesis (Gonville and Caius College, September
2004)
December 2008, 175 pages, PDF
Abstract: Constraining the events that demarcate periods on a VLSI chip to precise instances of time is the
Abstract: Program errors are hard to detect and are task undertaken in this thesis. High speed sampling and
costly, to both programmers who spend significant ef- clock distribution are two example applications. Founforts in debugging, and for systems that are guarded by dational to my approach is the use of self-timed data
runtime checks. Static verification techniques have been control circuits.
applied to imperative and object-oriented languages,
PhD thesis (Churchill College, August 2008)
165
Specially designed self-timed control circuits deliver
high frequency timing signals with precise phase relationships. The frequency and the phase relationships
are controlled by varying the number of self-timed control stages and the number of tokens they control.
The self-timed control circuits are constructed with
simple digital logic gates. The digital logic gates respond to a range of analog values with a continuum of
precise and controlled delays. The control circuits implement their functionality efficiently. This allows the
gates to drive long wires and distribute the timing signals over a large area. Also gate delays are short and
few, allowing for high frequencies.
The self-timed control circuits implement the functionality of a FIFO that is then closed into a ring. Timing tokens ripple through the rings. The FIFO stages
use digital handshaking protocols to pass the timing tokens between the stages. The FIFO control stage detects
the phase between the handshake signals on its inputs
and produces a signal that is sent back to the producers
with a delay that is a function of the phase relationship
of the input signals.
The methods described are not bound to the same
process and systematic skew limitations of existing
methods. For a certain power budget, timing signals are
generated and distributed with significantly less power
with the approaches to be presented than with conventional methods.
State Filters (SFs), which are lightweight, stateful, event
filtering components. Their design is motivated by the
redundancy and correlation observed in sensor readings produced close together in space and time. By
performing context-based data processing, SFs increase
Pub/Sub expressiveness and improve communication
efficiency.
Secondly, I propose State Maintenance Components
(SMCs) for capturing more expressive conditions in
heterogeneous sensor networks containing more resourceful devices. SMCs extend SFs with data fusion
and temporal and spatial data manipulation capabilities. They can also be composed together (in a DAG) to
deduce higher level information. SMCs operate independently from each other and can therefore be decomposed for distributed processing within the network.
Finally, I present a Pub/Sub protocol called QPS
(Quad-PubSub) for location-aware Wireless Sensor
Networks (WSNs). QPS is central to the design of my
framework as it facilitates messaging between statebased components, applications, sensors, and actuators. In contrast to existing data dissemination protocols, QPS has a layered architecture. This allows for the
transparent operation of routing protocols that meet
different Quality of Service (QoS) requirements.
UCAM-CL-TR-740
Tal Sobol-Shikler:
UCAM-CL-TR-739
Analysis of affective expression in
speech
Salman Taherian:
State-based Publish/Subscribe for
sensor systems
January 2009, 163 pages, PDF
PhD thesis (Girton College, March 2007)
January 2009, 240 pages, PDF
PhD thesis (St John’s College, June 2008)
Abstract: Recent technological advances have enabled
the creation of networks of sensor devices. These devices are typically equipped with basic computational
and communication capabilities. Systems based on
these devices can deduce high-level, meaningful information about the environment that may be useful to
applications. Due to their scale, distributed nature, and
the limited resources available to sensor devices, these
systems are inherently complex. Shielding applications
from this complexity is a challenging problem.
To address this challenge, I present a middleware
called SPS (State-based Publish/Subscribe). It is based
on a combination of a State-Centric data model and a
Publish/Subscribe (Pub/Sub) communication paradigm.
I argue that a state-centric data model allows applications to specify environmental situations of interest in
a more natural way than existing solutions. In addition, Pub/Sub enables scalable many-to-many communication between sensors, actuators, and applications.
This dissertation initially focuses on Resourceconstrained Sensor Networks (RSNs) and proposes
Abstract: This dissertation presents analysis of expressions in speech. It describes a novel framework for dynamic recognition of acted and naturally evoked expressions and its application to expression mapping
and to multi-modal analysis of human-computer interactions.
The focus of this research is on analysis of a wide
range of emotions and mental states from non-verbal
expressions in speech. In particular, on inference of
complex mental states, beyond the set of basic emotions, including naturally evoked subtle expressions
and mixtures of expressions.
This dissertation describes a bottom-up computational model for processing of speech signals. It combines the application of signal processing, machine
learning and voting methods with novel approaches to
the design, implementation and validation. It is based
on a comprehensive framework that includes all the
development stages of a system. The model represents
paralinguistic speech events using temporal abstractions borrowed from various disciplines such as musicology, engineering and linguistics. The model consists
166
of a flexible and expandable architecture. The validation of the model extends its scope to different expressions, languages, backgrounds, contexts and applications.
The work adapts an approach that an utterance is
not an isolated entity but rather a part of an interaction and should be analysed in this context. The analysis in context includes relations to events and other behavioural cues. Expressions of mental states are related
not only in time but also by their meaning and content.
This work demonstrates the relations between the lexical definitions of mental states, taxonomies and theoretical conceptualization of mental states and their vocal correlates. It examines taxonomies and theoretical
conceptualisation of mental states in relation to their
vocal characteristics. The results show that a very wide
range of mental state concepts can be mapped, or described, using a high-level abstraction in the form of
a small sub-set of concepts which are characterised by
their vocal correlates.
This research is an important step towards comprehensive solutions that incorporate social intelligence
cues for a wide variety of applications and for multidisciplinary research.
UCAM-CL-TR-741
David N. Cottingham:
Vehicular wireless communication
January 2009, 264 pages, PDF
Algorithms are adapted from the field of 2-D shape
simplification to the problem of processing thousands
of signal strength readings. By applying these to the
data collected, coverage maps are generated that contain extents. These represent how coverage varies between two locations on a given road. The algorithms
are first proven fit for purpose using synthetic data, before being evaluated for accuracy of representation and
compactness of output using real data.
The problem of how to select the optimal network
to connect to is then addressed. The coverage map
representation is converted into a multi-planar graph,
where the coverage of all available wireless networks
is included. This novel representation also includes the
ability to hand over between networks, and the penalties so incurred. This allows the benefits of connecting
to a given network to be traded off with the cost of
handing over to it.
In order to use the multi-planar graph, shortest path
routing is used. The theory underpinning multi-criteria
routing is overviewed, and a family of routing metrics developed. These generate efficient solutions to the
problem of calculating the sequence of networks that
should be connected to over a given geographical route.
The system is evaluated using real traces, finding that
in 75% of the test cases proactive routing algorithms
provide better QoS than a reactive algorithm. Moreover, the system can also be run to generate geographical routes that are QoS-aware.
This dissertation concludes by examining how coverage mapping can be applied to other types of data,
and avenues for future research are proposed.
PhD thesis (Churchill College, September 2008)
Abstract: Transportation is vital in everyday life. As a
consequence, vehicles are increasingly equipped with
onboard computing devices. Moreover, the demand for
connectivity to vehicles is growing rapidly, both from
business and consumers. Meanwhile, the number of
wireless networks available in an average city in the developed world is several thousand. Whilst this theoretically provides near-ubiquitous coverage, the technology
type is not homogeneous.
This dissertation discusses how the diversity in communication systems can be best used by vehicles. Focussing on road vehicles, it first details the technologies available, the difficulties inherent in the vehicular environment, and how intelligent handover algorithms could enable seamless connectivity. In particular, it identifies the need for a model of the coverage of
wireless networks.
In order to construct such a model, the use of vehicular sensor networks is proposed. The Sentient Van,
a platform for vehicular sensing, is introduced, and details are given of experiments carried out concerning
the performance of IEEE 802.11x, specifically for vehicles. Using the Sentient Van, a corpus of 10 million
signal strength readings was collected over three years.
This data, and further traces, are used in the remainder
of the work described, thus distinguishing it in using
entirely real world data.
UCAM-CL-TR-742
Thomas Ridge, Michael Norrish,
Peter Sewell:
TCP, UDP, and Sockets:
Volume 3: The Service-level
Specification
February 2009, 305 pages, PDF
Abstract: Despite more than 30 years of research on
protocol specification, the major protocols deployed in
the Internet, such as TCP, are described only in informal
prose RFCs and executable code. In part this is because
the scale and complexity of these protocols makes them
challenging targets for formal descriptions, and because
techniques for mathematically rigorous (but appropriately loose) specification are not in common use.
In this work we show how these difficulties can
be addressed. We develop a high-level specification for
TCP and the Sockets API, describing the byte-stream
service that TCP provides to users, expressed in the formalised mathematics of the HOL proof assistant. This
complements our previous low-level specification of the
protocol internals, and makes it possible for the first
time to state what it means for TCP to be correct: that
167
the protocol implements the service. We define a precise
abstraction function between the models and validate
it by testing, using verified testing infrastructure within
HOL. Some errors may remain, of course, especially
as our resources for testing were limited, but it would
be straightforward to use the method on a larger scale.
This is a pragmatic alternative to full proof, providing
reasonable confidence at a relatively low entry cost.
Together with our previous validation of the lowlevel model, this shows how one can rigorously tie
together concrete implementations, low-level protocol
models, and specifications of the services they claim to
provide, dealing with the complexity of real-world protocols throughout.
Similar techniques should be applicable, and even
more valuable, in the design of new protocols (as we illustrated elsewhere, for a MAC protocol for the SWIFT
optically switched network). For TCP and Sockets, our
specifications had to capture the historical complexities, whereas for a new protocol design, such specification and testing can identify unintended complexities at
an early point in the design.
UCAM-CL-TR-743
Rebecca F. Watson:
Optimising the speed and accuracy of
a Statistical GLR Parser
March 2009, 145 pages, PDF
PhD thesis (Darwin College, September 2007)
Abstract: The focus of this thesis is to develop techniques that optimise both the speed and accuracy of a
unification-based statistical GLR parser. However, we
can apply these methods within a broad range of parsing frameworks. We first aim to optimise the level of tag
ambiguity resolved during parsing, given that we employ a front-end PoS tagger. This work provides the first
broad comparison of tag models as we consider both
tagging and parsing performance. A dynamic model
achieves the best accuracy and provides a means to
overcome the trade-off between tag error rates in single tag per word input and the increase in parse ambiguity over multipletag per word input. The second
line of research describes a novel modification to the
inside-outside algorithm, whereby multiple inside and
outside probabilities are assigned for elements within
the packed parse forest data structure. This algorithm
enables us to compute a set of ‘weighted GRs’ directly
from this structure. Our experiments demonstrate substantial increases in parser accuracy and throughput for
weighted GR output.
Finally, we describe a novel confidence-based training framework, that can, in principle, be applied to
any statistical parser whose output is defined in terms
of its consistency with a given level and type of annotation. We demonstrate that a semisupervised variant of this framework outperforms both ExpectationMaximisation (when both are constrained by unlabelled partial-bracketing) and the extant (fully supervised) method. These novel training methods utilise
data automatically extracted from existing corpora.
Consequently, they require no manual effort on behalf
of the grammar writer, facilitating grammar development.
UCAM-CL-TR-744
Anna Ritchie:
Citation context analysis for
information retrieval
March 2009, 119 pages, PDF
PhD thesis (New Hall, June 2008)
Abstract: This thesis investigates taking words from
around citations to scientific papers in order to create
an enhanced document representation for improved information retrieval. This method parallels how anchor
text is commonly used in Web retrieval. In previous
work, words from citing documents have been used as
an alternative representation of the cited document but
no previous experiment has combined them with a fulltext document representation and measured effectiveness in a large scale evaluation.
The contributions of this thesis are twofold: firstly,
we present a novel document representation, along
with experiments to measure its effect on retrieval effectiveness, and, secondly, we document the construction of a new, realistic test collection of scientific research papers, with references (in the bibliography) and
their associated citations (in the running text of the paper) automatically annotated. Our experiments show
that the citation-enhanced document representation increases retrieval effectiveness across a range of standard
retrieval models and evaluation measures.
In Chapter 2, we give the background to our work,
discussing the various areas from which we draw together ideas: information retrieval, particularly link
structure analysis and anchor text indexing, and bibliometrics, in particular citation analysis. We show that
there is a close relatedness of ideas between these areas but that these ideas have not been fully explored
experimentally. Chapter 3 discusses the test collection
paradigm for evaluation of information retrieval systems and describes how and why we built our test collection. In Chapter 4 we introduce the ACL Anthology,
the archive of computational linguistics papers that our
test collection is centred around. The archive contains
the most prominent publications since the beginning
of the field in the early 1960s, consisting of one journal plus conferences and workshops, resulting in over
10,000 papers. Chapter 5 describes how the PDF papers are prepared for our experiments, including identification of references and citations in the papers, once
168
converted to plain text, and extraction of citation information to an XML database. Chapter 6 presents our
experiments: we show that adding citation terms to the
full-text of the papers improves retrieval effectiveness
by up to 7.4%, that weighting citation terms higher
relative to paper terms increases the improvement and
that varying the context from which citation terms are
taken has a significant effect on retrieval effectiveness.
Our main hypothesis that citation terms enhance a fulltext representation of scientific papers is thus proven.
There are some limitations to these experiments.
The relevance judgements in our test collection are incomplete but we have experimentally verified that the
test collection is, nevertheless, a useful evaluation tool.
Using the Lemur toolkit constrained the method that
we used to weight citation terms; we would like to experiment with a more realistic implementation of term
weighting. Our experiments with different citation contexts did not conclude an optimal citation context; we
would like to extend the scope of our investigation.
Now that our test collection exists, we can address
these issues in our experiments and leave the door open
for more extensive experimentation.
UCAM-CL-TR-745
Scott Owens, Susmit Sarkar, Peter Sewell:
A better x86 memory model:
x86-TSO
(extended version)
March 2009, 52 pages, PDF
Abstract: Real multiprocessors do not provide the sequentially consistent memory that is assumed by most
work on semantics and verification. Instead, they have
relaxed memory models, typically described in ambiguous prose, which lead to widespread confusion. These
are prime targets for mechanized formalization. In previous work we produced a rigorous x86-CC model, formalizing the Intel and AMD architecture specifications
of the time, but those turned out to be unsound with respect to actual hardware, as well as arguably too weak
to program above. We discuss these issues and present
a new x86-TSO model that suffers from neither problem, formalized in HOL4. We believe it is sound with
respect to real processors, reflects better the vendor’s
intentions, and is also better suited for programming.
We give two equivalent definitions of x86-TSO: an intuitive operational model based on local write buffers,
and an axiomatic total store ordering model, similar to
that of the SPARCv8. Both are adapted to handle x86specific features. We have implemented the axiomatic
model in our memevents tool, which calculates the set
of all valid executions of test programs, and, for greater
confidence, verify the witnesses of such executions directly, with code extracted from a third, more algorithmic, equivalent version of the definition.
UCAM-CL-TR-746
Shishir Nagaraja, Ross Anderson:
The snooping dragon:
social-malware surveillance of the
Tibetan movement
March 2009, 12 pages, PDF
Abstract: In this note we document a case of malwarebased electronic surveillance of a political organisation
by the agents of a nation state. While malware attacks
are not new, two aspects of this case make it worth
serious study. First, it was a targeted surveillance attack designed to collect actionable intelligence for use
by the police and security services of a repressive state,
with potentially fatal consequences for those exposed.
Second, the modus operandi combined social phishing
with high-grade malware. This combination of wellwritten malware with well-designed email lures, which
we call social malware, is devastatingly effective. Few
organisations outside the defence and intelligence sector could withstand such an attack, and although this
particular case involved the agents of a major power,
the attack could in fact have been mounted by a capable motivated individual. This report is therefore of
importance not just to companies who may attract the
attention of government agencies, but to all organisations. As social-malware attacks spread, they are bound
to target people such as accounts-payable and payroll
staff who use computers to make payments. Prevention will be hard. The traditional defence against social malware in government agencies involves expensive and intrusive measures that range from mandatory
access controls to tiresome operational security procedures. These will not be sustainable in the economy as
a whole. Evolving practical low-cost defences against
social-malware attacks will be a real challenge.
UCAM-CL-TR-747
Fei Song, Hongke Zhang, Sidong Zhang,
Fernando Ramos, Jon Crowcroft:
An estimator of forward and
backward delay for multipath
transport
March 2009, 16 pages, PDF
Abstract: Multipath transport protocols require awareness of the capability of different paths being used for
transmission. It is well known that round trip time
(RTT) can be used to estimate retransmission timeout
with reasonable accuracy. However, using RTT to evaluate the delay of forward or backward paths is not always suitable. In fact, these paths are usually dissimilar,
169
and therefore the packet delay can be significantly different in each direction.
We propose a forward and backward delay estimator that aims to solve this problem. Based on the results of the estimator, a new retransmission heuristic
mechanism for multipath transport is proposed. With
this same technique we also build two other heuristics: A bottleneck bandwidth estimator and a shared
congestion detector. These help the sender to choose
the high bandwidth path in retransmission and ensure
TCP-friendliness in multipath transport, respectively.
UCAM-CL-TR-748
Abstract: Complex communication networks, more
particular Mobile Ad Hoc Networks (MANET) and
Pocket Switched Networks (PSN), rely on short range
radio and device mobility to transfer data across the
network. These kind of mobile networks contain duality in nature: they are radio networks at the same time
also human networks, and hence knowledge from social networks can be also applicable here. In this paper, we demonstrate how identifying social communities can significantly improve the forwarding efficiencies in term of delivery ratio and delivery cost. We verify our hypothesis using data from five human mobility experiments and test on two application scenarios,
asynchronous messaging and publish/subscribe service.
Marco Canini, Wei Li, Andrew W. Moore:
UCAM-CL-TR-750
GTVS: boosting the collection of
application traffic ground truth
Marco Canini, Wei Li, Martin Zadnik,
Andrew W. Moore:
April 2009, 20 pages, PDF
Abstract: Interesting research in the areas of traffic classification, network monitoring, and application-orient
analysis can not proceed with real trace data labeled
with actual application information. However, handlabeled traces are an extremely valuable but scarce resource in the traffic monitoring and analysis community, as a result of both privacy concerns and technical
difficulties: hardly any possibility exists for payloaded
data to be released to the public, while the intensive
labor required for getting the ground-truth application
information from the data severely constrains the feasibility of releasing anonymized versions of hand-labeled
payloaded data.
The usual way to obtain the ground truth is fragile, inefficient and not directly comparable from one’s
work to another. This chapter proposes and details a
methodology that significantly boosts the efficiency in
compiling the application traffic ground truth. In contrast with other existing work, our approach maintains
the high certainty as in hand-verification, while striving to save time and labor required for that. Further, it
is implemented as an easy hands-on tool suite which is
now freely available to the public.
In this paper we present a case study using a 30
minute real data trace to guide the readers through our
ground-truth classification process. We also present a
method, which is an extension of GTVS that efficiently
classifies HTTP traffic by its purpose.
UCAM-CL-TR-749
Pan Hui, Eiko Yoneki, Jon Crowcroft,
Shu-Yan Chan:
Identifying social communities
in complex communications for
network efficiency
May 2009, 14 pages, PDF
AtoZ: an automatic traffic organizer
using NetFPGA
May 2009, 27 pages, PDF
Abstract: This paper introduces AtoZ, an automatic
traffic organizer that provides end-users with control of how their applications use network resources.
Such an approach contrasts with the moves of many
ISPs towards network-wide application throttling and
provider-centric control of an application’s networkusage. AtoZ provides seamless per-application trafficorganizing on gigabit links, with minimal packet-delays
and no unintended packet drops.
The AtoZ combines the high-speed packet processing of the NetFPGA with an efficient flow-behavior
identification method. Currently users can enable AtoZ
control over network resources by prohibiting certain
applications and controlling the priority of others. We
discuss deployment experience and use real traffic to
illustrate how such an architecture enables several distinct features: high accuracy, high throughput, minimal
delay, and efficient packet labeling – all in a low-cost,
robust configuration that works alongside the home or
enterprise access-router.
UCAM-CL-TR-751
David C. Turner:
Nominal domain theory for
concurrency
July 2009, 185 pages, PDF
PhD thesis (Clare College, December 2008)
Abstract: Domain theory provides a powerful mathematical framework for describing sequential computation, but the traditional tools of domain theory are inapplicable to concurrent computation. Without a general mathematical framework it is hard to compare
170
developments and approaches from different areas of
study, leading to time and effort wasted in rediscovering old ideas in new situations.
A possible remedy to this situation is to build a
denotational semantics based directly on computation
paths, where a process denotes the set of paths that it
may follow. This has been shown to be a remarkably
powerful idea, but it lacks certain computational features. Notably, it is not possible to express the idea of
names and name-generation within this simple path semantics.
Nominal set theory is a non-standard mathematical foundation that captures the notion of names in a
general way. Building a mathematical development on
top of nominal set theory has the effect of incorporating names into its fabric at a low level. Importantly,
nominal set theory is sufficiently close to conventional
foundations that it is often straightforward to transfer
intuitions into the nominal setting.
Here the original path-based domain theory for
concurrency is developed within nominal set theory,
which has the effect of systematically adjoining namegeneration to the model. This gives rise to an expressive
metalanguage, Nominal HOPLA, which supports a notion of name-generation. Its denotational semantics is
given entirely in terms of universal constructions on domains. An operational semantics is also presented, and
relationships between the denotational and operational
descriptions are explored.
The generality of this approach to including name
generation into a simple semantic model indicates that
it will be possible to apply the same techniques to
more powerful domain theories for concurrency, such
as those based on presheaves.
UCAM-CL-TR-752
Gerhard P. Hancke:
The first part of this dissertation presents attacks
against current proximity identification systems. It documents how eavesdropping, skimming and relay attacks can be implemented against HF RFID systems.
Experimental setups and practical results are provided
for eavesdropping and skimming attacks performed
against RFID systems adhering to the ISO 14443 and
ISO 15693 standards. These attacks illustrate that the
limited operational range cannot prevent unauthorised
access to stored information on the token, or ensure
that transmitted data remains confidential. The practical implementation of passive and active relay attacks
against an ISO 14443 RFID system is also described.
The relay attack illustrates that proximity identification should not rely solely on the physical characteristics of the communication channel, even if it could be
shown to be location-limited. As a result, it is proposed
that additional security measures, such as distancebounding protocols, should be incorporated to verify
proximity claims. A new method, using cover noise, is
also proposed to make the backward communication
channel more resistant to eavesdropping attacks.
The second part of this dissertation discusses
distance-bounding protocols. These protocols determine an upper bound for the physical distance between
two parties. A detailed survey of current proposals,
investigating their respective merits and weaknesses,
identifies general principles governing secure distancebounding implementations. It is practically shown that
an attacker can circumvent the distance bound by implementing attacks at the packet and physical layer of
conventional communication channels. For this reason
the security of a distance bound depends not only on
the cryptographic protocol, but also on the time measurement provided by the underlying communication.
Distance-bounding protocols therefore require special
channels. Finally, a new distance-bounding protocol
and a practical implementation of a suitable distancebounding channel for HF RFID systems are proposed.
Security of proximity identification
systems
UCAM-CL-TR-753
John L. Miller, Jon Crowcroft:
July 2009, 161 pages, PDF
PhD thesis (Wolfson College, February 2008)
Abstract: RFID technology is the prevalent method for
implementing proximity identification in a number of
security sensitive applications. The perceived proximity
of a token serves as a measure of trust and is often used
as a basis for granting certain privileges or services. Ensuring that a token is located within a specified distance
of the reader is therefore an important security requirement. In the case of high-frequency RFID systems the
limited operational range of the near-field communication channel is accepted as implicit proof that a token
is in close proximity to a reader. In some instances, it is
also presumed that this limitation can provide further
security services.
Carbon: trusted auditing for
P2P distributed virtual environments
August 2009, 20 pages, PDF
Abstract: Many Peer-to-Peer Distributed Virtual Environments (P2P DVE’s) have been proposed, but none
are widely deployed. One significant barrier to deployment is lack of security. This paper presents Carbon,
a trusted auditing system for P2P DVE’s which provides reasonable security with low per-client overhead.
DVE’s using Carbon perform offline auditing to evaluate DVE client correctness. Carbon audits can be used
to catch DVE clients which break DVE rules – cheaters
– so the DVE can punish them. We analyze the impact of applying Carbon to a peer-to-peer game with
171
attributes similar to World of Warcraft. We show that
99.9% of cheaters – of a certain profile – can be caught
with guided auditing and 2.3% bandwidth overhead,
or 100% of cheaters can be caught with exhaustive auditing and 27% bandwidth overhead. The surprisingly
low overhead for exhaustive auditing is the result of
the small payload in most DVE packet updates, compared to the larger aggregate payloads in audit messages. Finally, we compare Carbon to PeerReview, and
show that for DVE scenarios Carbon consumes significantly less resources – in typical cases by an order of
magnitude – while sacrificing little protection.
UCAM-CL-TR-754
the skin-detached surface. It represents a detailed surface model as a peeled “skin” added over a simplified
surface model. The “skin” contains the details of the
surface while the simplified mesh maintains the basic
shape. The deformation process consists of three steps:
At the mesh loading stage, the “skin” is precomputed
according to the detailed mesh and detached from the
simplified mesh. Then we deform the simplified mesh
following the nonlinear gradient domain mesh editing
approach to satisfy the handle position constraints. Finally the detailed “skin” is remapped onto the simplified mesh, resulting in a deformed detailed mesh. We
investigate the advantages as well as the limitations of
our method by implementing a prototype system and
applying it to several examples.
Frank Stajano, Paul Wilson:
UCAM-CL-TR-756
Understanding scam victims:
seven principles for systems security
Hamed Haddadi, Damien Fay, Steve Uhlig,
Andrew W. Moore, Richard Mortier,
Almerima Jamakovic:
August 2009, 22 pages, PDF
An updated, abridged and peer-reviewed version
of this report appeared in Communications
of the ACM 54(3):70-75, March 2011
[doi:10.1145/1897852.1897872]. Please cite the
refereed CACM version in any related work.
Analysis of the Internet’s structural
evolution
September 2009, 13 pages, PDF
Abstract: The success of many attacks on computer systems can be traced back to the security engineers not
understanding the psychology of the system users they
meant to protect. We examine a variety of scams and
“short cons” that were investigated, documented and
recreated for the BBC TV programme The Real Hustle and we extract from them some general principles
about the recurring behavioural patterns of victims that
hustlers have learnt to exploit.
We argue that an understanding of these inherent “human factors” vulnerabilities, and the necessity
to take them into account during design rather than
naı̈vely shifting the blame onto the “gullible users”, is
a fundamental paradigm shift for the security engineer
which, if adopted, will lead to stronger and more resilient systems security.
UCAM-CL-TR-755
Yujian Gao, Aimin Hao, Qinping Zhao,
Neil A. Dodgson:
Skin-detached surface for interactive
large mesh editing
September 2009, 18 pages, PDF
Abstract: In this paper we study the structural evolution of the AS topology as inferred from two different
datasets over a period of seven years. We use a variety
of topological metrics to analyze the structural differences revealed in the AS topologies inferred from the
two different datasets. In particular, to focus on the
evolution of the relationship between the core and the
periphery, we make use of the weighted spectral distribution.
We find that the traceroute dataset has increasing
difficulty in sampling the periphery of the AS topology,
largely due to limitations inherent to active probing.
Such a dataset has too limited a view to properly observe topological changes at the AS-level compared to
a dataset largely based on BGP data. We also highlight
limitations in current measurements that require a better sampling of particular topological properties of the
Internet. Our results indicate that the Internet is changing from a core-centered, strongly customer-provider
oriented, disassortative network, to a soft-hierarchical,
peering-oriented, assortative network.
UCAM-CL-TR-757
Mark Adcock:
Improving cache performance by
Abstract: We propose a method for interactive deforruntime data movement
mation of large detailed meshes. Our method allows
the users to manipulate the mesh directly using freely- July 2009, 174 pages, PDF
selected handles on the mesh. To best preserve sur- PhD thesis (Christ’s College, June 2009)
face details, we introduce a new surface representation,
172
Abstract: The performance of a recursive data structure (RDS) increasingly depends on good data cache behaviour, which may be improved by software/hardware
prefetching or by ensuring that the RDS has a good data
layout. The latter is harder but more effective, and requires solving two separate problems: firstly ensuring
that new RDS nodes are allocated in a good location
in memory, and secondly preventing a degradation in
layout when the RDS changes shape due to pointer updates.
The first problem has been studied in detail, but
only two major classes of solutions to the second exist. Layout degradation may be side-stepped by using
a ‘cache-aware’ RDS, one designed to have inherently
good cache behaviour (e.g. using a B-Tree in place of
a binary search tree), but such structures are difficult
to devise and implement. A more automatic solution in
some languages is to use a ‘layout-improving’ garbage
collector, which attempt to improve heap data layout
during collection using online profiling of data access
patterns. This may carry large performance, memory
and latency overheads.
In this thesis we investigate the insertion of code
into a program which attempts to move RDS nodes
at runtime to prevent or reduce layout degradation.
Such code affects only the performance of a program
not its semantics. The body of this thesis is a thorough
and systematic evaluation of three different forms of
data movement. The first method adapts existing work
on static RDS data layout, performing ad-hoc single
node movements at a program’s pointer-update sites,
which is simple to apply and effective in practice, but
the performance gain may be hard to predict. The second method performs infrequent movement of larger
groups of nodes, borrowing techniques from garbage
collection but also embedding data movement in existing traversals of the RDS; the benefit of performing
additional data movement to compact the heap is also
demonstrated. The third method restores a pre-chosen
layout after each RDS pointer update, which is a complex but effective technique, and may be viewed both
as an optimisation and as a way of synthesising new
cache-aware RDSs.
Concentrating on both maximising performance
while minimising latency and extra memory usage, two
fundamental RDSs are used for the investigation, representative of two common data access patterns (linear and branching). The methods of this thesis compare
favourably to upper bounds on performance and to the
canonical cache-aware solutions. This thesis shows the
value of runtime data movement, and as well as producing optimisation useful in their own right may be
used to guide the design of future cache-aware RDSs
and layout-improving garbage collectors.
October 2009, 160 pages, PDF
PhD thesis (Churchill College, September 2009)
Abstract: Reasoning about concurrent programs is difficult because of the need to consider all possible interactions between concurrently executing threads. The
problem is especially acute for programs that manipulate shared heap-allocated data structures, since heapmanipulation provides more ways for threads to interact. Modular reasoning techniques sidestep this difficulty by considering every thread in isolation under
some assumptions on its environment.
In this dissertation we develop modular program
logics and program analyses for the verification of concurrent heap-manipulating programs. Our approach is
to exploit reasoning principles provided by program
logics to construct modular program analyses and to
use this process to obtain further insights into the logics. In particular, we build on concurrent separation
logic—a Hoare-style logic that allows modular manual
reasoning about concurrent programs written in a simple heap-manipulating programming language.
Our first contribution is to show the soundness
of concurrent separation logic without the conjunction rule and the restriction that resource invariants
be precise, and to construct an analysis for concurrent
heap-manipulating programs that exploits this modified reasoning principle to achieve modularity. The
analysis can be used to automatically verify a number of safety properties, including memory safety, datastructure integrity, data-race freedom, the absence of
memory leaks, and the absence of assertion violations.
We show that we can view the analysis as generating
proofs in our variant of the logic, which enables the use
of its results in proof-carrying code or theorem proving
systems.
Reasoning principles expressed by program logics
are most often formulated for only idealised programming constructs. Our second contribution is to develop
logics and analyses for modular reasoning about features present in modern languages and libraries for concurrent programming: storable locks (i.e., locks dynamically created and destroyed in the heap), first-order
procedures, and dynamically-created threads.
UCAM-CL-TR-759
Eric Koskinen, Matthew Parkinson,
Maurice Herlihy:
Coarse-grained transactions
(extended version)
August 2011, 34 pages, PDF
UCAM-CL-TR-758
Alexey Gotsman:
Logics and analyses for concurrent
heap-manipulating programs
Abstract: Traditional transactional memory systems
suffer from overly conservative conflict detection, yielding so-called false conflicts, because they are based on
fine-grained, low-level read/write conflicts. In response,
the recent trend has been toward integrating various
173
abstract data-type libraries using ad-hoc methods of
high-level conflict detection. These proposals have led
to improved performance but a lack of a unified theory
has led to confusion in the literature.
We clarify these recent proposals by defining a generalization of transactional memory in which a transaction consists of course-grained (abstract data-type)
operations rather than simply memory read/write operations. We provide semantics for both pessimistic
(e.g. transactional boosting) and optimistic (e.g. traditional TMs and recent alternatives) execution. We show
that both are included in the standard atomic semantics, yet find that the choice imposes different requirements on the coarse-grained operations: pessimistic requires operations be left-movers, optimistic requires
right-movers. Finally, we discuss how the semantics applies to numerous TM implementation details discussed
widely in the literature.
UCAM-CL-TR-760
Alan F. Blackwell, Lee Wilson, Alice Street,
Charles Boulton, John Knell:
Radical innovation:
crossing knowledge boundaries with
interdisciplinary teams
November 2009, 124 pages, PDF
Abstract: Interdisciplinary innovation arises from the
positive effects that result when stepping across the social boundaries that we structure knowledge by. Those
boundaries include academic disciplines, government
departments, companies’ internal functions, companies
and sectors, and the boundaries between these domains. In the knowledge economy, it is often the case
that the right knowledge to solve a problem is in a different place to the problem itself, so interdisciplinary
innovation is an essential tool for the future. There are
also many problems today that need more than one
kind of knowledge to solve them, so interdisciplinary
innovation is also an essential tool for the challenging
problems of today.
This report presents the results of an in-depth study
into successful interdisciplinary innovation, focusing on
the personal experiences of the people who achieve it.
It is complementary to organisational research, and to
research on the economic impact of innovation, but has
primarily adopted perspectives and methods from other
disciplines. Instead, this report has been developed by
a team that is itself interdisciplinary, with a particular
focus on anthropology, design research, and strategic
policy. It also draws on reports from expert witnesses
and invited commentators in many other fields.
UCAM-CL-TR-761
Jonathan J. Davies:
Programming networks of vehicles
November 2009, 292 pages, PDF
PhD thesis (Churchill College, September 2008)
Abstract: As computers become smaller in size and advances in communications technology are made, we hypothesise that a new range of applications involving
computing in road vehicles will emerge. These applications may be enabled by the future arrival of generalpurpose computing platforms in vehicles. Many of
these applications will involve the collection, processing and distribution of data sampled by sensors on large
numbers of vehicles. This dissertation is primarily concerned with addressing how these applications can be
designed and implemented by programmers.
We explore how a vehicular sensor platform may be
built and how data from a variety of sensors can be
sampled and stored. Applications exploiting such platforms will infer higher-level information from the raw
sensor data collected. We present the design and implementation of one such application which involves processing vehicles’ location histories into an up-to-date
road map.
Our experience shows that there is a problem with
programming this kind of application: the number of
vehicles and the nature of computational infrastructure
available are not known until the application is executed. By comparison, existing approaches to programming applications in wireless sensor networks tend to
assume that the nature of the network architecture is
known at design-time. This is not an appropriate assumption to make in vehicular sensor networks. Instead, this dissertation proposes that the functionality
of applications is designed and implemented at a higher
level and the problem of deciding how and where its
components are to be executed is left to a compiler. We
call this ‘late physical binding’.
This approach brings the benefit that applications
can be automatically adapted and optimised for execution in a wide range of environments. We describe a
suite of transformations which can change the order in
which components of the program are executed whilst
preserving its semantic integrity. These transformations
may affect several of the application’s characteristics
such as its execution time or energy consumption.
The practical utility of this approach is demonstrated through a novel programming language based
on Java. Two examples of diverse applications are presented which demonstrate that the language and compiler can be used to create non-trivial applications. Performance measurements show that the compiler can
introduce parallelism to make more efficient use of
resources and reduce an application’s execution time.
One of the applications belongs to a class of distributed systems beyond merely processing vehicular
sensor data, suggesting that the late physical binding
174
UCAM-CL-TR-763
paradigm has broader application to other areas of distributed computing.
Saar Drimer:
UCAM-CL-TR-762
Security for volatile FPGAs
Evangelia Kalyvianaki:
November 2009, 169 pages, PDF
Resource provisioning for virtualized
server applications
PhD thesis (Darwin College, August 2009)
November 2009, 161 pages, PDF
PhD thesis (Lucy Cavendish College, August 2008)
Abstract: Data centre virtualization creates an agile
environment for application deployment. Applications
run within one or more virtual machines and are hosted
on various servers throughout the data centre. One key
mechanism provided by modern virtualization technologies is dynamic resource allocation. Using this technique virtual machines can be allocated resources as
required and therefore, occupy only the necessary resources for their hosted application. In fact, two of the
main challenges faced by contemporary data centres,
server consolidation and power saving, can be tackled
efficiently by capitalising on this mechanism.
This dissertation shows how to dynamically adjust
the CPU resources allocated to virtualized server applications in the presence of workload fluctuations. In
particular it employs a reactive approach to resource
provisioning based on feedback control and introduces
five novel controllers. All five controllers adjust the application allocations based on past utilisation observations.
A subset of the controllers integrate the Kalman filtering technique to track the utilisations and based on
which they predict the allocations for the next interval.
This approach is particularly attractive for the resource
management problem since the Kalman filter uses the
evolution of past utilisations to adjust the allocations.
In addition, the adaptive Kalman controller which adjusts its parameters online and dynamically estimates
the utilisation dynamics, is able to differentiate substantial workload changes from small fluctuations for
unknown workloads.
In addition, this dissertation captures, models, and
builds controllers based on the CPU resource coupling of application components. In the case of multitier applications, these controllers collectively allocate
resources to all application tiers detecting saturation
points across components. This results in them acting faster to workload variations than their single-tier
counterparts.
All controllers are evaluated against the Rubis
benchmark application deployed on a prototype virtualized cluster built for this purpose.
Abstract: With reconfigurable devices fast becoming
complete systems in their own right, interest in their
security properties has increased. While research on
“FPGA security” has been active since the early 2000s,
few have treated the field as a whole, or framed its challenges in the context of the unique FPGA usage model
and application space. This dissertation sets out to examine the role of FPGAs within a security system and
how solutions to security challenges can be provided. I
offer the following contributions:
I motivate authenticating configurations as an additional capability to FPGA configuration logic, and then
describe a flexible security protocol for remote reconfiguration of FPGA-based systems over insecure networks. Non-volatile memory devices are used for persistent storage when required, and complement the lack
of features in some FPGAs with tamper proofing in order to maintain specified security properties. A unique
advantage of the protocol is that it can be implemented
on some existing FPGAs (i.e., it does not require FPGA
vendors to add functionality to their devices). Also proposed is a solution to the “IP distribution problem”
where designs from multiple sources are integrated into
a single bitstream, yet must maintain their confidentiality.
I discuss the difficulty of reproducing and comparing FPGA implementation results reported in the
academic literature. Concentrating on cryptographic
implementations, problems are demonstrated through
designing three architecture-optimized variants of the
AES block cipher and analyzing the results to show
that single figures of merit, namely “throughput” or
“throughput per slice”, are often meaningless without
the context of an application. To set a precedent for reproducibility in our field, the HDL source code, simulation testbenches and compilation instructions are made
publicly available for scrutiny and reuse.
Finally, I examine payment systems as ubiquitous
embedded devices, and evaluate their security vulnerabilities as they interact in a multi-chip environment. Using FPGAs as an adversarial tool, a man-in-the-middle
attack against these devices is demonstrated. An FPGAbased defense is also demonstrated: the first secure
wired “distance bounding” protocol implementation.
This is then put in the context of securing reconfigurable systems.
UCAM-CL-TR-764
Caroline V. Gasperin:
175
Statistical anaphora resolution in
biomedical texts
December 2009, 124 pages, PDF
PhD thesis (Clare Hall, August 2008)
Abstract: This thesis presents a study of anaphora in
biomedical scientific literature and focuses on tackling
the problem of anaphora resolution in this domain.
Biomedical literature has been the focus of many information extraction projects; there are, however, very
few works on anaphora resolution in biomedical scientific full-text articles. Resolving anaphora is an important step in the identification of mentions of biomedical
entities about which information could be extracted.
We have identified coreferent and associative
anaphoric relations in biomedical texts. Among associative relations we were able to distinguish 3 main
types: biotype, homolog and set-member relations. We
have created a corpus of biomedical articles that are
annotated with anaphoric links between noun phrases
referring to biomedical entities of interest. Such noun
phrases are typed according to a scheme that we have
developed based on the Sequence Ontology; it distinguishes 7 types of entities: gene, part of gene, product
of gene, part of product, subtype of gene, supertype of
gene and gene variant.
We propose a probabilistic model for the resolution of anaphora in biomedical texts. The model
seeks to find the antecedents of anaphoric expressions,
both coreferent and associative, and also to identify
discourse-new expressions. The model secures good
performance despite being trained on a small corpus:
it achieves 55-73% precision and 57-63% recall on
coreferent cases, and reasonable performance on different classes of associative cases. We compare the performance of the model with a rule-based baseline system
that we have also developed, a naive Bayes system and
a decision trees system, showing that the ours outperforms the others.
We have experimented with active learning in order
to select training samples to improve the performance
of our probabilistic model. It was not, however, more
successful than random sampling.
properties, or alternatively are manual and labour intensive to apply; few target realistically modelled machine code. The work presented in this dissertation
aims to ease the effort required in proving properties of
programs on top of detailed models of machine code.
The contributions are novel approaches for both verification of existing programs and methods for automatically constructing correct code.
For program verification, this thesis presents a new
approach based on translation: the problem of proving
properties of programs is reduced, via fully-automatic
deduction, to a problem of proving properties of recursive functions. The translation from programs to recursive functions is shown to be implementable in a theorem prover both for simple while-programs as well
as real machine code. This verification-after-translation
approach has several advantages over established approaches of verification condition generation. In particular, the new approach does not require annotating the
program with assertions. More importantly, the proposed approach separates the verification proof from
the underlying model so that specific resource names,
some instruction orderings and certain control-flow
structures become irrelevant. As a result proof reuse is
enabled to a greater extent than in currently used methods. The scalability of this new approach is illustrated
through the verification of ARM, x86 and PowerPC implementations of a copying garbage collector.
For construction of correct code this thesis presents
a new compiler which maps functions from logic, via
proof, down to multiple carefully modelled commercial
machine languages. Unlike previously published work
on compilation from higher-order logic, this compiler
allows input functions to be partially specified and supports a broad range of user-defined extensions. These
features enabled the production of formally verified
machine-code implementations of a LISP interpreter, as
a case study.
The automation and proofs have been implemented
in the HOL4 theorem prover, using a new machinecode Hoare triple instantiated to detailed specifications
of ARM, x86 and PowerPC instruction set architectures.
UCAM-CL-TR-766
UCAM-CL-TR-765
Julian M. Smith:
Magnus O. Myreen:
Towards robust inexact geometric
computation
Formal verification of machine-code
programs
December 2009, 186 pages, PDF
PhD thesis (St. Edmund’s College, July 2009)
December 2009, 109 pages, PDF
PhD thesis (Trinity College, December 2008)
Abstract: Formal program verification provides mathematical means of increasing assurance for the correctness of software. Most approaches to program verification are either fully automatic and prove only weak
Abstract: Geometric algorithms implemented using
rounded arithmetic are prone to robustness problems.
Geometric algorithms are often a mix of arithmetic and
combinatorial computations, arising from the need to
create geometric data structures that are themselves a
176
complex mix of numerical and combinatorial data. Decisions that influence the topology of a geometric structure are made on the basis of certain arithmetic calculations, but the inexactness of these calculations may lead
to inconsistent decisions, causing the algorithm to produce a topologically invalid result or to fail catastrophically. The research reported here investigates ways to
produce robust algorithms with inexact computation.
I present two algorithms for operations on piecewise
linear (polygonal/polyhedral) shapes. Both algorithms
are topologically robust, meaning that they are guaranteed to generate a topologically valid result from a
topologically valid input, irrespective of numerical errors in the computations. The first algorithm performs
the Boolean operation in 3D, and also in 2D. The
main part of this algorithm is a series of interdependent operations. The relationship between these operations ensures a consistency in these operations, which, I
prove, guarantees the generation of a shape representation with valid topology. The basic algorithm may
generate geometric artifacts such as gaps and slivers,
which generally can be removed by a data-smoothing
post-process. The second algorithm presented performs
simplification in 2D, converting a geometrically invalid
(but topologically valid) shape representation into one
that is fully valid. This algorithm is based on a variant
of the Bentley-Ottmann sweep line algorithm, but with
additional rules to handle situations not possible under
an exact implementation.
Both algorithms are presented in the context of what
is required of an algorithm in order for it to be classed
as robust in some sense. I explain why the formulaic approach used for the Boolean algorithm cannot readily
be used for the simplification process. I also give essential code details for a C++ implementation of the 2D
simplification algorithm, and discuss the results of extreme tests designed to show up any problems. Finally, I
discuss floating-point arithmetic, present error analysis
for the floating-point computation of the intersection
point between two segments in 2D, and discuss how
such errors affect both the simplification algorithm and
the basic Boolean algorithm in 2D.
UCAM-CL-TR-767
Massimo Ostilli, Eiko Yoneki,
Ian X. Y. Leung, Jose F. F. Mendes,
Pietro Lió, Jon Crowcroft:
Ising model of rumour spreading in
interacting communities
January 2010, 24 pages, PDF
Abstract: We report a preliminary investigation on interactions between communities in a complex network
using the Ising model to analyse the spread of information among real communities. The inner opinion of a
given community is forced to change through the introduction of a unique external source and we analyse how the other communities react to this change.
We model two conceptual external sources: namely,
“Strong-belief”, and “propaganda”, by an infinitely
strong inhomogeneous external field and a finite uniform external field, respectively. In the former case, the
community changes independently from other communities while in the latter case according also to interactions with the other communities. We apply our model
to synthetic networks as well as various real world
data ranging from human physical contact networks to
online social networks. The experimental results using
real world data clearly demonstrate two distinct scenarios of phase transitions characterised by the presence of
strong memory effects when the graph and coupling parameters are above a critical threshold.
UCAM-CL-TR-768
Cecily Morrison, Adona Iosif, Miklos Danka:
Report on existing open-source
electronic medical records
February 2010, 12 pages, PDF
Abstract: In this report we provide an overview of existing open-source electronic medical records and assess
them against the criteria established by the EViDence
group.
UCAM-CL-TR-769
Sriram Srinivasan:
Kilim: A server framework with
lightweight actors, isolation types and
zero-copy messaging
February 2010, 127 pages, PDF
PhD thesis (King’s College, February 2010)
Abstract: Internet services are implemented as hierarchical aggregates of communicating components: networks of data centers, networks of clusters in a data
center, connected servers in a cluster, and multiple virtual machines on a server machine, each containing several operating systems processes. This dissertation argues for extending this structure to the intra-process
level, with networks of communicating actors. An actor is a single-threaded state machine with a private
heap and a thread of its own. It communicates with
other actors using well-defined and explicit messaging
protocols. Actors must be light enough to comfortably
match the inherent concurrency in the problem space,
and to exploit all available parallelism. Our aims are
two-fold: (a) to treat SMP systems as they really are:
distributed systems with eventual consistency, and (b)
recognize from the outset that a server is always part of
a larger collection of communicating components, thus
eliminating the mindset mismatch between concurrent
programming and distributed programming.
177
Although the actor paradigm is by no means new,
our design points are informed by drawing parallels
between the macro and micro levels. As with components in a distributed system, we expect that actors
must be isolatable in a number of ways: memory isolation, fault isolation, upgrade isolation, and execution
isolation. The application should be able to have a say
in actor placement and scheduling, and actors must be
easily monitorable.
Our primary contribution is in showing that these
requirements can be satisfied in a language and environment such as Java, without changes to the source
language or to the virtual machine, and without leaving
much of the idiomatic ambit of Java, with its mindset
of pointers and mutable state. In other words, one does
not have to move to a concurrency-oriented language
or to an entirely immutable object paradigm.
We demonstrate an open-source toolkit called Kilim
that provides (a) ultra-lightweight actors (faster and
lighter than extant environments such as Erlang), (b) a
type system that guarantees memory isolation between
threads by separating internal objects from exportable
messages and by enforcing ownership and structural
constraints on the latter (linearity and tree-structure, respectively) and, (c) a library with I/O support and customizable synchronization constructs and schedulers.
We show that this solution is simpler to program
than extant solutions, yet statically guaranteed to be
free of low-level data races. It is also faster, more scalable and more stable (in increasing scale) in two industrial strength evaluations: interactive web services
(comparing Kilim Web Server to Jetty) and databases
(comparing Berkeley DB to a Kilim variant of it).
UCAM-CL-TR-770
Jatinder Singh:
Controlling the dissemination and
disclosure of healthcare events
February 2010, 193 pages, PDF
Healthcare requires mechanisms to strictly control information dissemination. Many solutions fail
to account for the scale and heterogeneity of the environment. Centrally managed data services impede
the local autonomy of health institutions, impacting security by diminishing accountability and increasing the risks/impacts of incorrect disclosures. Direct, synchronous (request-response) communication
requires an enumeration of every potential information source/sink. This is impractical when considering
health services at a national level. Healthcare presents a
data-driven environment highly amenable to an eventbased infrastructure, which can inform, update and
alert relevant parties of incidents as they occur. Eventbased data dissemination paradigms, while efficient
and scalable, generally lack the rigorous access control
mechanisms required for health infrastructure.
This dissertation describes how publish/subscribe,
an asynchronous, push-based, many-to-many middleware communication paradigm, is extended to include
mechanisms for actively controlling information disclosure. We present Interaction Control: a data-control
layer above a publish/subscribe service allowing the
definition of context-aware policy rules to authorise information channels, transform information and restrict
data propagation according to the circumstances. As
dissemination policy is defined at the broker-level and
enforced by the middleware, client compliance is ensured. Although policy enforcement involves extra processing, we show that in some cases the control mechanisms can actually improve performance over a general
publish/subscribe implementation. We build Interaction
Control mechanisms into integrated database-brokers
to provide a rich representation of state; while facilitating audit, which is essential for accountability.
Healthcare requires the sharing of sensitive information across federated domains of administrative control. Interaction Control provides the means for balancing the competing concerns of information sharing and
protection. It enables those responsible for information
to meet their data management obligations, through
specification of fine-grained disclosure policy.
PhD thesis (St. John’s College, September 2009)
UCAM-CL-TR-771
Abstract: Information is central to healthcare: for
proper care, information must be shared. Modern
healthcare is highly collaborative, involving interactions between users from a range of institutions, including primary and secondary care providers, researchers,
government and private organisations. Each has specific data requirements relating to the service they provide, and must be informed of relevant information as
it occurs.
Personal health information is highly sensitive.
Those who collect/hold data as part of the care process
are responsible for protecting its confidentiality, in line
with patient consent, codes of practice and legislation.
Ideally, one should receive only that information necessary for the tasks they perform—on a need-to-know
basis.
Cecily Morrison:
Bodies-in-Space: investigating
technology usage in co-present group
interaction
March 2010, 147 pages, PDF
PhD thesis (Darwin College, August 2009)
Abstract: With mobile phones in people’s pockets, digital devices in people’s homes, and information systems
in group meetings at work, technology is frequently
present when people interact with each other. Unlike
devices used by a single person at a desk, people, rather
than machines, are the main focus in social settings. An
178
important difference then between these two scenarios,
individual and group, is the role of the body. Although
non-verbal behaviour is not part of human-computer
interaction, it is very much part of human-human interaction. This dissertation explores bodies-in-space —
people’s use of spatial and postural positioning of their
bodies to maintain a social interaction when technology is supporting the social interaction of a co-present
group.
I begin this dissertation with a review of literature,
looking at how and when bodies-in-space have been accounted for in research and design processes of technology for co-present groups. I include examples from
both human-computer interaction, as well the social
sciences more generally. Building on this base, the following four chapters provide examples and discussion
of methods to: (1) see (analytically), (2) notate, (3) adjust (choreograph), and (4) research in the laboratory,
bodies-in-space. I conclude with reflections on the value
of capturing bodies-in-space in the process of designing
technology for co-present groups and emphasise a trend
towards end-user involvement and its consequences for
the scope of human-computer interaction research.
All of the research in this dissertation derives from,
and relates to, the real-world context of an intensive
care unit of a hospital and was part of assessing the
deployment of an electronic patient record.
UCAM-CL-TR-772
complete embedding of schematic inference rules. We
turn to program equivalence and define a standard notion of operational equivalence between alphaML expressions and use this to prove correctness results about
the representation of data terms involving binding and
about schematic formulae and inductive definitions.
The fact that binding can be represented correctly
in alphaML is interesting for technical reasons, because
the language dispenses with the notion of globally distinct names present in most systems based on nominal methods. These results, along with the encoding of
inference rules, constitute the main technical payload
of the dissertation. However, our approach complicates the solving of constraints between terms. Therefore, we develop a novel algorithm for solving equality
and freshness constraints between nominal terms which
does not rely on standard devices such as swappings
and suspended permutations. Finally, we discuss an implementation of alphaML, and conclude with a summary of the work and a discussion of possible future
extensions.
UCAM-CL-TR-773
Thomas J. Cashman:
NURBS-compatible subdivision
surfaces
March 2010, 99 pages, PDF
Matthew R. Lakin:
PhD thesis (Queens’ College, January 2010)
An executable meta-language for
inductive definitions with binders
Abstract: Two main technologies are available to design
and represent freeform surfaces: Non-Uniform Rational B-Splines (NURBS) and subdivision surfaces. Both
representations are built on uniform B-splines, but they
extend this foundation in incompatible ways, and different industries have therefore established a preference
for one representation over the other. NURBS are the
dominant standard for Computer-Aided Design, while
subdivision surfaces are popular for applications in animation and entertainment. However there are benefits of subdivision surfaces (arbitrary topology) which
would be useful within Computer-Aided Design, and
features of NURBS (arbitrary degree and non-uniform
parametrisations) which would make good additions to
current subdivision surfaces.
I present NURBS-compatible subdivision surfaces,
which combine topological freedom with the ability
to represent any existing NURBS surface exactly. Subdivision schemes that extend either non-uniform or
general-degree B-spline surfaces have appeared before,
but this dissertation presents the first surfaces able to
handle both challenges simultaneously. To achieve this
I develop a novel factorisation of knot insertion rules
for non-uniform, general-degree B-splines.
Many subdivision surfaces have poor second-order
behaviour near singularities. I show that it is possible
to bound the curvatures of the general-degree subdivision surfaces created using my factorisation. Boundedcurvature surfaces have previously been created by
March 2010, 171 pages, PDF
PhD thesis (Queens’ College, March 2010)
Abstract: A testable prototype can be invaluable for
identifying bugs during the early stages of language development. For such a system to be useful in practice it
should be quick and simple to generate prototypes from
the language specification.
This dissertation describes the design and development of a new programming language called alphaML,
which extends traditional functional programming languages with specific features for producing correct, executable prototypes. The most important new features
of alphaML are for the handling of names and binding structures in user-defined languages. To this end, alphaML uses the techniques of nominal sets (due to Pitts
and Gabbay) to represent names explicitly and handle
binding correctly up to alpha-renaming. The language
also provides built-in support for constraint solving and
non-deterministic search.
We begin by presenting a generalised notion of systems defined by a set of schematic inference rules. This
is our model for the kind of languages that might be
implemented using alphaML. We then present the syntax, type system and operational semantics of the alphaML language and proceed to define a sound and
179
‘tuning’ uniform low-degree subdivision schemes; this
dissertation shows that general-degree schemes can be
tuned in a similar way. As a result, I present the first
general-degree subdivision schemes with bounded curvature at singularities.
Previous subdivision schemes, both uniform and
non-uniform, have inserted knots indiscriminately, but
the factorised knot insertion algorithm I describe in this
dissertation grants the flexibility to insert knots selectively. I exploit this flexibility to preserve convexity in
highly non-uniform configurations, and to create locally uniform regions in place of non-uniform knot intervals. When coupled with bounded-curvature modifications, these techniques give the first non-uniform
subdivision schemes with bounded curvature.
I conclude by combining these results to present
NURBS-compatible subdivision surfaces: arbitrarytopology, non-uniform and general-degree surfaces
which guarantee high-quality second-order surface
properties.
UCAM-CL-TR-774
John Wickerson, Mike Dodds,
Matthew Parkinson:
Explicit stabilisation for modular
rely-guarantee reasoning
March 2010, 29 pages, PDF
Abstract: We propose a new formalisation of stability
for Rely-Guarantee, in which an assertion’s stability is
encoded into its syntactic form. This allows two advances in modular reasoning. Firstly, it enables RelyGuarantee, for the first time, to verify concurrent libraries independently of their clients’ environments.
Secondly, in a sequential setting, it allows a module’s
internal interference to be hidden while verifying its
clients. We demonstrate our approach by verifying, using RGSep, the Version 7 Unix memory manager, uncovering a twenty-year-old bug in the process.
UCAM-CL-TR-775
and Sendmail have had numerous security issues ranging from low-level buffer overflows to subtle protocol
logic errors. These problems have cost billions of dollars as the growth of the Internet exposes increasing
numbers of computers to electronic malware. Despite
the decades of research on techniques such as modelchecking, type-safety and other forms of formal analysis, the vast majority of server implementations continue to be written unsafely and informally in C/C++.
In this dissertation we propose an architecture for
constructing new implementations of standard Internet protocols which integrates mature formal methods not currently used in deployed servers: (i) static
type systems from the ML family of functional languages; (ii) model checking to verify safety properties
exhaustively about aspects of the servers; and (iii) generative meta-programming to express high-level constraints for the domain-specific tasks of packet parsing
and constructing non-deterministic state machines. Our
architecture—dubbed MELANGE—is based on Objective Caml and contributes two domain-specific languages: (i) the Meta Packet Language (MPL), a data
description language used to describe the wire format
of a protocol and output statically type-safe code to
handle network traffic using high-level functional data
structures; and (ii) the Statecall Policy Language (SPL)
for constructing non-deterministic finite state automata
which are embedded into applications and dynamically
enforced, or translated into PROMELA and statically
model-checked.
Our research emphasises the importance of delivering efficient, portable code which is feasible to deploy across the Internet. We implemented two complex protocols—SSH and DNS—to verify our claims,
and our evaluation shows that they perform faster than
their standard counterparts OpenSSH and BIND, in
addition to providing static guarantees against some
classes of errors that are currently a major source of
security problems.
UCAM-CL-TR-776
Kathryn E. Gray, Alan Mycroft:
System tests from unit tests
Anil Madhavapeddy:
March 2010, 27 pages, PDF
Creating high-performance,
statically type-safe network
applications
Abstract: Large programs have bugs; software engineering practices reduce the number of bugs in deployed
systems by relying on a combination of unit tests, to filter out bugs in individual procedures, and system tests,
to identify bugs in an integrated system.
Our previous work showed how Floyd-Hoare
triples, {P}C{Q}, could also be seen as unit tests, i.e.
formed a link between verification and test-based validation. A transactional-style implementation allows
test post-conditions to refer to values of data structures
both before and after test execution. Here we argue that
this style of specifications, with a transactional implementation, provide a novel source of system tests.
March 2010, 169 pages, PDF
PhD thesis (Robinson College, April 2006)
Abstract: A typical Internet server finds itself in the
middle of a virtual battleground, under constant threat
from worms, viruses and other malware seeking to subvert the original intentions of the programmer. In particular, critical Internet servers such as OpenSSH, BIND
180
Given a set of unit tests for a system, we can run
programs in test mode on real data. Based on an analysis of the unit tests, we intersperse the program’s execution with the pre- and post-conditions from the test
suite to expose bugs or incompletenesses in either the
program or the test suite itself. We use the results of
these tests, as well as branch-trace coverage information, to identify and report anomalies in the running
program.
this investigation I consider the converse problem, of
ensuring that an instance of communication between
computer systems leaves behind no unequivocal evidence of its having taken place. Features of communications protocols that were seen as defects from the
standpoint of non-repudiation can be seen as benefits
from the standpoint of this converse problem, which I
call “plausible deniability”.
UCAM-CL-TR-781
UCAM-CL-TR-777
Minor E. Gordon:
Thomas Dinsdale-Young, Mike Dodds,
Philippa Gardner, Matthew Parkinson,
Viktor Vafeiadis:
Stage scheduling for CPU-intensive
servers
Concurrent Abstract Predicates
June 2010, 119 pages, PDF
PhD thesis (Jesus College, December 2009)
April 2010, 43 pages, PDF
Abstract: Abstraction is key to understanding and reasoning about large computer systems. Abstraction is
simple to achieve if the relevant data structures are disjoint, but rather difficult when they are partially shared,
as is the case for concurrent modules. We present a program logic for reasoning abstractly about data structures, that provides a fiction of disjointness and permits compositional reasoning. The internal details of a
module are completely hidden from the client by concurrent abstract predicates. We reason about a module’s implementation using separation logic with permissions, and provide abstract specifications for use by
client programs using concurrent abstract predicates.
We illustrate our abstract reasoning by building two
implementations of a lock module on top of hardware
instructions, and two implementations of a concurrent
set module on top of the lock module.
UCAM-CL-TR-780
Abstract: The increasing prevalence of multicore, multiprocessor commodity hardware calls for server software architectures that are cycle-efficient on individual cores and can maximize concurrency across an entire machine. In order to achieve both ends this dissertation advocates stage architectures that put software
concurrency foremost and aggressive CPU scheduling
that exploits the common structure and runtime behavior of CPU-intensive servers. For these servers user-level
scheduling policies that multiplex one kernel thread per
physical core can outperform those that utilize pools of
worker threads per stage on CPU-intensive workloads.
Boosting the hardware efficiency of servers in userspace
means a single machine can handle more users without
tuning, operating system modifications, or better hardware.
UCAM-CL-TR-782
Jonathan M. Hayman:
Michael Roe:
Petri net semantics
Cryptography and evidence
June 2010, 252 pages, PDF
PhD thesis (Darwin College, January 2009)
May 2010, 75 pages, PDF
PhD thesis (Clare College, April 1997)
Abstract: The invention of public-key cryptography led
to the notion that cryptographically protected messages
could be used as evidence to convince an impartial adjudicator that a disputed event had in fact occurred.
Information stored in a computer is easily modified,
and so records can be falsified or retrospectively modified. Cryptographic protection prevents modification,
and it is hoped that this will make cryptographically
protected data acceptable as evidence. This usage of
cryptography to render an event undeniable has become known as non-repudiation. This dissertation is an
enquiry into the fundamental limitations of this application of cryptography, and the disadvantages of the
techniques which are currently in use. In the course of
Abstract: Petri nets are a widely-used model for concurrency. By modelling the effect of events on local components of state, they reveal how the events of a process
interact with each other, and whether they can occur
independently of each other by operating on disjoint
regions of state.
Despite their popularity, we are lacking systematic
syntax-driven techniques for defining the semantics of
programming languages inside Petri nets in an analogous way that Plotkin’s Structural Operational Semantics defines a transition system semantics. The first part
of this thesis studies a generally-applicable framework
for the definition of the net semantics of a programming
language.
The net semantics is used to study concurrent separation logic, a Hoare-style logic used to prove partial
181
correctness of pointer-manipulating concurrent programs. At the core of the logic is the notion of separation of ownership of state, allowing us to infer that
proven parallel processes operate on the disjoint regions of the state that they are seen to own. In this
thesis, a notion of validity of the judgements capturing
the subtle notion of ownership is given and soundness
of the logic with respect to this model is shown. The
model is then used to study the independence of processes arising from the separation of ownership. Following from this, a form of refinement is given which
is capable of changing the granularity assumed of the
program’s atomic actions.
Amongst the many different models for concurrency, there are several forms of Petri net. Category
theory has been used in the past to establish connections between them via adjunctions, often coreflections,
yielding common constructions across the models and
relating concepts such as bisimulation. The most general forms of Petri net have, however, fallen outside this
framework. Essentially, this is due to the most general
forms of net having an implicit symmetry in their behaviour that other forms of net cannot directly represent.
The final part of this thesis shows how an abstract
framework for defining symmetry in models can be applied to obtain categories of Petri net with symmetry.
This is shown to recover, up to symmetry, the universal characterization of unfolding operations on general
Petri nets, allowing coreflections up to symmetry between the category of general Petri nets and other categories of net.
difficult or impossible. To prolong the lifetime of the
system, it is vital that these energy resources are used
efficiently. Further complications arise due to the distributed nature of pervasive computing systems. The
lack of a global clock can make it impossible to order events from different sources. Events may be delayed or lost en route to their destination, making it
difficult to perform timely and accurate complex event
detection. Finally, pervasive computing systems may be
large, both geographically and in terms of the number
of sensors. Architectures to support pervasive computing applications should therefore be highly scalable.
We make several contributions in this dissertation.
Firstly, we present a flexible language for specifying
complex event patterns. The language provides developers with a variety of parameters to control the detection process, and is designed for use in an open distributed environment. Secondly, we provide the ability for applications to specify a variety of detection
policies. These policies allow the system to determine
the best way of handling lost and delayed events. Of
particular interest is our ‘no false-positive’ detection
policy. This allows a reduction in detection latency
while ensuring that only correct events are generated
for applications sensitive to false positives. Finally, we
show how complex event detector placement can be
optimized over a federated event-based middleware. In
many cases, detector distribution can reduce unnecessary communication with resource constrained sensors.
UCAM-CL-TR-784
Ripduman Sohan, Andrew Rice,
Andrew W. Moore, Kieran Mansley:
UCAM-CL-TR-783
Characterizing 10 Gbps network
interface energy consumption
Dan O’Keeffe:
Distributed complex event detection
for pervasive computing
July 2010, 10 pages, PDF
July 2010, 170 pages, PDF
PhD thesis (St. John’s College, December 2009)
Abstract: Pervasive computing is a model of information processing that augments computers with sensing
capabilities and distributes them into the environment.
Many pervasive computing applications are reactive
in nature, in that they perform actions in response to
events (i.e. changes in state of the environment). However, these applications are typically interested in highlevel complex events, in contrast to the low-level primitive events produced by sensors. The goal of this thesis
is to support the detection of complex events by filtering, aggregating, and combining primitive events.
Supporting complex event detection in pervasive
computing environments is a challenging problem. Sensors may have limited processing, storage, and communication capabilities. In addition, battery powered sensing devices have limited energy resources. Since they
are embedded in the environment, recharging may be
Abstract: Understanding server energy consumption is
fast becoming an area of interest given the increase in
the per-machine energy footprint of modern servers and
the increasing number of servers required to satisfy demand. In this paper we (i) quantify the energy overhead
of the network subsystem in modern servers by measuring, reporting and analyzing power consumption in
six 10 Gbps and four 1 Gbps interconnects at a finegrained level; (ii) introduce two metrics for calculating
the energy efficiency of a network interface from the
perspective of network throughput and host CPU usage; (iii) compare the efficiency of multiport 1 Gbps interconnects as an alternative to 10 Gbps interconnects;
and (iv) conclude by offering recommendations for improving network energy efficiency for system deployment and network interface designers.
182
UCAM-CL-TR-785
Aaron R. Coble:
Anonymity, information,
and machine-assisted proof
July 2010, 171 pages, PDF
PhD thesis (King’s College, January 2010)
Abstract: This report demonstrates a technique for
proving the anonymity guarantees of communication
systems, using a mechanised theorem-prover. The approach is based on Shannon’s theory of information
and can be used to analyse probabilistic programs.
The information-theoretic metrics that are used for
anonymity provide quantitative results, even in the case
of partial anonymity. Many of the developments in this
text are applicable to information leakage in general,
rather than solely to privacy properties. By developing
the framework within a mechanised theorem-prover, all
proofs are guaranteed to be logically and mathematically consistent with respect to a given model. Moreover, the specification of a system can be parameterised
and desirable properties of the system can quantify over
those parameters; as a result, properties can be proved
about the system in general, rather than specific instances.
In order to develop the analysis framework described in this text, the underlying theories of information, probability, and measure had to be formalised in
the theorem-prover; those formalisations are explained
in detail. That foundational work is of general interest
and not limited to the applications illustrated here. The
meticulous, extensional approach that has been taken
ensures that mathematical consistency is maintained.
A series of examples illustrate how formalised information theory can be used to analyse and prove
the information leakage of programs modelled in the
theorem-prover. Those examples consider a number of
different threat models and show how they can be characterised in the framework proposed.
Finally, the tools developed are used to prove the
anonymity of the dining cryptographers (DC) protocol,
thereby demonstrating the use of the framework and
its applicability to proving privacy properties; the DC
protocol is a standard benchmark for new methods of
analysing anonymity systems. This work includes the
first machine-assisted proof of anonymity of the DC
protocol for an unbounded number of cryptographers.
UCAM-CL-TR-786
Arnab Banerjee:
Communication flows in
power-efficient Networks-on-Chips
August 2010, 107 pages, PDF
PhD thesis (Girton College, March 2009)
Abstract: Networks-on-Chips (NoCs) represent a scalable wiring solution for future chips, with dynamic
allocation-based networks able to provide good utilisation of the scarce available resources. This thesis develops power-efficient, dynamic, packet-switched NoCs
which can support on-chip communication flows.
Given the severe power constraint already present
in VLSI, a power efficient NoC design direction is first
developed. To accurately explore the impact of various
design parameters on NoC power dissipation, 4 different router designs are synthesised, placed and routed
in a 90nm process. This demonstrates that the power
demands are dominated by the data-path and not the
control-path, leading to the key finding that, from the
energy perspective, it is justifiable to use more computation to optimise communication.
A review of existing research shows the nearubiquitous nature of stream-like communication flows
in future computing systems, making support for flows
within NoCs critically important. It is shown that in
several situations, current NoCs make highly inefficient
use of network resources in the presence of communication flows. To resolve this problem, a scalable mechanism is developed to enable the identification of flows,
with a flow defined as all packets going to the same
destination. The number of virtual-channels that can
be used by a single flow is then limited to the minimum
required, ensuring efficient resource utilisation.
The issue of fair resource allocation between flows
is next investigated. The locally fair, packet-based allocation strategies of current NoCs are shown not to
provide fairness between flows. The mechanism already
developed to identify flows by their destination nodes
is extended to enable flows to be identified by sourcedestination address pairs. Finally, a modification to the
link scheduling mechanism is proposed to achieve maxmin fairness between flows.
UCAM-CL-TR-787
Daniel Bernhardt:
Emotion inference from human body
motion
October 2010, 227 pages, PDF
PhD thesis (Selwyn College, January 2010)
Abstract: The human body has evolved to perform sophisticated tasks from locomotion to the use of tools.
At the same time our body movements can carry information indicative of our intentions, inter-personal attitudes and emotional states. Because our body is specialised to perform a variety of everyday tasks, in most
situations emotional effects are only visible through
subtle changes in the qualities of movements and actions. This dissertation focuses on the automatic analysis of emotional effects in everyday actions.
In the past most efforts to recognise emotions from
the human body have focused on expressive gestures
183
which are archetypal and exaggerated expressions of
emotions. While these are easier to recognise by humans and computational pattern recognisers they very
rarely occur in natural scenarios. The principal contribution of this dissertation is hence the inference of
emotional states from everyday actions such as walking, knocking and throwing. The implementation of the
system draws inspiration from a variety of disciplines
including psychology, character animation and speech
recognition. Complex actions are modelled using Hidden Markov Models and motion primitives. The manifestation of emotions in everyday actions is very subtle
and even humans are far from perfect at picking up and
interpreting the relevant cues because emotional influences are usually minor compared to constraints arising
from the action context or differences between individuals.
This dissertation describes a holistic approach
which models emotional, action and personal influences in order to maximise the discriminability of different emotion classes. A pipeline is developed which
incrementally removes the biases introduced by different action contexts and individual differences. The resulting signal is described in terms of posture and dynamic features and classified into one of several emotion classes using statistically trained Support Vector
Machines. The system also goes beyond isolated expressions and is able to classify natural action sequences. I use Level Building to segment action sequences and combine component classifications using
an incremental voting scheme which is suitable for online applications. The system is comprehensively evaluated along a number of dimensions using a corpus of
motion-captured actions. For isolated actions I evaluate the generalisation performance to new subjects. For
action sequences I study the effects of reusing models trained on the isolated cases vs adapting models to
connected samples. The dissertation also evaluates the
role of modelling the influence of individual user differences. I develop and evaluate a regression-based adaptation scheme. The results bring us an important step
closer to recognising emotions from body movements,
embracing the complexity of body movements in natural scenarios.
UCAM-CL-TR-788
Byron Cook, Eric Koskinen, Moshe Vardi:
Branching-time reasoning for
programs
(extended version)
July 2011, 38 pages, PDF
Abstract: We describe a reduction from temporal property verification to a program analysis problem. Our
reduction is an encoding which, with the use of procedures and nondeterminism, enables existing interprocedural program analysis tools to naturally perform
the reasoning necessary for proving temporal properties (eg. backtracking, eventuality checking, tree counterexamples for branching-time properties, abstraction
refinement, etc.). Our reduction is state-based in nature
but also forms the basis of an efficient algorithm for
verifying trace-based properties, when combined with
an iterative symbolic determinization technique, due to
Cook and Koskinen.
In this extended version, we formalize our encoding as a guarded transition system G, parameterized
by a finite set of ranking functions and the temporal
logic property. We establish soundness between a safety
property of G and the validity of a branching time temporal logic property ∀CTL. ∀CTL is a sufficient logic
for proving properties written in the trace-based Linear
Temporal Logic via the iterative algorithm.
Finally using examples drawn from the PostgreSQL
database server, Apache web server, and Windows OS
kernel, we demonstrate the practical viability of our
work.
UCAM-CL-TR-789
Byron Cook, Eric Koskinen:
Making prophecies with decision
predicates
November 2010, 29 pages, PDF
Abstract: We describe a new algorithm for proving temporal properties expressed in LTL of infinite-state programs. Our approach takes advantage of the fact that
LTL properties can often be proved more efficiently using techniques usually associated with the branchingtime logic CTL than they can with native LTL algorithms. The caveat is that, in certain instances, nondeterminism in the system’s transition relation can cause
CTL methods to report counterexamples that are spurious with respect to the original LTL formula. To address this problem we describe an algorithm that, as
it attempts to apply CTL proof methods, finds and
then removes problematic nondeterminism via an analysis on the potentially spurious counterexamples. Problematic nondeterminism is characterized using decision
predicates, and removed using a partial, symbolic determinization procedure which introduces new prophecy
variables to predict the future outcome of these choices.
We demonstrate—using examples taken from the PostgreSQL database server, Apache web server, and Windows OS kernel—that our method can yield enormous
performance improvements in comparison to known
tools, allowing us to automatically prove properties of
programs where we could not prove them before.
UCAM-CL-TR-790
Ted Briscoe, Ben Medlock, Øistein Andersen:
184
Automated assessment of ESOL free
text examinations
November 2010, 31 pages, PDF
process mixture models. We review the unsupervised
learning method used, which allows the number of clusters discovered to be determined by the data. Furthermore, we introduce a new clustering evaluation measure that addresses some shortcomings of the existing
measures. Chapter 5 introduces a method of guiding
the clustering solution using pairwise links between instances. Furthermore, we present a method of selecting these pairwise links actively in order to decrease the
amount of supervision required. Finally, Chapter 6 assesses the contributions of this thesis and highlights directions for future work.
Abstract: In this report, we consider the task of automated assessment of English as a Second Language
(ESOL) examination scripts written in response to
prompts eliciting free text answers. We review and critically evaluate previous work on automated assessment
for essays, especially when applied to ESOL text. We
formally define the task as discriminative preference
ranking and develop a new system trained and tested
UCAM-CL-TR-792
on a corpus of manually-graded scripts. We show experimentally that our best performing system is very
close to the upper bound for the task, as defined by the James P. Bridge:
agreement between human examiners on the same corpus. Finally we argue that our approach, unlike extant Machine learning and automated
solutions, is relatively prompt-insensitive and resistant theorem proving
to subversion, even when its operating principles are
in the public domain. These properties make our ap- November 2010, 180 pages, PDF
proach significantly more viable for high-stakes assess- PhD thesis (Corpus Christi College, October 2010)
ment.
Abstract: Computer programs to find formal proofs of
theorems have a history going back nearly half a cenUCAM-CL-TR-791
tury. Originally designed as tools for mathematicians,
modern applications of automated theorem provers
Andreas Vlachos:
and proof assistants are much more diverse. In particular they are used in formal methods to verify software
Semi-supervised learning for
and hardware designs to prevent costly, or life threatening, errors being introduced into systems from mibiomedical information extraction
crochips to controllers for medical equipment or space
November 2010, 113 pages, PDF
rockets.
PhD thesis (Peterhouse College, December 2009)
Despite this, the high level of human expertise required in their use means that theorem proving tools
Abstract: This thesis explores the application of semi- are not widely used by non specialists, in contrast to
supervised learning to biomedical information extrac- computer algebra packages which also deal with the
tion. The latter has emerged in recent years as a chal- manipulation of symbolic mathematics. The work delenging application domain for natural language pro- scribed in this dissertation addresses one aspect of this
cessing techniques. The challenge stems partly from the problem, that of heuristic selection in automated thelack of appropriate resources that can be used as la- orem provers. In theory such theorem provers should
beled training data. Therefore, we choose to focus on be automatic and therefore easy to use; in practice the
semi-supervised learning techniques which enable us to heuristics used in the proof search are not universally
take advantage of human supervision combined with optimal for all problems so human expertise is required
unlabeled data.
to determine heuristic choice and to set parameter valWe begin with a short introduction to biomedical ues.
information extraction and semi-supervised learning in
Modern machine learning has been applied to the
Chapter 1. Chapter 2 focuses on the task of biomedi- automation of heuristic selection in a first order logic
cal named entity recognition. Using raw abstracts and theorem prover. One objective was to find if there are
a dictionary of gene names we develop two systems any features of a proof problem that are both easy
for this task. Furthermore, we discuss annotation is- to measure and provide useful information for detersues and demonstrate how the performance can be im- mining heuristic choice. Another was to determine and
proved using user feedback in realistic conditions. In demonstrate a practical approach to making theorem
Chapter 3 we develop two biomedical event extrac- provers truly automatic.
tion systems: a rule-based one and a machine learning
In the experimental work, heuristic selection based
based one. The former needs only an annotated dic- on features of the conjecture to be proved and the
tionary and syntactic parsing as input, while the latter associated axioms is shown to do better than any
requires partial event annotation additionally. Both sys- single heuristic. Additionally a comparison has been
tems achieve performances comparable to systems uti- made between static features, measured prior to the
lizing fully annotated training data. Chapter 4 discusses proof search process, and dynamic features that meathe task of lexical-semantic clustering using Dirichlet sure changes arising in the early stages of proof search.
185
Further work was done on determining which features
are important, demonstrating that good results are obtained with only a few features required.
UCAM-CL-TR-794
Øistein E. Andersen:
Grammatical error prediction
UCAM-CL-TR-793
January 2011, 163 pages, PDF
Shazia Afzal:
PhD thesis (Girton College, 2010)
Affect inference in learning
environments: a functional view of
facial affect analysis using naturalistic
data
December 2010, 146 pages, PDF
PhD thesis (Murray Edwards College, May 2010)
Abstract: This research takes an application-oriented
stance on affective computing and addresses the problem of automatic affect inference within learning technologies. It draws from the growing understanding of
the centrality of emotion in the learning process and
the fact that, as yet, this crucial link is not addressed
in the design of learning technologies. This dissertation
specifically focuses on examining the utility of facial affect analysis to model the affective state of a learner in
a one-on-one learning setting.
Although facial affect analysis using posed or acted
data has been studied in great detail for a couple of
decades now, research using naturalistic data is still a
challenging problem. The challenges are derived from
the complexity in conceptualising affect, the methodological and technical difficulties in measuring it, and
the emergent ethical concerns in realising automatic
affect inference by computers. However, as the context of this research is derived from, and relates to, a
real-world application environment, it is based entirely
on naturalistic data. The whole pipeline – of identifying the requirements, to collection of data, to the development of an annotation protocol, to labelling of
data, and the eventual analyses – both quantitative and
qualitative; is described in this dissertation. In effect, a
framework for conducting research using natural data
is set out and the challenges encountered at each stage
identified.
Apart from the challenges associated with the perception and measurement of affect, this research emphasises that there are additional issues that require
due consideration by virtue of the application context.
As such, in light of the discussed observations and results, this research concludes that we need to understand the nature and expression of emotion in the context of technology use, and pursue creative exploration
of what is perhaps a qualitatively different form of emotion expression and communication.
Abstract: In this thesis, we investigate methods for automatic detection, and to some extent correction, of
grammatical errors. The evaluation is based on manual error annotation in the Cambridge Learner Corpus
(CLC), and automatic or semi-automatic annotation of
error corpora is one possible application, but the methods are also applicable in other settings, for instance to
give learners feedback on their writing or in a proofreading tool used to prepare texts for publication.
Apart from the CLC, we use the British National
Corpus (BNC) to get a better model of correct usage, WordNet for semantic relations, other machinereadable dictionaries for orthography/morphology, and
the Robust Accurate Statistical Parsing (RASP) system
to parse both the CLC and the BNC and thereby identify syntactic relations within the sentence. An ancillary
outcome of this is a syntactically annotated version of
the BNC, which we have made publicly available.
We present a tool called GenERRate, which can be
used to introduce errors into a corpus of correct text,
and evaluate to what extent the resulting synthetic error
corpus can complement or replace a real error corpus.
Different methods for detection and correction are
investigated, including: sentence-level binary classification based on machine learning over n-grams of words,
n-grams of part-of-speech tags and grammatical relations; automatic identification of features which are
highly indicative of individual errors; and development
of classifiers aimed more specifically at given error
types, for instance concord errors based on syntactic
structure and collocation errors based on co-occurrence
statistics from the BNC, using clustering to deal with
data sparseness. We show that such techniques can detect, and sometimes even correct, at least certain error
types as well as or better than human annotators.
We finally present an annotation experiment in
which a human annotator corrects and supplements the
automatic annotation, which confirms the high detection/correction accuracy of our system and furthermore
shows that such a hybrid set-up gives higher-quality
annotation with considerably less time and effort expended compared to fully manual annotation.
UCAM-CL-TR-795
Aurelie Herbelot:
Underspecified quantification
February 2011, 163 pages, PDF
PhD thesis (Trinity Hall, 2010)
186
Abstract: Many noun phrases in text are ambiguously
quantified: syntax doesn’t explicitly tell us whether they
refer to a single entity or to several and, in main clauses,
what portion of the set denoted by the subject Nbar actually takes part in the event expressed by the verb. For
instance, when we utter the sentence ‘Cats are mammals’, it is only world knowledge that allows our hearer
to infer that we mean ‘All cats are mammals’, and not
‘Some cats are mammals’. This ambiguity effect is interesting at several levels. Theoretically, it raises cognitive and linguistic questions. To what extent does syntax help humans resolve the ambiguity? What problemsolving skills come into play when syntax is insufficient
for full resolution? How does ambiguous quantification
relate to the phenomenon of genericity, as described by
the linguistic literature? From an engineering point of
view, the resolution of quantificational ambiguity is essential to the accuracy of some Natural Language Processing tasks.
We argue that the quantification ambiguity phenomenon can be described in terms of underspecification and propose a formalisation for what we call ‘underquantified’ subject noun phrases. Our formalisation
is motivated by inference requirements and covers all
cases of genericity.
Our approach is then empirically validated by human annotation experiments. We propose an annotation scheme that follows our theoretical claims with
regard to underquantification. Our annotation results
strengthen our claim that all noun phrases can be analysed in terms of quantification. The produced corpus
allows us to derive a gold standard for quantification
resolution experiments and is, as far as we are aware,
the first attempt to analyse the distribution of null
quantifiers in English.
We then create a baseline system for automatic
quantification resolution, using syntax to provide discriminating features for our classification. We show
that results are rather poor for certain classes and argue that some level of pragmatics is needed, in combination with syntax, to perform accurate resolution.
We explore the use of memory-based learning as a way
to approximate the problem-solving skills available to
humans at the level of pragmatic understanding.
Many solutions have been proposed to address this issue, ranging from explicit to implicit parallelism, but
consensus has yet to be reached on the best way to
tackle such a problem.
In this thesis we propose a profiling-based interactive approach to program parallelisation. Profilers
gather dependence information on a program, which
is then used to automatically parallelise the program
at source-level. The programmer can then examine the
resulting parallel program, and using critical path information from the profiler, identify and refactor parallelism bottlenecks to enable further parallelism. We
argue that this is an efficient and effective method of
parallelising general sequential programs.
Our first contribution is a comprehensive analysis of limits of parallelism in several benchmark programs, performed by constructing Dynamic Dependence Graphs (DDGs) from execution traces. We show
that average available parallelism is often high, but realising it would require various changes in compilation,
language or computation models. As an example, we
show how using a spaghetti stack structure can lead to
a doubling of potential parallelism.
The rest of our thesis demonstrates how some of
this potential parallelism can be realised under the popular fork-join parallelism model used by Cilk, TBB,
OpenMP and others. We present a tool-chain with two
main components: Embla 2, which uses DDGs from
profiled dependences to estimate the amount of tasklevel parallelism in programs; and Woolifier, a sourceto-source transformer that uses Embla 2’s output to
parallelise the programs. Using several case studies,
we demonstrate how this tool-chain greatly facilitates
program parallelisation by performing an automatic
best-effort parallelisation and presenting critical paths
in a concise graphical form so that the programmer
can quickly locate parallelism bottlenecks, which when
refactored can lead to even greater potential parallelism
and significant actual speed-ups (up to around 25 on a
32-effective-core machine).
UCAM-CL-TR-797
Boris Feigin:
Interpretational overhead in
system software
UCAM-CL-TR-796
Jonathan Mak:
April 2011, 116 pages, PDF
Facilitating program parallelisation:
a profiling-based approach
PhD thesis (Homerton College, September 2010)
March 2011, 120 pages, PDF
PhD thesis (St. John’s College, November 2010)
Abstract: The advance of multi-core architectures signals the end of universal speed-up of software over
time. To continue exploiting hardware developments,
effort must be invested in producing software that can
be split up to run on multiple cores or processors.
Abstract: Interpreting a program carries a runtime
penalty: the interpretational overhead. Traditionally, a
compiler removes interpretational overhead by sacrificing inessential details of program execution. However, a broad class of system software is based on nonstandard interpretation of machine code or a higherlevel language. For example, virtual machine monitors
emulate privileged instructions; program instrumentation is used to build dynamic call graphs by intercepting
187
function calls and returns; and dynamic software updating technology allows program code to be altered at
runtime. Many of these frameworks are performancesensitive and several efficiency requirements—both formal and informal—have been put forward over the
last four decades. Largely independently, the concept
of interpretational overhead received much attention in
the partial evaluation (“program specialization”) literature. This dissertation contributes a unifying understanding of efficiency and interpretational overhead in
system software.
Starting from the observation that a virtual machine
monitor is a self-interpreter for machine code, our first
contribution is to reconcile the definition of efficient
virtualization due to Popek and Goldberg with Jones
optimality, a measure of the strength of program specializers. We also present a rational reconstruction of
hardware virtualization support (“trap-and-emulate”)
from context-threaded interpretation, a technique for
implementing fast interpreters due to Berndl et al.
As a form of augmented execution, virtualization
shares many similarities with program instrumentation.
Although several low-overhead instrumentation frameworks are available on today’s hardware, there has been
no formal understanding of what it means for instrumentation to be efficient. Our second contribution is
a definition of efficiency for program instrumentation
in the spirit of Popek and Goldberg’s work. Instrumentation also incurs an implicit overhead because instrumentation code needs access to intermediate execution states and this is antagonistic to optimization.
The third contribution is to use partial equivalence
relations (PERs) to express the dependence of instrumentation on execution state, enabling an instrumentation/optimization trade-off. Since program instrumentation, applied at runtime, constitutes a kind of dynamic software update, we can similarly restrict allowable future updates to be consistent with existing optimizations. Finally, treating “old” and “new” code in
a dynamically-updatable program as being written in
different languages permits a semantic explanation of a
safety rule that was originally introduced as a syntactic
check.
in the near future; thus, solutions for future and existing C and C++ code are needed.
Despite considerable prior research, memory-safety
problems in C and C++ programs persist because the
existing proposals that are practical enough for production use cannot offer adequate protection, while
comprehensive proposals are either too slow for practical use, or break backwards compatibility by requiring
significant porting or generating binary-incompatible
code.
To enable practical protection against memorycorruption attacks and operating system crashes, I designed new integrity properties preventing dangerous
memory corruption at low cost instead of enforcing
strict memory safety to catch every memory error at
high cost. Then, at the implementation level, I aggressively optimised for the common case, and streamlined
execution by modifying memory layouts as far as allowed without breaking binary compatibility.
I developed three compiler-based tools for analysing
and instrumenting unmodified source code to automatically generate binaries hardened against memory errors: BBC and WIT to harden user-space C programs,
and BGI to harden and to isolate Microsoft Windows
kernel extensions. The generated code incurs low performance overhead and is binary-compatible with uninstrumented code. BBC offers strong protection with
lower overhead than previously possible for its level of
protection; WIT further lowers overhead while offering
stronger protection than previous solutions of similar
performance; and BGI improves backwards compatibility and performance over previous proposals, making kernel extension isolation practical for commodity
systems.
UCAM-CL-TR-799
Thomas Tuerk:
A separation logic framework for
HOL
June 2011, 271 pages, PDF
PhD thesis (Downing College, December 2010)
UCAM-CL-TR-798
Periklis Akritidis:
Practical memory safety for C
June 2011, 136 pages, PDF
PhD thesis (Wolfson College, May 2010)
Abstract: Copious amounts of high-performance and
low-level systems code are written in memory-unsafe
languages such as C and C++. Unfortunately, the lack
of memory safety undermines security and reliability;
for example, memory-corruption bugs in programs can
breach security, and faults in kernel extensions can
bring down the entire operating system. Memory-safe
languages, however, are unlikely to displace C and C++
Abstract: Separation logic is an extension of Hoare
logic due to O’Hearn and Reynolds. It was designed
for reasoning about mutable data structures. Because
separation logic supports local reasoning, it scales better than classical Hoare logic and can easily be used to
reason about concurrency. There are automated separation logic tools as well as several formalisations in
interactive theorem provers. Typically, the automated
separation logic tools are able to reason about shallow properties of large programs. They usually consider just the shape of data structures, not their datacontent. The formalisations inside theorem provers can
be used to prove interesting, deep properties. However,
they typically lack automation. Another shortcoming is
188
that there are a lot of slightly different separation logics. For each programming language and each interesting property a new kind of separation logic seems to be
invented.
In this thesis, a general framework for separation
logic is developed inside the HOL4 theorem prover.
This framework is based on Abstract Separation Logic,
an abstract, high level variant of separation logic. Abstract Separation Logic is a general separation logic
such that many other separation logics can be based
on it. This framework is instantiatiated in a first step to
support a stack with read and write permissions following ideas of Parkinson, Bornat and Calcagno. Finally,
the framework is further instantiated to build a separation logic tool called Holfoot. It is similar to the tool
Smallfoot, but extends it from reasoning about shape
properties to fully functional specifications.
To my knowledge this work presents the first formalisation of Abstract Separation Logic inside a theorem prover. By building Holfoot on top of this formalisation, I could demonstrate that Abstract Separation Logic can be used as a basis for realistic separation
logic tools. Moreover, this work demonstrates that it is
feasable to implement such separation logic tools inside
a theorem prover. Holfoot is highly automated. It can
verify Smallfoot examples automatically inside HOL4.
Moreover, Holfoot can use the full power of HOL4.
This allows Holfoot to verify fully functional specifications. Simple fully functional specifications can be handled automatically using HOL4’s tools and libraries or
external SMT solvers. More complicated ones can be
handled using interactive proofs inside HOL4. In contrast, most other separation logic tools can reason just
about the shape of data structures. Others reason only
about data properties that can be solved using SMT
solvers.
UCAM-CL-TR-800
James R. Srinivasan:
Improving cache utilisation
June 2011, 184 pages, PDF
PhD thesis (Jesus College, April 2011)
Abstract: Microprocessors have long employed caches
to help hide the increasing latency of accessing main
memory. The vast majority of previous research has focussed on increasing cache hit rates to improve cache
performance, while lately decreasing power consumption has become an equally important issue. This thesis
examines the lifetime of cache lines in the memory hierarchy, considering whether they are live (will be referenced again before eviction) or dead (will not be referenced again before eviction). Using these two states, the
cache utilisation (proportion of the cache which will be
referenced again) can be calculated.
This thesis demonstrates that cache utilisation is
relatively poor over a wide range of benchmarks and
cache configurations. By focussing on techniques to improve cache utilisation, cache hit rates are increased
while overall power consumption may also be decreased.
Key to improving cache utilisation is an accurate
predictor of the state of a cache line. This thesis
presents a variety of such predictors, mostly based
upon the mature field of branch prediction, and compares them against previously proposed predictors. The
most appropriate predictors are then demonstrated in
two applications: Improving victim cache performance
through filtering, and reducing cache pollution during
aggressive prefetching
These applications are primarily concerned with improving cache performance and are analysed using a detailed microprocessor simulator. Related applications,
including decreasing power consumption, are also discussed, as are the applicability of these techniques to
multiprogrammed and multiprocessor systems.
UCAM-CL-TR-801
Amitabha Roy:
Software lock elision for x86 machine
code
July 2011, 154 pages, PDF
PhD thesis (Emmanuel College, April 2011)
Abstract: More than a decade after becoming a topic of
intense research there is no transactional memory hardware nor any examples of software transactional memory use outside the research community. Using software
transactional memory in large pieces of software needs
copious source code annotations and often means that
standard compilers and debuggers can no longer be
used. At the same time, overheads associated with software transactional memory fail to motivate programmers to expend the needed effort to use software transactional memory. The only way around the overheads
in the case of general unmanaged code is the anticipated
availability of hardware support. On the other hand,
architects are unwilling to devote power and area budgets in mainstream microprocessors to hardware transactional memory, pointing to transactional memory being a “niche” programming construct. A deadlock has
thus ensued that is blocking transactional memory use
and experimentation in the mainstream.
This dissertation covers the design and construction
of a software transactional memory runtime system
called SLE x86 that can potentially break this deadlock by decoupling transactional memory from programs using it. Unlike most other STM designs, the
core design principle is transparency rather than performance. SLE x86 operates at the level of x86 machine code, thereby becoming immediately applicable
to binaries for the popular x86 architecture. The only
requirement is that the binary synchronise using known
locking constructs or calls such as those in Pthreads or
189
OpenMP libraries. SLE x86 provides speculative lock
elision (SLE) entirely in software, executing critical sections in the binary using transactional memory. Optionally, the critical sections can also be executed without
using transactions by acquiring the protecting lock.
The dissertation makes a careful analysis of the impact on performance due to the demands of the x86
memory consistency model and the need to transparently instrument x86 machine code. It shows that both
of these problems can be overcome to reach a reasonable level of performance, where transparent software
transactional memory can perform better than a lock.
SLE x86 can ensure that programs are ready for transactional memory in any form, without being explicitly
written for it.
annotators revealed two groups – hunters and gatherers – who differ clearly in the structure and size of the
clusters they created (chapter 6).
On the basis of the evaluation strategy the parameters for sentence clustering and LSA are optimized
(chapter 7). A final experiment in which the performance of LSA in sentence clustering for MDS is compared to the simple word matching approach of the traditional Vector Space Model (VSM) revealed that LSA
produces better quality sentence clusters for MDS than
VSM.
UCAM-CL-TR-803
Ekaterina V. Shutova:
Computational approaches to
figurative language
UCAM-CL-TR-802
Johanna Geiß:
August 2011, 219 pages, PDF
Latent semantic sentence clustering
for multi-document summarization
PhD thesis (Pembroke College, March 2011)
July 2011, 156 pages, PDF
PhD thesis (St. Edmund’s College, April 2011)
Abstract: This thesis investigates the applicability of Latent Semantic Analysis (LSA) to sentence clustering for
Multi-Document Summarization (MDS). In contrast to
more shallow approaches like measuring similarity of
sentences by word overlap in a traditional vector space
model, LSA takes word usage patterns into account. So
far LSA has been successfully applied to different Information Retrieval (IR) tasks like information filtering and document classification (Dumais, 2004). In the
course of this research, different parameters essential to
sentence clustering using a hierarchical agglomerative
clustering algorithm (HAC) in general and in combination with LSA in particular are investigated. These parameters include, inter alia, information about the type
of vocabulary, the size of the semantic space and the optimal numbers of dimensions to be used in LSA. These
parameters have not previously been studied and evaluated in combination with sentence clustering (chapter
4).
This thesis also presents the first gold standard for
sentence clustering in MDS. To be able to evaluate
sentence clusterings directly and classify the influence
of the different parameters on the quality of sentence
clustering, an evaluation strategy is developed that includes gold standard comparison using different evaluation measures (chapter 5). Therefore the first compound gold standard for sentence clustering was created. Several human annotators were asked to group
similar sentences into clusters following guidelines created for this purpose (section 5.4). The evaluation of
the human generated clusterings revealed that the human annotators agreed on clustering sentences above
chance. Analysis of the strategies adopted by the human
Abstract: The use of figurative language is ubiquitous
in natural language text and it is a serious bottleneck in
automatic text understanding. A system capable of interpreting figurative language would be extremely beneficial to a wide range of practical NLP applications.
The main focus of this thesis is on the phenomenon of
metaphor. I adopt a statistical data-driven approach to
its modelling, and create the first open-domain system
for metaphor identification and interpretation in unrestricted text. In order to verify that similar methods can
be applied to modelling other types of figurative language, I then extend this work to the task of interpretation of logical metonymy.
The metaphor interpretation system is capable of
discovering literal meanings of metaphorical expressions in text. For the metaphors in the examples “All
of this stirred an unfathomable excitement in her” or
“a carelessly leaked report” the system produces interpretations “All of this provoked an unfathomable excitement in her” and “a carelessly disclosed report” respectively. It runs on unrestricted text and to my knowledge is the only existing robust metaphor paraphrasing
system. It does not employ any hand-coded knowledge,
but instead derives metaphorical interpretations from
a large text corpus using statistical pattern-processing.
The system was evaluated with the aid of human judges
and it operates with the accuracy of 81%.
The metaphor identification system automatically
traces the analogies involved in the production of a
particular metaphorical expression in a minimally supervised way. The system generalises over the analogies by means of verb and noun clustering, i.e. identification of groups of similar concepts. This generalisation makes it capable of recognising previously unseen
metaphorical expressions in text, e.g. having once seen
a metaphor ‘stir excitement’ the system concludes that
‘swallow anger’ is also used metaphorically. The system
identifies metaphorical expressions with a high precision of 79%.
190
The logical metonymy processing system produces
a list of metonymic interpretations disambiguated with
respect to their word sense. It then automatically organises them into a novel class-based model of logical
metonymy inspired by both empirical evidence and linguistic theory. This model provides more accurate and
generalised information about possible interpretations
of metonymic phrases than previous approaches.
UCAM-CL-TR-804
Sean B. Holden:
The HasGP user manual
September 2011, 18 pages, PDF
Abstract: HasGP is an experimental library implementing methods for supervised learning using Gaussian
process (GP) inference, in both the regression and
classification settings. It has been developed in the
functional language Haskell as an investigation into
whether the well-known advantages of the functional
paradigm can be exploited in the field of machine
learning, which traditionally has been dominated by
the procedural/object-oriented approach, particularly
involving C/C++ and Matlab. HasGP is open-source
software released under the GPL3 license. This manual
provides a short introduction on how install the library,
and how to apply it to supervised learning problems. It
also provides some more in-depth information on the
implementation of the library, which is aimed at developers. In the latter, we also show how some of the
specific functional features of Haskell, in particular the
ability to treat functions as first-class objects, and the
use of typeclasses and monads, have informed the design of the library. This manual applies to HasGP version 0.1, which is the initial release of the library.
change this balance and apportion energy costs to those
who cause them to be incurred. This dissertation explores how sensor systems installed in many buildings
today can be used to apportion energy consumption
between users, including an evaluation of a range of
strategies in a case study and elaboration of the overriding principles that are generally applicable. It also
shows how second-order estimators combined with location data can provide a proxy for fine-grained sensing.
A key ingredient for apportionment mechanisms is
data on energy usage. This may come from metering devices or buildings directly, or from profiling devices and
using secondary indicators to infer their power state. A
mechanism for profiling devices to determine the energy costs of specific activities, particularly applicable
to shared programmable devices is presented which can
make this process simpler and more accurate. By combining crowd-sourced building-inventory information
and a simple building energy model it is possible to estimate an individual’s energy use disaggregated by device
class with very little direct sensing.
Contextual information provides crucial cues for
apportioning the use and energy costs of resources, and
one of the most valuable sources from which to infer
context is location. A key ingredient for a personal energy meter is a low cost, low infrastructure location
system that can be deployed on a truly global scale.
This dissertation presents a description and evaluation
of the new concept of inquiry-free Bluetooth tracking
that has the potential to offer indoor location information with significantly less infrastructure and calibration than other systems.
Finally, a suitable architecture for a personal energy
meter on a global scale is demonstrated using a mobile
phone application to aggregate energy feeds based on
the case studies and technologies developed.
UCAM-CL-TR-806
UCAM-CL-TR-805
Damien Fay, Jérôme Kunegis, Eiko Yoneki:
Simon Hay:
A model personal energy meter
On joint diagonalisation for
dynamic network analysis
September 2011, 207 pages, PDF
October 2011, 12 pages, PDF
PhD thesis (Girton College, August 2011)
Abstract: Every day each of us consumes a significant amount of energy, both directly through transport,
heating and use of appliances, and indirectly from our
needs for the production of food, manufacture of goods
and provision of services.
This dissertation investigates a personal energy meter which can record and apportion an individual’s energy usage in order to supply baseline information and
incentives for reducing our environmental impact.
If the energy costs of large shared resources are split
evenly without regard for individual consumption each
person minimises his own losses by taking advantage
of others. Context awareness offers the potential to
Abstract: Joint diagonalisation (JD) is a technique used
to estimate an average eigenspace of a set of matrices.
Whilst it has been used successfully in many areas to
track the evolution of systems via their eigenvectors; its
application in network analysis is novel. The key focus
in this paper is the use of JD on matrices of spanning
trees of a network. This is especially useful in the case of
real-world contact networks in which a single underlying static graph does not exist. The average eigenspace
may be used to construct a graph which represents the
‘average spanning tree’ of the network or a representation of the most common propagation paths. We then
examine the distribution of deviations from the average and find that this distribution in real-world contact
191
networks is multi-modal; thus indicating several modes the second-order universe. Moreover, we define a noin the underlying network. These modes are identified tion of translation homomorphism that allows us to esand are found to correspond to particular times. Thus tablish a 2-categorical type theory correspondence.
JD may be used to decompose the behaviour, in time, of
contact networks and produce average static graphs for
UCAM-CL-TR-808
each time. This may be viewed as a mixture between a
dynamic and static graph approach to contact network Matko Botinčan, Mike Dodds,
analysis.
Suresh Jagannathan:
Resource-sensitive synchronisation
inference by abduction
UCAM-CL-TR-807
Ola Mahmoud:
January 2012, 57 pages, PDF
Second-order algebraic theories
October 2011, 133 pages, PDF
PhD thesis (Clare Hall, March 2011)
Abstract: Second-order universal algebra and secondorder equational logic respectively provide a model
theory and a formal deductive system for languages
with variable binding and parameterised metavariables.
This dissertation completes the algebraic foundations
of second-order languages from the viewpoint of categorical algebra.
In particular, the dissertation introduces the notion
of second-order algebraic theory. A main role in the definition is played by the second-order theory of equality M, representing the most elementary operators and
equations present in every second-order language. We
show that M can be described abstractly via the universal property of being the free cartesian category on an
exponentiable object. Thereby, in the tradition of categorical algebra, a second-order algebraic theory consists of a cartesian category TH and a strict cartesian
identity-on-objects functor from M to TH that preserves the universal exponentiable object of M.
At the syntactic level, we establish the correctness
of our definition by showing a categorical equivalence between second-order equational presentations
and second-order algebraic theories. This equivalence,
referred to as the Second-Order Syntactic Categorical Type Theory Correspondence, involves distilling a
notion of syntactic translation between second-order
equational presentations that corresponds to the canonical notion of morphism between second-order algebraic theories. Syntactic translations provide a mathematical formalisation of notions such as encodings and
transforms for second-order languages.
On top of the aforementioned syntactic correspondence, we furthermore establish the Second-Order Semantic Categorical Type Theory Correspondence. This
involves generalising Lawvere’s notion of functorial
model of algebraic theories to the second-order setting.
By this semantic correspondence, second-order functorial semantics is shown to correspond to the model theory of second-order universal algebra.
We finally show that the core of the theory surrounding Lawvere theories generalises to the second order as well. Instances of this development are the existence of algebraic functors and monad morphisms in
Abstract: We present an analysis which takes as its
input a sequential program, augmented with annotations indicating potential parallelization opportunities,
and a sequential proof, written in separation logic, and
produces a correctly-synchronized parallelized program
and proof of that program. Unlike previous work, ours
is not an independence analysis; we insert synchronization constructs to preserve relevant dependencies found
in the sequential program that may otherwise be violated by a naı̈ve translation. Separation logic allows
us to parallelize fine-grained patterns of resource-usage,
moving beyond straightforward points-to analysis.
Our analysis works by using the sequential proof
to discover dependencies between different parts of the
program. It leverages these discovered dependencies to
guide the insertion of synchronization primitives into
the parallelized program, and ensure that the resulting
parallelized program satisfies the same specification as
the original sequential program. Our analysis is built
using frame inference and abduction, two techniques
supported by an increasing number of separation logic
tools.
UCAM-CL-TR-809
John L. Miller:
Distributed virtual environment
scalability and security
October 2011, 98 pages, PDF
PhD thesis (Hughes Hall, October 2011)
Abstract: Distributed virtual environments (DVEs)
have been an active area of research and engineering for
more than 20 years. The most widely deployed DVEs
are network games such as Quake, Halo, and World of
Warcraft (WoW), with millions of users and billions of
dollars in annual revenue. Deployed DVEs remain expensive centralized implementations despite significant
research outlining ways to distribute DVE workloads.
This dissertation shows previous DVE research evaluations are inconsistent with deployed DVE needs. Assumptions about avatar movement and proximity –
fundamental scale factors – do not match WoW’s workload, and likely the workload of other deployed DVEs.
192
Alternate workload models are explored and preliminary conclusions presented. Using realistic workloads
it is shown that a fully decentralized DVE cannot be
deployed to today’s consumers, regardless of its overhead.
Residential broadband speeds are improving, and
this limitation will eventually disappear. When it does,
appropriate security mechanisms will be a fundamental
requirement for technology adoption.
A trusted auditing system (“Carbon”) is presented
which has good security, scalability, and resource characteristics for decentralized DVEs. When performing
exhaustive auditing, Carbon adds 27% network overhead to a decentralized DVE with a WoW-like workload. This resource consumption can be reduced significantly, depending upon the DVE’s risk tolerance. Finally, the Pairwise Random Protocol (PRP) is described.
PRP enables adversaries to fairly resolve probabilistic
activities, an ability missing from most decentralized
DVE security proposals.
Thus, this dissertation’s contribution is to address
two of the obstacles for deploying research on decentralized DVE architectures. First, lack of evidence that
research results apply to existing DVEs. Second, the
lack of security systems combining appropriate security
guarantees with acceptable overhead.
UCAM-CL-TR-810
Nick Barrow-Williams:
Proximity Coherence for
chip-multiprocessors
November 2011, 164 pages, PDF
PhD thesis (Trinity Hall, January 2011)
Abstract: Many-core architectures provide an efficient
way of harnessing the growing numbers of transistors
available in modern fabrication processes; however, the
parallel programs run on these platforms are increasingly limited by the energy and latency costs of communication. Existing designs provide a functional communication layer but do not necessarily implement the
most efficient solution for chip-multiprocessors, placing
limits on the performance of these complex systems. In
an era of increasingly power limited silicon design, efficiency is now a primary concern that motivates designers to look again at the challenge of cache coherence.
The first step in the design process is to analyse the
communication behaviour of parallel benchmark suites
such as Parsec and SPLASH-2. This thesis presents
work detailing the sharing patterns observed when running the full benchmarks on a simulated 32-core x86
machine. The results reveal considerable locality of
shared data accesses between threads with consecutive
operating system assigned thread IDs. This pattern, although of little consequence in a multi-node system,
corresponds to strong physical locality of shared data
between adjacent cores on a chip-multiprocessor platform.
Traditional cache coherence protocols, although often used in chip-multiprocessor designs, have been developed in the context of older multi-node systems. By
redesigning coherence protocols to exploit new patterns
such as the physical locality of shared data, improving
the efficiency of communication, specifically in chipmultiprocessors, is possible. This thesis explores such
a design – Proximity Coherence – a novel scheme in
which L1 load misses are optimistically forwarded to
nearby caches via new dedicated links rather than always being indirected via a directory structure.
UCAM-CL-TR-811
A. Theodore Markettos:
Active electromagnetic attacks on
secure hardware
December 2011, 217 pages, PDF
PhD thesis (Clare Hall, March 2010)
Abstract: The field of side-channel attacks on cryptographic hardware has been extensively studied. In many
cases it is easier to derive the secret key from these attacks than to break the cryptography itself. One such
sidechannel attack is the electromagnetic side-channel
attack, giving rise to electromagnetic analysis (EMA).
EMA, when otherwise known as ‘TEMPEST’ or
‘compromising emanations’, has a long history in the
military context over almost the whole of the twentieth century. The US military also mention three related
attacks, believed to be: HIJACK (modulation of secret
data onto conducted signals), NONSTOP (modulation
of secret data onto radiated signals) and TEAPOT (intentional malicious emissions).
In this thesis I perform a fusion of TEAPOT and HIJACK/NONSTOP techniques on secure integrated circuits. An attacker is able to introduce one or more frequencies into a cryptographic system with the intention
of forcing it to misbehave or to radiate secrets.
I demonstrate two approaches to this attack:
To perform the reception, I assess a variety of electromagnetic sensors to perform EMA. I choose an inductive hard drive head and a metal foil electric field
sensor to measure near-field EM emissions.
The first approach, named the re-emission attack,
injects frequencies into the power supply of a device to
cause it to modulate up baseband signals. In this way
I detect data-dependent timing from a ‘secure’ microcontroller. Such up-conversion enables a more compact
and more distant receiving antenna.
The second approach involves injecting one or more
frequencies into the power supply of a random number
generator that uses jitter of ring oscillators as its random number source. I am able to force injection locking of the oscillators, greatly diminishing the entropy
available.
I demonstrate this with the random number generators on two commercial devices. I cause a 2004 EMV
193
banking smartcard to fail statistical test suites by generating a periodicity. For a secure 8-bit microcontroller
that has been used in banking ATMs, I am able to reduce the random number entropy from 232 to 225. This
enables a 50% probability of a successful attack on
cash withdrawal in 15 attempts.
UCAM-CL-TR-812
Pedro Brandão:
Abstracting information on body area
networks
January 2012, 144 pages, PDF
PhD thesis (Magdalene College, July 2011)
Abstract: Healthcare is changing, correction, healthcare
is in need of change. The population ageing, the increase in chronic and heart diseases and just the increase in population size will overwhelm the current
hospital-centric healthcare.
There is a growing interest by individuals to monitor their own physiology. Not only for sport activities,
but also to control their own diseases. They are changing from the passive healthcare receiver to a proactive
self-healthcare taker. The focus is shifting from hospital
centred treatment to a patient-centric healthcare monitoring.
Continuous, everyday, wearable monitoring and actuating is part of this change. In this setting, sensors that monitor the heart, blood pressure, movement, brain activity, dopamine levels, and actuators
that pump insulin, ‘pump’ the heart, deliver drugs to
specific organs, stimulate the brain are needed as pervasive components in and on the body. They will tend for
people’s need of self-monitoring and facilitate healthcare delivery.
These components around a human body that communicate to sense and act in a coordinated fashion
make a Body Area Network (BAN). In most cases, and
in our view, a central, more powerful component will
act as the coordinator of this network. These networks
aim to augment the power to monitor the human body
and react to problems discovered with this observation.
One key advantage of this system is their overarching
view of the whole network. That is, the central component can have an understanding of all the monitored
signals and correlate them to better evaluate and react
to problems. This is the focus of our thesis.
In this document we argue that this multi-parameter
correlation of the heterogeneous sensed information is
not being handled in BANs. The current view depends
exclusively on the application that is using the network
and its understanding of the parameters. This means
that every application will oversee the BAN’s heterogeneous resources managing them directly without taking
into consideration other applications, their needs and
knowledge.
There are several physiological correlations already
known by the medical field. Correlating blood pressure
and cross sectional area of blood vessels to calculate
blood velocity, estimating oxygen delivery from cardiac
output and oxygen saturation, are such examples. This
knowledge should be available in a BAN and shared
by the several applications that make use of the network. This architecture implies a central component
that manages the knowledge and the resources. And
this is, in our view, missing in BANs.
Our proposal is a middleware layer that abstracts
the underlying BAN’s resources to the application, providing instead an information model to be queried. The
model describes the correlations for producing new information that the middleware knows about. Naturally,
the raw sensed data is also part of the model. The middleware hides the specificities of the nodes that constitute the BAN, by making available their sensed production. Applications are able to query for information attaching requirements to these requests. The middleware
is then responsible for satisfying the requests while optimising the resource usage of the BAN.
Our architecture proposal is divided in two corresponding layers, one that abstracts the nodes’ hardware
(hiding node’s particularities) and the information layer
that describes information available and how it is correlated. A prototype implementation of the architecture
was done to illustrate the concept.
UCAM-CL-TR-813
Andrew B. Lewis:
Reconstructing compressed photo
and video data
February 2012, 148 pages, PDF
PhD thesis (Trinity College, June 2011)
Abstract: Forensic investigators sometimes need to verify the integrity and processing history of digital photos and videos. The multitude of storage formats and
devices they need to access also presents a challenge for
evidence recovery. This thesis explores how visual data
files can be recovered and analysed in scenarios where
they have been stored in the JPEG or H.264 (MPEG-4
AVC) compression formats.
My techniques make use of low-level details of lossy
compression algorithms in order to tell whether a file
under consideration might have been tampered with.
I also show that limitations of entropy coding sometimes allow us to recover intact files from storage devices, even in the absence of filesystem and container
metadata.
I first show that it is possible to embed an imperceptible message within a uniform region of a JPEG image such that the message becomes clearly visible when
the image is recompressed at a particular quality factor, providing a visual warning that recompression has
taken place.
I then use a precise model of the computations involved in JPEG decompression to build a specialised
194
compressor, designed to invert the computations of
the decompressor. This recompressor recovers the compressed bitstreams that produce a given decompression
result, and, as a side-effect, indicates any regions of the
input which are inconsistent with JPEG decompression.
I demonstrate the algorithm on a large database of images, and show that it can detect modifications to decompressed image regions.
Finally, I show how to rebuild fragmented compressed bitstreams, given a syntax description that includes information about syntax errors, and demonstrate its applicability to H.264/AVC Baseline profile
video data in memory dumps with randomly shuffled
blocks.
UCAM-CL-TR-814
Arjuna Sathiaseelan, Jon Crowcroft:
The free Internet: a distant mirage or
near reality?
February 2012, 10 pages, PDF
Abstract: Through this short position paper, we hope
to convey our thoughts on the need for free Internet
access and describe possible ways of achieving this –
hoping this stimulates a useful discussion.
UCAM-CL-TR-815
Christian Richardt:
Colour videos with depth:
acquisition, processing and
evaluation
March 2012, 132 pages, PDF
PhD thesis (Gonville & Caius College, November
2011)
Abstract: The human visual system lets us perceive the
world around us in three dimensions by integrating evidence from depth cues into a coherent visual model
of the world. The equivalent in computer vision and
computer graphics are geometric models, which provide a wealth of information about represented objects,
such as depth and surface normals. Videos do not contain this information, but only provide per-pixel colour
information. In this dissertation, I hence investigate a
combination of videos and geometric models: videos
with per-pixel depth (also known as RGBZ videos). I
consider the full life cycle of these videos: from their
acquisition, via filtering and processing, to stereoscopic
display.
UCAM-CL-TR-816
Jean E. Martina:
Verification of security protocols
based on multicast communication
March 2012, 150 pages, PDF
PhD thesis (Clare College, February 2011)
Abstract: Over an insecure network, agents need means
to communicate securely. These means are often called
security protocols. Security protocols, although constructed through the arrangement of simple security
blocks, normally yield complex goals. They seem simple at a first glance, but hide subtleties that allow them
to be exploited.
One way of trying to systematically capture such
subtleties is through the use of formal methods. The
maturity of some methods for protocol verification is a
fact today. But these methods are still not able to capture the whole set of security protocols being designed.
With the convergence to an online world, new security goals are proposed and new protocols need to be
designed. The evolution of formal verification methods
becomes a necessity to keep pace with this ongoing development.
This thesis covers the Inductive Method and its extensions. The Inductive Method is a formalism to specify and verify security protocols based on structural induction and higher-order logic proofs. This account of
our extensions enables the Inductive Method to reason
about non-Unicast communication and threshold cryptography.
We developed a new set of theories capable of representing the entire set of known message casting frameworks. Our theories enable the Inductive Method to
reason about a whole new set of protocols. We also
specified a basic abstraction of threshold cryptography
as a way of proving the extensibility of the method to
new cryptographic primitives. We showed the feasibility of our specifications by revisiting a classic protocol,
now verified under our framework. Secrecy verification
under a mixed environment of Multicast and Unicast
was also done for a Byzantine security protocol.
UCAM-CL-TR-817
Joseph Bonneau, Cormac Herley,
Paul C. van Oorschot, Frank Stajano:
The quest to replace passwords:
a framework for comparative
evaluation of Web authentication
schemes
March 2012, 32 pages, PDF
195
Abstract: We evaluate two decades of proposals to replace text passwords for general-purpose user authentication on the web using a broad set of twenty-five usability, deployability and security benefits that an ideal
scheme might provide. The scope of proposals we survey is also extensive, including password management
software, federated login protocols, graphical password
schemes, cognitive authentication schemes, one-time
passwords, hardware tokens, phone-aided schemes and
biometrics. Our comprehensive approach leads to key
insights about the difficulty of replacing passwords.
Not only does no known scheme come close to providing all desired benefits: none even retains the full set
of benefits which legacy passwords already provide. In
particular, there is a wide range between schemes offering minor security benefits beyond legacy passwords,
to those offering significant security benefits in return
for being more costly to deploy or difficult to use. We
conclude that many academic proposals have failed to
gain traction because researchers rarely consider a sufficiently wide range of real-world constraints. Beyond
our analysis of current schemes, our framework provides an evaluation methodology and benchmark for
future web authentication proposals.
This report is an extended version of the peerreviewed paper by the same name. In about twice as
many pages it gives full ratings for 35 authentication
schemes rather than just 9.
UCAM-CL-TR-818
The MAC Framework, a research project I began
before starting my PhD, allows policy modules to dynamically extend the kernel access control policy. The
framework allows policies to integrate tightly with kernel synchronisation, avoiding race conditions inherent
to system call interposition, as well as offering reduced
development and technology transfer costs for new security policies. Over two chapters, I explore the framework itself, and its transfer to and use in several products: the open source FreeBSD operating system, nCircle’s enforcement appliances, and Apple’s Mac OS X
and iOS operating systems.
Capsicum is a new application-centric capability security model extending POSIX. Capsicum targets application writers rather than system designers, reflecting a trend towards security-aware applications such as
Google’s Chromium web browser, that map distributed
security policies into often inadequate local primitives. I
compare Capsicum with other sandboxing techniques,
demonstrating improved performance, programmability, and security.
This dissertation makes original contributions to
challenging research problems in security and operating
system design. Portions of this research have already
had a significant impact on industry practice.
UCAM-CL-TR-819
Joseph Bonneau:
Guessing human-chosen secrets
Robert N. M. Watson:
May 2012, 161 pages, PDF
New approaches to operating system
security extensibility
PhD thesis (Churchill College, May 2012)
April 2012, 184 pages, PDF
PhD thesis (Wolfson College, October 2010)
Abstract: This dissertation proposes new approaches
to commodity computer operating system (OS) access control extensibility that address historic problems
with concurrency and technology transfer. Access control extensibility addresses a lack of consensus on operating system policy model at a time when security requirements are in flux: OS vendors, anti-virus companies, firewall manufacturers, smart phone developers,
and application writers require new tools to express
policies tailored to their needs. By proposing principled
approaches to access control extensibility, this work allows OS security to be “designed in” yet remain flexible
in the face of diverse and changing requirements.
I begin by analysing system call interposition, a
popular extension technology used in security research
and products, and reveal fundamental and readily exploited concurrency vulnerabilities. Motivated by these
failures, I propose two security extension models: the
TrustedBSD Mandatory Access Control (MAC) Framework, a flexible kernel access control extension framework for the FreeBSD kernel, and Capsicum, practical
capabilities for UNIX.
Abstract: Authenticating humans to computers remains
a notable weak point in computer security despite
decades of effort. Although the security research community has explored dozens of proposals for replacing
or strengthening passwords, they appear likely to remain entrenched as the standard mechanism of humancomputer authentication on the Internet for years to
come. Even in the optimistic scenario of eliminating
passwords from most of today’s authentication protocols using trusted hardware devices or trusted servers to
perform federated authentication, passwords will persist as a means of “last-mile” authentication between
humans and these trusted single sign-on deputies.
This dissertation studies the difficulty of guessing
human-chosen secrets, introducing a sound mathematical framework modeling human choice as a skewed
probability distribution. We introduce a new metric,
alpha-guesswork, which can accurately model the resistance of a distribution against all possible guessing
attacks. We also study the statistical challenges of estimating this metric using empirical data sets which can
be modeled as a large random sample from the underlying probability distribution.
This framework is then used to evaluate several
representative data sets from the most important categories of human-chosen secrets to provide reliable estimates of security against guessing attacks. This includes
196
collecting the largest-ever corpus of user-chosen passwords, with nearly 70 million, the largest list of human
names ever assembled for research, the largest data sets
of real answers to personal knowledge questions and
the first data published about human choice of banking
PINs. This data provides reliable numbers for designing
security systems and highlights universal limitations of
human-chosen secrets.
UCAM-CL-TR-820
Eiko Yoneki, Amitabha Roy:
A unified graph query layer for
multiple databases
August 2012, 22 pages, PDF
Abstract: There is increasing demand to store and query
data with an inherent graph structure. Examples of
such data include those from online social networks,
the semantic web and from navigational queries on spatial data such as maps. Unfortunately, traditional relational databases have fallen short where such graph
structured data is concerned. This has led to the development of specialised graph databases such as Neo4j.
However, traditional databases continue to have a wide
usage base and have desirable properties such as the
capacity to support a high volume of transactions
while offering ACID semantics. In this paper we argue that it is in fact possible to unify different database
paradigms together in the case of graph structured data
through the use of a common query language and data
loader that we have named Crackle (a wordplay on
Gra[ph]QL). Crackle provides an expressive and powerful query library in Clojure (a functional LISP dialect for JVMs). It also provides a data loader that is
capable of interfacing transparently with various data
sources such as PostgreSQL databases and the Redis
key-value store. Crackle shields programmers from the
backend database by allowing them to write queries
in Clojure. Additionally, its graph-focused prefetchers
are capable of closing the hitherto large gap between a
PostgreSQL database and a specialised graph database
such as Neo4j from as much 326x (with a SQL query)
to as low as 6x (when using Crackle). We also include
a detailed performance analysis that identifies ways
to further reduce this gap with Crackle. This brings
into question the performance argument for specialised
graph databases such as Neo4j by providing comparable performance on supposedly legacy data sources.
UCAM-CL-TR-821
Charles Reams:
Modelling energy efficiency for
computation
Abstract: In the last decade, efficient use of energy has
become a topic of global significance, touching almost
every area of modern life, including computing. From
mobile to desktop to server, energy efficiency concerns
are now ubiquitous. However, approaches to the energy
problem are often piecemeal and focus on only one area
for improvement.
I argue that the strands of the energy problem are inextricably entangled and cannot be solved in isolation.
I offer a high-level view of the problem and, building
from it, explore a selection of subproblems within the
field. I approach these with various levels of formality,
and demonstrate techniques to make improvements on
all levels. The original contributions are as follows:
Chapter 3 frames the energy problem as one of optimisation with constraints, and explores the impact of
this perspective for current commodity products. This
includes considerations of the hardware, software and
operating system. I summarise the current situation in
these respects and propose directions in which they
could be improved to better support energy management.
Chapter 4 presents mathematical techniques to
compute energy-optimal schedules for long-running
computations. This work reflects the server-domain
concern with energy cost, producing schedules that exploit fluctuations in power cost over time to minimise
expenditure rather than raw energy. This assumes certain idealised models of power, performance, cost, and
workload, and draws precise formal conclusions from
them.
Chapter 5 considers techniques to implement
energy-efficient real-time streaming. Two classes of
problem are considered: first, hard real-time streaming
with fixed, predictable frame characteristics; second,
soft real-time streaming with a quality-of-service guarantee and probabilistic descriptions of per-frame workload. Efficient algorithms are developed for scheduling
frame execution in an energy-efficient way while still
guaranteeing hard real-time deadlines. These schedules
determine appropriate values for power-relevant parameters, such as dynamic voltage–frequency scaling.
A key challenge for future work will be unifying these diverse approaches into one “Theory of Energy” for computing. The progress towards this is summarised in Chapter 6. The thesis concludes by sketching
future work towards this Theory of Energy.
UCAM-CL-TR-822
Richard A. Russell:
Planning with preferences using
maximum satisfiability
October 2012, 160 pages, PDF
PhD thesis (Gonville and Caius College, September
2011)
October 2012, 135 pages, PDF
PhD thesis (Clare College, October 2012)
197
Abstract: The objective of automated planning is to
synthesise a plan that achieves a set of goals specified by
the user. When achieving every goal is not feasible, the
planning system must decide which ones to plan for and
find the lowest cost plan. The system should take as input a description of the user’s preferences and the costs
incurred through executing actions. Goal utility dependencies arise when the utility of achieving a goal depends on the other goals that are achieved with it. This
complicates the planning procedure because achieving
a new goal can alter the utilities of all the other goals
currently achieved.
In this dissertation we present methods for solving planning problems with goal utility dependencies
by compiling them to a variant of satisfiability known
as weighted partial maximum satisfiability (WPMaxSAT). An optimal solution to the encoding is found
using a general-purpose solver. The encoding is constructed such that its optimal solution can be used to
construct a plan that is most preferred amongst other
plans of length that fit within a prespecified horizon.
We evaluate this approach against an integer programming based system using benchmark problems taken
from past international planning competitions.
We study how a WPMax-SAT solver might benefit
from incorporating a procedure known as survey propagation. This is a message passing algorithm that estimates the probability that a variable is constrained to
be a particular value in a randomly selected satisfying
assignment. These estimates are used to influence variable/value decisions during search for a solution. Survey propagation is usually presented with respect to the
satisfiability problem, and its generalisation, SP(y), with
respect to the maximum satisfiability problem. We extend the argument that underpins these two algorithms
to derive a new set of message passing equations for
application to WPMax-SAT problems. We evaluate the
success of this method by applying it to our encodings
of planning problems with goal utility dependencies.
Our results indicate that planning with preferences
using WPMax-SAT is competitive and sometimes more
successful than an integer programming approach –
solving two to three times more subproblems in some
domains, while being outperformed by a smaller margin in others. In some domains, we also find that using information provided by survey propagation in a
WPMax-SAT solver to select variable/value pairs for
the earliest decisions can, on average, direct search to
lower cost solutions than a uniform sampling strategy
combined with a popular heuristic.
UCAM-CL-TR-823
Amitabha Roy, Karthik Nilakant,
Valentin Dalibard, Eiko Yoneki:
Mitigating I/O latency in SSD-based
graph traversal
November 2012, 27 pages, PDF
Abstract: Mining large graphs has now become an important aspect of many applications. Recent interest in
low cost graph traversal on single machines has lead to
the construction of systems that use solid state drives
(SSDs) to store the graph. An SSD can be accessed with
far lower latency than magnetic media, while remaining cheaper than main memory. Unfortunately SSDs are
slower than main memory and algorithms running on
such systems are hampered by large IO latencies when
accessing the SSD. In this paper we present two novel
techniques to reduce the impact of SSD IO latency on
semi-external memory graph traversal. We introduce a
variant of the Compressed Sparse Row (CSR) format
that we call Compressed Enumerated Encoded Sparse
Offset Row (CEESOR). CEESOR is particularly efficient for graphs with hierarchical structure and can reduce the space required to represent connectivity information by amounts varying from 5% to as much
as 76%. CEESOR allows a larger number of edges to
be moved for each unit of IO transfer from the SSD
to main memory and more effective use of operating system caches. Our second contribution is a runtime prefetching technique that exploits the ability of
solid state drives to service multiple random access requests in parallel. We present a novel Run Along SSD
Prefetcher (RASP). RASP is capable of hiding the effect of IO latency in single threaded graph traversal in
breadth-first and shorted path order to the extent that
it improves iteration time for large graphs by amounts
varying from 2.6X-6X.
UCAM-CL-TR-824
Simon Frankau:
Hardware synthesis from a
stream-processing functional
language
November 2012, 202 pages, PDF
PhD thesis (St. John’s College, July 2004)
Abstract: As hardware designs grow exponentially
larger, there is an increasing challenge to use transistor budgets effectively. Without higher-level synthesis
tools, so much effort may be spent on low-level details
that it becomes impractical to effectively design circuits
of the size that can be fabricated. This possibility of a
design gap has been documented for some time now.
One solution is the use of domain-specific languages. This thesis covers the use of software-like languages to describe algorithms that are to be implemented in hardware. Hardware engineers can use the
tools to improve their productivity and effectiveness in
this particular domain. Software engineers can also use
this approach to benefit from the parallelism available
in modern hardware (such as reconfigurable systems
and FPGAs), while retaining the convenience of a software description.
198
In this thesis a statically-allocated pure functional
language, SASL, is introduced. Static allocation makes
the language suited to implementation in fixed hardware resources. The I/O model is based on streams (linear lazy lists), and implicit parallelism is used in order to maintain a software-like approach. The thesis
contributes constraints which allow the language to be
statically-allocated, and synthesis techniques for SASL
targeting both basic CSP and a graph-based target that
may be compiled to a register-transfer level (RTL) description.
Further chapters examine the optimisation of the
language, including the use of lenient evaluation to increase parallelism, the introduction of closures and general lazy evaluation, and the use of non-determinism in
the language. The extensions are examined in terms of
the restrictions required to ensure static allocation, and
the techniques required to synthesise them.
UCAM-CL-TR-826
Fernando M. V. Ramos:
GREEN IPTV: a resource and
energy efficient network for IPTV
December 2012, 152 pages, PDF
PhD thesis (Clare Hall, November 2012)
Abstract: The distribution of television is currently
dominated by three technologies: over-the-air broadcast, cable, and satellite. The advent of IP networks and
the increased availability of broadband access created
a new vehicle for the distribution of TV services. The
distribution of digital TV services over IP networks, or
IPTV, offers carriers flexibility and added value in the
form of additional services. It causes therefore no surprise the rapid roll-out of IPTV services by operators
worldwide in the past few years.
IPTV distribution imposes stringent requirements
on both performance and reliability. It is therefore challenging for an IPTV operator to guarantee the quality
of experience expected by its users, and doing so in an
efficient manner. In this dissertation I investigate some
of the challenges faced by IPTV distribution network
operators, and I propose novel techniques to address
these challenges.
First, I address one of the major concerns of IPTV
network deployment: channel change delay. This is the
latency experienced by users when switching between
TV channels. Synchronisation and buffering of video
streams can cause channel change delays of several seconds. I perform an empirical analysis of a particular
solution to the channel change delay problem, namely,
predictive pre-joining of TV channels. In this scheme
each Set Top Box simultaneously joins additional multicast groups (TV channels) along with the one requested
by the user. If the user switches to any of these channels next, switching latency is virtually eliminated, and
user experience is improved. The results show that it is
possible to eliminate zapping delay for a significant percentage of channel switching requests with little impact
in access network bandwidth cost.
Second, I propose a technique to increase the resource and energy efficiency of IPTV networks. This
technique is based on a simple paradigm: avoiding
waste. To reduce the inefficiencies of current static multicast distribution schemes, I propose a semi-dynamic
scheme where only a selection of TV multicast groups
is distributed in the network, instead of all. I perform an
empirical evaluation of this method and conclude that
its use results in significant bandwidth reductions without compromising service performance. I also demonstrate that these reductions may translate into significant energy savings in the future.
Third, to increase energy efficiency further I propose
a novel energy and resource friendly protocol for core
optical IPTV networks. The idea is for popular IPTV
traffic to optically bypass the network nodes, avoiding electronic processing. I evaluate this proposal empirically and conclude that the introduction of optical
switching techniques results in a significant increase in
the energy efficiency of IPTV networks.
All the schemes I present in this dissertation are evaluated by means of trace-driven analyses using a dataset
from an operational IPTV service provider. Such thorough and realistic evaluation enables the assessment of
the proposed techniques with an increased level of confidence, and is therefore a strength of this dissertation.
UCAM-CL-TR-827
Omar S. Choudary:
The smart card detective:
a hand-held EMV interceptor
December 2012, 55 pages, PDF
Abstract: Several vulnerabilities have been found in
the EMV system (also known as Chip and PIN). Saar
Drimer and Steven Murdoch have successfully implemented a relay attack against EMV using a fake terminal. Recently the same authors have found a method to
successfully complete PIN transactions without actually
entering the correct PIN. The press has published this
vulnerability but they reported such a scenario as being
hard to execute in practice because it requires specialized and complex hardware.
As proposed by Ross Anderson and Mike Bond
in 2006, I decided to create a miniature man-in-themiddle device to defend smartcard users against relay
attacks.
As a result of my MPhil project work I created a
hand-held device, called Smart Card Defender (SCD),
which intercepts the communication between smartcard and terminal. The device has been built using a low
cost ATMEL AT90USB1287 microcontroller and other
readily available electronic components. The total cost
of the SCD has been around £100, but an industrial
version could be produced for less than £20.
199
I implemented several applications using the SCD,
including the defense against the relay attack as well
as the recently discovered vulnerability to complete a
transaction without using the correct PIN.
All the applications have been successfully tested on
CAP readers and live terminals. Furthermore, I have
performed real tests using the SCD at several shops in
town.
From the experiments using the SCD, I have noticed some particularities of the CAP protocol compared to the EMV standard. I have also discovered that
the smartcard does not follow the physical transport
protocol exactly. Such findings are presented in detail,
along with a discussion of the results.
UCAM-CL-TR-828
makes hard routers a flexible and efficient alternative to
soft interconnect.
The second part of this thesis looks at the feasibility
of replacing all static wiring on the FPGA with TDM
wiring. The aim was to increase the routing capacity
of the FPGA whilst decreasing the area used to implement it. An ECAD flow was developed to explore the
extent to which the amount of wiring can be reduced.
The results were then used to design the TDM circuitry.
My results show that an 80% reduction in the
amount of wiring is possible though time-division multiplexing. This reduction is sufficient to increase the
routing capacity of the FPGA whilst maintaining similar or better logic density. This TDM wiring can be used
to implement area and power-efficient hard networkson-chip with good flexibility, as well as improving the
performance of other hard IP blocks.
Rosemary M. Francis:
UCAM-CL-TR-829
Exploring networks-on-chip for
FPGAs
Philip Christopher Paul:
January 2013, 121 pages, PDF
Microelectronic security measures
PhD thesis (Darwin College, July 2009)
February 2013, 177 pages, PDF
Abstract: Developments in fabrication processes have
shifted the cost ratio between wires and transistors to
allow new trade-offs between computation and communication. Rising clock speeds have lead to multicycle cross-chip communication and pipelined buses. It
is then a small step from pipelining to switching and
the development of multi-core networked systems-onchip. Modern FPGAs are also now home to complex
systems-on-chip. A change in the way we structure the
computation demands a change in the way we structure
the communication on-chip.
This thesis looks at Network-on-Chip design for FPGAs beyond the trade-offs between hard (silicon) and
soft (configurable) designs. FPGAs are capable of extremely flexible, statically routed bit-based wiring, but
this flexibility comes at a high area, latency and power
cost. Soft NoCs are able to maintain this flexibility, but
do not necessarily make good use of the computationcommunication trade-off. Hard NoCs are more efficient when used, but are forced to operate below capacity by the soft IP cores. It is also difficult to design
hard NoCs with the flexibility needed without wasting
silicon when the network is not used.
In the first part of this thesis I explore the capability
of Time-Division Multiplexed (TDM) wiring to bridge
the gap between the fine-grain static FPGA wiring and
the bus-based dynamic routing of a NoC. By replacing
some of the static FPGA wiring with TDM wiring I am
able to time division multiplex hard routers and make
better use of the non-configurable area. The cost of a
hard network is reduced by moving some of the area
cost from the routers into reusable TDM wiring components. The TDM wiring improves the interface between the hard routers and soft IP blocks which leads
to higher logic density overall. I show that TDM wiring
PhD thesis (Pembroke College, January 2009)
Abstract: In this dissertation I propose the concept of
tamper protection grids for microelectronic security devices made from organic electronic materials. As security devices have become ubiquitous in recent years,
they are becoming targets for criminal activity. One
general attack route to breach the security is to carry
out physical attack after depackaging a device. Commercial security devices use a metal wire mesh within
the chip to protect against these attacks. However, as a
microchip is physically robust, the mesh is not affected
by depackaging.
As a better way of protecting security devices
against attacks requiring the chip package to be removed, I investigate a protection grid that is vulnerable
to damage if the packaging is tampered with. The protection grid is connected directly to standard bond pads
on the microchip, to allow direct electronic measurements, saving the need for complex sensor structures.
That way, a security device can monitor the package
for integrity, and initiate countermeasures if required.
The feasibility of organic tamper protection grids
was evaluated. To establish the viability of the concept,
a fabrication method for these devices was developed,
the sensitivity to depackaging was assessed, and practical implementation issues were evolved. Inkjet printing was chosen as fabrication route, as devices can be
produced at low cost while preserving flexibility of layout. A solution to the problem of adverse surface interaction was found to ensure good print quality on the
hydrophobic chip surface. Standard contacts between
chip and grid are non-linear and degrade between measurements, however it was shown that stable ohmic
contacts are possible using a silver buffer layer. The
200
sensitivity of the grid to reported depackaging methods was tested, and improvements to the structure were
found to maximise damage to the grid upon tampering
with the package. Practical issues such as measurement
stability with temperature and age were evaluated, as
well as a first prototype to assess the achievable measurement accuracy. The evaluation of these practical issues shows directions for future work that can develop
organic protection grids beyond the proof of concept.
Apart from the previously mentioned invasive attacks, there is a second category of attacks, noninvasive attacks, that do not require the removal of the
chip packaging. The most prominent non-invasive attack is power analysis in which the power consumption of a device is used as oracle to reveal the secret
key of a security device. Logic gates were designed and
fabricated with data-independent power consumption
in each clock cycle. However, it is shown that this is
not sufficient to protect the secret key. Despite balancing the discharged capacitances in each clock cycle,
the power consumed still depends on the data input.
While the overall charge consumed in each clock cycle matches to a few percent, differences within a clock
cycle can easily be measured. It was shown that the
dominant cause for this imbalance is early propagation,
which can be mitigated by ensuring that evaluation
in a gate only takes place after all inputs are present.
The second major source of imbalance are mismatched
discharge paths in logic gates, which result in datadependent evaluation times of a gate. This source of
imbalance is not as trivial to remove, as it conflicts with
balancing the discharged capacitances in each clock cycle.
neural communication, particularly for real-time computations.
It is shown that memory bandwidth is the most
significant constraint to the scale of real-time neural
computation, followed by communication bandwidth,
which leads to a decision to implement a neural computation system on a platform based on a network of Field
Programmable Gate Arrays (FPGAs), using commercial
off-the-shelf components with some custom supporting
infrastructure. This brings implementation challenges,
particularly lack of on-chip memory, but also many advantages, particularly high-speed transceivers. An algorithm to model neural communication that makes efficient use of memory and communication resources is
developed and then used to implement a neural computation system on the multi-FPGA platform.
Finding suitable benchmark neural networks for a
massively parallel neural computation system proves
to be a challenge. A synthetic benchmark that has
biologically-plausible fan-out, spike frequency and
spike volume is proposed and used to evaluate the system. It is shown to be capable of computing the activity
of a network of 256k Izhikevich spiking neurons with
a fan-out of 1k in real-time using a network of 4 FPGA
boards. This compares favourably with previous work,
with the added advantage of scalability to larger neural
networks using more FPGAs.
It is concluded that communication must be considered as a first-class design constraint when implementing massively parallel neural computation systems.
UCAM-CL-TR-831
Meredydd Luff:
UCAM-CL-TR-830
Communication for programmability
and performance on multi-core
processors
Paul J. Fox:
Massively parallel neural
computation
April 2013, 89 pages, PDF
PhD thesis (Gonville & Caius College, November
2012)
March 2013, 105 pages, PDF
PhD thesis (Jesus College, October 2012)
Abstract: Reverse-engineering the brain is one of the
US National Academy of Engineering’s ‘Grand Challenges’. The structure of the brain can be examined at
many different levels, spanning many disciplines from
low-level biology through psychology and computer
science. This thesis focusses on real-time computation
of large neural networks using the Izhikevich spiking
neuron model.
Neural computation has been described as ‘embarrassingly parallel’ as each neuron can be thought of as
an independent system, with behaviour described by a
mathematical model. However, the real challenge lies in
modelling neural communication. While the connectivity of neurons has some parallels with that of electrical
systems, its high fan-out results in massive data processing and communication requirements when modelling
Abstract: The transition to multi-core processors has
yielded a fundamentally new sort of computer. Software can no longer benefit passively from improvements in processor technology, but must perform its
computations in parallel if it is to take advantage of
the continued increase in processing power. Software
development has yet to catch up, and with good reason: parallel programming is hard, error-prone and often unrewarding.
In this dissertation, I consider the programmability
challenges of the multi-core era, and examine three angles of attack.
I begin by reviewing alternative programming
paradigms which aim to address these changes, and investigate two popular alternatives with a controlled pilot experiment. The results are inconclusive, and subsequent studies in that field have suffered from similar
201
weakness. This leads me to conclude that empirical user
studies are poor tools for designing parallel programming systems.
I then consider one such alternative paradigm, transactional memory, which has promising usability characteristics but suffers performance overheads so severe
that they mask its benefits. By modelling an ideal intercore communication mechanism, I propose using our
embarrassment of parallel riches to mitigate these overheads. By pairing “helper” processors with application
threads, I offload the overheads of software transactional memory, thereby greatly mitigating the problem
of serial overhead.
Finally, I address the mechanics of inter-core communication. Due to the use of cache coherence to
preserve the programming model of previous processors, explicitly communicating between the cores of
any modern multi-core processor is painfully slow. The
schemes proposed so far to alleviate this problem are
complex, insufficiently general, and often introduce
new resources which cannot be virtualised transparently by a time-sharing operating system. I propose
and describe an asynchronous remote store instruction, which is issued by one core and completed asynchronously by another into its own local cache. I evaluate several patterns of parallel communication, and determine that the use of remote stores greatly increases
the performance of common synchronisation kernels.
I quantify the benefit to the feasibility of fine-grained
parallelism. To finish, I use this mechanism to implement my parallel STM scheme, and demonstrate that it
performs well, reducing overheads significantly.
UCAM-CL-TR-832
Gregory A. Chadwick:
Communication centric, multi-core,
fine-grained processor architecture
Communication is also a key issue in multi-core architecture. Wires do not scale as well as gates, making
communication relatively more expensive compared
to computation so optimising communication between
cores on chip becomes important.
This dissertation presents an architecture designed
to enable scalable fine-grained computation that is
communication aware (allowing a programmer to optimise for communication). By combining a tagged memory, where each word is augmented with a presence bit
signifying whether or not data is present in that word,
with a hardware based scheduler, which allows a thread
to wait upon a word becoming present with low overhead. A flexible and scalable architecture well suited
to fine-grained computation can be created, one which
enables this without needing the introduction of many
new architectural features or instructions. Communication is made explicit by enforcing that accesses to a
given area of memory will always go to the same cache,
removing the need for a cache coherency protocol.
The dissertation begins by reviewing the need for
multi-core architecture and discusses the major issues
faced in their construction. It moves on to look at finegrained computation in particular. The proposed architecture, known as Mamba, is then presented in detail
with several software techniques suitable for use with
it introduced. An FPGA implementation of Mamba is
then evaluated against a similar architecture that lacks
the extensions Mamba has for assisting in fine-grained
computation (namely a memory tagged with presence
bits and a hardware scheduler). Microbenchmarks examining the performance of FIFO based communication, MCS locks (an efficient spin-lock implementation based around queues) and barriers demonstrate
Mamba’s scalability and insensitivity to thread count.
A SAT solver implementation demonstrates that these
benefits have a real impact on an actual application.
UCAM-CL-TR-833
April 2013, 165 pages, PDF
Alan F. Blackwell, Ignatios Charalampidis:
PhD thesis (Fitzwilliam College, September 2012)
Abstract: With multi-core architectures now firmly entrenched in many application areas both computer architects and programmers now face new challenges.
Computer architects must increase core count to increase explicit parallelism available to the programmer
in order to provide better performance whilst leaving
the programming model presented tractable. The programmer must find ways to exploit this explicit parallelism provided that scales well with increasing core
and thread availability.
A fine-grained computation model allows the programmer to expose a large amount of explicit parallelism and the greater the level of parallelism exposed
the better increasing core counts can be utilised. However a fine-grained approach implies many interworking threads and the overhead of synchronising and
scheduling these threads can eradicate any scalability
advantages a fine-grained program may have.
Practice-led design and evaluation of
a live visual constraint language
May 2013, 16 pages, PDF
Abstract: We report an experimental evaluation of
Palimpsest, a novel purely-visual programming language. A working prototype of Palimpsest had been developed following a practice-led process, in order to assess whether tools for use in the visual arts can usefully
be created by adopting development processes that emulate arts practice. This initial prototype was received
more positively by users who have high self-efficacy in
both visual arts and computer use. A number of potential usability improvements are identified, structured
according to the Cognitive Dimensions of Notations
framework.
202
UCAM-CL-TR-834
John Wickerson:
Concurrent verification for sequential
programs
May 2013, 149 pages, PDF
PhD thesis (Churchill College, December 2012)
Abstract: This dissertation makes two contributions to
the field of software verification. The first explains how
verification techniques originally developed for concurrency can be usefully applied to sequential programs.
The second describes how sequential programs can be
verified using diagrams that have a parallel nature.
The first contribution involves a new treatment
of stability in verification methods based on relyguarantee. When an assertion made in one thread of a
concurrent system cannot be invalidated by the actions
of other threads, that assertion is said to be ‘stable’.
Stability is normally enforced through side-conditions
on rely-guarantee proof rules. This dissertation instead
proposes to encode stability information into the syntactic form of the assertion. This approach, which we
call explicit stabilisation, brings several benefits. First,
we empower rely-guarantee with the ability to reason
about library code for the first time. Second, when
the rely-guarantee method is redeployed in a sequential
setting, explicit stabilisation allows more details of a
module’s implementation to be hidden when verifying
clients. Third, explicit stabilisation brings a more nuanced understanding of the important issue of stability
in concurrent and sequential verification; such an understanding grows ever more important as verification
techniques grow ever more complex.
The second contribution is a new method of presenting program proofs conducted in separation logic.
Building on work by Jules Bean, the ribbon proof is
a diagrammatic alternative to the standard ‘proof outline’. By emphasising the structure of a proof, ribbon
proofs are intelligible and hence pedagogically useful.
Because they contain less redundancy than proof outlines, and allow each proof step to be checked locally,
they are highly scalable; this we illustrate with a ribbon proof of the Version 7 Unix memory manager.
Where proof outlines are cumbersome to modify, ribbon proofs can be visually manoeuvred to yield proofs
of variant programs. We describe the ribbon proof system, prove its soundness and completeness, and outline a prototype tool for mechanically checking the diagrams it produces.
UCAM-CL-TR-835
Maximilian C. Bolingbroke:
Call-by-need supercompilation
May 2013, 230 pages, PDF
PhD thesis (Robinson College, April 2013)
Abstract: This thesis shows how supercompilation, a
powerful technique for transformation and analysis of
functional programs, can be effectively applied to a
call-by-need language. Our setting will be core calculi
suitable for use as intermediate languages when compiling higher-order, lazy functional programming languages such as Haskell.
We describe a new formulation of supercompilation
which is more closely connected to operational semantics than the standard presentation. As a result of this
connection, we are able to exploit a standard Sestoftstyle operational semantics to build a supercompiler
which, for the first time, is able to supercompile a callby-need language with unrestricted recursive let bindings.
We give complete descriptions of all of the (surprisingly tricky) components of the resulting supercompiler,
showing in detail how standard formulations of supercompilation have to be adapted for the call-by-need setting.
We show how the standard technique of generalisation can be extended to the call-by-need setting. We also
describe a novel generalisation scheme which is simpler
to implement than standard generalisation techniques,
and describe a completely new form of generalisation
which can be used when supercompiling a typed language to ameliorate the phenomenon of supercompilers
overspecialising functions on their type arguments.
We also demonstrate a number of nongeneralisation-based techniques that can be used
to improve the quality of the code generated by the
supercompiler. Firstly, we show how let-speculation
can be used to ameliorate the effects of the workduplication checks that are inherent to call-by-need
supercompilation. Secondly, we demonstrate how the
standard idea of ‘rollback’ in supercompilation can be
adapted to our presentation of the supercompilation
algorithm.
We have implemented our supercompiler as an optimisation pass in the Glasgow Haskell Compiler. We
perform a comprehensive evaluation of our implementation on a suite of standard call-by-need benchmarks.
We improve the runtime of the benchmarks in our suite
by a geometric mean of 42%, and reduce the amount of
memory which the benchmarks allocate by a geometric
mean of 34%.
UCAM-CL-TR-836
Janina Voigt, Alan Mycroft:
Aliasing contracts: a dynamic
approach to alias protection
June 2013, 27 pages, PDF
Abstract: Object-oriented programming languages allow multiple variables to refer to the same object, a situation known as aliasing. Aliasing is a powerful tool
which enables sharing of objects across a system. However, it can cause serious encapsulation breaches if not
203
controlled properly; through aliasing, internal parts of
aggregate objects can be exposed and potentially modified by any part of the system.
A number of schemes for controlling aliasing have
been proposed, including Clarke et al.’s ownership
types and Boyland et al.’s capabilities. However, many
existing systems lack flexibility and expressiveness,
making it difficult in practice to program common idioms or patterns which rely on sharing, such as iterators.
We introduce aliasing contracts, a dynamic alias
protection scheme which is highly flexible and expressive. Aliasing contracts allow developers to express assumptions about which parts of a system can access
particular objects. Aliasing contracts attempt to be a
universal approach to alias protection; they can be used
to encode various existing schemes.
UCAM-CL-TR-837
Hamed Haddadi, Richard Mortier,
Derek McAuley, Jon Crowcroft:
Human-data interaction
June 2013, 9 pages, PDF
Abstract: The time has come to recognise the emerging
topic of Human-Data Interaction (HDI). It arises from
the need, both ethical and practical, to engage users
to a much greater degree with the collection, analysis,
and trade of their personal data, in addition to providing them with an intuitive feedback mechanism. HDI
is inherently inter-disciplinary, encapsulating elements
not only of traditional computer science ranging across
data processing, systems design, visualisation and interaction design, but also of law, psychology, behavioural
economics, and sociology. In this short paper we elaborate the motivation for studying the nature and dynamics of HDI, and we give some thought to challenges and
opportunities in developing approaches to this novel
discipline.
mainline code during compilation. However, for existing software systems to benefit from AOP, the crosscutting concerns must be identified first (aspect mining)
before the system can be re-factored into an aspectoriented design.
This thesis on mining and tracking cross-cutting
concerns makes three contributions: firstly, it presents
aspect mining as both a theoretical idea and a practical and scalable application. By analysing where developers add code to a program, our history-based aspect mining (HAM) identifies and ranks cross-cutting
concerns. Its effectiveness and high precision was evaluated using industrial-sized open-source projects such
as ECLIPSE.
Secondly, the thesis takes the work on software
evolution one step further. Knowledge about a concern’s implementation can become invalid as the system
evolves. We address this problem by defining structural
and textual patterns among the elements identified as
relevant to a concern’s implementation. The inferred
patterns are documented as rules that describe a concern in a formal (intensional) rather than a merely textual (extensional) manner. These rules can then be used
to track an evolving concern’s implementation in conjunction with the development history.
Finally, we implemented this technique for Java in
an Eclipse plug-in called ISIS4J and evaluated it using a
number of concerns. For that we again used the development history of an open-source project. The evaluation shows not only the effectiveness of our approach,
but also to what extent our approach supports the
tracking of a concern’s implementation despite, for example, program code extensions or refactorings.
UCAM-CL-TR-839
Colin Kelly:
Automatic extraction of property
norm-like data from large text
corpora
September 2013, 154 pages, PDF
UCAM-CL-TR-838
PhD thesis (Trinity Hall, September 2012)
Silvia Breu:
Mining and tracking in evolving
software
June 2013, 104 pages, PDF
PhD thesis (Newnham College, April 2011)
Abstract: Every large program contains a small fraction of functionality that resists clean encapsulation.
For example, code for debugging or locking is hard to
keep hidden using object-oriented mechanisms alone.
This problem gave rise to aspect-oriented programming: such cross-cutting functionality is factored out
into so-called aspects and these are woven back into
Abstract: Traditional methods for deriving propertybased representations of concepts from text have focused on extracting unspecified relationships (e.g., ”car
— petrol”) or only a sub-set of possible relation types,
such as hyponymy/hypernymy (e.g., ”car is-a vehicle”)
or meronymy/metonymy (e.g., ”car has wheels”).
We propose a number of varied approaches towards
the extremely challenging task of automatic, largescale acquisition of unconstrained, human-like property norms (in the form ”concept relation feature”, e.g.,
”elephant has trunk”, ”scissors used for cutting”, ”banana is yellow”) from large text corpora. We present
four distinct extraction systems for our task. In our
first two experiments we manually develop syntactic
and lexical rules designed to extract property norm-like
204
information from corpus text. We explore the impact
of corpus choice, investigate the efficacy of reweighting
our output through WordNet-derived semantic clusters, introduce a novel entropy calculation specific to
our task, and test the usefulness of other classical wordassociation metrics.
In our third experiment we employ semi-supervised
learning to generalise from our findings thus far, viewing our task as one of relation classification in which
we train a support vector machine on a known set of
property norms. Our feature extraction performance
is encouraging; however the generated relations are
restricted to those found in our training set. Therefore in our fourth and final experiment we use an improved version of our semi-supervised system to initially extract only features for concepts. We then use the
concepts and extracted features to anchor an unconstrained relation extraction stage, introducing a novel
backing-off technique which assigns relations to concept/feature pairs using probabilistic information.
We also develop and implement an array of evaluations for our task. In addition to the previously employed ESSLLI gold standard, we offer five new evaluation techniques: fMRI activation prediction, EEG activation prediction, a conceptual structure statistics evaluation, a human-generated semantic similarity evaluation and a WordNet semantic similarity comparison.
We also comprehensively evaluate our three best systems using human annotators.
Throughout our experiments, our various systems’
output is promising but our final system is by far the
best-performing. When evaluated against the ESSLLI
gold standard it achieves a precision of 44.1%, compared to the 23.9% precision of the current state of
the art. Furthermore, our final system’s Pearson correlation with human- generated semantic similarity measurements is strong at 0.742, and human judges marked
71.4% of its output as correct/plausible.
UCAM-CL-TR-840
Marek Rei:
Minimally supervised
dependency-based methods for
natural language processing
September 2013, 169 pages, PDF
PhD thesis (Churchill College, December 2012)
the task of detecting the scope of speculative language,
and develop a system that applies manually-defined
rules over dependency graphs. Next, we experiment
with distributional similarity measures for detecting
and generating hyponyms, and describe a new measure
that achieves the highest performance on hyponym generation. We also extend the distributional hypothesis
to larger structures and propose the task of detecting
entailment relations between dependency graph fragments of various types and sizes. Our system achieves
relatively high accuracy by combining distributional
and lexical similarity scores. Finally, we describe a selflearning framework for improving the accuracy of an
unlexicalised parser, by calculating relation probabilities using its own dependency output. The method requires only a large in-domain text corpus and can therefore be easily applied to different domains and genres.
While fully supervised approaches generally achieve
the highest results, our experiments found minimally
supervised methods to be remarkably competitive. By
moving away from explicit supervision, we aim to better understand the underlying patterns in the data, and
to create systems that are not tied to any specific domains, tasks or resources.
UCAM-CL-TR-841
Arjuna Sathiaseelan, Dirk Trossen,
Ioannis Komnios, Joerg Ott, Jon Crowcroft:
Information centric delay tolerant
networking: an internet architecture
for the challenged
September 2013, 11 pages, PDF
Abstract: Enabling universal Internet access is one of
the key issues that is currently being addressed globally. However the existing Internet architecture is seriously challenged to ensure universal service provisioning. This technical report puts forth our vision to
make the Internet more accessible by architecting a universal communication architectural framework combining two emerging architecture and connectivity approaches: Information Centric Networking (ICN) and
Delay/Disruption Tolerant Networking (DTN). Such
an unified architecture will aggressively seek to widen
the connectivity options and provide flexible service
models beyond what is currently pursued in the field
of universal service provisioning.
Abstract: This work investigates minimally-supervised
UCAM-CL-TR-842
methods for solving NLP tasks, without requiring explicit annotation or training data. Our motivation is to
create systems that require substantially reduced effort Helen Yannakoudakis:
from domain and/or NLP experts, compared to annotating a corresponding dataset, and also offer easier do- Automated assessment of
main adaptation and better generalisation properties.
English-learner writing
We apply these principles to four separate language
processing tasks and analyse their performance com- October 2013, 151 pages, PDF
pared to supervised alternatives. First, we investigate PhD thesis (Wolfson College, December 2012)
205
Abstract: In this thesis, we investigate automated assessment (AA) systems of free text that automatically
analyse and score the quality of writing of learners
of English as a second (or other) language. Previous research has employed techniques that measure,
in addition to writing competence, the semantic relevance of a text written in response to a given prompt.
We argue that an approach which does not rely on
task-dependent components or data, and directly assesses learner English, can produce results as good as
prompt-specific models. Furthermore, it has the advantage that it may not require re-training or tuning for
new prompts or assessment tasks. We evaluate the performance of our models against human scores, manually annotated in the Cambridge Learner Corpus, a
subset of which we have released in the public domain
to facilitate further research on the task.
We address AA as a supervised discriminative machine learning problem, investigate methods for assessing different aspects of writing prose, examine their
generalisation to different corpora, and present stateof-the-art models. We focus on scoring general linguistic competence and discourse coherence and cohesion, and report experiments on detailed analysis of
appropriate techniques and feature types derived automatically from generic text processing tools, on their
relative importance and contribution to performance,
and on comparison with different discriminative models, whilst also experimentally motivating novel feature types for the task. Using outlier texts, we examine
and address validity issues of AA systems and, more
specifically, their robustness to subversion by writers
who understand something of their workings. Finally,
we present a user interface that visualises and uncovers
the ‘marking criteria’ represented in AA models, that
is, textual features identified as highly predictive of a
learner’s level of attainment. We demonstrate how the
tool can support their linguistic interpretation and enhance hypothesis formation about learner grammars, in
addition to informing the development of AA systems
and further improving their performance.
UCAM-CL-TR-843
Robin Message:
Programming for humans:
a new paradigm for domain-specific
languages
November 2013, 140 pages, PDF
PhD thesis (Robinson College, March 2013)
Abstract: Programming is a difficult, specialist skill. Despite much research in software engineering, programmers still work like craftsmen or artists, not engineers.
As a result, programs cannot easily be modified, joined
together or customised. However, unlike a craft product, once a programmer has created their program, it
can replicated infinitely and perfectly. This means that
programs are often not a good fit for their end-users
because this infinite duplication gives their creators an
incentive to create very general programs.
My thesis is that we can create better paradigms,
languages and data structuring techniques to enable
end-users to create their own programs.
The first contribution is a new paradigm for programming languages which explicitly separates control
and data flow. For example, in a web application, the
control level would handle user clicks and database
writes, while the data level would handle form inputs
and database reads. The language is strongly typed,
with type reconstruction. We believe this paradigm is
particularly suited to end-user programming of interactive applications.
The second contribution is an implementation of
this paradigm in a specialised visual programming language for novice programmers to develop web applications. We describe our programming environment,
which has a novel layout algorithm that maps control
and data flow onto separate dimensions. We show that
experienced programmers are more productive in this
system than the alternatives.
The third contribution is a novel data structuring
technique which infers fuzzy types from example data.
This inference is theoretically founded on Bayesian
statistics. Our inference aids programmers in moving
from semi-structured data to typed programs. We discuss how this data structuring technique could be visualised and integrated with our visual programming
environment.
UCAM-CL-TR-844
Wei Ming Khoo:
Decompilation as search
November 2013, 119 pages, PDF
PhD thesis (Hughes Hall, August 2013)
Abstract: Decompilation is the process of converting
programs in a low-level representation, such as machine code, into high-level programs that are human
readable, compilable and semantically equivalent. The
current de facto approach to decompilation is largely
modelled on compiler theory and only focusses on one
or two of these desirable goals at a time.
This thesis makes the case that decompilation is
more effectively accomplished through search. It is observed that software development is seldom a clean
slate process and much software is available in public
repositories. To back this claim, evidence is presented
from three categories of software development: corporate software development, open source projects and
malware creation. Evidence strongly suggests that code
reuse is prevalent in all categories.
Two approaches to search-based decompilation are
proposed. The first approach borrow inspiration from
information retrieval, and constitutes the first contribution of this thesis. It uses instruction mnemonics,
206
control-flow sub-graphs and data constants, which can
be quickly extracted from a disassembly, and relies on
the popular text search engine CLucene. The time taken
to analyse a function is small enough to be practical and
the technique achieves an F2 measure of above 83.0%
for two benchmarks.
The second approach and contribution of this thesis
is perturbation analysis, which is able to differentiate
between algorithms implementing the same functionality, e.g. bubblesort versus quicksort, and between different implementations of the same algorithm, e.g. quicksort from Wikipedia versus quicksort from Rosetta
code. Test-based indexing (TBI) uses random testing
to characterise the input-output behaviour of a function; perturbation-based indexing (PBI) is TBI with additional input-output behaviour obtained through perturbation analysis. TBI/PBI achieves an F2 measure of
88.4% on five benchmarks involving different compilers and compiler options.
To perform perturbation analysis, function prototyping is needed, the standard way comprising liveness and reaching-definitions analysis. However, it is
observed that in practice actual prototypes fall into one
of a few possible categories, enabling the type system to
be simplified considerably. The third and final contribution is an approach to prototype recovery that follows
the principle of conformant execution, in the form of
inlined data source tracking, to infer arrays, pointer-topointers and recursive data structures.
called Cake. This language is specialised to the task of
describing how components having mismatched interfaces (i.e., not plug-compatible, and perhaps not homogeneous) may be adapted so that they compose as
required. It is a language significantly more effective at
capturing relationships between mismatched interfaces
than general-purpose programming languages. Firstly,
we outline the language’s design, which centres on reconciling interface differences in the form high-level correspondence rules which relate different interfaces. Secondly, since Cake is designed to be a practical tool
which can be a convenient and easily-integrated tool
under existing development practices, we describe an
implementation of Cake in detail and explain how it
achieves this integration. Thirdly, we evaluate Cake on
real tasks: by applying it to integration tasks which
have already been performed under conventional approaches, we draw meaningful comparisons demonstrating a smaller (quantitative) size of required code
and lesser (qualitative) complexity of the code that is
required. Finally, Cake applies to a wide range of input
components; we sketch extensions to Cake which render it capable of composing components that are heterogeneous with respect to a carefully identified set of
stylistic concerns which we describe in detail.
UCAM-CL-TR-846
Daniel Bates:
Exploiting tightly-coupled cores
UCAM-CL-TR-845
January 2014, 162 pages, PDF
Stephen Kell:
PhD thesis (Robinson College, July 2013)
Black-box composition of
mismatched software components
December 2013, 251 pages, PDF
PhD thesis (Christ’s College, December 2010)
Abstract: Software is expensive to develop. Much of
that expense can be blamed on difficulties in combining, integrating or re-using separate pieces of software,
and in maintaining such compositions. Conventional
development tools approach composition in an inherently narrow way. Specifically, they insist on modules
that are plug-compatible, meaning that they must fit
together down to a very fine level of detail, and that
are homogeneous, meaning that they must be written according to the same conventions and (usually)
in the same programming language. In summary, modules must have matched interfaces to compose. These
inflexibilities, in turn, motivate more software creation
and concomitant expense: they make programming approaches based on integration and re-use unduly expensive. This means that reimplementation from scratch is
often chosen in preference to adaptation of existing implementations.
This dissertation presents several contributions towards lessening this problem. It centres on the design of a new special-purpose programming language,
Abstract: As we move steadily through the multicore
era, and the number of processing cores on each chip
continues to rise, parallel computation becomes increasingly important. However, parallelising an application is often difficult because of dependencies between different regions of code which require cores
to communicate. Communication is usually slow compared to computation, and so restricts the opportunities
for profitable parallelisation. In this work, I explore the
opportunities provided when communication between
cores has a very low latency and low energy cost. I
observe that there are many different ways in which
multiple cores can be used to execute a program, allowing more parallelism to be exploited in more situations, and also providing energy savings in some cases.
Individual cores can be made very simple and efficient
because they do not need to exploit parallelism internally. The communication patterns between cores can
be updated frequently to reflect the parallelism available at the time, allowing better utilisation than specialised hardware which is used infrequently.
In this dissertation I introduce Loki: a homogeneous, tiled architecture made up of many simple,
tightly-coupled cores. I demonstrate the benefits in both
performance and energy consumption which can be
achieved with this arrangement and observe that it is
207
also likely to have lower design and validation costs
and be easier to optimise. I then determine exactly
where the performance bottlenecks of the design are,
and where the energy is consumed, and look into some
more-advanced optimisations which can make parallelism even more profitable.
UCAM-CL-TR-847
Jatinder Singh, Jean Bacon:
SBUS: a generic policy-enforcing
middleware for open pervasive
systems
February 2014, 20 pages, PDF
Abstract: Currently, application components tend to be
bespoke and closed, running in vertical silos (single applications/systems). To realise the potential of pervasive
systems, and emerging distributed systems more generally, it must be possible to use components systemwide, perhaps in ways and for purposes not envisaged
by their designers. It follows that while the infrastructure and resources underlying applications still require
management, so too do the applications themselves, in
terms of how and when they (inter)operate. To achieve
such context-dependent, personalised operation we believe that the application logic embodied in components
should be separated from the policy that coordinates
them, specifying where and how they should be used.
SBUS is an open, decentralised, applicationindependent policy-enforcing middleware, developed
towards this aim. To enable the flexible and complex
interactions required by pervasive systems, it supports
a wide range of interaction patterns, including event
driven operation, request-response, and data (message)
streaming, and features a flexible security model. Crucially, SBUS is dynamically reconfigurable, allowing
components to be managed from outside application
logic, by authorised third-parties. This paves the way
for policy-driven systems, where policy can operate
across infrastructure and applications to realise both
traditional and new functionality.
This report details the SBUS middleware and the
role of policy enforcement in enabling pervasive, distributed systems.
UCAM-CL-TR-848
James G. Jardine:
Automatically generating reading lists
February 2014, 164 pages, PDF
PhD thesis (Robinson College, August 2013)
Abstract: This thesis addresses the task of automatically
generating reading lists for novices in a scientific field.
Reading lists help novices to get up to speed in a new
field by providing an expert-directed list of papers to
read. Without reading lists, novices must resort to adhoc exploratory scientific search, which is an inefficient
use of time and poses a danger that they might use biased or incorrect material as the foundation for their
early learning.
The contributions of this thesis are fourfold. The
first contribution is the ThemedPageRank (TPR) algorithm for automatically generating reading lists. It combines Latent Topic Models with Personalised PageRank
and Age Adjustment in a novel way to generate reading lists that are of better quality than those generated
by state-of-the-art search engines. TPR is also used in
this thesis to reconstruct the bibliography for scientific
papers. Although not designed specifically for this task,
TPR significantly outperforms a state-of-the-art system
purpose-built for the task. The second contribution is a
gold-standard collection of reading lists against which
TPR is evaluated, and against which future algorithms
can be evaluated. The eight reading lists in the goldstandard were produced by experts recruited from two
universities in the United Kingdom. The third contribution is the Citation Substitution Coefficient (CSC), an
evaluation metric for evaluating the quality of reading
lists. CSC is better suited to this task than standard IR
metrics such as precision, recall, F-score and mean average precision because it gives partial credit to recommended papers that are close to gold-standard papers
in the citation graph. This partial credit results in scores
that have more granularity than those of the standard
IR metrics, allowing the subtle differences in the performance of recommendation algorithms to be detected.
The final contribution is a light-weight algorithm for
Automatic Term Recognition (ATR). As will be seen,
technical terms play an important role in the TPR algorithm. This light-weight algorithm extracts technical
terms from the titles of documents without the need
for the complex apparatus required by most state-ofthe-art ATR algorithms. It is also capable of extracting
very long technical terms, unlike many other ATR algorithms.
Four experiments are presented in this thesis. The
first experiment evaluates TPR against state-of-the-art
search engines in the task of automatically generating
reading lists that are comparable to expert-generated
gold-standards. The second experiment compares the
performance of TPR against a purpose-built state-ofthe-art system in the task of automatically reconstructing the reference lists of scientific papers. The third experiment involves a user study to explore the ability
of novices to build their own reading lists using two
fundamental components of TPR: automatic technical
term recognition and topic modelling. A system exposing only these components is compared against a stateof-the-art scientific search engine. The final experiment
is a user study that evaluates the technical terms discovered by the ATR algorithm and the latent topics gener-
208
ated by TPR. The study enlists thousands of users of that includes capability registers, capability instrucQiqqa, research management software independently tions, and tagged memory that have been added to the
written by the author of this thesis.
64-bit MIPS ISA via a new capability coprocessor.
CHERI’s hybrid approach, inspired by the Capsicum security model, allows incremental adoption of
UCAM-CL-TR-849
capability-oriented software design: software implementations that are more robust and resilient can be
Marcelo Bagnulo Braun, Jon Crowcroft:
deployed where they are most needed, while leaving
SNA: Sourceless Network
less critical software largely unmodified, but nevertheless suitably constrained to be incapable of having adArchitecture
verse effects. For example, we are focusing converMarch 2014, 12 pages, PDF
sion efforts on low-level TCB components of the system: separation kernels, hypervisors, operating system
Abstract: Why are there source addresses in datagrams? kernels, language runtimes, and userspace TCBs such
What alternative architecture can one conceive to pro- as web browsers. Likewise, we see early-use scenarvide all of the current, and some new functionality, cur- ios (such as data compression, image processing, and
rently dependant on a conflicting set of uses for this video processing) that relate to particularly high-risk
field. We illustrate how this can be achieved by re- software libraries, which are concentrations of both
interpreting the 32-bit field in IPv4 headers to help the complex and historically vulnerability-prone code comInternet solve a range of current and future problems.
bined with untrustworthy data sources, while leaving
containing applications unchanged.
UCAM-CL-TR-850
This report describes the CHERI architecture and
design, and provides reference documentation for the
Robert N.M. Watson, Peter G. Neumann,
CHERI instruction-set architecture (ISA) and potential
memory models, along with their requirements. It also
Jonathan Woodruff, Jonathan Anderson,
documents our current thinking on integration of proDavid Chisnall, Brooks Davis, Ben Laurie,
gramming languages and operating systems. Our ongoSimon W. Moore, Steven J. Murdoch,
ing research includes two prototype processors employMichael Roe:
ing the CHERI ISA, each implemented as an FPGA soft
core specified in the Bluespec hardware description lanCapability Hardware
guage (HDL), for which we have integrated the appliEnhanced RISC Instructions:
cation of formal methods to the Bluespec specifications
and the hardware-software implementation.
CHERI Instruction-set architecture
UCAM-CL-TR-851
April 2014, 131 pages, PDF
Abstract: This document describes the rapidly maturing
design for the Capability Hardware Enhanced RISC Instructions (CHERI) Instruction-Set Architecture (ISA),
which is being developed by SRI International and the
University of Cambridge. The document is intended to
capture our evolving architecture, as it is being refined,
tested, and formally analyzed. We have now reached
70% of the time for our research and development cycle.
CHERI is a hybrid capability-system architecture
that combines new processor primitives with the commodity 64-bit RISC ISA enabling software to efficiently implement fine-grained memory protection
and a hardware-software object-capability security
model. These extensions support incrementally adoptable, high-performance, formally based, programmerfriendly underpinnings for fine-grained software decomposition and compartmentalization, motivated by
and capable of enforcing the principle of least privilege.
The CHERI system architecture purposefully addresses
known performance and robustness gaps in commodity
ISAs that hinder the adoption of more secure programming models centered around the principle of least privilege. To this end, CHERI blends traditional paged virtual memory with a per-address-space capability model
Robert N.M. Watson, David Chisnall,
Brooks Davis, Wojciech Koszek,
Simon W. Moore, Steven J. Murdoch,
Peter G. Neumann, Jonathan Woodruff:
Capability Hardware
Enhanced RISC Instructions:
CHERI User’s guide
April 2014, 26 pages, PDF
Abstract: The CHERI User’s Guide documents the software environment for the Capability Hardware Enhanced RISC Instructions (CHERI) prototype developed by SRI International and the University of Cambridge. The User’s Guide is targeted at hardware and
software developers working with capability-enhanced
software. It describes the CheriBSD operating system,
a version of the FreeBSD operating system that has
been adapted to support userspace capability systems
via the CHERI ISA, and the CHERI Clang/LLVM compiler suite. It also describes the earlier Deimos demonstration microkernel.
209
UCAM-CL-TR-852
Robert N.M. Watson, Jonathan Woodruff,
David Chisnall, Brooks Davis,
Wojciech Koszek, A. Theodore Markettos,
Simon W. Moore, Steven J. Murdoch,
Peter G. Neumann, Robert Norton,
Michael Roe:
Bluespec Extensible RISC
Implementation: BERI Hardware
reference
April 2014, 76 pages, PDF
Abstract: The BERI Hardware Reference documents
the Bluespec Extensible RISC Implementation (BERI)
developed by SRI International and the University
of Cambridge. The reference is targeted at hardware
and software developers working with the BERI1 and
BERI2 processor prototypes in simulation and synthesized to FPGA targets. We describe how to use the
BERI1 and BERI2 processors in simulation, the BERI1
debug unit, the BERI unit-test suite, how to use BERI
with Altera FPGAs and Terasic DE4 boards, the 64bit MIPS and CHERI ISAs implemented by the prototypes, the BERI1 and BERI2 processor implementations themselves, and the BERI Programmable Interrupt Controller (PIC).
UCAM-CL-TR-853
Robert N.M. Watson, David Chisnall,
Brooks Davis, Wojciech Koszek,
Simon W. Moore, Steven J. Murdoch,
Peter G. Neumann, Jonathan Woodruff:
UCAM-CL-TR-855
Patrick K.A. Wollner, Isak Herman,
Haikal Pribadi, Leonardo Impett,
Alan F. Blackwell:
Bluespec Extensible RISC
Implementation: BERI Software
reference
Mephistophone
June 2014, 8 pages, PDF
April 2014, 34 pages, PDF
Abstract: The BERI Software Reference documents
how to build and use FreeBSD on the Bluespec Extensible RISC Implementation (BERI) developed by SRI International and the University of Cambridge. The reference is targeted at hardware and software programmers
who will work with BERI or BERI-derived systems.
UCAM-CL-TR-854
Dominic Orchard:
Programming contextual
computations
May 2014, 223 pages, PDF
PhD thesis (Jesus College, January 2013)
Abstract: Modern computer programs are executed in
a variety of different contexts: on servers, handheld devices, graphics cards, and across distributed environments, to name a few. Understanding a program’s contextual requirements is therefore vital for its correct execution. This dissertation studies contextual computations, ranging from application-level notions of context
to lower-level notions of context prevalent in common
programming tasks. It makes contributions in three areas: mathematically structuring contextual computations, analysing contextual program properties, and designing languages to facilitate contextual programming.
Firstly, existing work which mathematically structures contextual computations using comonads (in programming and semantics) is analysed and extended.
Comonads are shown to exhibit a shape preservation
property which restricts their applicability to a subset of contextual computations. Subsequently, novel
generalisations of comonads are developed, including
the notion of an indexed comonad, relaxing shapepreservation restrictions.
Secondly, a general class of static analyses called coeffect systems is introduced to describe the propagation
of contextual requirements throughout a program. Indexed comonads, with some additional structure, are
shown to provide a semantics for languages whose contextual properties are captured by a coeffect analysis.
Finally, language constructs are presented to ease
the programming of contextual computations. The benefits of these language features, the mathematical structuring, and coeffect systems are demonstrated by a language for container programming which guarantees optimisations and safety invariants.
Abstract: The scope of this project is the creation of
a controller for composition, performance and interaction with sound. Interactions can be classified to one
of three types: (i) end-user triggering, controlling, editing, and manipulation of sounds with varying temporal dimensions; (ii) inclusion of multi-sensor feedback
mechanisms including end-user biological monitoring;
and (iii) integration of sensed, semi-random, environmental factors as control parameters to the output of
the system.
The development of the device has been completed
in two stages: (i) conceptual scoping has defined the interaction space for the development of this machine;
(ii) prototype development has resulted in the creation
of a functioning prototype and culminated in a series of
live performances. The final stage presupposes a custom
210
interaction design for each artistic partner, reinforcing these from citations in passing, and show that they prothe conceptual role of the device as a novel mechanism vide statistically significant improvements over a rulefor personalized, visualizable, tangible interaction with based baseline.
sound.
UCAM-CL-TR-857
UCAM-CL-TR-856
Heidi Howard:
Awais Athar:
ARC: Analysis of Raft Consensus
Sentiment analysis of scientific
citations
July 2014, 69 pages, PDF
BA dissertation (Pembroke College, May 2014)
June 2014, 114 pages, PDF
PhD thesis (Girton College, April 2014)
Abstract: While there has been growing interest in the
field of sentiment analysis for different text genres in
the past few years, relatively less emphasis has been
placed on extraction of opinions from scientific literature, more specifically, citations. Citation sentiment detection is an attractive task as it can help researchers in
identifying shortcomings and detecting problems in a
particular approach, determining the quality of a paper
for ranking in citation indexes by including negative citations in the weighting scheme, and recognising issues
that have not been addressed as well as possible gaps in
current research approaches.
Current approaches assume that the sentiment
present in the citation sentence represents the true sentiment of the author towards the cited paper and do
not take further informal mentions of the citations elsewhere in the article into account. There have also been
no attempts to evaluate citation sentiment on a large
corpus.
This dissertation focuses on the detection of sentiment towards the citations in a scientific article. The
detection is performed using the textual information
from the article. I address three sub-tasks and present
new large corpora for each of the tasks.
Firstly, I explore different feature sets for detection
of sentiment in explicit citations. For this task, I present
a new annotated corpus of more than 8,700 citation
sentences which have been labelled as positive, negative or objective towards the cited paper. Experimenting with different feature sets, I show the best result of
micro-F score 0.760 is obtained using n-grams of length
and dependency relations.
Secondly, I show that the assumption that sentiment is limited only to the explicit citation is incorrect. I present a citation context corpus where more
than 200,000 sentences from 1,034 paper—reference
pairs have been annotated for sentiment. These sentences contain 1,741 citations towards 20 cited papers.
I show that including the citation context in the analysis increases the subjective sentiment by almost 185%.
I propose new features which help in extracting the citation context and examine their effect on sentiment
analysis.
Thirdly, I tackle the task of identifying significant
citations. I propose features which help discriminate
Abstract: The Paxos algorithm, despite being synonymous with distributed consensus for a decade, is famously difficult to reason about and implement due
to its non-intuitive approach and underspecification.
In response, this project implemented and evaluated a
framework for constructing fault-tolerant applications,
utilising the recently proposed Raft algorithm for distributed consensus. Constructing a simulation framework for our implementation enabled us to evaluate the
protocol on everything from understandability and efficiency to correctness and performance in diverse network environments. We propose a range of optimisations to the protocol and released to the community
a testbed for developing further optimisations and investigating optimal protocol parameters for real-world
deployments.
UCAM-CL-TR-858
Jonathan D. Woodruff:
CHERI: A RISC capability machine
for practical memory safety
July 2014, 112 pages, PDF
PhD thesis (Clare Hall, March 2014)
Abstract: This work presents CHERI, a practical extension of the 64-bit MIPS instruction set to support
capabilities for fine-grained memory protection.
Traditional paged memory protection has proved inadequate in the face of escalating security threats and
proposed solutions include fine-grained protection tables (Mondrian Memory Protection) and hardware fatpointer protection (Hardbound). These have emphasised transparent protection for C executables but have
lacked flexibility and practicality. Intel’s recent memory
protection extensions (iMPX) attempt to adopt some
of these ideas and are flexible and optional but lack the
strict correctness of these proposals.
Capability addressing has been the classical solution to efficient and strong memory protection but it
has been thought to be incompatible with common instruction sets and also with modern program structure
which uses a flat memory space with global pointers.
CHERI is a fusion of capabilities with a paged flat
memory producing a program-managed fat pointer capability model. This protection mechanism scales from
211
application sandboxing to efficient byte-level memory
safety with per-pointer permissions. I present an extension to the 64-bit MIPS architecture on FPGA that runs
standard FreeBSD and supports self-segmenting applications in user space.
Unlike other recent proposals, the CHERI implementation is open-source and of sufficient quality to
support software development as well as community
extension of this work. I compare with published memory safety mechanisms and demonstrate competitive
performance while providing assurance and greater
flexibility with simpler hardware requirements.
UCAM-CL-TR-859
Lucian Carata, Oliver Chick, James Snee,
Ripduman Sohan, Andrew Rice,
Andy Hopper:
Resourceful: fine-grained resource
accounting for explaining service
variability
September 2014, 12 pages, PDF
Abstract: Increasing server utilization in modern datacenters also increases the likelihood of contention on
physical resources and unexpected behavior due to
side-effects from interfering applications. Existing resource accounting mechanisms are too coarse-grained
for allowing services to track the causes of such variations in their execution. We make the case for measuring resource consumption at system-call level and outline the design of Resourceful, a system that offers applications the ability of querying this data at runtime
with low overhead, accounting for costs incurred both
synchronously and asynchronously after a given call.
212

Similar documents

×

Report this document