Database System Concepts, 5th Ed

Document technical information

Format ppt
Size 832.0 kB
First found May 22, 2018

Document content analysis

Category Also themed
Language
English
Type
not defined
Concepts
no text concepts found

Persons

Organizations

Places

Transcript

Chapter 11: Storage and File Structure
Database System Concepts, 5th Ed.
Chapter 11: Storage and File Structure
 Overview of Physical Storage Media
 Storage Access
 File Organization
 Organization of Records in Files
 Data-Dictionary Storage
Database System Concepts - 5th Edition, Oct 23, 2005.
11.2
©Silberschatz, Korth and Sudarshan
Overview of Physical Storage Media 物理存储
介质
 Magnetic Disks
 RAID
磁盘
磁盘阵列
 Tertiary Storage
第三存储
 Storage Access
存取 **
 File Organization 文件结构 **
 Organization of Records in Files 记录结构 **
 Data-Dictionary Storage
数据字典 **
 Storage Structures for Object-Oriented Databases
Database System Concepts - 5th Edition, Oct 23, 2005.
11.3
©Silberschatz, Korth and Sudarshan
Classification of Physical Storage
Media
 Speed with which data can be accessed
 Cost per unit of data
 Reliability

data loss on power failure or system crash

physical failure of the storage device
 Can differentiate storage into:

volatile storage: loses contents when power is
switched off 易逝,挥发 RAM

non-volatile storage:

Contents persist even when power is
switched off.

Includes secondary and tertiary storage, as
well as batter-backed up main-memory. 磁盘,
光盘,磁带,闪存
Database System Concepts - 5th Edition, Oct 23, 2005.
11.4
©Silberschatz, Korth and Sudarshan
Physical Storage Media
 Cache – fastest and most costly form of storage; volatile; managed
by the computer system hardware. 快, 易逝, 硬件
•高速高价
•低速低价
•设备
•设备
•多种 Cache:
•CPU-与内存
•内存与磁盘
•OS与DBMS
Database System Concepts - 5th Edition, Oct 23, 2005.
11.5
©Silberschatz, Korth and Sudarshan
Physical Storage Media
 Main memory: 主存储

fast access (10s to 100s of nanoseconds; 1 nanosecond = 10–9
seconds)
10纳秒

generally too small (or too expensive) to store the entire
database 不能存整个DB
 capacities
of up to a few Gigabytes widely used currently
 Capacities
have gone up and per-byte costs have decreased
steadily and rapidly (roughly factor of 2 every 2 to 3 years)

Volatile — contents of main memory are usually lost if a power
failure or system crash occurs.易逝,
Database System Concepts - 5th Edition, Oct 23, 2005.
11.6
©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
 Flash memory 闪存

Data survives power failure 非易逝

Data can be written at a location only once, but location can be
erased and written to again 可重写
 有限次读写
目前质量: 1-10万次,够用了

Reads are roughly as fast as main memory 读快

写稍微慢

贵,

用于嵌入式设备,数字相机 MP3

also known as EEPROM (Electrically Erasable Programmable
Read-Only Memory)
Database System Concepts - 5th Edition, Oct 23, 2005.
11.7
©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
磁盘
Primary, long-term , entire database.
 slower than memory
direct-access –
Hard disks vs floppy disks
100 GB currently
 Magnetic-disk




 Survives
power failures and system crashes
 disk failure can destroy data, but is very rare
Database System Concepts - 5th Edition, Oct 23, 2005.
11.8
©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
 Optical storage 光存储

non-volatile, data is read optically from a spinning
disk using a laser 非易逝

CD-ROM (640 MB) and DVD (4.7 to 17 GB) most
popular forms

Write-one, read-many (WORM) optical disks used
for archival storage (CD-R and DVD-R)

Multiple write versions also available (CD-RW,
DVD-RW, and DVD-RAM)

slower than with magnetic disk

Juke-box systems, 多个可移动盘
Database System Concepts - 5th Edition, Oct 23, 2005.
11.9
©Silberschatz, Korth and Sudarshan
Physical Storage Media (Cont.)
 Tape storage
磁带

non-volatile, used primarily for backup (to recover from disk
failure), and for archival data 非易逝

sequential-access – much slower than disk

40 to 300 GB tapes available 大容量

cheaper than disk, but drives are expensive

Tape jukeboxes available for storing massive amounts of data
海量磁带库
of terabytes (1 terabyte = 109 bytes) to even a
petabyte (1 petabyte = 1012 bytes)
 hundreds
Database System Concepts - 5th Edition, Oct 23, 2005.
11.10
©Silberschatz, Korth and Sudarshan
Storage Hierarchy
•各层
•存储速度
有数量级
的差别
•至少10
倍
慢介质以
廉价和容
量而顽固
地不肯退
出历史舞
台
Database System Concepts - 5th Edition, Oct 23, 2005.
11.11
©Silberschatz, Korth and Sudarshan
Storage Hierarchy (Cont.) 存储分级
 primary storage: 主存 ,快 ,易逝
 secondary storage: next level in hierarchy, non-volatile, moderately
fast access time

also called on-line storage

E.g. flash memory, magnetic disks
闪存,磁盘
 tertiary storage: lowest level in hierarchy, non-volatile, slow access
time 第3存储

also called off-line storage

E.g. magnetic tape, optical storage 磁带光盘
Database System Concepts - 5th Edition, Oct 23, 2005.
11.12
©Silberschatz, Korth and Sudarshan
Magnetic Hard Disk Mechanism
NOTE: Diagram is schematic, and simplifies the structure of actual disk drives
Database System Concepts - 5th Edition, Oct 23, 2005.
11.13
©Silberschatz, Korth and Sudarshan
11.5 Storage Access(存储访问)
 A database file is partitioned into fixed-length storage
units called blocks. Blocks are units of both storage
allocation and data transfer.
 Database system seeks to minimize the number of
block transfers between the disk and memory. We can
reduce the number of disk accesses by keeping as
many blocks as possible in main memory.
 Buffer – portion of main memory available to store
copies of disk blocks.
 Buffer manager – subsystem responsible for allocating
buffer space in main memory.
Database System Concepts - 5th Edition, Oct 23, 2005.
11.14
©Silberschatz, Korth and Sudarshan
11.5.1 Buffer Manager
 Programs call on the buffer manager when they need a
block from disk.
1.
If the block is already in the buffer, buffer manager
returns the address of the block in main memory
2.
If the block is not in the buffer, the buffer manager
1.
Allocates space in the buffer for the block
1. Replacing (throwing out) some other block, if
required, to make space for the new block.
2. Replaced block written back to disk only if it
was modified since the most recent time that
it was written to/fetched from the disk.
2.
Reads the block from the disk to the buffer, and
returns the address of the block in main
memory to requester.
Database System Concepts - 5th Edition, Oct 23, 2005.
11.15
©Silberschatz, Korth and Sudarshan
11.5.2 Buffer-Replacement Policies
 Most operating systems replace the block least recently used (LRU
strategy)
 Idea behind LRU – use past pattern of block references as a
predictor of future references
 Queries have well-defined access patterns (such as sequential
scans), and a database system can use the information in a user’s
query to predict future references

LRU can be a bad strategy for certain access patterns involving
repeated scans of data
 For
example: when computing the join of 2 relations r and s
by a nested loops
for each tuple tr of r do
for each tuple ts of s do
if the tuples tr and ts match …

Mixed strategy with hints on replacement strategy provided
by the query optimizer is preferable
Database System Concepts - 5th Edition, Oct 23, 2005.
11.16
©Silberschatz, Korth and Sudarshan
Buffer-Replacement Policies (Cont.)
 Pinned block(被钉住的块) – memory block that is not
allowed to be written back to disk.
 Most recently used (MRU) strategy – system must pin the
block currently being processed. After the final tuple of that
block has been processed, the block is unpinned, and it
becomes the most recently used block.
 Buffer manager can use statistical information regarding the
probability that a request will reference a particular relation
 Buffer managers also support forced output (强制写出)of
blocks for the purpose of recovery
Database System Concepts - 5th Edition, Oct 23, 2005.
11.17
©Silberschatz, Korth and Sudarshan
11.6 File Organization
 The database is stored as a collection of files. Each
file is a sequence of records. A record is a sequence
of fields.
 One approach:
assume
each
record size is fixed
file has records of one particular type only
different
files are used for different relations
This case is easiest to implement; will consider
variable length records later.
Database System Concepts - 5th Edition, Oct 23, 2005.
11.18
©Silberschatz, Korth and Sudarshan
11.6.1 Fixed-Length Records
 Simple approach:

Store record i starting from byte n  (i – 1), where n is the
size of each record.

Record access is simple but records may cross blocks
 Modification:
do not allow records to cross block
boundaries
 Deletion of record i:
alternatives:

move records i + 1, . . ., n
to i, . . . , n – 1

move record n to i

do not move records, but
link all free records on a
free list
Database System Concepts - 5th Edition, Oct 23, 2005.
11.19
©Silberschatz, Korth and Sudarshan
Free Lists(空闲列表)
 Store the address of the first deleted record in the file
header.
 Use this first record to store the address of the second
deleted record, and so on
 Can think of these stored addresses as pointers since they
“point” to the location of a record.
 More space efficient representation: reuse space for
normal attributes of free records to store pointers. (No
pointers stored in in-use records.)
Database System Concepts - 5th Edition, Oct 23, 2005.
11.20
©Silberschatz, Korth and Sudarshan
11.6.2 Variable-Length Records
 Variable-length records arise in database systems
in several ways:

Storage of multiple record types in a file.

Record types that allow variable lengths for one
or more fields.

Record types that allow repeating fields (used in
some older data models).
Database System Concepts - 5th Edition, Oct 23, 2005.
11.21
©Silberschatz, Korth and Sudarshan
Variable-Length Records: Slotted Page
Structure
变长记录 槽口 存页面结构
 Slotted page header contains: 文件头存:

number of record entries 记录个数

end of free space in the block 回收列表结束

location and size of each record每个记录起止

与 光盘目录的组织 有点相似
Database System Concepts - 5th Edition, Oct 23, 2005.
11.22
©Silberschatz, Korth and Sudarshan
 记录在页面内移动,头文件中信息相应修改(可多次写
的光盘是采用 另写目录)
 外界的指针应该指向 文件头中的固定位置
 而不应该指向具体的(可能会变的)记录位置
Database System Concepts - 5th Edition, Oct 23, 2005.
11.23
©Silberschatz, Korth and Sudarshan
Variable-Length Records (Cont.)
 定长 的实现
Fixed-length representation:

reserved space 保留空间

Pointers 指针
 保留空间 Reserved space – 暂时不用的空着
Database System Concepts - 5th Edition, Oct 23, 2005.
11.24
©Silberschatz, Korth and Sudarshan
Organization of Records in Files
记录组织 :堆,顺序,散列 ,聚簇
 Heap – a record can be placed anywhere in the file where there is space
无序,堆,插入策略:无缝堆后,有缝插针,
 Sequential – store records in sequential order, based on the value of the
search key of each record,有序
 Hashing – 影院分单双号Hash(X)= X mod 2,入座时搜索空间减半


宝光寺:数罗汉 .
Hash(Person,sex)=2*(RedomSeed+Age+1)+Sex
 clustering file organization 不同型记录可放一文件内,把相关记录(父子兄弟等
,经常同时检索的)放在同一个磁盘块, 减少磁盘IO,快
 注意 内存比磁盘快 10—100倍,减少读块 次数 是速度关键
Database System Concepts - 5th Edition, Oct 23, 2005.
11.25
©Silberschatz, Korth and Sudarshan
Sequential File Organization
 适用于 处理 序列性,历史性 数据的场合
 按某个或几个属性 排序 search-key
Database System Concepts - 5th Edition, Oct 23, 2005.
11.26
©Silberschatz, Korth and Sudarshan
Sequential File Organization (Cont.)
 删除 – use pointer chains
 插入 – 有缝插针 ,无缝 链溢 overflow block,更新指针
 与时俱进 重排序 reorganize time to time to restore
sequential order
无缝链溢
(溢出块)
Database System Concepts - 5th Edition, Oct 23, 2005.
11.27
©Silberschatz, Korth and Sudarshan
Multitable Clustering File Organization
Store several relations in one file using a multitable clustering
file organization
Database System Concepts - 5th Edition, Oct 23, 2005.
11.28
©Silberschatz, Korth and Sudarshan
Clustering File Organization 聚簇文件
 不同型但相关的数据放在一起 (旧式大家庭: 几代同堂 )
 E.g., clustering organization of customer and
depositor:
 作这两各关系连接时,快
 results in variable size records
Database System Concepts - 5th Edition, Oct 23, 2005.
11.29
©Silberschatz, Korth and Sudarshan
Data Dictionary Storage
数据字典存储
Data dictionary 或 system catalog
stores metadata: data about data, such as :
 Information about relations


names , 属性名,类型,长度 , views,
integrity constraints
 用户账号,口令
 关于DB 本身的统计数据
 物理组织 Physical file organization information
物理存储 (sequential/hash/…)
 Physical location of relation ,DISK ,Block
 Information about indices 索引 (Chapter 12)

Database System Concepts - 5th Edition, Oct 23, 2005.
11.30
©Silberschatz, Korth and Sudarshan
Chapter 11: Storage and File Structure 小结
 Overview of Physical Storage Media 物理存储介质
 Magnetic Disks
 RAID
磁盘
磁盘阵列
 Tertiary Storage
第三存储
 Storage Access
存取
 File Organization 文件结构
 Organization of Records in Files 记录结构
 Data-Dictionary Storage
Database System Concepts - 5th Edition, Oct 23, 2005.
数据字典
11.31
©Silberschatz, Korth and Sudarshan
×

Report this document