Craig S. Mullins |
|
April 2001 |
|
|
DB2 Table Space Options By Craig S. Mullins Although DB2 data is accessed at the table level, those skilled in DB2 database design and administration know that the actual data is actually stored in a structure known as a table space. Each table space correlates to one or more individual physical VSAM data sets that are used to house the actual DB2 data. When designing DB2 databases, DBAs can choose from three types of table spaces, each one useful in different circumstances. The three types of table spaces are:
In general, the predominant table space type to
use for most applications is the segmented table space. Segmented table spaces
provide a good combination of features that mix ease of use and setup with
performance and functionality. Many organizations adhere to standards stating
that new DB2 table spaces should be segmented table spaces unless a compelling
reason exits to choose one of the other table space types. You should consider
using the other types of DB2 table spaces in the following cases:
Partitioning Considerations DB2 can handle up to 254
partitions per table space. The actual limit on number of partitions depends on
the DSSIZE of the table space. Large table spaces are those which specify the
LARGE parameter or have a DSSIZE greater than 4GB. The LARGE parameter was
introduced with V5; DSSIZE with V6. A large table space can have from 1 to 254
partitions. Non-large table spaces are limited to no more than 64 partitions, as
are any table spaces created in a version prior to DB2 V5. For non-LARGE partitioned table
spaces, the number of partitions impacts the maximum size of the data set
partition as follows: Number of Partitions
Maximum Data Set Size 1 to 16
4 GB 17 to 32
2 GB 33 to 64
1 GB Keep these limitations in mind
as you design your partitioned table spaces. As a general rule of thumb try
to define table space partitions such that no one partition is more than 20
percent larger than the next largest partition. This provides even growth, which
eases DASD monitoring and provides approximately even data access requirements
and utility processing times across partitions. This is not a hard-and-fast rule
though, especially when dealing with “hot spots.” The “hot spot”
partition may be much smaller than the other partitions going against the idea
of maintaining evenly distributed partitions. This is okay. Deciding
to use a partitioned table space is not as simple as merely determining the size
of the table. In the early days of DB2, size was the primary consideration for
choosing a partitioned table space. However, as DB2 has matured and the
applications written using DB2 have become modernized, additional considerations
will impact your partitioning decisions. Application-level details, such as data
contention, performance requirements, degree of parallelism, and the volume of
updates to columns in the partitioning index must factor into the decision to
use partitioned table spaces. Sometimes designers try to
avoid partitioned table spaces by dividing a table into multiple tables, each
with its own table space. This is not wise. Never attempt to avoid a partitioned
table space by implementing several smaller table spaces, each containing a
subset of the total amount of data. When proceeding in this manner, the designer
usually places separate tables into each of the smaller table spaces. This
almost always is a bad design decision because it introduces an uncontrolled and
unneeded denormalization. Furthermore, when data that logically belongs in one
table is separated into multiple tables, SQL operations to access the data as a
logical whole are made needlessly complex. One example of this complexity is the
difficulty in enforcing unique keys across multiple tables. Although partitioned
table spaces can introduce additional complexities into your environment, these
complexities never outweigh those introduced by mimicking partitioning with
several smaller, identical table spaces. To clarify why this idea is bad,
consider these two different ways of implementing a three “partition”
solution:
CREATE DB DB_SAMP; CREATE
TABLESPACE TS_SAMP IN DB_SAMP
ERASE NO NUMPARTS 3
(PART 1
USING STOGROUP SG_SAMP1
PRIQTY 2000 SECQTY 50
COMPRESS NO,
PART 2
USING STOGROUP SG_SAMP2
PRIQTY 4000 SECQTY 150
COMPRESS YES,
PART 3
USING STOGROUP SG_SAMP3
PRIQTY 1000
SECQTY 50
COMPRESS YES)
LOCKSIZE PAGE BUFFERPOOL BP1 CLOSE NO; CREATE TABLE TB_SAMP . . . IN
DB_SAMP.TS_SAMP;
CREATE DB DB_SAMP; CREATE
TABLESPACE TS_SAMP1 IN DB_SAMP
USING STOGROUP SG_SAMP1
PRIQTY 2000 SECQTY 50
ERASE NO COMPRESS NO
LOCKSIZE PAGE BUFFERPOOL BP1 CLOSE NO; CREATE
TABLESPACE TS_SAMP2 IN DB_SAMP
USING STOGROUP SG_SAMP2
PRIQTY 4000 SECQTY 150
ERASE NO COMPRESS YES
LOCKSIZE PAGE BUFFERPOOL BP1 CLOSE NO; CREATE
TABLESPACE TS_SAMP3 IN DB_SAMP
USING STOGROUP SG_SAMP3
PRIQTY 1000
SECQTY 50
ERASE NO COMPRESS YES
LOCKSIZE PAGE BUFFERPOOL BP1 CLOSE NO; CREATE TABLE TB_SAMP1 . . . IN
DB_SAMP.TS_SAMP1; CREATE TABLE TB_SAMP2 . . . IN
DB_SAMP.TS_SAMP2; CREATE TABLE TB_SAMP3 . . . IN
DB_SAMP.TS_SAMP3; Now
consider how difficult it would be to retrieve data in the second implementation
if you did not know which “partition” the data resides in, or if the data
could reside in multiple partitions. Using
the first example a simple SELECT will work. SELECT
* FROM
TB_SAMP WHERE
COL1 = :HOST-VARIABLE; In
the second example, a UNION is required. SELECT
* FROM
TB_SAMP1 WHERE
COL1 = :HOST-VARIABLE UNION
ALL SELECT
* FROM
TB_SAMP2 WHERE
COL1 = :HOST-VARIABLE UNION
ALL SELECT
* FROM
TB_SAMP3 WHERE
COL1 = :HOST-VARIABLE; If
other tables need to be joined the “solution” becomes even more complex.
Likewise if data must be updated, inserted, or deleted and you do not know which
“partition” contains the impacted data. The bottom line: avoid bypassing DB2
partitioning using your own pseudo-partitions. Partitioning Pros and Cons Before
deciding to partition a table space, weigh the pros and cons. Consult the
following list of advantages and disadvantages before implementation: Advantages
of a partitioned table space:
Disadvantages
of a partitioned table space:
In
general, partitioned table spaces are becoming more useful. You might even want
to consider using partitioning for most table spaces (instead of segmented),
especially if parallelism is an issue. At least, consider partitioning table
spaces that are accessed in a read only manner by long-running batch programs.
Of course, very small table spaces are rarely viable candidates for
partitioning, even with DB2’s advanced I/O, CPU, and Sysplex parallelism
features. This is true because the smaller the amount of data to access, the
more difficult it is to break it into pieces large enough such that concurrent,
parallel processing will be helpful. When
using partitioned table spaces, try to place each partition of the same
partitioned table space on separate DASD volumes. Failure to do so can
negatively affect the performance of query parallelism performed against those
partitions. Disk drive head contention will occur because concurrent access is
being performed on separate partitions that co-exist on the same device. Of
course, with some of the newer storage devices, such as the ESS Shark hardware
from IBM, data set placement is a non-issue because of the way in which data is
physically stored on the device. Summary DB2
provides three different types of table spaces, each of which has its own
distinct set of advantages and disadvantages for use depending upon the
situation. As a DBA you should understand the implementation details of each
type of table space and be prepared to choose the right type of table space for
each situation.
From DB2 Update (Xephon) April 2001. © 2001 Craig S. Mullins, All rights reserved. |