Partition in Postgresql


Partitioning
PostgreSQL supports basic table partitioning. This section describes why and how to implement partitioning as part of your database design.
Overview
Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits:
Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory.
When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table.
Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design. ALTER TABLE NO INHERIT and DROP TABLE are both far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE.
Seldom-used data can be migrated to cheaper and slower storage media.
The benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server.
Currently, PostgreSQL supports partitioning via table inheritance. Each partition must be created as a child table of a single parent table. The parent table itself is normally empty; it exists just to represent the entire data set. You should be familiar with inheritance (see Section 5.8) before attempting to set up partitioning.

The following forms of partitioning can be implemented in PostgreSQL:
Range Partitioning
The table is partitioned into "ranges" defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example one might partition by date ranges, or by ranges of identifiers for particular business objects.
List Partitioning
The table is partitioned by explicitly listing which key values appear in each partition.


dozen partition tables
billion rows
200gb
1600 partitions
what table partitioning is and how it is implemented.
rules, constraints, inheritance and how it all comes together.
when and why to use triggers for partitioning
automated partition maintenance.
 Partition use for performance, scalability, manageability:-
performance
as table size grows, queries eventually slow down, even with indexing
data access patterns ca dictate only small portions of a table need to be queried against.
Manageability
allow data to be added and removed much easier.
maintenance command (vacuum, re-index, cluster) can focus on active data.
scalability
remove many hardware constraints (disk size, speed).
maintain performance, manageability over magnitudes more data.
History
before 8.1 manual partitioning through inheritance and or views
8.1 constraint exclusion
after 8.1 no inherits, more(multi-table indexes)

Types
Range partition
Defined by specific range of data, typically a timestamp
List partition
specific list of values, state codes or maybe userid%3


Difference between list vs range:

LIST
RANGE
Think hash, states
Think  dates, serial
Defined set of partitions
Ever increasing set of partition
Data per child grows
Data per child static
Static query patterns
Volatile query patterns
Low maintenance
Higher maintenance

Partition table
adds complexity, administration costs
number of rows, types of queries, data growth


Partition table in PostgreSQL is very easy to do, It involve inheritance concept and trigger of PostgreSQL. Here i provide a sample to demonstrate how to partition table in PostgreSQL.
Before proceed, please understand some basic concept like,er… better i provide a concept of partition “time” in a table.
“Mother” — (Child1, Child2, Child3,Child4, Child5)
MasterTable — (Child_01_2008, Child_02_2008, Child_03_2008, Child_04_2008,Child_05_2008)
We only insert, select, update and delete on MasterTable, all child tables are transparent to user.
For example when i insert a record into MasterTable which is at January 2008. Record will auto redirect (trigger) to child table (Child_01_2008). When user select a record from Master table, postgreSQL will automatically retrieve data from all child tables which inherited from MasterTable.

STEP 1)Create a simple table call “dinesh” , it only include 2 columns “hash” and “hashtime”
www.sakthidbtech.com=# CREATE TABLE dinesh
(
  hash bytea NOT NULL,
  hashtime TIMESTAMP WITHOUT TIME zone NOT NULL
);



STEP 2) Create 10 tables in different months and inherantence from main dinesh table.
www.sakthidbtech.com=# CREATE TABLE dinesh_y2017m01 (
CHECK ( hashtime >= DATE '2008-01-01' AND hashtime < DATE '2008-01-31' )
 ) INHERITS (dinesh);
 CREATE TABLE dinesh_y2017m02 (
CHECK ( hashtime >= DATE '2008-02-01' AND hashtime < DATE '2008-02-29' )
 ) INHERITS (dinesh);
CREATE TABLE dinesh_y2017m03 (
CHECK ( hashtime >= DATE '2008-03-01' AND hashtime < DATE '2008-03-31' )
 ) INHERITS (dinesh);
CREATE TABLE dinesh_y2017m04 (
CHECK ( hashtime >= DATE '2008-04-01' AND hashtime < DATE '2008-04-30' )
 ) INHERITS (dinesh);
CREATE TABLE dinesh_y2017m05 (
CHECK ( hashtime >= DATE '2008-05-01' AND hashtime < DATE '2008-05-31' )
) INHERITS (dinesh);
CREATE TABLE dinesh_y2017m06 (
CHECK ( hashtime >= DATE '2008-06-01' AND hashtime < DATE '2008-06-30' )
) INHERITS (dinesh);
CREATE TABLE dinesh_y2017m07 (
CHECK ( hashtime >= DATE '2008-07-01' AND hashtime < DATE '2008-07-31' )
) INHERITS (dinesh);
CREATE TABLE dinesh_y2017m08 (
CHECK ( hashtime >= DATE '2008-08-01' AND hashtime < DATE '2008-08-31' )
) INHERITS (dinesh);
CREATE TABLE dinesh_y2017m09 (
CHECK ( hashtime >= DATE '2008-09-01' AND hashtime < DATE '2008-09-30' )
) INHERITS (dinesh);
CREATE TABLE dinesh_y2017m010 (
CHECK ( hashtime >= DATE '2008-10-01' AND hashtime < DATE '2008-10-31' )
) INHERITS (dinesh);


STEP 3) Create primary key for each child tables


www.sakthidbtech.com=# ALTER TABLE dinesh_y2017m01 ADD CONSTRAINT dinesh_y2017m01_pkey PRIMARY KEY (hashtime, hash);
ALTER TABLE dinesh_y2017m02 ADD CONSTRAINT dinesh_y2017m02_pkey PRIMARY KEY (hashtime, hash);
ALTER TABLE dinesh_y2017m03 ADD CONSTRAINT dinesh_y2017m03_pkey PRIMARY KEY (hashtime, hash);
ALTER TABLE dinesh_y2017m04 ADD CONSTRAINT dinesh_y2017m04_pkey PRIMARY KEY (hashtime, hash);
ALTER TABLE dinesh_y2017m05 ADD CONSTRAINT dinesh_y2017m05_pkey PRIMARY KEY (hashtime, hash);
ALTER TABLE dinesh_y2017m06 ADD CONSTRAINT dinesh_y2017m06_pkey PRIMARY KEY (hashtime, hash);
ALTER TABLE dinesh_y2017m07 ADD CONSTRAINT dinesh_y2017m07_pkey PRIMARY KEY (hashtime, hash);
ALTER TABLE dinesh_y2017m08 ADD CONSTRAINT dinesh_y2017m08_pkey PRIMARY KEY (hashtime, hash);
ALTER TABLE dinesh_y2017m09 ADD CONSTRAINT dinesh_y2017m09_pkey PRIMARY KEY (hashtime, hash);
ALTER TABLE dinesh_y2017m010 ADD CONSTRAINT dinesh_y2017m010_pkey PRIMARY KEY (hashtime, hash);




STEP 4) Create an index for each child tables
www.sakthidbtech.com=# CREATE INDEX idx_dinesh_y2017m01_hashtime ON dinesh_y2017m01 (hashtime);
CREATE INDEX idx_dinesh_y2017m02_hashtime ON dinesh_y2017m02 (hashtime);
CREATE INDEX idx_dinesh_y2017m03_hashtime ON dinesh_y2017m03 (hashtime);
CREATE INDEX idx_dinesh_y2017m04_hashtime ON dinesh_y2017m04 (hashtime);
CREATE INDEX idx_dinesh_y2017m05_hashtime ON dinesh_y2017m05 (hashtime);
CREATE INDEX idx_dinesh_y2017m06_hashtime ON dinesh_y2017m06 (hashtime);
CREATE INDEX idx_dinesh_y2017m07_hashtime ON dinesh_y2017m07 (hashtime);
CREATE INDEX idx_dinesh_y2017m08_hashtime ON dinesh_y2017m08 (hashtime);
CREATE INDEX idx_dinesh_y2017m09_hashtime ON dinesh_y2017m09 (hashtime);
CREATE INDEX idx_dinesh_y2017m010_hashtime ON dinesh_y2017m010 (hashtime);


STEP 5) Create a trigger on mother table to redirect records into child tables.


www.sakthidbtech.com=# CREATE OR REPLACE FUNCTION dinesh_func_insert_trigger()
RETURNS TRIGGER AS $$
BEGIN
    IF ( NEW.hashtime >= DATE '2008-01-01' AND NEW.hashtime < DATE '2008-01-31' ) THEN
        INSERT INTO dinesh_y2017m01 VALUES (NEW.*);
    ELSIF ( NEW.hashtime >= DATE '2008-02-01' AND NEW.hashtime < DATE '2008-02-29' ) THEN
        INSERT INTO dinesh_y2017m02 VALUES (NEW.*);
    ELSIF ( NEW.hashtime >= DATE '2008-03-01' AND NEW.hashtime < DATE '2008-03-31' ) THEN
        INSERT INTO dinesh_y2017m03 VALUES (NEW.*);
    ELSIF ( NEW.hashtime >= DATE '2008-04-01' AND NEW.hashtime < DATE '2008-04-30' ) THEN
        INSERT INTO dinesh_y2017m04 VALUES (NEW.*);
    ELSIF ( NEW.hashtime >= DATE '2008-05-01' AND NEW.hashtime < DATE '2008-05-31' ) THEN
        INSERT INTO dinesh_y2017m05 VALUES (NEW.*);
    ELSIF ( NEW.hashtime >= DATE '2008-06-01' AND NEW.hashtime < DATE '2008-06-30' ) THEN
INSERT INTO dinesh_y2017m06 VALUES (NEW.*);
    ELSIF ( NEW.hashtime >= DATE '2008-07-01' AND NEW.hashtime < DATE '2008-07-31' ) THEN
INSERT INTO dinesh_y2017m07 VALUES (NEW.*);
    ELSIF ( NEW.hashtime >= DATE '2008-08-01' AND NEW.hashtime < DATE '2008-08-31' ) THEN
INSERT INTO dinesh_y2017m08 VALUES (NEW.*);
    ELSIF ( NEW.hashtime >= DATE '2008-09-01' AND NEW.hashtime < DATE '2008-09-30' ) THEN
INSERT INTO dinesh_y2017m09 VALUES (NEW.*);
    ELSIF ( NEW.hashtime >= DATE '2008-10-01' AND NEW.hashtime < DATE '2008-10-31' ) THEN
INSERT INTO dinesh_y2017m010 VALUES (NEW.*);
    ELSE
        RAISE EXCEPTION 'Date out of range.  Fix the measurement_insert_trigger() function!';
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

www.sakthidbtech.com=# CREATE TRIGGER trigger_dinesh_insert
    BEFORE INSERT ON dinesh
    FOR EACH ROW EXECUTE PROCEDURE dinesh_func_insert_trigger();



www.sakthidbtech.com=# explain select * from dinesh;
                                             QUERY PLAN                       
                   
--------------------------------------------------------------------------------
--------------------
 Result  (cost=0.00..216.00 rows=11601 width=40)
   ->  Append  (cost=0.00..216.00 rows=11601 width=40)
         ->  Seq Scan on dinesh  (cost=0.00..0.00 rows=1 width=40)
         ->  Seq Scan on dinesh_y2017m01 dinesh  (cost=0.00..21.60 r
ows=1160 width=40)
         ->  Seq Scan on dinesh_y2017m02 dinesh  (cost=0.00..21.60 r
ows=1160 width=40)
         ->  Seq Scan on dinesh_y2017m03 dinesh  (cost=0.00..21.60 r
ows=1160 width=40)
         ->  Seq Scan on dinesh_y2017m04 dinesh  (cost=0.00..21.60 r
ows=1160 width=40)
         ->  Seq Scan on dinesh_y2017m05 dinesh  (cost=0.00..21.60 r
ows=1160 width=40)
         ->  Seq Scan on dinesh_y2017m06 dinesh  (cost=0.00..21.60 r
ows=1160 width=40)
         ->  Seq Scan on dinesh_y2017m07 dinesh  (cost=0.00..21.60 r
ows=1160 width=40)
         ->  Seq Scan on dinesh_y2017m08 dinesh  (cost=0.00..21.60 r
ows=1160 width=40)
         ->  Seq Scan on dinesh_y2017m09 dinesh  (cost=0.00..21.60 r
ows=1160 width=40)
         ->  Seq Scan on dinesh_y2017m010 dinesh  (cost=0.00..21.60
rows=1160 width=40)
(13 rows)

managing partitions rules
managing a large number of partitions leads to a large number of rules
rules require a lot of overhead, as each rule is processed.
rules can lead to side effects(functions getting called multiple times)
adding/removing/modifying rules requires exclusive locks.
as the number of rules increases, performance decreases.

we don't need to use rules for partition we also use the partitionin using triggers.

Managing partition triggers
better control of database actions
can achieve consistent performance regardless number of partitions
modifying a trigger function requires no special locking.
not without some caveats.



Advantage triggers
different variations for insert, update,delete.
work in tangent with create trigger.
compute partition from the date on the fly.
gives consistent performance regardless of the number of partions.
low maintenance.
slower than rules(but consistent)
quoting/null painful to deal with
different variations for insert, update, delete
works in tangent with create trigger.
need to add a condition check for each partition
need to handle non-matching rows
faster than using execute method.
no quoting/ null issues.
high maintenance





www.sakthidbtech.com=# select generate_series(1,12);                                         generate_series
-----------------
               1
               2
               3
               4
               5
               6
               7
               8
               9
              10
              11
              12
(12 rows)

www.sakthidbtech.com=# select '2012-01-01'::date+x *'1 month'::interval                     from generate_series(0,11)x;
      ?column?     
--------------------
 01-JAN-12 00:00:00
 01-FEB-12 00:00:00
 01-MAR-12 00:00:00
 01-APR-12 00:00:00
 01-MAY-12 00:00:00
 01-JUN-12 00:00:00
 01-JUL-12 00:00:00
 01-AUG-12 00:00:00
 01-SEP-12 00:00:00
 01-OCT-12 00:00:00
 01-NOV-12 00:00:00
 01-DEC-12 00:00:00
(12 rows)




unique problem
work around for partition key
constraint defines unique sets
index each partition
non partition key
index per partition
big implementation hole
custom functions
querying partitions
can be tricky to get right
must query on static values
postgreSQL's definition of static




boolean partition
www.sakthidbtech.com=# create table test_x (b boolean not null);
CREATE TABLE
www.sakthidbtech.com=# create table test_x1 (check(b is true)) inherits (test_x);
CREATE TABLE
www.sakthidbtech.com=# create table test_x0 (check(b is false)) inherits (test_x);
CREATE TABLE
www.sakthidbtech.com=# set constraint_exclusion=on;
SET
www.sakthidbtech.com=# explain select * from test_x where b is true;
                                  QUERY PLAN                               

-----------------------------------------------------------------------------
-
 Result  (cost=0.00..72.60 rows=2631 width=1)
   ->  Append  (cost=0.00..72.60 rows=2631 width=1)
         ->  Seq Scan on test_x  (cost=0.00..0.00 rows=1 width=1)
               Filter: (b IS TRUE)
         ->  Seq Scan on test_x1 test_x  (cost=0.00..36.30 rows=1315 width=1)
               Filter: (b IS TRUE)
         ->  Seq Scan on test_x0 test_x  (cost=0.00..36.30 rows=1315 width=1)
               Filter: (b IS TRUE)
(8 rows)

www.sakthidbtech.com=# explain select * from test_x where b is not true;
                                  QUERY PLAN                               

-----------------------------------------------------------------------------
-
 Result  (cost=0.00..72.60 rows=2631 width=1)
   ->  Append  (cost=0.00..72.60 rows=2631 width=1)
         ->  Seq Scan on test_x  (cost=0.00..0.00 rows=1 width=1)
               Filter: (b IS NOT TRUE)
         ->  Seq Scan on test_x1 test_x  (cost=0.00..36.30 rows=1315 width=1)
               Filter: (b IS NOT TRUE)
         ->  Seq Scan on test_x0 test_x  (cost=0.00..36.30 rows=1315 width=1)
               Filter: (b IS NOT TRUE)
(8 rows)

www.sakthidbtech.com=# explain select * from test_x where not b;
                                  QUERY PLAN                               

-----------------------------------------------------------------------------
-
 Result  (cost=0.00..72.60 rows=2631 width=1)
   ->  Append  (cost=0.00..72.60 rows=2631 width=1)
         ->  Seq Scan on test_x  (cost=0.00..0.00 rows=1 width=1)
               Filter: (NOT b)
         ->  Seq Scan on test_x1 test_x  (cost=0.00..36.30 rows=1315 width=1)
               Filter: (NOT b)
         ->  Seq Scan on test_x0 test_x  (cost=0.00..36.30 rows=1315 width=1)
               Filter: (NOT b)
(8 rows)


www.sakthidbtech.com=# create table test_a (b boolean);
CREATE TABLE
www.sakthidbtech.com=# create table test_a1 (check(not b)) inherits (test_a);
CREATE TABLE
www.sakthidbtech.com=# drop table test_a1;
DROP TABLE
www.sakthidbtech.com=# create table test_a0(check(not b)) inherits (test_a);
CREATE TABLE
www.sakthidbtech.com=# create table test_a1(check(b)) inherits (test_a);
CREATE TABLE
www.sakthidbtech.com=# explain select * from test_a where b is true;
                                  QUERY PLAN                               

-----------------------------------------------------------------------------
-
 Result  (cost=0.00..72.60 rows=2631 width=1)
   ->  Append  (cost=0.00..72.60 rows=2631 width=1)
         ->  Seq Scan on test_a  (cost=0.00..0.00 rows=1 width=1)
               Filter: (b IS TRUE)
         ->  Seq Scan on test_a0 test_a  (cost=0.00..36.30 rows=1315 width=1)
               Filter: (b IS TRUE)
         ->  Seq Scan on test_a1 test_a  (cost=0.00..36.30 rows=1315 width=1)
               Filter: (b IS TRUE)
(8 rows)

www.sakthidbtech.com=# explain select * from test_a where b;
                                  QUERY PLAN                               

-----------------------------------------------------------------------------
-
 Result  (cost=0.00..36.30 rows=1316 width=1)
   ->  Append  (cost=0.00..36.30 rows=1316 width=1)
         ->  Seq Scan on test_a  (cost=0.00..0.00 rows=1 width=1)
               Filter: b
         ->  Seq Scan on test_a1 test_a  (cost=0.00..36.30 rows=1315 width=1)
               Filter: b
(6 rows)


Comments

Popular Posts