Day 13: Partitioning and Sharding in PostgreSQL

In Day 13 of our PostgreSQL learning series, we’ll explore partitioning and sharding, two techniques used to improve database scalability and manageability. Partitioning involves dividing large tables into smaller, more manageable parts, while sharding distributes data across multiple database instances to horizontally scale out. Let’s delve into each topic with detailed explanations, examples, and commands.

Partitioning

Partitioning is a technique used to divide large tables into smaller, more manageable parts called partitions based on certain criteria such as ranges, lists, or hash values. PostgreSQL supports several types of partitioning methods:

  1. Range Partitioning:
    • In range partitioning, data is divided into partitions based on specified ranges of column values.
  2. List Partitioning:
    • List partitioning divides data into partitions based on specific lists of column values.
  3. Hash Partitioning:
    • Hash partitioning distributes data across partitions based on the hash value of a specified column.

Commands for Creating Partitioned Tables:

  1. Range Partitioning:
CREATE TABLE partitioned_table (
id SERIAL,
created_at TIMESTAMP,
data TEXT
) PARTITION BY RANGE (created_at);

CREATE TABLE partition_2022 PARTITION OF partitioned_table
FOR VALUES FROM ('2022-01-01') TO ('2023-01-01');
  1. List Partitioning:
CREATE TABLE partitioned_table (
id SERIAL,
region TEXT,
data TEXT
) PARTITION BY LIST (region);

CREATE TABLE partition_east PARTITION OF partitioned_table
FOR VALUES IN ('East');
  1. Hash Partitioning:
CREATE TABLE partitioned_table (
id SERIAL,
hash_key INT,
data TEXT
) PARTITION BY HASH (hash_key);

CREATE TABLE partition_1 PARTITION OF partitioned_table
FOR VALUES WITH (MODULUS 4, REMAINDER 0);

Dropping Partitions:

To drop a partition in PostgreSQL, you can use the DROP TABLE SQL command. However, dropping a partitioned table can be a bit tricky as it might impact the entire partition structure. Here’s how you can drop a single partition:

DROP TABLE IF EXISTS partition_name;

Replace partition_name with the name of the partition you want to drop.

Example:

DROP TABLE IF EXISTS partition_east;

Checking Partitions:

To check the existing partitions in a partitioned table, you can query the pg_partitions catalog table or use the \d+ command in psql.

SELECT * FROM pg_partitions WHERE schemaname = 'schema_name' AND tablename = 'table_name';

Replace schema_name with the schema of the partitioned table and table_name with the name of the partitioned table.

Example:

SELECT * FROM pg_partitions WHERE schemaname = 'public' AND tablename = 'partitioned_table';

Alternatively, you can use the \d+ command in psql to display information about the partitions of a table:

\d+ partitioned_table

Summary:

  • Dropping a partition involves using the DROP TABLE command to remove the partition from the database.
  • You can check existing partitions using queries against the pg_partitions catalog table or using the \d+ command in psql.
  • Partitioning divides large tables into smaller partitions for easier management and improved performance.
  • PostgreSQL supports range, list, and hash partitioning methods.
  • Sharding distributes data across multiple database instances to horizontally scale out.
  • Third-party tools like Citus can be used to implement sharding in PostgreSQL.

These commands provide essential functionality for managing partitions in PostgreSQL, allowing you to drop unwanted partitions and check the partition structure effectively.

Sharding

Sharding is a technique used to horizontally partition data across multiple database instances called shards. Each shard holds a portion of the dataset, and together they form a distributed database system. PostgreSQL does not have built-in support for sharding, but you can implement sharding using third-party tools like Citus or Pgpool-II.

Example of Sharding with Citus:

  1. Install Citus Extension:
sudo apt-get install postgresql-13-citus
  1. Initialize Citus:
CREATE EXTENSION citus;
  1. Create Distributed Table:
CREATE TABLE distributed_table (
id SERIAL,
data TEXT
);

SELECT create_distributed_table('distributed_table', 'id');
  1. Add Worker Nodes:
SELECT citus_add_node('worker_node_address', 5432);
  1. Insert Data into Distributed Table:
INSERT INTO distributed_table (data) VALUES ('example_data');

Summary:

  • Sharding distributes data across multiple database instances to horizontally scale out.
  • Third-party tools like Citus can be used to implement sharding in PostgreSQL.

2 thoughts on “Day 13: Partitioning and Sharding in PostgreSQL

  1. Pingback: Title: 21 Days of PostgreSQL Learning: A Comprehensive Guide | Smart way of Technology

  2. Pingback: Title: 21 Days of PostgreSQL Learning: A Comprehensive Guide | SmartTechWays – Innovative Solutions for Smart Businesses

Leave a Reply