Maximising PostgreSQL Efficiency in Yugabyte: Best Practices for Distributed Databases

On this page

Introduction

PostgreSQL is a powerful open-source relational database management system that is widely used in various applications. However, when it comes to distributed SQL environments like Yugabyte, optimizing PostgreSQL performance can be a challenging task. Yugabyte is a distributed SQL database that provides the ability to scale and distribute data across multiple regions, making it a great choice for high-availability applications. In this blog post, we will discuss ten tips for optimizing PostgreSQL performance in Yugabyte distributed SQL environments, including best practices for query optimization, index creation, and more.

Define the primary key in the CREATE TABLE rather than ALTER TABLE

When creating a new table, it’s important to define the primary key during the initial CREATE TABLE statement rather than adding it later with an ALTER TABLE statement. Here’s an example of creating a table with a primary key:

CREATE TABLE employees (
  id SERIAL PRIMARY KEY,
  name TEXT,
  age INTEGER
);

This statement creates a new table called employees with columns id, name, and age. The id column is defined as a SERIAL type, which automatically generates a unique value for each new row. The PRIMARY KEY constraint is added to the id column, making it the primary key for the table.

HASH sharding is good to distribute when columns are used in point queries (!= range)

Hash sharding is a type of data partitioning where rows are distributed across nodes based on a hash function applied to a specific column in the table. This is useful when the column is frequently used in point queries, which means queries that filter by a specific value rather than a range of values. Here’s an example of creating a hash-partitioned table:

CREATE TABLE sales (
  id SERIAL,
  product_id INTEGER,
  sale_date DATE,
  amount NUMERIC,
  PRIMARY KEY (id, product_id)
)
PARTITION BY HASH (product_id);

This statement creates a new table called sales with columns id, product_id, sale_date, and amount. The id and product_id columns are defined as the primary key for the table, and the table is partitioned using the HASH method on the product_id column.

EXPLAIN ANALYZE is your way to understand why a query is slow

EXPLAIN ANALYZE is a PostgreSQL command that provides detailed information about the query plan and execution statistics for a given query. This is useful for understanding why a query is slow and how it can be optimized. Here’s an example of using EXPLAIN ANALYZE:

EXPLAIN ANALYZE SELECT * FROM sales WHERE product_id = 123;

This command will execute the query and provide detailed information about the query plan and execution statistics, including the estimated and actual row counts, the time spent on each operation, and the query’s total execution time.

Watch for Rows Removed by Filter with high amount of rows

When analyzing the EXPLAIN ANALYZE output, it’s important to watch for the “Rows Removed by Filter” statistic, which indicates how many rows were removed from the result set by a filter operation. If this number is high, it may indicate that the query is inefficient or that an index is missing. Here’s an example of the EXPLAIN ANALYZE output with a high number of rows removed by filter:

QUERY PLAN
----------------------------------------------------------
 Seq Scan on sales  (cost=0.00..1234.56 rows=1000000 width=100)
   Filter: (product_id = 123)
   Rows Removed by Filter: 999877

This output shows that the query scanned 1 million rows but only returned 123, and that 999,877 rows were removed by the filter operation.

Always enable b_enable_expression_pushdown

The b_enable_expression_pushdown configuration parameter in PostgreSQL enables pushdown of filter expressions to the storage engine. This can significantly improve query performance by reducing the amount of data that needs to be read from disk. To enable this setting, add the following line to your PostgreSQL configuration file:

set yb_enable_expression_pushdown=on;

Covering indexes are important in Distributed SQL (to avoid PCs)

In a distributed SQL environment, where queries may need to access data from multiple nodes, covering indexes can be very useful. A covering index is an index that includes all the columns needed to satisfy a query, so that the query can be satisfied entirely from the index without having to read any data from the table. This can significantly reduce the amount of network traffic and disk I/O needed to execute a query, which is especially important in a distributed system.
Here’s an example of creating a covering index:

CREATE INDEX idx_sales_product_id ON sales (product_id) INCLUDE (sale_date, amount);

This statement creates a new index called idx_sales_product_id on the product_id column of the sales table, and includes the sale_date and amount columns in the index as well. This makes the index a covering index for queries that only need to access these three columns.

Don’t SELECT * if you don’t need all the columns

When querying a table, it’s best to only select the columns that are needed for the query rather than using SELECT * to select all columns. This can help to reduce the amount of data that needs to be transmitted over the network and processed by the query engine. Here’s an example of a more selective SELECT statement:

SELECT id, name FROM employees WHERE age > 30;

This statement only selects the id and name columns from the employees table, and only returns rows where the age column is greater than 30.

Use Partial Index if you don’t need the index for all the rows

If an index is only needed for a subset of the rows in a table, it’s more efficient to create a partial index rather than a full index. A partial index is an index that includes only the rows that meet a certain condition, which can reduce the size of the index and improve query performance. Here’s an example of creating a partial index:

CREATE INDEX idx_sales_recent ON sales (sale_date) WHERE sale_date > '2022-01-01';

This statement creates a new index called idx_sales_recent on the sale_date column of the sales table, but only includes rows where the sale_date is after January 1, 2022.

Check the join order in the execution plan, provide the right indexes to filter early

When executing a query that involves multiple tables, it’s important to check the join order in the execution plan to ensure that the tables are joined in the most efficient order. Additionally, providing the right indexes can help to filter rows early in the query execution and reduce the amount of data that needs to be processed. Here’s an example of using an index to filter rows early in a query:

SELECT *
FROM sales
JOIN products ON sales.product_id = products.id
WHERE products.name = 'Widget'

In this example, the query joins the sales and products tables on the product_id and id columns, respectively. If there are many rows in the sales table, it can be more efficient to filter the rows early by adding an index on the product_id column of the sales table.

For complex query, pg_hint_plan may be used to define the execution plan:

For complex queries that involve many tables and conditions, it can be difficult to manually optimize the execution plan. In these cases, the pg_hint_plan extension for PostgreSQL can be used to provide hints to the query planner about how to optimize the query. Here’s an example of using pg_hint_plan:

-- Load the pg_hint_plan extension
CREATE EXTENSION pg_hint_plan;
-- Add a hint to the query
SELECT *
FROM sales
JOIN products ON sales.product_id = products.id
WHERE products.name = 'Widget'
/*+ JOIN(products) */;

In this example, the pg_hint_plan extension is loaded, and a hint is added to the query using the /*+ JOIN(table_name) */ syntax. This hint tells the query planner to use a nested loop join between the sales and products tables. Without the hint, the query planner may choose a different join method that is less efficient.

Summary

In summary, optimizing PostgreSQL performance in Yugabyte distributed SQL environments requires a deep understanding of the database and the underlying infrastructure. By following the tips outlined in this blog post, you can improve query performance, reduce latency, and increase scalability. These tips include creating the right indexes, avoiding unnecessary data retrieval, using partial indexes, and optimizing join order, among others. Additionally, by using tools like pg_hint_plan, you can optimize complex queries and improve the overall performance of PostgreSQL in Yugabyte. By implementing these best practices, you can ensure that your applications are running efficiently and effectively in a distributed SQL environment.

Reference

Sharding in YugabyteDB: Understanding Hash and Range Partitioning

https://docs.yugabyte.com/preview/explore/indexes-constraints/primary-key-ysql/
https://dev.to/yugabyte/rows-removed-by-index-recheck-on-yugabytedb-index-scan-38aj
https://dev.to/franckpachot/explain-analyze-dist-4nlc
https://docs.yugabyte.com/preview/explore/indexes-constraints/covering-index-ysql/
https://docs.yugabyte.com/preview/explore/query-1-performance/explain-analyze/
https://docs.yugabyte.com/preview/api/ycql/dml_select/
https://docs.yugabyte.com/preview/explore/indexes-constraints/partial-index-ycql/
https://docs.yugabyte.com/preview/explore/ysql-language-features/queries/
https://docs.yugabyte.com/preview/explore/query-1-performance/pg-hint-plan/

7 Comments

buy engaging instagram followers

June 25, 2023 / 9:34 am Reply

Thank you for the auspicious writeup. It
in fact was a amusement account it. Look advanced to more added agreeable
from you! By the way, how could we communicate?
googletest

January 22, 2024 / 7:45 pm Reply

It’s an remarkable paragraph in support of all the internet users; they
will take benefit from it I am sure.
Database Management Services

November 23, 2024 / 6:48 am Reply

We’re a group of volunteers and starting a new scheme in our community.
Your website provided us with valuable information to work on. You have done an impressive job and our entire community will
be thankful to you.
LamaRig

March 8, 2025 / 12:27 am Reply

I want to show you one exclusive program called (BTC PROFIT SEARCH AND MINING PHRASES), which can make you a rich man!

This program searches for Bitcoin wallets with a balance, and tries to find a secret phrase for them to get full access to the lost wallet!

Run the program and wait, and in order to increase your chances, install the program on all computers available to you, at work, with your friends, with your relatives, you can also ask your classmates to use the program, so your chances will increase tenfold!
Remember the more computers you use, the higher your chances of getting the treasure!

DOWNLOAD FOR FREE

Telegram:
https://t.me/btc_profit_search
Keithinvop

April 22, 2025 / 6:34 pm Reply

This cause well stocked with timely – [url=https://elevateright.com/delta-11-gummies/ ]This trick changed my life[/url] method has been cast-off by celebrities and influencers to slog away less and net more. Thousands of people are already turning $5 into $500 using this banned method the superintendence doesn’t longing you to conscious about.
Keithinvop

April 26, 2025 / 1:50 am Reply

This cause rich express – https://elevateright.com/product/delta-8-thc-vape-pen-pineapple-express/ method has been used past celebrities and influencers to utilize less and procure more. Thousands of people are already turning $5 into $500 using this banned method the control doesn’t want you to have knowledge of about.
Perpustakaan Digital

June 26, 2025 / 2:43 am Reply

What are the key challenges in optimizing PostgreSQL performance within distributed SQL environments like Yugabyte?
Regard IT Telkom