Sunday, March 12, 2017

What PostgreSQL has over other open source SQL databases

You may be asking yourself "Why PostgreSQL?" There are several choices for open source relational databases out there (we looked at MySQL, MariaDB and Firebird for this article), but what does PostgreSQL have that they don't? PostgreSQL's tag line claims that it's: "The world's most advanced open source database." We'll give you a few reasons why PostgreSQL makes this claim.
In Part I of this series, we'll look at storing data - the model, structures, types and size limits. In Part II, we'll focus more on data manipulation and retrieval.
Data model
PostgreSQL isn't just relational, it's object-relational. This gives it some advantages over other open source SQL databases like MySQL, MariaDB and Firebird.
A fundamental characteristic of an object-relational database is support for user-defined objects and their behaviors including data types, functions, operators, domains and indexes. This makes PostgreSQL extremely flexible and robust. Among other things, complex data structures can be created, stored and retrieved. In some of the examples below you'll see nested and composite structures which standard RDBMS' don't support.
Data types and structures
There's an extensive list of data types that PostgreSQL supports. Besides the numeric, floating-point, string, boolean and date types you'd expect (and many options within these), PostgreSQL boasts uuid, monetary, enumerated, geometric, binary, network address, bit string, text search, xml, json, array, composite and range types, as well as some internal types for object identification and log location. To be fair, MySQL, MariaDB and Firebird each have some of these to varying degrees, but only PostgreSQL supports all of them.
Let's take a closer look at a couple of these:
Network addresses
PostgreSQL provides for storing different network address types. The CIDR (Classless Internet Domain Routing) data type follows the convention for IPv4 and IPv6 network addresses. Some examples for CIDR:
·         192.168.100.128/25
·         10.1.2.3/32
·         2001:4f8:3:ba:2e0:81ff:fe22:d1f1/128
·         ::ffff:1.2.3.0/128
Also available for network address storage is the INET data type, used for IPv4 and IPv6 hosts where the subnet is optional. The data type MACADDR can be used for storing MAC addresses for hardware identification such as 08-00-2b-01-02-03.
MySQL and MariaDB have some INET functions to convert network addresses, but do not provide data types for natively storing network addresses. Firebird also does not have network address types.
Multi-dimensional arrays
Because PostgreSQL is an object-relational database, arrays of values can be stored for most of the existing data types. Do this by appending square brackets to the data type specification for the column or by using the ARRAY expression. An array size can be specified, but is not required. Let's look at a holiday picnic menu for demonstrating the use of arrays:
 -- create a table where the values are arrays
CREATE TABLE holiday_picnic ( 
     holiday varchar(50) -- single value
     sandwich text[], -- array
     side text[] [], -- multi-dimensional array
     dessert text ARRAY, -- array
     beverage text ARRAY[4] -- array of 4 items
);

 -- insert array values into the table
INSERT INTO holiday_picnic VALUES 
     ('Labor Day',
     '{"roast beef","veggie","turkey"}',
     '{
        {"potato salad","green salad","macaroni salad"},
        {"chips","crackers"}
     }',
     '{"fruit cocktail","berry pie","ice cream"}',
     '{"soda","juice","beer","water"}'
     );
MySQL, MariaDB, and Firebird do not have this capability. To store an array of values like these in a tradiitional relational database, a separate table with a row for each of the array values can be created as a workaround.
Geometric data
Geo data is fast becoming a core requirement for many applications. PostgreSQL has long supported a variety of geometric data types such as points, lines, circles, and polygons. The PATH data type is one of these. A path consists of multiple points in a sequence and can be open (the beginning and end points are not connected) or closed (the beginning and end points are connected). Let's use a hiking trail example as a path. In this case, my hiking trail is a loop so my beginning and end points are connected and, therefore, my path is closed. Parentheses around the set of coordinates indicate a closed path whereas square brackets would indicate an open path.
 -- create a table for trails
CREATE TABLE trails ( 
     trail_name varchar(250),
     trail_path path
);

 -- insert a trail into the table
 -- where the path is defined by lat-long coordinates
INSERT INTO trails VALUES 
     ('Dool Trail - Creeping Forest Trail Loop',
     ((37.172,-122.22261666667),
     (37.171616666667,-122.22385),
     (37.1735,-122.2236),
     (37.175416666667,-122.223),
     (37.1758,-122.22378333333),
     (37.179466666667,-122.22866666667),
     (37.18395,-122.22675),
     (37.180783333333,-122.22466666667),
     (37.176116666667,-122.2222),
     (37.1753,-122.22293333333),
     (37.173116666667,-122.22281666667)));
The PostGIS extension available for PostgreSQL augments the existing geometric data featues with additional spatial types, functions, operators and indexes. It's location aware and supports both raster and vector data. It also provides for interoperability with a variety of 3rd party open source and proprietary geo-spatial tools for working with, mapping and rendering the data. We rolled PostGIS out for Compose PostgreSQL deployments in January this year: PostGIS for All Compose PostgreSQL Deployments.
Note that in MySQL 5.7.8 and since MariaDB 5.3.3, data type extensions were added for supporting the OpenGIS standard for geographic information. That version of MySQL and subsequent MariaDB versions offer similar data type storage to the out-of-the-box PostgreSQL geometric data types. However, in MySQL and MariaDB, data values must first be converted to geometric format using simple commands before being inserted into tables. Firebird does not currently provide for geo data types.
JSON support
PostgreSQL's JSON support lets you go schema-less in a SQL database. This can be useful when the data structure requires some flexibility because it is still changing in development or when it is unknown which data fields the data object will contain.
The JSON data type enforces valid JSON which allows you to then make use of the specialized JSON operators and functions built into PostgreSQL for querying and manipulating the data. Also available is the JSONB type - a binary form of JSON where white spaces are removed, object order is not preserved but is instead stored optimally, and only the last value for duplicate keys is retained. JSONB is usually the preferred format since it requires less space per object, can be indexed, and can be processed faster since it does not require re-parsing. To learn more, check out: Is PostgreSQL Your Next JSON Database?
In MySQL 5.7.8 and MariaDB 10.0.1, support for native JSON objects was introduced. While there are a variety of functions and operators for JSON that are now available in these databases, they are not indexable the way JSONB is in PostgreSQL. Firebird hasn't joined the club yet and only supports JSON objects as text.
Create a new type
And, as if PostgreSQL's extensive list of existing data types wasn't enough, you can use the CREATE TYPE command to create new data types as composite, enumerated, range and base. Here's an example of creating and querying a new composite type:
 -- create a new composite type called "wine"
CREATE TYPE wine AS ( 
     wine_vineyard varchar(50),
     wine_type varchar(50),
     wine_year int
);

 -- create a table that uses the composite type "wine"
CREATE TABLE pairings ( 
     menu_entree varchar(50),
     wine_pairing wine
);

 -- insert data into the table using the ROW expression
INSERT INTO pairings VALUES 
     ('Lobster Tail',ROW('Stag''s Leap','Chardonnay', 2012)),
     ('Elk Medallions',ROW('Rombauer','Cabernet Sauvignon',2012));

 /*
   query from the table using the table column name
   (use parentheses followed by a period
   then the name of the field from the composite type)
 */
SELECT (wine_pairing).wine_vineyard, (wine_pairing).wine_type 
FROM pairings 
WHERE menu_entree = 'Elk Medallions'; 
Because they are not object-relational, MySQL, MariaDB, and Firebird do not provide this powerful functionality.
Data size
PostgreSQL can handle a lot of data. The current posted size limits are listed below:
Limit
Value
Maximum Database Size
Unlimited
Maximum Table Size
32 TB
Maximum Row Size
1.6 TB
Maximum Field Size
1 GB
Maximum Rows per Table
Unlimited
Maximum Columns per Table
250 - 1600 depending on column types
Maximum Indexes per Table
Unlimited
At Compose we auto-scale your deployment so you don't have to worry when your data grows. But, as every DBA knows, it's good to be wary about extremely large and unlimited capacities. We recommend you let common sense be your guide when creating your tables and performing indexing.
By comparison, MySQL and MariaDB are notorious for their 65,535 byte row size limit. Firebird also only claims a 64KB maximum row size. Typically the data size is limited by the operating system file size limit. Because PostgreSQL can store table data in multiple smaller files, it can get around this limitation - though, it is important to note that too many files may negatively impact performance. MySQL and MariaDB do, however, support more columns per table (up to 4,096 depending on the data type) and larger individual table sizes than PostgreSQL, but it is in rare conditions that the existing PostgreSQL limits would need to be exceeded.
Data integrity
PostgreSQL decidedly strives to conform to the ANSI-SQL:2008 standard, is fully ACID (Atomicity, Consistency, Isolation and Durability) compliant, and is well-known for its rock-solid referential and transactional integrity. Primary keys, restricting and cascading foreign keys, unique constraints, not null constraints, check constraints and other data integrity features ensure only validated data is stored.
MySQL and MariaDB are doing more to be SQL standard compliant with the InnoDB/XtraDB storage engines. They now offer a STRICT option using SQL modes, which determines the data validation checks that get used; however, depending on the mode you use, invalid and sometimes silently-truncated data can be inserted or created on update. Neither of these databases currently supports check constraints and there are also a host of caveats for foreign key constraints. Additionally, data integrity may suffer significantly depending on the storage engine selected. MySQL (and the MariaDB fork) has made no secret that they have long made tradeoffs for speed and efficiencyover integrity and compliance.
Summing up
PostgreSQL has a lot of capability. Built using an object-relational model, it supports complex structures and a breadth of built-in and user-defined data types. It provides extensive data capacity and is trusted for its data integrity. You may not need all of the advanced features we've reviewed here for storing data, but since data needs can evolve quickly, there is undoubtedly clear benefit to having it all at your fingertips.
If PostgreSQL feels out of proportion for your needs or you prefer to shoot more from the hip, then take a look at the NoSQL databases we offer at Compose or consider the other open source SQL databases we mentioned. They each have their own strengths. Compose firmly believes in choosing the right database for the task at hand... and sometimes that means a multiple-database solution!
Ready for more PostgreSQL? In Part II of this series, we'll look at data manipulation and retrieval in PostgreSQL, including virtual table features, query capabilities, indexing and language extensions.


No comments:

Post a Comment