You may be asking
yourself "Why PostgreSQL?" There are several choices for open source
relational databases out there (we looked at MySQL, MariaDB and Firebird for
this article), but what does PostgreSQL have that they don't? PostgreSQL's tag
line claims that it's: "The world's most advanced open source
database." We'll give you a few reasons why PostgreSQL makes this claim.
In Part I of this
series, we'll look at storing data - the model, structures, types and size
limits. In Part II, we'll focus more on data manipulation
and retrieval.
Data model
PostgreSQL isn't just
relational, it's object-relational. This gives it some advantages over other
open source SQL databases like MySQL, MariaDB and Firebird.
A fundamental
characteristic of an object-relational database is support for user-defined
objects and their behaviors including data types, functions, operators, domains
and indexes. This makes PostgreSQL extremely flexible and robust. Among other
things, complex data structures can be created, stored and retrieved. In some
of the examples below you'll see nested and composite structures which standard
RDBMS' don't support.
Data types and
structures
There's an extensive
list of data types that PostgreSQL supports. Besides the numeric,
floating-point, string, boolean and date types you'd expect (and many options
within these), PostgreSQL boasts uuid, monetary, enumerated, geometric, binary,
network address, bit string, text search, xml, json, array, composite and range
types, as well as some internal types for object identification and log
location. To be fair, MySQL, MariaDB and Firebird each have some of these to
varying degrees, but only PostgreSQL supports all of them.
Let's take a closer
look at a couple of these:
Network addresses
PostgreSQL provides
for storing different network address types. The CIDR (Classless Internet Domain Routing) data type follows the
convention for IPv4 and IPv6 network addresses. Some examples for CIDR:
·
192.168.100.128/25
·
10.1.2.3/32
·
2001:4f8:3:ba:2e0:81ff:fe22:d1f1/128
·
::ffff:1.2.3.0/128
Also available for
network address storage is the INET data
type, used for IPv4 and IPv6 hosts where the subnet is optional. The data type MACADDR can be used for storing MAC addresses for hardware
identification such as 08-00-2b-01-02-03.
MySQL and MariaDB have
some INET functions to convert network addresses, but do not provide data types
for natively storing network addresses. Firebird also does not have network
address types.
Multi-dimensional arrays
Because PostgreSQL is
an object-relational database, arrays of values can be stored for most of the
existing data types. Do this by appending square brackets to the data type
specification for the column or by using the ARRAY expression. An array size
can be specified, but is not required. Let's look at a holiday picnic menu for
demonstrating the use of arrays:
-- create a table where the values are arrays
CREATE TABLE holiday_picnic (
holiday varchar(50) --
single value
sandwich text[], -- array
side text[] [], --
multi-dimensional array
dessert text ARRAY, -- array
beverage text ARRAY[4] --
array of 4 items
);
-- insert array values into the table
INSERT INTO holiday_picnic VALUES
('Labor
Day',
'{"roast
beef","veggie","turkey"}',
'{
{"potato salad","green salad","macaroni salad"},
{"chips","crackers"}
}',
'{"fruit cocktail","berry
pie","ice cream"}',
'{"soda","juice","beer","water"}'
);
MySQL, MariaDB, and
Firebird do not have this capability. To store an array of values like these in
a tradiitional relational database, a separate table with a row for each of the
array values can be created as a workaround.
Geometric data
Geo data is fast
becoming a core requirement for many applications. PostgreSQL has long
supported a variety of geometric data types such as points, lines, circles, and
polygons. The PATH data type is one of these. A path
consists of multiple points in a sequence and can be open (the beginning and
end points are not connected) or closed (the beginning and end points are
connected). Let's use a hiking trail example as a path. In this case, my hiking
trail is a loop so my beginning and end points are connected and, therefore, my
path is closed. Parentheses around the set of coordinates indicate a closed
path whereas square brackets would indicate an open path.
-- create a table for trails
CREATE TABLE trails (
trail_name varchar(250),
trail_path path
);
-- insert a trail into the table
-- where the path is defined by lat-long coordinates
INSERT INTO trails VALUES
('Dool
Trail - Creeping Forest Trail Loop',
((37.172,-122.22261666667),
(37.171616666667,-122.22385),
(37.1735,-122.2236),
(37.175416666667,-122.223),
(37.1758,-122.22378333333),
(37.179466666667,-122.22866666667),
(37.18395,-122.22675),
(37.180783333333,-122.22466666667),
(37.176116666667,-122.2222),
(37.1753,-122.22293333333),
(37.173116666667,-122.22281666667)));
The PostGIS extension
available for PostgreSQL augments the existing geometric data featues with
additional spatial types, functions, operators and indexes. It's location aware
and supports both raster and vector data. It also provides for interoperability
with a variety of 3rd party open source and proprietary geo-spatial tools for
working with, mapping and rendering the data. We rolled PostGIS out for Compose
PostgreSQL deployments in January this year: PostGIS for All Compose PostgreSQL Deployments.
Note that in MySQL
5.7.8 and since MariaDB 5.3.3, data type extensions were added for supporting
the OpenGIS standard for geographic information. That version of MySQL and
subsequent MariaDB versions offer similar data type storage to the
out-of-the-box PostgreSQL geometric data types. However, in MySQL and MariaDB,
data values must first be converted to geometric format using simple commands
before being inserted into tables. Firebird does not currently provide for geo
data types.
JSON support
PostgreSQL's JSON
support lets you go schema-less in a SQL database. This can be useful when the
data structure requires some flexibility because it is still changing in development
or when it is unknown which data fields the data object will contain.
The JSON data type enforces valid JSON which allows you to then
make use of the specialized JSON operators and functions built into PostgreSQL
for querying and manipulating the data. Also available is the JSONB type - a binary form of JSON where white spaces are
removed, object order is not preserved but is instead stored optimally, and
only the last value for duplicate keys is retained. JSONB is usually the
preferred format since it requires less space per object, can be indexed, and
can be processed faster since it does not require re-parsing. To learn more,
check out: Is PostgreSQL Your Next JSON Database?
In MySQL 5.7.8 and
MariaDB 10.0.1, support for native JSON objects was introduced. While there are
a variety of functions and operators for JSON that are now available in these
databases, they are not indexable the way JSONB is in PostgreSQL. Firebird
hasn't joined the club yet and only supports JSON objects as text.
Create a new type
And, as if
PostgreSQL's extensive list of existing data types wasn't enough, you can use
the CREATE TYPE command to create new data types as
composite, enumerated, range and base. Here's an example of creating and
querying a new composite type:
-- create a new composite type called "wine"
CREATE TYPE wine AS (
wine_vineyard varchar(50),
wine_type varchar(50),
wine_year int
);
-- create a table that uses the composite type
"wine"
CREATE TABLE pairings (
menu_entree varchar(50),
wine_pairing wine
);
-- insert data into the table using the ROW expression
INSERT INTO pairings VALUES
('Lobster
Tail',ROW('Stag''s
Leap','Chardonnay', 2012)),
('Elk
Medallions',ROW('Rombauer','Cabernet Sauvignon',2012));
/*
query from the table using the table column
name
(use parentheses followed by a period
then the name of the field from the
composite type)
*/
SELECT (wine_pairing).wine_vineyard, (wine_pairing).wine_type
FROM pairings
WHERE menu_entree = 'Elk
Medallions';
Because they are not
object-relational, MySQL, MariaDB, and Firebird do not provide this powerful
functionality.
Data size
PostgreSQL can handle
a lot of data. The current posted size limits are listed below:
Limit
|
Value
|
Maximum Database
Size
|
Unlimited
|
Maximum Table Size
|
32 TB
|
Maximum Row Size
|
1.6 TB
|
Maximum Field Size
|
1 GB
|
Maximum Rows per
Table
|
Unlimited
|
Maximum Columns per
Table
|
250 - 1600 depending
on column types
|
Maximum Indexes per
Table
|
Unlimited
|
At Compose we auto-scale your deployment so you don't have
to worry when your data grows. But, as every DBA knows, it's good to be wary
about extremely large and unlimited capacities. We recommend you let common
sense be your guide when creating your tables and performing indexing.
By comparison, MySQL
and MariaDB are notorious for their 65,535 byte row size limit. Firebird also
only claims a 64KB maximum row size. Typically the data size is limited by the
operating system file size limit. Because PostgreSQL can store table data in
multiple smaller files, it can get around this limitation - though, it is
important to note that too many files may negatively impact performance. MySQL
and MariaDB do, however, support more columns per table (up to 4,096 depending
on the data type) and larger individual table sizes than PostgreSQL, but it is
in rare conditions that the existing PostgreSQL limits would need to be
exceeded.
Data integrity
PostgreSQL decidedly
strives to conform to the ANSI-SQL:2008 standard, is fully ACID (Atomicity,
Consistency, Isolation and Durability) compliant, and is well-known for its
rock-solid referential and transactional integrity. Primary keys, restricting
and cascading foreign keys, unique constraints, not null constraints, check
constraints and other data integrity features ensure only validated data is
stored.
MySQL and MariaDB are
doing more to be SQL standard compliant with the InnoDB/XtraDB storage engines.
They now offer a STRICT option using SQL modes, which determines
the data validation checks that get used; however, depending on the mode you
use, invalid and sometimes silently-truncated data can be inserted or created
on update. Neither of these databases currently supports check constraints and
there are also a host of caveats for foreign key constraints. Additionally,
data integrity may suffer significantly depending on the storage engine
selected. MySQL (and the MariaDB fork) has made no secret that they have long made tradeoffs for speed and efficiencyover
integrity and compliance.
Summing up
PostgreSQL has a lot
of capability. Built using an object-relational model, it supports complex
structures and a breadth of built-in and user-defined data types. It provides
extensive data capacity and is trusted for its data integrity. You may not need
all of the advanced features we've reviewed here for storing data, but since
data needs can evolve quickly, there is undoubtedly clear benefit to having it
all at your fingertips.
If PostgreSQL feels
out of proportion for your needs or you prefer to shoot more from the hip, then
take a look at the NoSQL databases we offer at Compose or
consider the other open source SQL databases we mentioned. They each have their
own strengths. Compose firmly believes in choosing the right database for the
task at hand... and sometimes that means a multiple-database solution!
Ready for more
PostgreSQL? In Part II of this series, we'll look at data
manipulation and retrieval in PostgreSQL, including virtual table features,
query capabilities, indexing and language extensions.
No comments:
Post a Comment