How to work with timestamps (timestamp format) in PostgreSQL?

the Theme work with timestamps in PostgreSQL, poorly disclosed in Russian specialized publications on the Internet and serves as a frequent source of problems in the work of the programmers. I offer translation of material from Hubert Lubaczewski, author of the popular foreign blog depesz.com. I hope the article is useful to you!

image

From time to time in IRC or on a mailing list someone asks questions that show a deep misunderstanding (or lack of understanding) of time stamps, especially those that take into account time zones. Since I already experienced this before, let me tell you what are timestamps and how they work and what the most common snags you may encounter.

We have two types of data that we can use:

the
    the
  • timestamp
  • the
  • timestamp with time zone (or timestamptz)

Type timestamp contains only a date and time, no other information. On the one hand, it may seem that it is a good idea (after all, in everyday life we use only the date and time, isn't it?), but on the other — is the utter horror.

Let's assume you have timestamp “2014-04-04 20:00:00". What's it telling you? Unfortunately, not a lot. It all depends on what part of the planet involved. Eight in the evening of the 4th of April is a different time in Los Angeles, Chicago, London, Warsaw or Moscow. The problem of time zones.

Of course, you might think: "I'll always be in the same time zone, I don't have problems with support for different time zones. In my time zone the date and time will be enough to mark any point of time, because this is what we do in "real life".

But is it really?

Imagine that you have a label ‘2013-10-27 02:00:00', and you know that your app is linked to the Polish time. In this case, you are lucky because it can be 2am Central European summer time (CEST) or more hours, according to the usual Central European time. All due to the seasonal transfer hours.

I think that using timestamp without timezone is almost always a bug, and it needs to be corrected. Problem becomes even bigger if recording in your app come from different time zones (for example, a scheduling agent).

So the most obvious solution is to use timestamp with timezone (timestamptz).

First, it does not take more disk space:

the
$ select typname, typlen from pg_type where typname ~ '^timestamp';
typname | typlen 
-------------+--------
timestamp | 8
timestamptz | 8
(2 rows)

How does it work? Mark needs to know the time zone, so why it does not take up more space?

The fact that she doesn't know the time zone. Inside, all the values in the timestamptz columns are in UTC (coordinated universal time).

At UTC has nice features it does not have a bias (he is the starting point from which values are in other time zones), and no difference between summer and winter time. So any timestamp in UTC format always guaranteed indicates only one point in time.

But if all the time to indicate UTC, how do I know the time I need the time zone?

Every time we are talking about the timestamptz values, if the time zone is not specified, then PostgreSQL uses the pre-configured time. And you can configure it in different ways:

the
    the
  • setting the timezone in postgresql.conf
  • the
  • alter database ... set timezone = ‘...'
  • the
  • alter user ... set timezone = ‘...'
  • the
  • SET timezone = ‘...'

The first parameter is used to specify in which time zone is your server. In other words, the time zone by default that will be used in the absence of other changes.

The following are two ways to change the default for the selected database and the user.
The latter method can be used if you want your database connection worked with in other settings.

Note how this changes the output of now():

the
$ select now();
now 
-------------------------------
2014-04-04 20:32:59.390583+02
(1 row)

$ set timezone = 'America/New_York';
SET

$ select now();
now 
-------------------------------
2014-04-04 14:33:06.442768-04
(1 row)

So every time you view or change the values of timestamptz, PostgreSQL converts them to/from UTC.

This means that the values can easily be compared (they are all in the same time zone, no changes in summer or winter, so the comparison is always possible).

Now let's imagine that you have a value of ‘2014-04-04 20:00:00'. And you know that this time in Los Angeles, but you want to save it in its database, which operates in a different time zone. You can check what the current offset is -7 hours, and use the value thus:

the
$ select '2014-04-04 20:00:00-07'::timestamptz;
timestamptz 
------------------------
2014-04-05 05:00:00+02

What happened? Why I don't see 8 PM?

The reason is simple – in the query I have inserted a timestamp in any timezone. Inside, the label was converted to UTC, and then converted again (perhaps even without UTC, I'm not sure) in my usual time zone, which is:

the
$ show timezone;
TimeZone 
----------
Poland
(1 row)

If I had set time zone to Los Angeles, then the query result would be:

the
$ set timezone = 'America/Los_Angeles';
SET

$ select '2014-04-04 20:00:00-07'::timestamptz;
timestamptz 
------------------------
2014-04-04 20:00:00-07
(1 row)

It is important to understand that the displayed value always takes into account the time zone setting.

There is another way to get 20:00 in Los Angeles:

the
$ set timezone = 'Poland';
SET

$ select '2014-04-04 20:00:00'::timestamp at time zone 'America/Los_Angeles';
timezone 
------------------------
2014-04-05 05:00:00+02
(1 row)

It is important to add “::timestamp" with the value, otherwise we will get something strange:

the
$ set timezone = 'Poland';
SET

$ select '2014-04-04 20:00:00' at time zone 'America/Los_Angeles';
timezone 
---------------------
2014-04-04 11:00:00
(1 row)

What happened here? Where did the 11:00?

The value in quotes (2014-04-04 20:00:00) is perceived as timestamptz, which means 8 PM in my time zone:

the
select '2014-04-04 20:00:00'::timestamptz;
timestamptz 
------------------------
2014-04-04 20:00:00+02
(1 row)

And only after the translation of the value in my time zone, PG reads “at time zone ...", which is used to display the time in selected time zone.

Thus, the timestamp at time zone gives a timestamptz value, which shows the moment when the local time in the selected time zone was the same as was specified in the command.

And timestamptz at time zone produces timestamp value that indicates what time in the selected time zone at a specified point in time.

It sounds a little confusing, so let me give examples:

the
select '2014-04-04 20:00:00'::timestamptz at time zone 'UTC';
timezone 
---------------------
2014-04-04 18:00:00
(1 row)

select '2014-04-04 20:00:00'::timestamp at time zone 'UTC';
timezone 
------------------------
2014-04-04 22:00:00+02
(1 row)

Interestingly, we can use this to translate time from one time zone to another, even if Pg is not in any of them.

Suppose we want to know what time it is in Los Angeles, when in Moscow — 8am. My local time is as follows:

the
$ show timezone;
TimeZone 
----------
Poland
(1 row)

It is of little use.

First we need to determine the point in time (in the format of timestamptz), which shows 8 am in Moscow:

the
$ select '2014-04-04 08:00:00'::timestamp at time zone 'Europe/Moscow';
timezone 
------------------------
2014-04-04 06:00:00+02
(1 row)

This tells me that it is 6 am in my time zone. But we want to know the time in Los Angeles. I could write‘2014-04-04 06:00:00+02' in the time zone of ‘LA', but it can be done differently:

the
$ select ('2014-04-04 08:00:00'::timestamp at time zone 'Europe/Moscow') at time zone 'America/Los_Angeles';

---------------------
2014-04-03 21:00:00
(1 row)

Because the ‘timestamp at time zone .. 'is the same as timestamptz, we can use “at time zone" again to convert it back to timestamp (without time zone) related to some other place.

I hope now you clear. I took me some time to understand this issue, and finally understood everything :)

All this has one interesting side effect: it is not so easy to add indexes to functions with timestamptz. For example, you cannot create an index that will be used to obtain the day of the week:

the
$ create table test (i timestamptz);
CREATE TABLE

$ create index q on test (to_char(i, 'Day'));
ERROR: functions in index expression must be marked IMMUTABLE

As shown in the example above, the reason is very simple – one and the same point in time may relate to different days of the week depending on time zone. And since to_char() uses the current time zone, it might produce different values for the same source data depending on the time zone settings in the system:

the
$ set timezone = 'Europe/Warsaw';
SET

$ insert into test (i) values ('2014-04-04 06:00:00');
INSERT 0 1

$ select i, to_char(i, 'Day') from test;
i | to_char 
------------------------+-----------
2014-04-04 06:00:00+02 | Friday 
(1 row)

$ set timezone = 'Europe/Moscow';
SET

$ select i, to_char(i, 'Day') from test;
i | to_char 
------------------------+-----------
2014-04-04 08:00:00+04 | Friday 
(1 row)

$ set timezone = 'America/Los_Angeles';
SET

$ select i, to_char(i, 'Day') from test;
i | to_char 
------------------------+-----------
2014-04-03 21:00:00-07 | Thursday 
(1 row)

The same point in time, but different days. It may be different months or even different years, depending on where it was.

Timestamp (without time zone) here “is the” strong side as it has no time zone is specified, it can be safely used to extract information.

But we know how to translate timestamptz to timestamp. You just need to tell it the time zone. So we can try to do so:

the
create index q on test (to_char(i at time zone 'Poland', 'Day'));

But unfortunately, nothing comes out. The fact is that to_char is too versatile. You can use to_char like so:

the
$ select to_char(now(), 'TMMonth');
to_char 
---------
April
(1 row)

$ set lc_time = 'pl_PL.UTF-8';
SET

$ select to_char(now(), 'TMMonth');
to_char 
----------
Kwiecień
(1 row)

This time we get different results not because of the time zone and locale.

The correct solution to the problem of indexing would be to write your own function that will call to_char to brand the constant "environment", and then it is already indexed. Here it is:

the
create function day_from_ts(timestamptz) returns text as $$
select to_char( $1 at time zone 'Poland', 'Day' );
$$ language sql immutable;
CREATE FUNCTION

Now we can use it for indexing:

the
create index q on test (day_from_ts( i ));
CREATE INDEX

It is safe because the function itself causes timezone to be set to "Poland" and it calls to_char in such a way as to ignore the value of the locale (in other words, in the format of to_char does not have a prefix TM).

Of course, to use this index, all queries must also use this function:

the
select * from test where day_from_ts(i) = 'Friday';

Another important thing when working with time zones – getting Unix time, or the so-called era. Overall, it's just:

the
$ select extract(epoch from now());
date_part 
------------------
1396638868.57491
(1 row)

It is interesting that it does not depend on time zone:

the
$ begin;
BEGIN

$ show timezone;
TimeZone 
----------
Poland
(1 row)

$ select now(), extract(epoch from now());
now | date_part 
-------------------------------+------------------
2014-04-04 21:15:27.834775+02 | 1396638927.83477
(1 row)

$ set timezone = 'America/Los_Angeles';
SET

$ select now(), extract(epoch from now());
now | date_part 
-------------------------------+------------------
2014-04-04 12:15:27.834775-07 | 1396638927.83477
(1 row)

$ commit;
COMMIT

The reason is, not everybody knows, lies in the fact that Unix time is always the time zone of UTC. This means that, when you remove the epoch from timestamp timestamp, PG assumes it is in UTC. Which results in the following potential problems:
the
$ select now(), extract(epoch from now());
now | date_part 
-------------------------------+-----------------
2014-04-04 21:19:01.456205+02 | 1396639141.4562
(1 row)

$ select extract(epoch from '2014-04-04 21:19:01.456205'::timestamp);
date_part 
-----------------
1396646341.4562
(1 row)

In the first case, Pg receives a "point in time", which is internally converted to UTC (and when displayed it is converted to my time zone +2).

In the second case the timestamp is in my time zone, but it is assumed that it is UTC (no conversion!), the epoch is taken from ‘2014-04-04 21:19:01.456205 UTC', not‘2014-04-04 21:19:01.456205+02'.

Surprising.

In short, try to avoid use timestamp and timestamptz.

The last thing I would say is not a bug or a potential problem, but rather the functionality, which many do not know.

As you have seen, PostgreSQL uses a timestamp (and timestamptz) with an accuracy of microseconds. Many people insist that the accuracy was only to the second, although personally I don't like it.

And timestamp, and timestamptz (and other types of data, time-related) may have additional precision (“precision”).

Let me give a simple example:

the
$ select now(), now()::timestamptz(0), now()::timestamptz(1);
now | now | now 
-------------------------------+------------------------+--------------------------
2014-04-04 21:23:42.322315+02 | 2014-04-04 21:23:42+02 | 2014-04-04 21:23:42.3+02
(1 row)

Of course, you can use this in tables:

the
$ create table test (i timestamptz(0));
CREATE TABLE

$ insert into test(i) values (now());
INSERT 0 1

$ select * from test;
i 
------------------------
2014-04-04 21:24:16+02
(1 row)

Great! You don't need to change “now()" or anything else, just add precision to the data type, and she will adjust.

I mentioned that I don't like it. The reason is simple – in any reasonably loaded system, the second is a low level of accuracy. Moreover, the data storage accurate to microseconds costs me nothing but may be useful. On the other hand, if the data to microseconds, how can I do to make the values displayed without a fraction of a second?

It's simple: I use (SELECT queries) fouzia to_char () or date_trunc, or even cast to type timestamptz(0):

the
$ select now(),
to_char(now(), 'YYYY-MM-DD HH24:MI:SS TZ'),
date_trunc('second', now()),
now()::timestamptz(0);
now | to_char | date_trunc | now 
-------------------------------+--------------------------+------------------------+------------------------
2014-04-04 21:28:20.827763+02 | 2014-04-04 21:28:20 CEST| 2014-04-04 21:28:20+02 | 2014-04-04 21:28:21+02
(1 row)


learn More about how to work with timestamps, we are going to tell at the conference PG Day'16 Russia in July 2016! Prepare your questions, we will try to answer them.

Certainly, you can also discover useful article, written by colleagues from Mail.ru Group. It clearly shows an example, when only one is timestamptz not enough to solve the problem.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

When the basin is small, or it's time to choose VPS server

Performance comparison of hierarchical models, Django and PostgreSQL

From Tomsk to Silicon Valley and Back