收录一些PostgreSQL日常使用的小命令和小配置,备查。


将sql结果导入csv文件

使用\copy命令将sql结果导入csv文件, postgresql命令行中输入以下命令:

1
\copy (Select * From foo) To '/tmp/test.csv' With CSV HEADER

Reference :http://stackoverflow.com/questions/1517635/save-pl-pgsql-output-from-postgresql-to-a-csv-file


将shapefile导入postgresql的步骤

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ createdb -U postgres shape_test
$ psql -U postgres shape_test -c "CREATE EXTENSION postgis;";
CREATE EXTENSION
$ shp2pgsql -s 4326 SAMPLE_mt_boundary_201510.shp > shape_file.sql
Shapefile type: Polygon
Postgis type: MULTIPOLYGON[2]
$ psql -U postgres shape_test < shape_file.sql
SET
SET
BEGIN
CREATE TABLE
ALTER TABLE
addgeometrycolumn
---------------------------------------------------------------------------
public.sample_mt_boundary_201510.geom SRID:4326 TYPE:MULTIPOLYGON DIMS:2
(1 row)
INSERT 0 1
INSERT 0 1
COMMIT
$

Reference :


pg_dump多表到一个sql文件中

可以使用多个-t 参数来dump多张表

1
$ pg_dump -U postgres db_name -t table1 -t table 2 > output_sql.sql


pg_dump只dump表结构

使用--schema-only-s参数。

1
$ pg_dump --schema-only


如何不dump表结构,只dump数据

可以使用--data-only来实现

1
$ pg_dump --data-only -U postgres db_name -t table1 > output_sql.sql

如果结合--column-inserts参数使用的话,那么就会为每条记录都生成一个insert语句

使用--column-inserts的缺点:

  1. sql文件会比较大
  2. 从sql恢复(restore)的时候也会很慢.
  3. 恢复的时候是单条记录恢复,一条记录错误不会导致整个table的恢复失败, 容易造成数据不完整。

但是适合于导出的sql会用于其他non-PostgreSQL的场景

1
$ pg_dump --column-inserts --data-only -U postgres db_name -t table1 > output_sql.sql

Reference : https://www.postgresql.org/docs/9.3/static/app-pgdump.html


在postgresql中建立一个只读用户

建立用户的命令如下:

1
CREATE USER intern2019 WITH LOGIN ENCRYPTED PASSWORD 'x!&LosA&@4' VALID UNTIL 'infinity';

对用户授权只读权限的命令如下:

1
2
3
GRANT USAGE ON SCHEMA public TO intern2019;
GRANT SELECT ON ALL TABLES IN SCHEMA public To intern2019; -- only effect by current table
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO intern2019; -- effect by all new table

删除某个用户

1
2
3
4
5
ALTER DEFAULT PRIVILEGES IN SCHEMA public revoke SELECT ON TABLES FROM intern2019
REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM intern2019;
REVOKE USAGE ON SCHEMA public FROM intern2019
DROP USER intern2019;

如何在group by中统计基于group by字段和其他条件的记录的总数

sum函数中,可以结合case when来统计符合某些条件的总数,在group by的时候,就可以顺带让Postgresql来检查计算符合条件的记录总数,比简单粗暴的子查询来的快速。

1
select mls_id, mls_name, SUM( CASE WHEN (trs_type='long term rentals' or trs_type='' or trs_type is null) THEN 1 ELSE 0 END) as rental_num, SUM( CASE WHEN (trs='for sale') THEN 1 ELSE 0 END) as for_sale_num from listings where active=true and validity=true group by mls_id, mls_name order by mls_id

Reference:


PG的时间函数

原字段加上interval '1 second'

1
2
3
4
5
6
7
postgres=# select NOW(), NOW()+interval '1 second';
now | ?column?
------------------------------+------------------------------
2017-01-09 14:49:38.52691+08 | 2017-01-09 14:49:39.52691+08
(1 row)
postgres=#

查看数据库中各个表上次vacuum和autovacuum时间的sql

1
2
3
4
5
6
7
8
partition_test=# select relname,last_vacuum, last_autovacuum, last_analyze, last_autoanalyze from pg_stat_user_tables;
relname | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze
----------+-------------------------------+-----------------+-------------------------------+------------------
user_new | 2017-05-10 09:30:06.156982+08 | | 2017-05-10 09:30:06.181667+08 |
users | 2017-05-10 09:29:51.52591+08 | | |
(2 rows)
partition_test=#

Postgis中比较geometry类型的值的方法

使用ST_Equals

1
select id, source, lat, lon, geom4326, geom, created_at, updated_at from properties where not ST_Equals(geom4326, ST_SetSRID(ST_MakePoint(lon, lat), 4326)) or not ST_Equals(geom, ST_SetSRID(ST_MakePoint(lon, lat), 4269)) limit 1;


PG中查看sql执行时间的方法

使用\timing来显示sql执行时间,再次输入\timing关闭显示时间

1
2
3
4
5
6
7
8
9
10
11
12
postgres=# \timing
Timing is on.
postgres=# select 1;
?column?
----------
1
(1 row)
Time: 0.214 ms
postgres=# \timing
Timing is off.
postgres=#


切割第一个字符串是不是数字的sql

1
select split_part('10059 Pine Glade Dr', ' ' ,1)~'^([0-9]+\.?[0-9]*|\.[0-9]+)$';

如何建索引时不锁表

Postgresql创建index时,默认是要锁表的。可以通过添加CONCURRENTLY参数,在不锁表的情况下建立索引。
但需要多次扫描,因此建索引时间会比较久

1
create index CONCURRENTLY index_feeds_on_take_id on feeds(take_id);


一些常用的psql命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
\l -- 列出所有数据库,数据库字符集等
\l+ -- 列出所有数据库, 也包含数据库size
\c DATABASE_NAME -- 使用某个数据库
\d -- 查看表列表,包含sequence
\dt -- 查看数据库中的表,不包含sequence
\d+ -- 查看表列表(带Size和Description)
\d TABLE_NAME -- 查看表结构
\d+ TABLE_NAME -- 查看表结构,更多信息,比如可显示子表等
\df -- 列出所有的function
\df+ FUNCTION_NAME -- 列出某个function的详细信息
\x -- 打开扩展显示
\q -- 退出
\dt *.* -- 查看当前数据库中所有schema中的表
\dv -- 列出当前数据库中所有的view

数据库磁盘容量命令

数据库中所有表磁盘空间的命令,包含各个表的大小,索引的大小
http://stackoverflow.com/questions/2596624/how-do-you-find-the-disk-size-of-a-postgres-postgresql-table-and-its-indexes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SELECT
table_name,
pg_size_pretty(table_size) AS table_size,
pg_size_pretty(indexes_size) AS indexes_size,
pg_size_pretty(total_size) AS total_size
FROM (
SELECT
table_name,
pg_table_size(table_name) AS table_size,
pg_indexes_size(table_name) AS indexes_size,
pg_total_relation_size(table_name) AS total_size
FROM (
SELECT ('"' || table_schema || '"."' || table_name || '"') AS table_name
FROM information_schema.tables
) AS all_tables
ORDER BY total_size DESC
) AS pretty_sizes


PG中重建索引

Postgresql中的索引是不停增长的,当数据被反复删除,更新后,索引文件会变得异常的巨大,称之为”bloated”。
使用reindex来重建索引,可以减少磁盘空间,提升索引效率。
但直接reindex重建索引会存在锁表现象。在PostgreSQL 12中才支持reindex的CONCURRENTLY参数。
12以下的版本中,只能通过手动新建和删除索引的方法来实现。方法如下:

  1. 以schema是public,tablename是’feed_records’为例,列出该表下所有索引以及其大小的sql如下:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    SELECT
    t.tablename,
    indexname,
    c.reltuples AS num_rows,
    pg_size_pretty(pg_relation_size(quote_ident(t.tablename)::text)) AS table_size,
    pg_size_pretty(pg_relation_size(quote_ident(indexrelname)::text)) AS index_size,
    CASE WHEN indisunique THEN 'Y'
    ELSE 'N'
    END AS UNIQUE,
    idx_scan AS number_of_scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
    FROM pg_tables t
    LEFT OUTER JOIN pg_class c ON t.tablename=c.relname
    LEFT OUTER JOIN
    ( SELECT c.relname AS ctablename, ipg.relname AS indexname, x.indnatts AS number_of_columns, idx_scan, idx_tup_read, idx_tup_fetch, indexrelname, indisunique FROM pg_index x
    JOIN pg_class c ON c.oid = x.indrelid
    JOIN pg_class ipg ON ipg.oid = x.indexrelid
    JOIN pg_stat_all_indexes psai ON x.indexrelid = psai.indexrelid )
    AS foo
    ON t.tablename = foo.ctablename
    WHERE t.schemaname='public' and t.tablename='feed_records'
    ORDER BY 1,2;
  2. 选一个负载小的时间,重建索引。步骤:

    • 新增一个同原索引内容相同的临时索引A’
    • 删除原来的索引A
    • 重新建原来的索引A
    • 删除新建的临时索引A’
      1
      2
      3
      4
      5
      6
      7
      以重建feed_records中last_update_time字段的索引为例
      \d feed_records;
      create index CONCURRENTLY xxx_index_feed_records_on_last_update_time on feed_records(last_update_time);
      drop index index_feed_records_on_last_update_time;
      create index CONCURRENTLY index_feed_records_on_last_update_time on feed_records(last_update_time);
      drop index xxx_index_feed_records_on_last_update_time;
      \d feed_records;

数据库导出和恢复

数据库导出和恢复命令

1
2
3
4
5
6
# 导出数据和表,不包含drop语句
pg_dump --host localhost --username postgres --dbname mydatabase > db.sql
# 导出数据和表,添加-c参数后,会包含drop语句
pg_dump --host localhost --username postgres --dbname mydatabase -c > db.sql
# 恢复表
psql --host localhost --username postgres --dbname mydatabase < db.sql


pg_dump和pg_restore加速

pg_dump在大数据量下dump和restore速度很慢,可以使用-j参数来起多个job并行操作。

1
2
$ pg_dump --host localhost -j8 -Fd -v -Uintern -d web_v2_staging -f web_v2_staging_dump
$ pg_restore --verbose --clean --no-acl --no-owner -h localhost -d web_v3_staging -Uintern -j8 --format=d web_v2_staging_dump


输出结果中显示NULL字串

psql的默认输出格式中,没法区分NULL和空字串,使用\pset null (null)来显示NULL

1
2
3
4
5
6
7
8
9
postgres=# \pset null (null);
Null display is "(null);".
postgres=# select NULL;
?column?
----------
(null);
(1 row)
postgres=#


explain analyze用来定位慢查询

针对慢查询,使用explainexplain analyze来定位问题。
explain只做评估,并不真正执行sql。
explain analyze会直接运行sql。
一般还需开启\timing来使用


修改用户密码

修改密码的命令

1
ALTER ROLE username WITH PASSWORD 'password';


查看所有的参数

1
show all

AWS RDS Postgresql master/replicate要修改的几个配置

Master参数修改

1
2
wal_keep_segments: 需要需要设大一点,否则写过多的话,会导致replica暂停流复制,从S3下载archived WAL来恢复数据
random_page_cost: 设为seq_page_cost一样的值,默认random_page_cost是4,seq_page_cost是1。SSD磁盘下可以认为顺序查找和随机查找开销是一样的。

Replica参数修改

1
2
hot_standby_feedback: 设为1,防止replica查询还在执行时,Master已经将受影响的数据清理掉并同步给replica,导致replica上的查询失败。
random_page_cost: 设为seq_page_cost一样的值,默认random_page_cost是4,seq_page_cost是1。SSD磁盘下可以认为顺序查找和随机查找开销是一样的。

Reference: Best practices for Amazon RDS PostgreSQL replication


查看数据库支持的扩展

查看当前数据库的pg扩展命令SELECT * FROM pg_extension;或者\dx
查看服务器支持的pg扩展SELECT * FROM pg_available_extensions;


使用SSL/TLS连接数据库

psql中通过添加sslmode指定模式以及sslrootcert指定cert路径来使用SSL/TLS

1
psql "sslmode=verify-full sslrootcert=./rds-ca-2019-root.pem host=staging-replica.xxxxxxxxxxx.us-east-1.rds.amazonaws.com port=5432 user=db_user dbname=db_name"

sslmode的说明摘录如下, 来自: https://www.postgresql.org/docs/9.6/libpq-ssl.html

sslmode Eavesdropping protection MITM protection Statement
disable No No I don’t care about security, and I don’t want to pay the overhead of encryption.
allow Maybe No I don’t care about security, but I will pay the overhead of encryption if the server insists on it.
prefer Maybe No I don’t care about encryption, but I wish to pay the overhead of encryption if the server supports it.
require Yes No I want my data to be encrypted, and I accept the overhead. I trust that the network will make sure I always connect to the server I want.
verify-ca Yes Depends on CA-policy I want my data encrypted, and I accept the overhead. I want to be sure that I connect to a server that I trust.
verify-full Yes Yes I want my data encrypted, and I accept the overhead. I want to be sure that I connect to a server I trust, and that it’s the one I specify.

刷新sequence为表的最大值

更新单表

Postgresql的主键自增是通过sequence机制来实现的。重建或者手动insert数据的时候,sequence可能不会被更新。
此时,可以手动更新sequence中的last_value为对应表的最大值。

以users表为例:

1
SELECT setval('users_id_seq', (SELECT MAX(id) FROM users));

更新全部表

https://wiki.postgresql.org上有更新全部表sequence的方法Fixing Sequences


PG的range函数

PG中自带了range类型和函数,可以用来方便的计算数值范围。具体类型,操作符和函数参见官方文档

此处简单摘录一下时间范围tsrange的用法

  1. tsrange类型的定义,使用tsrange(start_time, end_time)来定义一个时间区间,可选的第三个参数,可以来定义区间是开区间还是闭区间。

    1
    2
    3
    4
    5
    6
    7
    test=# select tsrange('2020-01-01 11:00:00', '2020-01-02 00:00:00'), tsrange('2020-01-01 11:00:00', '2020-01-02 00:00:00', '[]');
    tsrange | tsrange
    -----------------------------------------------+-----------------------------------------------
    ["2020-01-01 11:00:00","2020-01-02 00:00:00") | ["2020-01-01 11:00:00","2020-01-02 00:00:00"]
    (1 row)
    test=#
  2. 使用@>来判定值是否在区间内。

    1
    2
    3
    4
    5
    6
    7
    test=# select tsrange('2020-01-01 11:00:00', '2020-01-02 00:00:00') @> '2020-01-01 10:00:00'::timestamp as is_in_range;
    is_in_range
    -------------
    f
    (1 row)
    test=#
  3. 使用upperlower来获取区间长度

    1
    2
    3
    4
    5
    6
    7
    test=# select upper((tsrange('2020-01-01 11:00:00', '2020-01-02 00:00:00'))) - lower((tsrange('2020-01-01 11:00:00', '2020-01-02 00:00:00'))) as length;
    length
    ----------
    13:00:00
    (1 row)
    test=#
  4. 使用&&来判断区间是否重合

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    -- 有区间重合
    test=# select tsrange('2020-01-01 00:00:00', '2020-01-02 00:00:00') && tsrange('2020-01-01 23:00:00', '2020-01-02 00:00:00');
    ?column?
    ----------
    t
    (1 row)
    -- 没有区间重合
    test=# select tsrange('2020-01-01 00:00:00', '2020-01-02 00:00:00') && tsrange('2020-01-02 00:00:00', '2020-01-02 01:00:00');
    ?column?
    ----------
    f
    (1 row)
    test=#
  5. 创建索引
    使用GiST和SP-GiST来为range类型的列来创建索引,可以加速=, &&, <@, @>, <<, >>, -|-, &<, and &>这几类运算符。

    1
    2
    3
    4
    5
    6
    7
    8
    create table events (
    id int primary key,
    user_id int not null,
    event text not null,
    timerange tsrange not null
    );
    CREATE INDEX timerange_on_events_idx ON events USING GIST (timerange);
  6. 使用EXCLUDE来限制重复数据

    • 新建表时候就添加exclude

      1
      2
      3
      4
      5
      6
      7
      create table events (
      id int primary key,
      user_id int not null,
      event text not null,
      timerange tsrange not null,
      EXCLUDE USING GIST (timerange WITH &&)
      );
    • 或者表建好后,后续再使用alter来添加exclude

      1
      2
      3
      4
      5
      6
      7
      8
      create table events (
      id int primary key,
      user_id int not null,
      event text not null,
      timerange tsrange not null
      );
      alter table events add constraint timerange_on_events_excl EXCLUDE USING gist (timerange WITH &&);
    • 添加了exclude后,插入重复数据会报错。

      1
      2
      3
      4
      5
      6
      7
      8
      9
      test=# insert into events(id, user_id, event, timerange) values
      test-# (1, 1, 'aaa', tsrange('2020-01-01 00:00:00', '2020-01-02 00:00:00'));
      INSERT 0 1
      test=#
      test=# insert into events(id, user_id, event, timerange) values
      test-# (2, 1, 'aaa', tsrange('2020-01-01 10:00:00', '2020-01-01 10:10:00'));
      ERROR: conflicting key value violates exclusion constraint "timerange_on_events_excl"
      DETAIL: Key (timerange)=(["2020-01-01 10:00:00","2020-01-01 10:10:00")) conflicts with existing key (timerange)=(["2020-01-01 00:00:00","2020-01-02 00:00:00")).
      test=#

trim去除首尾空白字符

可以看出,使用了trim过后,info前后的空白都被去除了。

1
2
3
4
5
6
7
test=# select trim(' info ')||',here' as string;
string
-----------
info,here
(1 row)
test=#


清空整个表并回收空间

PostgreSQL中要清空某个表并且回收空间,可以使用TRUNCATE命令。
官网描述如下:

TRUNCATE quickly removes all rows from a set of tables. It has the same effect as an unqualified DELETE on each table, but since it does not actually scan the tables it is faster. Furthermore, it reclaims disk space immediately, rather than requiring a subsequent VACUUM operation. This is most useful on large tables.
用法:

1
TRUNCATE removed_images;

参考: TRUNCATE


创建和查看view

创建view

1
2
3
4
5
6
7
SELECT questions.id AS question_id,
string_agg(question_categories.name::text, ', '::text) AS question_category_name,
array_agg(question_categories.id) AS question_category_ids
FROM questions
JOIN question_question_categories ON question_question_categories.question_id = questions.id
JOIN question_categories ON question_categories.id = question_question_categories.question_category_id AND question_categories.deleted_at IS NULL
GROUP BY questions.id;

查看view, \d+ question_category_view


如何快速删除表中大部分数据

Reference: Best way to delete millions of rows by ID

思路:

  • 创建临时表存放需要保存的数据
  • truncate原表
  • 从临时表中把数据重新拷贝回原表中

例子:

1
2
3
4
5
6
7
8
9
10
11
12
SET temp_buffers = '1000MB'; -- or whatever you can spare temporarily
CREATE TABLE tmp AS
SELECT t.*
FROM tbl t
LEFT JOIN del_list d USING (id)
WHERE d.id IS NULL; -- copy surviving rows into temporary table
TRUNCATE tbl; -- empty table - truncate is very fast for big tables
INSERT INTO tbl
SELECT * FROM tmp; -- insert back surviving rows.

注意: 源stackoverflow中使用的是CREATE TEMP TABLE创建的临时表,临时表只对当前session有效。万一truncate后连接断掉,会导致数据丢失。因此改为CREATE TABLE tmp比较合理。


拷贝表

拷贝表的语句

1
2
CREATE TABLE new_table AS
TABLE existing_table;

如果只需要复制表结构而不包含数据,可以将 WITH NO DATA 子句添加到 CREATE TABLE 语句中。

1
2
3
CREATE TABLE new_table AS
TABLE existing_table
WITH NO DATA;

注意拷贝表只有结构和数据,索引是不能复制的,需要在新表中手动添加。

Reference: 如何在 PostgreSQL 中复制表


刷新表的sequence

手动拷贝数据时,由于拷贝的数据会直接insert into带ID的数据。此时sequence并不会跟随着变化。
拷贝完数据后,就需要刷新sequence id表为表对应的最大值。
方法摘录自: How to reset postgres’ primary key sequence when it falls out of sync?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Login to psql and run the following
-- What is the result?
SELECT MAX(id) FROM your_table;
-- Then run...
-- This should be higher than the last result.
SELECT nextval('your_table_id_seq');
-- If it's not higher... run this set the sequence last to your highest id.
-- (wise to run a quick pg_dump first...)
BEGIN;
-- protect against concurrent inserts while you update the counter
LOCK TABLE your_table IN EXCLUSIVE MODE;
-- Update the sequence
SELECT setval('your_table_id_seq', COALESCE((SELECT MAX(id)+1 FROM your_table), 1), false);
COMMIT;

Amazon Linux 2上安装Postgresql

参考帖子:
Install PSQL client on EC2 instance

大概步骤:

1
2
amazon-linux-extras install postgresql10
yum install -y postgresql-server postgresql-devel


Postgresql重置postgres密码

如果忘记本地postgres账号的密码,可以先修改数据库本地登录为trust模式,此模式下登录账号不需要密码。
方法:

  1. 修改pg_hba.conf中模式为trust

    1
    local all all trust
  2. 重启pg数据库

    1
    systemctl restart postgresql-9.3.service
  3. 登录数据库修改密码

    1
    2
    3
    psql -U postgres
    postgres=# alter user postgres with password 'NEW_PASSWORD';
    postgresl=# \q
  4. 修改pg_hba.conf中改回为md5

    1
    local all all md5
  5. 重启pg数据库, 使配置生效。

    1
    systemctl restart postgresql-9.3.service

参考: Modify The Default Administrator Password


停止一个query查询

查询所有活跃的查询

1
SELECT * FROM pg_stat_activity WHERE state = 'active';

终止某个查询

1
SELECT pg_cancel_backend(<pid of the process>)

强制杀死某个查询

1
SELECT pg_terminate_backend(<pid of the process>)

参考: How to stop/kill a query in postgresql?

留言