Paimon, 大数据

paimon-08 维表join

paimon支持lookup join语法 用于从paimon查询的数据来补充维度字段

要求一个表具有处理时间属性,而另一个表由查找源连接器支持

案例:paimon支持flink中具有主键的表和append-only的表查找联结

USE CATALOG fs_catalog;

CREATE TABLE customers (
    id INT PRIMARY KEY NOT ENFORCED,
    name STRING,
    country STRING,
    zip STRING
);

INSERT INTO customers VALUES(1, 'zs', 'ch', '123'),(2,'ls', 'ch', '456'),(3,'ww','ch', '789');

CREATE TEMPORARY TABLE Orders (
    order_id INT,
    total INT,
    customer_id INT,
    proc_time AS PROCTIME()
) WITH (
    'connector' = 'datagen',
    'rows-per-second' = '1',
    'fields.order_id.kind'='sequence',
    'fields.order_id.start'='1',
    'fields.order_id.end'='10000',
    'fields.total.kind'='random',
    'fields.total.min'='1',
    'fields.total.max'='1000',
    'fields.customer_id.kind'='random',
    'fields.customer_id.min'='1',
    'fields.customer_id.max'='3'
);

SELECT o.order_id, o.total, c.country, c.zip
FROM Orders AS o
JOIN customers
FOR SYSTEM_TIME AS OF o.proc_time AS c
ON o.customer_id=c.id;
  • JOIN 维度表名 FOR SYSTEM_TIME AS OF 事实表的处理时间字段 AS 别名

lookup join算子会在本地维护一个rocksdb缓存并实时拉取表的最新更新,查找联结运算符只会提取必要的数据 因此过滤条件对性能非常重要

如果orders(主表)的记录join缺失因为customers(查找表)对应的数据还没有准备好,可以考虑使用flink的Delayed Retry Strategy For lookup