Blog
paimon-08 维表join
paimon支持lookup join语法 用于从paimon查询的数据来补充维度字段
要求一个表具有处理时间属性,而另一个表由查找源连接器支持
案例:paimon支持flink中具有主键的表和append-only的表查找联结
USE CATALOG fs_catalog;
CREATE TABLE customers (
id INT PRIMARY KEY NOT ENFORCED,
name STRING,
country STRING,
zip STRING
);
INSERT INTO customers VALUES(1, 'zs', 'ch', '123'),(2,'ls', 'ch', '456'),(3,'ww','ch', '789');
CREATE TEMPORARY TABLE Orders (
order_id INT,
total INT,
customer_id INT,
proc_time AS PROCTIME()
) WITH (
'connector' = 'datagen',
'rows-per-second' = '1',
'fields.order_id.kind'='sequence',
'fields.order_id.start'='1',
'fields.order_id.end'='10000',
'fields.total.kind'='random',
'fields.total.min'='1',
'fields.total.max'='1000',
'fields.customer_id.kind'='random',
'fields.customer_id.min'='1',
'fields.customer_id.max'='3'
);
SELECT o.order_id, o.total, c.country, c.zip
FROM Orders AS o
JOIN customers
FOR SYSTEM_TIME AS OF o.proc_time AS c
ON o.customer_id=c.id;
- JOIN 维度表名 FOR SYSTEM_TIME AS OF 事实表的处理时间字段 AS 别名
lookup join算子会在本地维护一个rocksdb缓存并实时拉取表的最新更新,查找联结运算符只会提取必要的数据 因此过滤条件对性能非常重要
如果orders(主表)的记录join缺失因为customers(查找表)对应的数据还没有准备好,可以考虑使用flink的Delayed Retry Strategy For lookup