Java 搞定 SQL 集合运算的最简方法
问题介绍
作为 java 程序员,用代码直接实现类似 SQL 中的交并补差的集合运算,总是要编写大量的代码,如果能有一个专门的外部数据工具,通过写简单类似 SQL 的脚本来实现,在 java 中直接调用并可以返回结果集,就再好不过了。Java 版集算器正是解决这一难题的神器,通过 SPL 脚本可以直观自然得写出运算,再使用 java 调用 SPL 脚本,使用起来简单,快捷,高效。另外,虽然 SQL 有集合概念,但对于有序集合运算提供的支持却很有限,经常要采用很费解的思路才可以完成, SPL 基于离散数据集模型,能轻松处理有序集合运算。下面我们就由浅入深,举例说明如何使用。
SPL 实现
和集
示例 1: 求重叠时间段的总天数
MySQL8:
with recursive t(start,end) as (select date'2010-01-07',date'2010-01-9' union all select date'2010-01-15',date'2010-01-16' union all select date'2010-01-07',date'2010-01-12' union all select date'2010-01-08',date'2010-01-11'),t1(d,end) as (select start,end from t union all select d+1,end from t1 where dselect count(distinct d) from t1;with recursive t(start,end) as (select date'2010-01-07',date'2010-01-9' union all select date'2010-01-15',date'2010-01-16' union all select date'2010-01-07',date'2010-01-12' union all select date'2010-01-08',date'2010-01-11'),t1(d,end) as (select start,end from t union all select d+1,end from t1 where dselect count(distinct d) from t1;
说明:此例先将各时间段转成时间段内所有日子对应的日期,然后再求不同日期的个数
集算器 SPL:
A | |
---|---|
1 | =connect("mysql") |
2 | =A1.query@x("select date'2010-01-07'start,date'2010-01-9'end union all select date'2010-01-15',date'2010-01-16'union all select date'2010-01-07',date'2010-01-12'union all select date'2010-01-08',date'2010-01-11'") |
3 | =A2.(periods(start,end)) |
4 | =A3.conj() |
5 | =A4.icount() |
A3: 对 A2 中的每一个时间段构造从 start 到 end 的日期序列
A4: 求 A3 中所有日期序列的和
A5: 求 A4 中不重复日期的个数
保存脚本文件SumSet.dfx (嵌入 Java 会用到)
差集
示例 1: 列出英语人口和法语人口均超过 5% 的国家
MySQL8:
with t1(lang) as (select 'English' union all select 'French') select name from world.country c where not exists(select * from t1 where lang not in (select language from world.countrylanguage where percentage>=5 and countrycode=c.code ));with t1(lang) as (select 'English' union all select 'French') select name from world.country c where not exists(select * from t1 where lang not in (select language from world.countrylanguage where percentage>=5 and countrycode=c.code ));
说明:此 SQL 只是演示通过双重否定实现差集为空
集算器 SPL:
A | |
---|---|
1 | =connect("mysql") |
2 | =A1.query("select CountryCode,Name,Language,Percentage from world.countrylanguage cl join world.country c on cl.countrycode=c.code where percentage>5") |
3 | =A2.group(CountryCode) |
4 | =A3.select(["English","French"]\~.(Language)==[]) |
5 | =A4.new(~.Name:name) |
A4: 选出 ["English","French"] 与本组语言集合的差为空的组,意思就是选出语言集合包含 English 和 French 的组
保存脚本文件DifferenceSet.dfx (嵌入 Java 会用到)
交集
示例 1: 列出英语人口、法语人口、西班牙语人口分别超过 0.3%、0.2%、0.1% 的国家代码
MySQL8:
with t1 as (select countrycode from world.countrylanguage where language='English' and percentage>0.3), t2 as (select countrycode from world.countrylanguage where language='French' and percentage>0.2), t3 as (select countrycode from world.countrylanguage where language='Spanish' and percentage>0.1)select countrycodefrom t1 join t2 using(countrycode) join t3 using(countrycode);with t1 as (select countrycode from world.countrylanguage where language='English' and percentage>0.3), t2 as (select countrycode from world.countrylanguage where language='French' and percentage>0.2), t3 as (select countrycode from world.countrylanguage where language='Spanish' and percentage>0.1)select countrycodefrom t1 join t2 using(countrycode) join t3 using(countrycode);
说明:此例只是演示如何求解多个集合的交集
集算器 SPL:
A | |
---|---|
1 | =connect("mysql") |
2 | [English,French,Spanish] |
3 | [0.3,0.2,0.1] |
4 | =A2.(A1.query@i("select countrycode from world.countrylanguage where language=? and percentage>?",~,A3(#))) |
5 | >A1.close() |
6 | =A4.isect() |
A3: 按次序依次查询英语人口超 0.3%、法语人口超 0.2%、西班牙语超 0.1% 的国家代码,并转成序列
A5: A3 中所有序列交集
保存脚本文件IntersectionSet.dfx (嵌入 Java 会用到)
Java 调用
SPL 嵌入到 Java 应用程序十分方便,通过 JDBC 调用存储过程方式加载,用和集保存的文件SumSet.dfx,示例调用如下:
... Connection con = null; Class.forName("com.esproc.jdbc.InternalDriver"); con= DriverManager.getConnection("jdbc:esproc:local://"); //调用存储过程,其中SumSet是dfx的文件名 st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call SumSet()"); //执行存储过程 st.execute(); //获取结果集 ResultSet rs = st.getResultSet(); ... ... Connection con = null; Class.forName("com.esproc.jdbc.InternalDriver"); con= DriverManager.getConnection("jdbc:esproc:local://"); //调用存储过程,其中SumSet是dfx的文件名 st =(com. esproc.jdbc.InternalCStatement)con.prepareCall("call SumSet()"); //执行存储过程 st.execute(); //获取结果集 ResultSet rs = st.getResultSet(); ...
替换成DifferenceSet.dfx或IntersectionSet.dfx是同样的道理,只需 call DifferenceSet()或者 call IntersectionSet() 即可。这里只用 Java 片段粗略解释了如何嵌入 SPL,详细步骤请参阅Java 如何调用 SPL 脚本,也非常简单,不再赘述。同时,SPL 也支持 ODBC 驱动,集成到支持 ODBC 的语言,嵌入过程类似。
扩展节选
关于集合运算除了上面讲的和差交运算,还可以获取与行号有关的计算,以及有序集合的对位运算。
根据行号取数据
示例 1: 计算招商银行 (600036) 2017 年第 3 个交易日和倒数第 3 个交易日的交易信息
MySQL8:
with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31') select tdate,open,close,volume from t where rn=3union all select tdate,open,close,volume from t where rn=(select max(rn)-2 from t);with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31') select tdate,open,close,volume from t where rn=3union all select tdate,open,close,volume from t where rn=(select max(rn)-2 from t);
集算器 SPL:
A | |
---|---|
1 | =connect("mysql") |
2 | =A1.query@x("select * from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31' order by tdate") |
3 | =A2(3)|A2.m(-3) |
A3: 第 3 条记录和倒数第 3 条记录的和集
示例 2: 计算招商银行 (600036) 最近 20 个交易日的平均收盘价
MySQL8:
with t as (select *, row_number() over(order by tdate desc) rn from stktrade where sid='600036')select avg(close) avg20 from t where rn<=20;with t as (select *, row_number() over(order by tdate desc) rn from stktrade where sid='600036')select avg(close) avg20 from t where rn<=20;
集算器 SPL:
A | |
---|---|
1 | =connect("mysql") |
2 | =A1.query@x("select * from stktrade where sid='600036' order by tdate") |
3 | =A2.m(-20:) |
4 | =A3.avg(close) |
A2: 将 600036 的交易记录按日期排序
A3: 取从倒数 20 条到末尾的所有记录
A4: 求 A3 中所有记录收盘价的平均值
求满足条件的记录的行号
示例 1: 计算招商银行 (600036)2017 年经过多少交易日收盘价达到 25 元
MySQL8:
with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31')select min(rn) from t where close>=25;with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31')select min(rn) from t where close>=25;
集算器 SPL:
A | |
---|---|
1 | =connect("mysql") |
2 | =A1.query@x("select * from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31' order by tdate") |
3 | =A2.pselect(close>=25) |
A3: 从前往后查找第 1 个收盘价达到 25 元的记录位置
示例 2: 计算格力电器 (000651) 2017 年涨幅 (考虑停牌)
MySQL8:
with t as (select * from stktrade where sid='000651'), t1(d) as (select max(tdate) from t where tdate<'2017-01-01'), t2(d) as (select max(tdate) from t where tdate<'2018-01-01')select s2.close/s1.close-1 risefrom (select * from t,t1 where tdate=d) s1, (select * from t,t2 where tdate=d) s2;with t as (select * from stktrade where sid='000651'), t1(d) as (select max(tdate) from t where tdate<'2017-01-01'), t2(d) as (select max(tdate) from t where tdate<'2018-01-01')select s2.close/s1.close-1 risefrom (select * from t,t1 where tdate=d) s1, (select * from t,t2 where tdate=d) s2;
集算器 SPL:
A | |
---|---|
1 | =connect("mysql") |
2 | =A1.query@x("select * from stktrade where sid='000651' and tdate<'2018-01-01' order by tdate") |
3 | =A2.pselect@z(tdate < date("2017-01-01")) |
4 | =A2(A3).close |
5 | =A2.m(-1).close |
6 | =A5/A4-1 |
A2: 数据按交易日从小到大排序
A3: 从后往前查找交易日在 2017-01-01 之前的最后一条记录在序列中的行号
A4: 求 2016 年收盘价
A5: 求 2017 年收盘价,其中 A2.m(-1) 取倒数第 1 条记录,即 2017 年最后一个交易日对应的记录
示例 3: 列出 2017 年信息发展 (300469) 交易量超过 250 万股时的交易信息及各日涨幅(考虑停牌)
MySQL8:
with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='300469' and tdate<=date '2017-12-31'),t1 as (select * from t where tdate>=date'2017-01-01' and volume>=2500000) select t1.tdate, t1.close, t.volume, t1.close/t.close-1 rise from t1 join t on t1.rn=t.rn+1;with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='300469' and tdate<=date '2017-12-31'),t1 as (select * from t where tdate>=date'2017-01-01' and volume>=2500000) select t1.tdate, t1.close, t.volume, t1.close/t.close-1 rise from t1 join t on t1.rn=t.rn+1;
集算器 SPL:
A | |
---|---|
1 | =connect("mysql") |
2 | =A1.query@x("select * from stktrade where sid='300469' and tdate<= date '2017-12-31' order by tdate") |
3 | =A2.pselect@a(tdate>=date("2017-01-01") && volume>2500000) |
4 | =A3.new(A2(~).tdate:tdate, A2(~).close:close, A2(~).volume:volume, A2(~).close/A2(~-1).close-1:rise) |
A3: 求出 2017 年交易量超 250 万股所有记录的行号
A4: 根据行号计算相应的日期、收盘价、交易量、涨幅
求最大值或最小值所在记录的行号
示例 1: 计算招商银行 (600036) 2017 年最早的最低价与最早的最高价间隔多少交易日
MySQL8:
with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31'), t1 as (select * from t where close=(select min(close) from t)), t2 as (select * from t where close=(select max(close) from t))select abs(cast(min(t1.rn) as signed)-cast(min(t2.rn) as signed)) intevalfrom t1,t2;with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31'), t1 as (select * from t where close=(select min(close) from t)), t2 as (select * from t where close=(select max(close) from t))select abs(cast(min(t1.rn) as signed)-cast(min(t2.rn) as signed)) intevalfrom t1,t2;
集算器 SPL:
A | |
---|---|
1 | =connect("mysql") |
2 | =A1.query@x("select * from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31' order by tdate") |
3 | =A2.pmax(close) |
4 | =A2.pmin(close) |
5 | =abs(A3-A4) |
A3: 从前往后找最大收盘价在序列中的行号
A4: 从前往后找最小收盘价在序列中的行号
示例 2: 计算招商银行 (600036) 2017 年最后的最低价与最后的最高价间隔多少交易日
MySQL8:
with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31'), t1 as (select * from t where close=(select min(close) from t)), t2 as (select * from t where close=(select max(close) from t))select abs(cast(max(t1.rn) as signed)-cast(max(t2.rn) as signed)) intevalfrom t1,t2;with t as (select *, row_number() over(order by tdate) rn from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31'), t1 as (select * from t where close=(select min(close) from t)), t2 as (select * from t where close=(select max(close) from t))select abs(cast(max(t1.rn) as signed)-cast(max(t2.rn) as signed)) intevalfrom t1,t2;
集算器 SPL:
A | |
---|---|
1 | =connect("mysql") |
2 | =A1.query@x("select * from stktrade where sid='600036' and tdate between '2017-01-01' and '2017-12-31' order by tdate") |
3 | =A2.pmax@z(close) |
4 | =A2.pmin@z(close) |
5 | =abs(A3-A4) |
A3: 从后往前找最大收盘价在序列中的行号
A4: 从后往前找最小收盘价在序列中的行号
有序集合间的对位计算
示例 1: 求 2018 年 3 月 6 日到 8 日创业板指 (399006) 对深证成指 (399001) 的每日相对收益率
MySQL8:
with t1 as (select *,close/lag(close) over(order by tdate) rise from stktrade where sid='399006' and tdate between '2018-03-05' and '2018-03-08'), t2 as (select *, close/lag(close) over(order by tdate) rise from stktrade where sid='399001' and tdate between '2018-03-05' and '2018-03-08')select t1.rise-t2.risefrom t1 join t2 using(tdate)where t1.rise is not null;with t1 as (select *,close/lag(close) over(order by tdate) rise from stktrade where sid='399006' and tdate between '2018-03-05' and '2018-03-08'), t2 as (select *, close/lag(close) over(order by tdate) rise from stktrade where sid='399001' and tdate between '2018-03-05' and '2018-03-08')select t1.rise-t2.risefrom t1 join t2 using(tdate)where t1.rise is not null;
集算器 SPL:
A | |
---|---|
1 | =connect("mysql") |
2 | =["399006","399001"].(A1.query("select * from stktrade where sid=? and tdate between '2018-03-05' and '2018-03-08' ",~)) |
3 | >A1.close() |
4 | =A2.(~.calc(to(2,4),close/close[-1])) |
5 | =A4(1)--A4(2) |
A2: 依次查询 399006 和 399001 从 2018 年 3 月 5 日到 8 日的交易数据
A4: 依次计算 A2 中 2 个序表从第 2 条记录到第 4 条记录的涨幅,也就是 399006 和 399001 从 2018 年 3 月 6 日到 8 日的每天涨幅
A5: 对位相减,即可算出每日相对收益率
SPL 优势
有库写 SQL,没库写 SPL
用 Java 程序直接汇总计算数据,还是比较累的,代码很长,并且不可复用,很多情况数据也不在数据库里,有了 SPL,就能像在 Java 中用 SQL 一样了,十分方便。
常用无忧,不花钱就能取得终身使用权的入门版
如果要分析的数据是一次性或临时性的,润乾集算器每个月都提供免费试用授权,可以循环免费使用。但要和 Java 应用程序集成起来部署到服务器上长期使用,定期更换试用授权还是比较麻烦,润乾提供了有终身使用权的入门版,解决了这个后顾之忧,获得方式参考 如何免费使用润乾集算器?
技术文档和社区支持
官方提供的集算器技术文档本身就有很多现成的例子,常规问题从文档里都能找到解决方法。如果获得了入门版,不仅能够使用 SPL 的常规功能,碰到任何问题都可以去乾学院上去咨询,官方通过该社区对入门版用户提供免费的技术支持。