NFK model and Surrogate Smart Order

OptExec RL

4.1 Preliminary analysis

Before examining the performance of our general model in comparison with the LIST Smart Order strategy, we have carried out a preliminary analysis with the aim of testing the capability of the reinforcement learning approach developed to address the optimal execution problem.

This preparatory study has been applied to the orders released by a “List customer specialist in equity markets” on FCA and ENI in 2018, which have been used as test set for our RL algorithm (trained on the 2018 FTSE MIB limit order book). The orders at issue have been partitioned into child orders following the volume curve realised every day for the corresponding stock. The parameter values used for this analysis are:

time bins (in minutes): [0, 1, 3, 6, 10, 20, 50] ($T= 6$);
inventory bins (in shares): [0, 100, 300, 600, 1000, 2000, 5000] ($I = 6$) for FCA; [0, 100, 300, 600, 1000, 2000, 8000] ($I = 6$) for ENI;
spread bins (in tick sizes): [0, 1, 2] ($S_{\rm max}=2$), in the period 01/01/2018 - 26/01/2018; [0, 1, 2, 3, 4, 5] ($S_{\rm max}=5$), in the period 29/01/2018 - 31/12/2018;
actions (in tick sizes): [-3, -2, -1, 0, 1, 2, 3];
tick size: 0.01, in the period 01/01/2018 - 26/01/2018; 0.002, in the period 29/01/2018 - 31/12/2018.

For the reasons explained in Sec. 3.2.3 of the preceding post, the training and test of the algorithm have been carried out discriminating between the times of the year with different tick size, for buy/sell cases.

1) For orders between 01/01/2018 and 26/01/2018: training from 01/01/2018 to 26/01/2018 for FCA/ENI; consecutive episodes in a day start every 15 minutes, with each episode lasting 50 minutes.

2) For orders between 29/01/2018 and 31/12/2018: training from 01/11/2018 to 31/12/2018 for FCA and from 01/07/2018 to 31/08/2018 for ENI; consecutive episodes in a day start every 30 minutes, with each episode lasting 50 minutes.

4.1.1 Surrogate Smart Order

To begin with a simpler case, we have tested our RL policy against a sort of Smart Order strategy, which we have called Surrogate Smart Order (SSO). The SSO simulation consists in the execution of the quantity $q$ of the child order at the end of each interval, by submitting a market order. This strategy differs from what the actual Smart Order (SO) policy does, since the latter relies on placing subsequent limit orders at the current best price within each interval, with the hope that one or more of them will be “hit” by some market order, and eventually going to the market, at the end of the interval, with any unexecuted shares. In a similar way to the RL strategy, we have calculated the volume-weighted average prices deriving from the SSO policy, for each of the orders in question (see Sec. 3.2.2 of the previous post).

4.1.2 NFK versus SSO

Now we are ready to illustrate the outcomes of this preliminary analysis. As criterion for comparing the performances of the two implemented strategies, we have chosen the total Profit and Loss (P&L) deriving from each of them, computed with respect to the market. In order to do this, we have calculated the P&L relative to each order, both in cash (euro) and in basis point units, through the following formulas:

${\rm P\&L}_{\rm strat} \hspace{0.1cm} [{\rm cash}] = \sigma \cdot Q\cdot ({\rm VWAP}_{\rm strat} - {\rm VWAP}_{\rm market})$

and

${\rm P\&L}_{\rm strat} \hspace{0.1cm} [{\rm bps}] = \sigma \cdot \frac{{\rm VWAP}_{\rm strat} - {\rm VWAP}_{\rm market}}{{\rm VWAP}_{\rm market}}\times 10^4,$

where $\sigma$ is the sign of the order ($-$ for buy, $+$ for sell), $Q$ is the total order volume, while ${\rm VWAP}_{\rm strat}$ and ${\rm VWAP}_{\rm market}$ are the volume-weighted average prices obtained through the strategy (RL or SSO) and the market respectively. Below we show the tables with the P&L statistics elaborated for SSO (yellow) and RL (red) policies, for FCA:

and ENI:

As you can see from the tables above, for every order dataset we have calculated the mean, the standard deviation, the minimum and maximum values, and the 25th, 50th and 75th percentiles of the order VWAPs (both in euros - p&l statistics in cash - and in basis points - p&l statistics in bps). Moreover, count is the total number of orders (both buy and sell) used for the test: 217 for FCA and 342 for ENI. The field p&l total cash in euro, computed as count $\times$ mean (in cash), indicates how much we spend in total using the RL/SSO strategy with respect to the market, while cost in euro is the cost deriving from the total order executions of the strategy.

By comparing the results obtained for the two policies, we get:

for FCA: p&l total cash in euro (RL - SSO) = 8697.16 euros; cost in euro (RL - SSO) = 2911.20 euros;
for ENI: p&l total cash in euro (RL - SSO) = 43954.31 euros; cost in euro (RL - SSO) = 5263.20 euros.

Thus, by considering the P&L only, with our RL strategy we earn about 12% and 28% more with respect to the SSO policy, for FCA and ENI respectively, while the cost coming from the order executions is always higher for the RL policy compared to the simulated Smart Order. In conclusion, the overall gain of the RL over the SSO strategy is of about 7% and 23%, respectively for FCA and ENI.

Though further analyses would be required to confirm these results, this first study shows that a reinforcement learning approach is capable of performing significantly better than a naive strategy like the Surrogate Smart Order.

Forts of these results, we have therefore extended the analysis to a larger dataset in order to improve the statistics and we have tested the performance of our RL model against the actual Smart Order policy. The next post will be dedicated to this. Don’t miss it!