By Irena Siyi Xiong

INSIGHTS Research

Systematic credit has drawn more limelight over the years as electronic trading of various credit instruments gained in volume and share of the market. The increased availability of high-quality data and growth of liquidity has made it possible for us at Arabesque AI to consider an expansion into credit instruments.

Both the valuations of corporate credit and equity are dependent on the overall health of a company. Therefore, they share many performance drivers, which we have already implemented into our equity analytical models. This makes it an interesting use-case for us to investigate our models’ transfer-learning capabilities. Through various proofs-of-concept over the past year, we have demonstrated the ability to build analytical models for corporate credit bonds.

Naivety and Challenges

From a theoretical, machine-learning perspective, where we have built a strong pipeline of models for our equity predictions, the application to the credit universe is a simple problem. It can be solved by creating a new dataset and a new label which we then use to train and evaluate a baseline model from our existing pipeline. However, the reality is a lot more challenging, due to 1) heterogeneity of data in clustering and across time, 2) high dataset imbalances, 3) trustworthiness of data and last, but not least, 4) entity and exchange mapping challenges. Let’s briefly look at each of these challenges.

Heterogeneity of data: When we consider equities, we can naively group them by the geographies they trade in, the sectors they conduct business in, etc. Ultimately, most of these instruments are non-preferential shares, otherwise known as common shares. Hence, they are comparable in a way. Corporate credit is awash with little details that makes it hard to compare. Some bonds are callable or puttable, which gives either the issuer the right to redeem the bond before the maturity date, or it gives the holder the right to demand the paying back of the principal amount before the maturity date. In stocks, options are separate financial products and therefore don’t need to be considered in pure stock-price forecasting. Further, the maturity dates of bonds are not aligned, one company can issue various types of bonds, such as secured bonds or convertibles and, to make it even more complicated, some European bonds are eligible for the ECB’s asset purchase programme. Hence, the grouping of “similar assets” for training is a harder task in bonds if one wishes to adjust for all these granularities.

To make matters worse, equities can almost always be assumed to be perpetually existing unless in the case of corporate events. On the other hand, bonds can almost always be assumed to expire at some point in time, except in the occasional case of perpetual bonds. This means that the universe refresh rate is exceedingly high. This presents many challenges for machine learning algorithms, not least limited to inconsistent dataset sizes or the unknown extent of survivorship bias vs. maturity effect. Datasets, therefore, need to be asset-agnostic to a certain degree and carefully constructed to maintain comparability.

High dataset imbalance: In equities we can either frame the problem as a price prediction or a returns prediction, either of which can be calculated by the prices of the equities (split/dividend-adjusted, which are still just intrinsic datapoints of the equities). In bonds, we can either frame the problem as a price prediction or a credit spread prediction. The former is a bond datapoint and the latter a combination of the bond yield versus the risk-free rate, typically a US Treasury bond. Here, we are implicitly predicting for “interactions” between two different assets— the bond and the risk-free rate. Moreover, when we train for a target label of a minimum spread widening/narrowing, we find stark class imbalances. These are more pronounced than the same setting in equities of minimum return requirements. The imbalance often calls for the need of readjusting the loss function where for trading cost reasons we would value one class over the other. For example, it is easier going long on a bond than to short a bond compared to the equities world.

Trustworthiness of data: The challenges above are compounded by the deteriorating quality of data in bonds of lesser-known issuing entities or lower credit ratings. In a trading landscape where OTC trading still contributes a significant share of the liquidity, bid/ask data and volume recorded from electronic markets are sometimes misleading, and worse, untradeable. This not only influences the training of the models but also the executability of credit trading signals. Often, this means sanity checking the data manually. The trustworthiness of data also feeds back to design on the type of trading decision horizons and therefore the target labels for the credit model.

Mapping of entities: Many commercial data providers carry their own asset mapping IDs. As bonds are issued by firms that, most of the time, also have issued their own shares means that we have an incentive to link equity IDs to the bond IDs. The mapping is important for understanding where the bonds lie on the capital structure and what credit risks they bear. This is less of a problem when one sources data from the same data provider but quickly becomes a tedious task when mapping across databases.

Measuring the quality of a systematic credit model

For any system, there must be a way to conduct quality checks. For machine learning systems, we can rely on metrics such as accuracy, error sizes, f1-scores etc. However, these might not be sufficient for models that produce forecasts for more illiquid holdings. On longer holding periods, it is important to understand the models from a fundamental analyst’s perspective. This means 1) understanding the behaviour of different machine learning systems and algorithms, 2) understanding the contribution and importance of different input features, and 3) understanding the variability of model outputs.

Model response to datasets: We know that different algorithms respond differently to the same dataset. Training an ARMA model will yield different outcomes as a Gaussian process model. Therefore, we need to monitor the performance of each model for the same dataset on their out-of-sample prediction power. Given known issues with input data and potential clustering of erroneous data, it is also important to understand how the algorithms respond to corrupted data at various segments of the datasets, i.e., response to adversarial attacks. As different models have different data requirements, i.e., i.i.d. variables for some statistical models, and large enough datasets for neural nets, we also investigate the models’ performance when varying sizes of datasets. However, this sometimes results in overgeneralizing and glossing over key differentiating features of bonds. Understanding these aspects is key to choosing models given our aforementioned challenges of persistently wide datasets in credit space.

Feature importance: As we vary the models, the large number of data points we feed into the models makes it hard to differentiate which really contain information, and which are simply noise. We can select features by comprehensively searching through the perturbation of features to identify gains in e.g., accuracy. But this is extremely computationally expensive and only works for one instance of the {model, dataset} set when we could possibly have multiple datasets over the years and different clusters. We can map the feature importance easily when using an XGBoost model through LIME/SHAP algorithms, but these are not necessarily applicable to the other models; the same goes for statistical tests on model coefficients. A hack is to combine a leave-one-out algorithm with a blanket blackbox model representing the entire system of models to map from a subset of features to our produced signals.

Variability of model outputs: Models produce signals that can change as fickle as I change my mind when choosing flavours in an ice cream parlour. A common way to deal with this is to smooth signals over time through moving averages. For systematic credit strategies, however, we need to intuitively understand the fickle signals – if we smoothen the signals, surely that means we cannot be that confident about the models’ decisions? To deal with the volatile signals, we can look at measuring the uncertainty of predictions via inductive conformal prediction which also nicely avoids the need to consistently retrain models.

About Arabesque AI

Arabesque AI was founded in 2019 as part of the Arabesque Group. We developed our proprietary Artificial Intelligence Engine to forecast stock prices globally on a daily basis. On top of our AI, we built and launched AutoCIO, a platform that creates bespoke investment strategies. Using AI and Big Data, AutoCIO offers hyper-customization, enabling investors to align their investment and sustainability criteria. At Arabesque, AI is not only a buzzword. We advise over $450mn on our platform, proving that AI is ready to be used in practice.