18 Oct, 2021

ESG, News and the power of Natural Language Processing

Jasmeet Khalsa


Social media and online news have fundamentally changed the way people interact with companies. Posts on platforms like Twitter or LinkedIn, along with blogs and online news articles, provide accounts of stakeholder experiences with companies and their perception of corporate behaviour and allow for the rapid spread of these views. The latter shapes stakeholder perspectives and informs stakeholder actions. As such, social media and online news quickly mirrors and shapes corporate reputation, societal legitimacy, social license to operate, and stakeholder trust [1]. An illustrative example is H&M’s “trashgate” scandal, where store personnel were found to be damaging and dumping unsold clothes in the garbage instead of donating them. Starting with an article in the New York Times, public outrage quickly spread across social media [2]. It was one of the top-three trending topics on Twitter and remained so for several days. Only after the outrage, H&M decided to address the issue. After investigations, it was discovered that the particular New York store was violating the company’s policy which was to donate unsold clothing to charity. “Trashgate” was one of the first examples to show the ways in which social media could raise issues to news  coverage, affect corporate publicity, and force companies to change their actions [3]. It also illustrates how social monitoring can be a powerful asset in the world of sustainability, especially in terms of evaluating Environmental, Social and Governance (“ESG”) risk.

Over recent years, the use of ESG data and analytics has boomed in capital markets [4]. Real-time news and social media data are receiving increasing attention in cutting-edge decision-making strategies. This popularity is grounded in the ability of ESG data to provide insights that are absent from typical financial data. Traditional financial information has limited usefulness to investors today as it allows for data that is both backwards-looking and that only encompasses a narrow financial base. As such, it is insufficient on its own to assess a company’s ability for future profit. For example, financial data did not indicate potential unethical behaviour by H&M, and it only picked up on the reputational (and financial) damage thereof once it had already happened. Therefore, both retail and institutional investors increasingly focus on ESG factors to assess companies. This is supported by ESG research that shows the positive relationship between a firm’s profitability and its ESG metrics [5] and illustrates that ESG data can help reduce portfolio risk [6].

However, ESG data in mainstream investing has three main challenges: most ESG data is qualitative, the landscape of corporate disclosures is incomplete and inconsistent, and disclosures are generally voluntary with sparse available data [7][8]. Many pertinent issues do not manifest in disclosures or regulatory filings and, if they do, the delays caused by reporting and publication cycles can cause relevant data to be out of date by the time it is in the public domain. There is also a significant bottleneck in assessing ESG performance due to the manual effort in continuously sourcing and validating disclosure data. This bottleneck is even more prominent when dealing with large volumes of unstructured text data, such as social media or news. As demand for ESG increases, the need for accurate and near real-time responses to ESG issues becomes clear, and the ability to detect and represent such issues through data sources beyond a company’s filings is paramount. In the ever-changing investment landscape, news and social media data utilisation have become critical to ESG investment strategies.

To properly realise the potential of news data, millions of articles need to be processed daily, and one must look towards the power and capability of Machine Learning (“ML”). Latest advances in Natural Language Processing (“NLP”) increase/strengthen our ability to process unstructured text data. Moving away from pre-determined text/keyword ontologies of the past [9], advances in the field of deep learning have pushed the state-of-the-art towards Transformer-based architectures such as BERT [10]. The key advantage here is leveraging context in decision making. Language is complex – for example, homographs exist, words whose meaning is entirely dependent on context. Without contextual understanding, false positives are likely, and many prominent classical methods are known to fall into this trap. Such approaches have focused on words and the frequency of their occurrence, with words weighted by how often they appear. For example, if a corpus of articles frequently mentions the word ‘exploitation’, such techniques can systematically discount its relevance. Similarly, identifying the difference between the word ‘carbon’ in the context of greenhouse gas emissions or when discussing carbon allotropes is critical in understanding the text in question. In other words, “context is king”.

Across the investment community, researchers and engineers are using machine learning in new and disruptive ways, analysing linguistic information from content, using ESG and sentiment data to determine a company’s commitment to ESG, and evaluating the impact of this commitment on stakeholders. [11] Sokolov et al. [12] show how BERT can be used as a classifier to aid in ESG Scoring, with aggregation approaches used on the output to construct a score. Such scores allow investors to recognise and understand what drives high and low ESG performance among their holdings, informing their approach for engagement. For instance, reflecting the impact of “trashgate” in their decision-making process for H&M. These also supplement brand and reputational risk management with a specific focus on sustainability issues and controversies. 

At Arabesque S-Ray, we are committed to providing innovative tools and incisive insight into ESG data to empower businesses and investors. This includes substantial focus on applied NLP research,  perfecting cutting-edge techniques and deepening sustainability expertise to provide granular insight into corporate behaviours. We are working to design the leading NLP-powered ESG-focused tools that will transform the way investors access and use social and traditional media signals in sustainable investing and aiding responsible business.  


[1] – Pekka Aula, (2010),”Social media, reputation risk and ambient publicity management”, Strategy & Leadership, Vol. 38 Iss: 6 pp. 43 – 49

[2] – ‘‘Reputational risk in digital publicity’’ presented at the Viestinna¨n tutkimuksen pa¨iva¨t, February 12th, 2010, Tampere, Helsinki.

[3] – Laaksonen SM. Hybrid narratives: Organizational reputation in the hybrid media system. Publications of the Faculty of Social Sciences. 2017 Jun 16.

[4] – Lev, B., & Zarowin, P. (1999). The boundaries of financial reporting and how to extend them. Journal of Accounting Research, 37(2), 353-385.

[5] – Clark, Gordon L. and Feiner, Andreas and Viehs, Michael, From the Stockholder to the Stakeholder: How Sustainability Can Drive Financial Outperformance (March 5, 2015)

[6] – Friede, G., T. Busch, and A. Bassen. 2015. “ESG and Financial Performance: Aggregated Evidence from More Than 2000 Empirical Studies.” Journal of Sustainable Finance & Investment 5 (4): 210–233

[7] – Park, Andrew & Ravenel, Curtis. (2013). Integrating Sustainability Into Capital Markets: Bloomberg LP And ESG’s Quantitative Legitimacy. Journal of Applied Corporate Finance

[8] – Henriksson, R., J. Livnat, P. Pfeifer, and M. Stump. 2019. “Integrating ESG in Portfolio Construction.” The Journal of Portfolio Management 45 (4): 67–81.

[9] – Lee Y. H., W. J. Tsao, and T. H. Chu. “Use of Ontology to Support Concept-Based Text Categorization.” In Designing E-Business Systems. Markets, Services, and Networks, edited by C. Weinhardt, S. Luckner, and J. Stößer, 201-213. WEB 2008. Lecture Notes in Business Information Processing, vol 22. Berlin, Heidelberg: Springer. 2009.

[10] – Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.

[11] – https://www.raconteur.net/finance/investing/how-machine-learning-is-helping-investors-find-esg-stocks/

[12] – Building Machine Learning Systems for Automated ESG Scoring, Alik Sokolov, Jonathan Mostovoy, Jack Ding, Luis Seco, The Journal of Impact and ESG Investing Jan 2021