Our approach begins by delving into the core metrics recommended by leading ESG frameworks (such as TCFD, GRI, ISSB) and mandated by key regulations (such as CSRD, SFDR, EU Taxonomy). This ensures that we thoroughly understand each data thematic at a granular level. We then bridge the gap between theory and practice by employing advanced NLP algorithms to analyze millions of ESG-related corporate disclosures. The art of crafting a data product lies in striking the perfect balance between granularity and completeness. Our mission is to provide data that is not only rich in detail but also immediately actionable and seamlessly comparable across time and entities, empowering you to make informed, impactful decisions.
Our technology infrastructure is powered by advanced web-emulation algorithms that meticulously scan thousands of websites at predetermined intervals. This ensures every piece of ESG-related content — whether a document, spreadsheet or webpage — is captured and process without fail. Our strategy is also supported by a dedicated team of experts adept at navigating complex web architectures and websites with restrictive terms of use. This dual approach not only ensures strict compliance with legal standards but also guarantees that our data products are consistently up-to-date, typically within a three-month reporting window. For clients with urgent needs, we can accelerate this process to as fast as 12 hours, delivering data that is both exhaustive and exceptionally current.
Our data extraction process is propelled by an AI-driven architecture, where specialized algorithms work in harmony to optimize precision. It begins with our computer vision algorithms, which convert raw data from PDF, HTML, and XLS files into a uniform, actionable format. Next, our NLP algorithms meticulously curate, translate, enrich, and rank the content, laying the groundwork for fine-tuned Large Language Models (LLMs) to perform deep, precise data extraction. This advanced, multi-stage process ensures that our data extraction is not only state-of-the-art but also exceptionally accurate and efficient, reflecting our commitment to delivering the highest-quality raw ESG data on the market.
Despite ground-breaking progresses in AI research, achieving 100% accuracy in automated data extraction remains elusive. We overcome this limitation with our human-in-the-loop architecture, combining advanced AI with a rigorous, multi-tiered data validation process to ensure the highest standards of accuracy and reliability. Our pipeline begins by assigning quality indicators to each piece of extracted data, predicting its accuracy and completeness. These indicators then guide our team in their review process, where each data source is typically evaluated by two independent analysts. If discrepancies are found, the data is escalated to an arbitration phase, where senior analysts reconcile the differences. This human-in-the-loop approach not only ensures that our AI models keep learning, but also guarantees that our clients receive fully audit-proof data that is traceable to its source.
After the initial compilation, our datasets undergo an exhaustive quality control process to assess their reliability. Each dataset is evaluated through an extensive series of automated Quality Assurance (QA) tests designed to identify anomalies and potential inaccuracies. Our QA strategy combines heuristic analysis with advanced machine learning (ML) techniques. Heuristic evaluations check for common irregularities, such as negative values or unit inconsistencies across reporting years. Simultaneously, our ML-based checks use unsupervised learning to detect anomalies through time-series analyses, distribution-based outlier detection, and clustering analyse. To complement these automated checks, we conduct daily manual audits and sanity reviews of randomly selected data entries. This multi-layered approach reflects our total commitment to delivering data that meets the highest standards of quality and reliability.