SCB 10X, SCBX and Stanford CRFM Launched the ThaiExam Leaderboard in HELM: A Thai large language model benchmark derived from standardized examinations in Thailand.


Bangkok, Thailand – October 8, 2024SCB 10X and SCBX, in collaboration with Stanford CRFM (Stanford Center for Research on Foundation Models) introduces the ThaiExam leaderboard, an innovative public benchmark designed to assess large language models (LLMs) on Thai language scenarios. Powered by HELM framework (Holistic Evaluation of Language Models), an industry-leading evaluation framework, this collaboration paves the way for more inclusive, multilingual model evaluations, focusing strongly on the Thai language.

 

The ThaiExam Leaderboard is designed to assess language models in real-world Thai scenarios, derived from standardized high school and financial professional exams such as ONET, TGAT, A-Level, and the Investment Consultant (IC) exam. The leaderboard evaluates a range of leading models, including Typhoon powered by SCB 10X and SCBX, offering full transparency at the prompt level. It also provides reproducible results using the HELM’s framework. This initiative represents a new publicly available leaderboard specifically designed for Thai language evaluation. It is aimed at driving innovation in Thai language model development and evaluation.

 

“This partnership with Stanford CRFM underscores our commitment to advancing Thai NLP and setting the standard for multilingual language model assessments,” said Kasima Tharnpipitchai, Head of AI Strategy at SCB 10X. “We believe the ThaiExam leaderboard will spur innovation in Thai language models and foster collaboration across the AI research community to support underrepresented languages globally.”

 

Addressing Gaps in Multilingual Evaluations

Despite the multilingual capabilities of advanced models like GPT-4 and Claude 3, evaluations predominantly focus on English tasks. The introduction of the ThaiExam leaderboard, powered by HELM’s framework, aims to fill a critical gap. It offers a tailored evaluation system for Thai, a complex language with unique linguistic features. Through HELM’s rigorous methodology, researchers and developers can now assess their models' performance in Thai with accuracy and transparency. With original Thai texts and a comprehensive set of assessments, this initiative offers a much-needed benchmark for understanding how well language models perform in Thai.

 

Results from Evaluating 34 Models on the ThaiExam Leaderboard

Among the 34 notable Thai Language Models evaluated, Typhoon 1.5X Instruct (70B) outperformed closed-source models like GPT-4 Turbo and Claude 3 Sonnet, highlighting its strong Thai language capabilities with an accuracy of 61.7%. Even smaller Typhoon models (8B) surpassed GPT-3.5 Turbo, while models like Claude 3 Haiku and Llama 3 (70B) also showed promising results despite not being specifically trained for Thai. These results underscore the power of Thai-centric fine-tuning in boosting local language performance.

 

Advancing Thai AI Through Global and Regional Collaboration

SCB 10X is committed to advancing AI innovation through strategic partnerships and collaborations with leading AI companies and institutions across Southeast Asia and beyond. By working closely with prominent AI players, SCB 10X leverages collective expertise to drive innovation in the Thai LLM ecosystem. This elevates the quality and relevance of AI solutions tailored specifically for the Southeast Asian market. Notable initiatives include the launch of the ThaiLLM Leaderboard, in collaboration with VISTEC and the SEACrowd Project. This evaluates LLMs using 10 datasets across key tasks, including ThaiExam, to foster growth in Thai NLP research. SCB 10X also partners with researchers from international institutions like the University of Cambridge and Tsinghua University on multimodal hallucination detection with "CrossCheckGPT", and with Mahidol University to leverage AI for both private and national development. Additionally, SCB 10X is also a contributor in initiatives like SEA-LION v2 and Project SEALD, in collaboration with AI Singapore (AISG) aimed at advancing language models for the region. These efforts ensure that Thailand plays an active role in AI technology advancement globally.

 

About HELM and Typhoon

HELM is renowned for its comprehensive and transparent evaluation system for large language models, offering a trusted platform for model benchmarking. By adding ThaiExam to its leaderboard, HELM not only opens up new opportunities for Thai-centric model evaluation but also enhances its reputation as the gold standard for evaluating LLM multilingual capabilities globally.

 

Typhoon models powered by SCB 10X and SCBX, optimized for Thai language tasks such as translation, summarization, and sentiment analysis, have been instrumental in advancing Thai NLP. The Typhoon-1.5X model delivers exceptional results on ThaiExam and other language benchmarks.

For more information about the ThaiExam leaderboard on HELM and to access the results, please visit [Link].