Leveraging unique secondary-distribution market data, Mercari has begun research and development on a high-precision AI search and recommendation model exclusively for the Japan market
Mercari, Inc. (“Mercari”) is pleased to announce that its research project, “Development of a High-Precision Search and Recommendation Platform Model for the Secondary-Distribution Market Using Generative Retrieval Technology,” has been selected for an initiative led by the Japanese government. The “Post-5G Information and Communication System Infrastructure Enhancement R&D Project / Development of Competitive Generative AI Platform Models (GENIAC),” is an initiative led by the Ministry of Economy, Trade and Industry (METI) and the New Energy and Industrial Technology Development Organization (NEDO).
For this project, Mercari’s research and development organization R4D will leverage the data of over 4 billion listings accumulated on the Mercari marketplace app1 as well as other data unique to the secondary-distribution market. It will use the data to conduct research and develop a foundational model to achieve high-precision AI searches and recommendations using generative retrieval technology. The research is scheduled to run from July to late December of 2026.
1. As of September 10, 2024, the total number of Mercari listings posted in Japan since the service launch on July 2, 2013 https://about.mercari.com/press/news/articles/20240911_4billion/ (article available only in Japanese).
The difficulty of searching for one-of-a-kind items on marketplace apps and the need for a unique AI foundation in Japan
Many of the listings on marketplace apps are for used or one-of-a-kind items that do not have model numbers. Even on Mercari, about 80% of listings consist of “long-tail” items that cannot be linked to existing catalogs. Such items often include descriptions that are short or written in unique technical terms, making it difficult to adequately find an item with conventional search methods that utilize keyword matching.
Moreover, the more unique a long-tail item is, the less related behavioral data is available, so recommendation systems based on purchase history and click data do not perform adequately. Furthermore, there are also limitations to using conventional search technologies to address ambiguous queries written in natural language2. These complex challenges lead to missed opportunities for users to repeat past searches or make purchases.
Generative retrieval is a key technology of this project that can help solve these challenges. An AI trained on vast amounts of item information and that understands the meaning and context of the items can respond to users’ search requests.
Currently, no one in Japan has established generative retrieval technology for the e-commerce search domain, making the development of an AI search foundation rooted in Japan’s unique culture and sensibility an urgent need. The Japanese government has also designated AI as one of the national strategic technology areas in its 7th Science and Technology Innovation Basic Plan (2026–2030). There is a growing awareness that the ability to develop generative AI foundational models will impact international competitiveness.
2. Examples of ambiguous search queries that use natural language: “A vintage-style chair I’d like to take with me on a weekend camping trip,” “An elegant fashion arrangement that I could wear to a wedding,” etc.
Summary of the project: Meeting the challenge of building a Japanese AI foundation using three types of data
Mercari’s research and development organization R4D is working on a project that designs a training dataset combining three types of data to develop an AI search and recommendation model specialized for the secondary-distribution market. The aim of the project is to make accurate suggestions regarding long-tail items in response to user requests and to derive results with an understanding of context when performing ambiguous searches in natural language.
1. Mercari listing data (approximately 4 billion listings)
This data consists of the actual listing and transaction data accumulated in the Mercari marketplace app. The large-scale dataset reflects actual supply and demand in the secondary-distribution market, including item information such as price, category, condition, and text descriptions, as well as search and behavior logs.
2. Item catalog data (approximately 30 million items) from Suruga-ya, available through a capital and business partnership agreement3
This detailed item catalog, held by Suruga-ya, is centered on hobby, entertainment, and collectors’ items. It has a classification system deeply rooted in Japan’s anime, comic, gaming, and entertainment cultures, and it functions as essential reference data for identifying unique items.
3. LLM-generated data—Synthetic nuanced review text (approximately 3 million items)
This consists of synthetic, structured data generated using LLMs such as Llama4 and Qwen. It incorporates nuanced language and impressions like “cute,” “gives off retro vibes,” and “has a casual feel perfect for informal settings.” The data helps bridge the gap between users’ queries written in ambiguous natural language and the characteristics of items.
3. Mercari, Suruga-ya Conclude Capital and Business Partnership Agreement https://about.mercari.com/press/news/articles/20251217_surugaya/ (article available only in Japanese)
Significance of this project
This project will likely have social and business significance from the following perspectives.
Potential for radical improvement of the search experience
This research focuses on a technological foundation that enables users to search for items intuitively using natural language. As such, it has the potential to enhance the convenience of marketplace services. Mercari is currently determining whether to implement the results of this research in the company’s product and will consider any potential implementation based on the research outcomes.
Potential application in crossborder transactions
Japanese item data that contains cultural context could be used to train a foundational model that would enhance searches and recommendations in Mercari’s crossborder e-commerce domain
Acceleration of resource circulation
Improving the precision of marketplace app searches will increase the number of opportunities for valuable items—otherwise buried among a vast number of listings—to find their next owner. The activation of the secondary-distribution market can also contribute to effective resource use.
Comment from Norimasa Kobori, Head of Mercari R4D
“We found that there was an inherent difficulty in the search experience of marketplace apps that could not be resolved simply by improving keyword matching. In this project, we will confront this difficulty head-on using Mercari’s unique asset of secondary-distribution market data. Being selected for a nationwide project like GENIAC shows that the Japanese government recognizes the importance of this challenge. As such, we will further commit to advancing research that enhances the search experience in the secondary-distribution market.”
About GENIAC
GENIAC (Generative AI Accelerator Challenge) is a project led by METI and NEDO to raise the level of platform model development capability in Japan and to encourage companies and others to be creative. Through support measures such as the provision of computing resources, the project aims to accelerate the development of competitive generative AI platform models, enhance the international competitiveness of Japan’s generative AI industry, and contribute to solving social challenges through the widespread adoption of generative AI.
https://www.meti.go.jp/policy/mono_info_service/geniac/index.html
A detailed article on this project has also been published on the R4D website—please check it out here: https://r4d.mercari.com/en/blog/20260604_geniac_ai_details/
Mercari will continue to connect communities across industries, academia, and government, contributing to the development of cutting-edge technology and social implementation while pioneering unseen value.