Hui2Vec: Learning Transaction Embedding Through High Utility ItemsetsPosted by on


NARD Intelligence is delighted to announce that a collaborative research project with researchers from the department of Computer Science of the High Institute of Information and Communication Technologies (Tunisia), and from the Big Data Institute at Shenzhen University (China) has been accepted for publication in the proceedings of the International Conference on Big Data Analytics (BDA 2022) that will be held during December 19-22, 2022 in Hederabad, India.

List of all accepted papers at the conference : BDA Accepted Papers List

This work presents a new approach for learning transaction embeddings based on high utility Itemsets. This is an adaptation of Word2vec, the very famous technique from Natural Language Processing (NLP) that uses a neural network model to learn word embeddings from a large corpus of text.

Abstract

Mining frequent itemsets (FIs) in transaction databases is a very popular task in data mining. It helps create meaningful and effective representations for customer transactions which is a key step in the process of transaction classification and clustering. To improve the quality of these representations, previous studies have adapted vector embedding methods to learn transaction embeddings from items and FIs.
However, FIs are still a simple pattern type that ignores important information about transactions such as the purchase quantities of items and their unit profits.
To address this issue, we propose to learn transaction embeddings from items and high-utility itemsets (HUIs), a more general pattern type. Since HUIs were shown to be more appropriate than FIs for a wide range of applications, we take for hypothesis that transaction embeddings learned from HUIs will be more representative and meaningful. We introduce an unsupervised method, named Hui2Vec, to learn transaction embeddings by combining both singleton items and HUIs. We demonstrate the superior quality of the embedding achieved with the proposed method compared to the embeddings learned from items and FIs on four datasets.

#word-embedding #machine-learning #ml #ai #high-utility #data-mining

AIResearch

No Comments Allowed