Lab of Media and Network

Department of Computer Science & Technology

Tsinghua University

Comparing Apples to Oranges: A Scalable Solution with Heterogeneous Hashing

M Ou, P Cui, F Wang, J Wang, W Zhu, S Yang. SIGKDD 2013


Although hashing techniques have been popular for the large scale similarity search problem, most of the existing methods for designing optimal hash functions focus on homogeneous similarity assessment. Realizing that heterogeneous entities and relationships are also ubiquitous in the real world applications, there is an emerging need to retrieve and search similar or relevant data entities from multiple heterogeneous domains.

Relation-aware Heterogeneous Hashing (RaHH) provides a general framework for generating hash codes of data entities sitting in multiple heterogeneous domains. Unlike some existing hashing methods that map heterogeneous data in a common Hamming space, the RaHH approach constructs a Hamming space for each type of data entities, and learns optimal mappings between them simultaneously. Moreover, the RaHH framework encodes both homogeneous and heterogeneous relationships between the data entities to design hash functions with im- proved accuracy.


  • Tencent Weibo Dataset
  • References

    Ou M, Cui P, Wang F, et al.Comparing apples to oranges: a scalable solution with heterogeneous hashing[C]//Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2013: 230-238.