Graph Representation Learning


Why do we need graph representation learning? Graph is a general data structure used to encode relational data, such as online social networks, syscall logs in cybersecurity, and many others. Intuitively, from graphs, one could derive contextual features for better decision making. For example, instead of predicting a user's interest solely based on the data of this user, one may look into his/her contextual information, which is the data from this user's close friends or friends' friends in this case. By jointly considering personal and contextual features, one could make more accurate prediction even when a specific user's data are sparse. With the significant value in contextual features, graph representation learning is to learn vector representations for individual nodes, edges, or graphs that encode salient and discriminative contextual features for downstream analytical tasks.

Why is it hard to learn graph representations? The key barrier is the difficulty in defining contextual features. In general, contexts could be something vague, and the exact forms could be task-specific, or even case-specific. Traditional methods attempt to address this challenge by heavey feature engineering. For example, one could suggest a triangle substructure is critical, and the count of triangles within one-hop of a node as one of this node's contextual features. While it is impractical to enumerate all combinatorial possibilities, the resulting performance could be limited as pre-defined features may not even be relevant with regard to a specific task. In 2015, the question in my mind was given labels of a specific task, why don't we let machines automatically figure out which features are informative for prediction? In short, it is the irregular structure and permutation invariance requirement that make it difficult to adapt the learning methods back then for graph representation learning.

My perspectives. My research touches three dimensions of graph representation learning techniques: functionality, generalization, and interpretation. Working with my colleauges at NEC Labs, we have developed machine learning techniques that enhance the three dimensions.

  • Functionality. Our machine learning techniques have enabled inductive learning so that the learned models can be generalized for unseen data [CIKM'18], enhanced expressiveness of learned features by progressive neighborhood aggregation [AAAI'18], enriched contextual features by considering neighborhood interaction [WSDM'21 A], and built the foundation for inductive and unsupervised graph representation learning [ICLR'20].
  • Generalization. In this line of work, we strive to make trained models perform better on unseen testing data. Our techniques are developed around two key ideas: (1) Incorporating priors into graph representation learning by adversarial training [KDD'18][ICDM'19 A], and (2) Automating task-irrelevant graph data recognition and removal during model training [ICML'20][WSDM'21 B][ECML'20].
  • Interpretation. It is critical to make the learned models transparent. We have developed techniques that could provide statistical evidences that highlight important attributes/nodes/edges/substructures for interpreting decision making [NeurIPS'20][ICDM'19 B].
  • Publications

  • [WSDM'21 A] Modeling Context Pair Interaction for Pairwise Tasks on Graphs. [pdf]
  • [WSDM'21 B] Learning to Drop: Robust Graph Neural Network via Topological Denoising. [pdf]
  • [ICML'20] Robust Graph Representation Learning via Neural Sparsification. [pdf]
  • [ICLR'20] Inductive and Unsupervised Representation Learning on Graph Structured Objects. [pdf]
  • [NeurIPS'20] Parameterized Explainer for Graph Neural Network. [pdf]
  • [ECML'20] Node Classification in Temporal Graphs through Stochastic Sparsification and Temporal Structural Convolution. [pdf]
  • [ICDM'19 A] Self-Attentive Attributed Network Embedding Through Adversarial Learning. [pdf]
  • [ICDM'19 B] Adaptive Neural Network for Node Classification in Dynamic Networks. [pdf]
  • [CIKM'18] TGNet: Learning to Rank Nodes in Temporal Graphs. [pdf]
  • [KDD'18] Learning Deep Network Representations with Adversarially Regularized Autoencoders. [pdf]
  • [AAAI'18] Substructure Assembling Network for Graph Classification. [pdf]
  • [top]

    Natural Language Understanding


    In this broad area, my current research focuses on the following topics.

    Feature extraction from texts of multi-facet information. Text descriptions of experts or products usually cover multi-facet information supported by corresponding sentences. In recommendation tasks, it is critical to locate relevant information conditioned on queries. In the context of question routing [WSDM'20] and recommendation [AAAI'20 A], we developed deep architectures that leverage attention conditioned on queries to recognize relevant sentences or features. Our techniques achieve state-of-the-art performance on public benchmark datasets.

    I am currently working on the following two topics. More information will come soon.

  • Efficient knowledge transfer for low-resource tasks.
  • Bringing human understandable decision-making justification by commonsense reasoning
  • Publications

  • [WSDM'20] Temporal Context-Aware Representation Learning for Question Routing. [pdf]
  • [AAAI'20 A] Asymmetrical Hierarchical Networks with Attentive Interactions for Interpretable Review-Based Recommendation. [pdf]
  • [top]

    Density Estimation and Anomaly Detection


    Density estimation is the core of anomaly detection: anomalous samples are usually far away from dense areas where normal samples reside. In this line of research, I focus on the following topics.

    Anomaly detectors. It is always a challenging task to perform anomaly detection for high-dimensional data, as all samples in high-dimensional space seem to be far away from each other. In [ICLR'18], we proposed a deep architecture named DAGMM that simultaneously performs dimension reduction and density estimation in a joint reduced and error space. On multiple public benchmark datasets, DAGMM achieves state-of-the-art detection accuracy. In addition, an extension of DAGMM is able to boost performance in clustering tasks [SDM'19].

  • The research paper [ICLR'18] was one of the top-10 rated submissions according to [link].
  • DAGMM serves as one of the core techniques in NEC's commercial prodcuts for securing communication networks [press coverage]. I received Business Contribution Award in 2019 for the leadership in this achievement.
  • Anomaly detection for system log data. System logs include key information that indicates system health status. As one of the key developers, my colleagues and I have built a log analytics system named NGLA which parses heterogeneous system logs without domain knowledge and detects system anomalies in real-time [ICDCS'18].

  • As one of the key members in NGLA development, I received Business Contribution Award in 2017 for its successful commercialization [press coverage].
  • In [Big-DAMA'18], we developed a deep architecture to embed IP address in a dense vector space. By measuring the estimated distance between IP pairs, our technique could discovery potential spoofing attacks in communication networks. We received the best paper award from SIGCOMM 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks.
  • Anomaly detection for Multivariate time series. We developed a deep architecture that encodes multivariate time series into a joint reduced and error space. In the learned space, our technique achieves significant improvement in detection accuracy for multivariate time series [AAAI'19].


  • [AAAI'19] A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. [pdf]
  • [SDM'19] Deep Co-Clustering. [pdf]
  • [ICLR'18] Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection. [pdf]
  • [ICDCS'18] LogLens: A Real-time Log Analysis System. [pdf]
  • [Big-DAMA'18] Deep Learning IP Network Representations. [pdf]
  • [top]

    Imitation Learning


    A quick intro on imitation learning. The goal of imitation learning is to let machines automatically learn skills from expert demonstrations. Unlike traditional reinforcement learning, imitation learning aims to skip or automatically learn the latent reward function that impacts experts' decision making. Behavior cloning and inverse reinforcement learning are two major imitation learning techniques. Behavior cloning suffers poor generalization caused by compounding errors, and it is difficult to incorporate information from negative demonstrations into inverse reinforcement learning.

    Adversarial imitation learning. Our key insight is to improve generalization performance from positive demonstration and avoid repeating mistakes in negative demonsrations. With this key spirit, we have developed adversarial learning techniques to enforce state-action distribution of a learned policy to move closer to state-action distribution of positive demonstrations, and to move farther from state-action distribution of negative demonstrations [WWW'20].


  • [WWW'20] Adversarial Cooperative Imitation Learning for Dynamic Treatment Regimes. [pdf]
  • [top]

    Application-Centric Studies


    In this line of research, we develop customized machine learning techniques for critical applications.

    Turbulence forecasting [ICDM'20]. Our main challenge comes from label data spasity. By a pre-trained turbulence detection model, our technique is able to generate pseudo labels. By combining true and pseudo labels, our deep learning technique has achieved significant performance improvement.

    Multivariate trend prediction [AAAI'20 B]. We formulate the trend prediction problem as a multi-task learning problem. A deep learning architecture is developed to allow history/memory sharing across different tasks. Our technique achieves state-of-the-art prediction performance on benchmark datasets.

    Multivariate time series retrieval [AAAI'20 C]. We aim to encode multivariate time series segments into vector representations for retrieval. In particular, we develop clustering constraints and adversarial training techniques to enhance the discriminative power and smoothness of learned vector representations. Our technique achieves state-of-the-art performance in time series retrieval tasks.


  • [ICDM'20] T2-Net: A Semi-supervised Deep Model for Turbulence Forecasting. [pdf]
  • [AAAI'20 B] Tensorized LSTM with Adaptive Shared Memory for Learning Trends in Multivariate Time Series. [pdf]
  • [AAAI'20 C] Deep Unsupervised Binary Coding Networks for Multivariate Time Series Retrieval. [pdf]
  • [top]

    ============================== Before Joining NEC Lab ==============================

    Behavior Query Discovery in Temporal Graphs


    System calls generated from computer systems form large-scale heterogeneous temporal graphs that capture the evolution of system activities over time. From this temporal graph, users can query known activity patterns related to system risks; however, these patterns are usually formed by primitive behaviors, such as remote-login, file compression, and so on, which cannot be directly found in the temporal graphs of basic system entities like processes, files, sockets, and pipes. To bridge the gap, we propose the idea of using discriminative sub-traces to represent primitive behaviors and build behavior queries, where the task of mining discriminative sub-traces is formulated as a temporal graph pattern mining problem. To overcome the mining speed issue, we leverage the temporal information in graphs to develop efficient pattern growth algorithms and prune search space with small overhead. Our mining algorithm is up to 90 times faster than the baselines, and the discovered behavior queries achieve high precision (97%) and recall (91%).

    Collaborator: NEC Lab America


  • Bo Zong, Xusheng Xiao, Zhichun Li, Zhenyu Wu, Zhiyun Qian, Xifeng Yan, Ambuj K. Singh, and Guofei Jiang. Behavior Query Discovery in System-Generated Temporal Graphs, VLDB'16 . [pdf] [full version] [slides];
  • [top]

    Critical Alert Mining


    Alerts serve as indicators of malfunction in system performance analysis. While complex systems nowadays could generate tens of thousands of alerts per day, users like system administrators cannot handle these alerts one by one. In this context, we propose critical alert mining that automatically discovers critical alerts from system monitor data and helps users focus on the critical ones instead of blindly checking alerts. In particular, we build a temporal graph that captures the dynamic dependencies among system alerts, and infer top critical alerts from the temporal graph by cost-effective algorithms. Our approach is up to 5, 000 times faster than the baselines, and we do discover critical alerts from real-life system monitor data.

    Collaborator: LogicMonitor


  • Bo Zong, Yinghui Wu, Jie Song, Ambuj K. Singh, Hasan Cam, Jiawei Han, and Xifeng Yan, Towards Scalable Critical Alert Mining, KDD'14. [slides] [pdf]
  • [top]

    Query Placement in Distributed Stream Processing Systems


    In a distributed stream processing system, it is important to pack user-defined queries into servers such that network cost in the system is minimized and load balance is preserved. In this work, we model the subscription relations between queries and data sources as temporal bipartite graphs where query nodes can dynamically join or leave the graphs, and formulate the query placement task as a partitioning problem. Due to the NP-hardness of the problem, we propose online/offline approximation algorithms with performance guarantees as well as cost-effective heuristics. Our algorithms practically work well in a wide range of scenarios, and are adopted in Microsoft products.

    Collaborator: Microsoft Research


  • Bo Zong, Christos Gkantsidis, and Milan Vojnovic. Herding "Small" Streaming Queries, DEBS'15. [pdf];
  • Milan Vojnovic, Christos Gkantsidis, Bo Zong. Streaming Query Resource Control, US Patent. [link].
  • [top]

    Cloud Service Placement


    Cloud datacenter provides a reliable way to share computation resources. In a cloud datacenter, a routine task is to place cloud services: find a set of servers to host a user-defined cloud service. It is challenging to place cloud services, as cloud services usually include topology and computation resource requirements, and the underlying datacenter is dynamically evolving over time. In this work, we formulate the cloud service placement problem as a subgraph matching problem where cloud datacenters are modeled as large temporal graphs and cloud services are represented as small query graphs. With the observation that available computation resources are highly dynamic in a datacenter but its structure is relatively stable, we develop a graph encoding scheme that decomposes graph structures and node/edge attributes in graphs. A flexible graph index is developed based on the encoding scheme to enable fast index search and update. The proposed graph index achieves up to 10 times speedup for subgraph search, and can process 5M updates per second.

    Collaborator: IBM Research


  • Bo Zong, Ramya Raghavendra, Mudhakar Srivatsa, Xifeng Yan, Ambuj K. Singh, and Kang-Won Lee, Cloud Service Placement via Subgraph Matching, ICDE'14. [slides] [pdf]
  • This work is included as a part of IBM System G [link].
  • [top]

    Inferring the Missing Structure in Information Cascades


    Information cascades are temporal graphs that model the traces of information propagation in online social media. On the one hand, they play vital roles in social science study as well as applications like online marketing. On the other hand, it is difficult to obtain complete structures for information cascades due to privacy policies and data noise. In this work, we aim to recover missing structures in cascades, and formulate this task as an optimization problem that searches the optimal way to connect cascade pieces subject to temporal constraints in cascades and structural constraints in online social media. Despite the NP-hardness of this problem, we propose efficient approximation algorithms with performance guarantees. Our approach significantly outperforms the baselines on recovering missing structures in real-life cascades.


  • Bo Zong, Yinghui Wu, Ambuj K. Singh, and Xifeng Yan, Inferring the Underlying Structure of Information Cascades, ICDM'12. [slides] [pdf]
  • [top]

    Routing Optmization in Mobility Networks


    In a network of mobile objects like vehicle networks, it is challenging to spread information in a cost-effective way, due to the dynamic changes in network topology. In this work, we model a mobility network as a temporal graph, and formulate the information routing problem as a node reachability problem. Since an information routing request could involve millions of reachability queries, it is vital to efficiently process the queries for real-time response. To achieve this goal, we build a graph index by compressing temporal graphs into directed acyclic graphs, and process reachability queries in a batch via the graph index. Our method can dynamically find the optimal routing strategy for an incoming request in a few seconds, and the proposed graph index brings up to 100 times speedup to the process of searching optimal routing strategies.


  • Misael Mongiovi, Konstantinos Psounis, Ambuj K. Singh, Xifeng Yan, and Bo Zong, Efficient Multicasting for Delay Tolerant Networks using Graph Indexing, INFOCOM'12. (order authors alphabetically) [pdf]
  • [top]

    Large-Scale Graph Management


    Sedge is a software framework that supports large scale graph processing. Essentially, Sedge is inspired by Google’s Pregel. The Pregel like model is simply blind of intensive inter-machine communication and unbalanced workload. Sedge adds functions to support overlapping partitions, with the goal to process local graph queries faster. Sedge is able to minimize inter-machine communication by dynamically adapting graph partitions to query workload change, as well as data change.


  • Shengqi Yang, Xifeng Yan, Bo Zong, and Arijit Khan, Towards Effective Partition Management for Large Graphs, SIGMOD'12. [pdf]
  • More details about Sedge (including source code) can be accessed via [link]
  • [top]

    keywords: Bo Zong, Graph learning, Deep Learning, Graph Neural Networks, Big Data, Big Graph, Graph analytics, Data Mining, Graph Mining, Dynamic Graphs, Network Dynamics, Temporal Graphs