Duo Wang Yuan Zuo Fengzhi Li Junjie Wu
MIIT Key Laboratory of Data Intelligence and Management, Beihang University
{wangduo58, zuoyuan, lifengzhi, wujj}@buaa.edu.cn
Corresponding author.
Abstract
Zero-shot graph machine learning, especially with graph neural networks (GNNs), has garnered significant interest due to the challenge of scarce labeled data. While methods like self-supervised learning and graph prompt learning have been extensively explored, they often rely on fine-tuning with task-specific labels, limiting their effectiveness in zero-shot scenarios. Inspired by the zero-shot capabilities of instruction-fine-tuned large language models (LLMs), we introduce a novel framework named Token Embedding-Aligned Graph Language Model (TEA-GLM) that leverages LLMs as cross-dataset and cross-task zero-shot learners for graph machine learning. Concretely, we pretrain a GNN, aligning its representations with token embeddings of an LLM. We then train a linear projector that transforms the GNN’s representations into a fixed number of graph token embeddings without tuning the LLM. A unified instruction is designed for various graph tasks at different levels, such as node classification (node-level) and link prediction (edge-level). These design choices collectively enhance our method’s effectiveness in zero-shot learning, setting it apart from existing methods. Experiments show that our graph token embeddings help the LLM predictor achieve state-of-the-art performance on unseen datasets and tasks compared to other methods using LLMs as predictors. Our code is available athttps://github.com/W-rudder/TEA-GLM.
1 Introduction
Graph Neural Networks (GNNs) have emerged as a pivotal framework in graph machine learning, harnessing the ability to capture intricate message-passing patterns for robust graph representation. These advancements have yielded various GNN architectures, including the Graph Convolution Network (GCN)[20], Graph Attention Network (GAT)[31], and GraphSAGE[11]. Despite their efficacy, GNNs often exhibit limited generalization capabilities, struggling to maintain consistent performance when transitioning across different datasets or downstream tasks[19]. This limitation underscores the necessity for more adaptable and universally applicable models in the graph learning domain.
To mitigate the dependency on labeled data and enhance the resilience of graph models, self-supervised learning has been widely adopted in GNN training. Techniques such as Deep Graph Infomax (DGI)[32] and GraphCL[44] have demonstrated effectiveness by leveraging mutual information maximization and contrastive learning, respectively. However, these methods typically require fine-tuning task-specific heads for downstream applications, which can be resource-intensive and limit their practicality in diverse scenarios. Moreover, graph prompt learning enhances GNN generalization by using unified task templates and meta-learning to adapt to various downstream applications[25, 29], but it often requires extensive fine-tuning and is constrained by the specificity of task types.
In recent years, the remarkable generalization capabilities of Large Language Models (LLMs) have spurred interest in their potential applications within graph machine learning. Some methods attempt to encode graph structures into text for LLM input[3, 10, 33, 23], but these approaches often lead to suboptimal outcomes[18]. Alternatively, using LLMs as enhancers to generate data or node text representations[43, 48, 39, 4, 24] has shown promise but remains constrained by the inherent reliance on GNNs for prediction. Recent efforts[30, 2] to use LLMs as predictors have demonstrated potential. However, their performance often remains unstable due to the challenge of producing transferable graph representations that work effectively for LLMs across diverse tasks and datasets.
In light of these challenges, we propose a novel framework named Token Embedding-Aligned Graph Language Model (TEA-GLM). Inspired by the zero-shot capabilities of instruction-fine-tuned LLMs[34], TEA-GLM leverages LLMs as cross-dataset and cross-task zero-shot predictors for graph machine learning. The core idea is to pretrain a GNN and align its representations with the token embeddings of an LLM. This alignment enables the GNN to effectively utilize the LLM’s pretrained knowledge, allowing it to generalize across different datasets and tasks without task-specific fine-tuning. Additionally, we train a linear projector to convert graph representations into a fixed number of token embeddings, which are then incorporated into a unified instruction designed for various graph tasks at different levels. Experiments show TEA-GLM achieves superior performance in zero-shot scenarios and when encountering unseen tasks, offering a more generalized and efficient solution for graph zero-shot learning. Our contributions are summarized as follows:
- •
We introduce TEA-GLM, a novel framework that aligns GNN representations with LLM token embeddings, enabling cross-dataset and cross-task zero-shot learning for graph machine learning.
- •
We propose a linear projector that maps graph representations into a fixed number of graph token embeddings. These embeddings are incorporated into a unified instruction designed for various graph tasks at different levels, enhancing the model’s generalization capabilities.
- •
Our extensive experiments demonstrate that TEA-GLM significantly outperforms state-of-the-art methods on unseen datasets and tasks.
2 Methodology
In this section, we introduce TEA-GLM, a novel framework designed for cross-dataset and cross-task zero-shot graph machine learning. TEA-GLM consists of two main components: a Graph Neural Network (GNN) to derive node representations from the graph, and a Large Language Model (LLM) to perform zero-shot tasks such as node classification and link prediction. Our methodology involves two key stages: enhanced self-supervised learning of the GNN, where feature-wise contrastive learning with LLM’s token embeddings is proposed, and training a linear projector to map graph representations into a fixed number of graph token embeddings by designing an instruction that is suitable for various graph tasks at different levels. The framework of our proposed method is illustrated in Fig.1.
2.1 Notations
Formally, a graph is denoted as , where with indicating the total number of nodes and representing the sets of nodes and edges, respectively. The adjacency matrix is denoted as , with iff . The feature matrix contains the attribute or feature information associated with each node, where is the feature of , and represents the dimensionality of features.
2.2 Token embeddings-aligned graph self-supervised learning
Given the increasing model sizes and data volumes in recent years, self-supervised learning has become a prominent research focus due to the scarcity of labeled data. In this context, we propose a contrastive learning method to obtain more transferable node representations suitable for use with large language models (LLMs). Our approach leverages instance-wise contrastive learning and introduces a feature-wise contrastive learning method that maps node representations to the textual embedding space of the LLM.
2.2.1 Instance-wise contrastive learning with structural information
To alleviate the need for labeled data and enhance model generalization capability, we employ self-supervised learning for pre-training. To better extract structural information from the graph, we follow the work of [52] to generate two views of , denoted as and , for contrastive learning. Specifically, we adopt the Removing Edges (RE) and Masking Node Features (MF) methods to generate different views. The RE strategy samples a random masking matrix to mask the raw adjacency matrix, computed as:
(1) |
where denotes the Hadamard product. The MF strategy samples a random mask vector . The generated node features are computed by:
(2) |
Thus, we obtain two views of , denoted as and . Then, we use a graph encoder to derive node representations:
(3) |
Where is the dimension size of node representations. Here, represents different views of the graph.
We employ a contrastive objective to distinguish the embeddings of the same node in these two different views from other node embeddings. For node , its node embedding generated in one view, , is treated as the anchor, while the embedding generated in the other view, , forms the positive sample. Embeddings of other nodes in the same view are regarded as intra-view negative samples, while embeddings of other nodes in the other view are regarded as inter-view negative samples. The contrastive loss is defined as:
(4) |
where is an indicator function that equals 1 iff , is the cosine similarity function, and is a temperature parameter. The loss for the other view is similarly defined, and the overall objective is the average of all instances:
(5) |
To enhance the scalability of our method for large-scale graphs, we employ the subsampling approach proposed by[11]. Both the RE and MF methods, along with the loss function described in Equation4, are seamlessly adaptable to the sampled subgraphs.
2.2.2 Feature-wise contrastive learning with token embeddings
Instance-wise contrastive learning relies heavily on individual instances, which can cause transfer issues when transitioning to other datasets. Moreover, there is a significant gap between the obtained node representations and the semantic space of LLMs. To address these issues, we propose feature-wise contrastive learning with token embeddings.
Feature-wise contrastive loss breaks the independence between instances. For the feature matrix , we denote the columns in different views as and . Here, . The loss is denoted as , and is calculated as:
(6) |
To map node representations to the semantic space of LLMs, we use the principal components of the token embeddings of LLMs as coordinate axes. This approach ensures that the representations of similar instances are closely aligned in the textual embedding space. This helps alleviate the inconsistency in optimization objectives during graph self-supervised learning due to the gap between node representations and the text embedding space.
Specifically, we first use principal component analysis (PCA) to obtain the principal components, denoted as , where is the dimension size of token embeddings of LLM. Then, we map node representations by:
(7) |
To map the node representations obtained from the GNN using principal components, we set the output dimension of the GNN to be equal to the token embeddings’ dimension(i.e., ). The columns of the mapped feature matrix , denoted as and , are fed into . Therefore, the final contrastive loss for graph self-supervised learning is the average of Equation 4 and Equation 6:
(8) |
Remark: The introduction of feature-wise contrastive learning with token embeddings successfully addresses the semantic space discrepancy between graph node representations and LLM token embeddings. Our method enables the direct and simple use of graph structural and text information obtained by GNN in LLMs, thereby avoiding the significant generalization loss associated with complex modality alignment training during the fine-tuning process. Its role in fine-tuning will be further described in Sec.2.3.2 and validated by experiments. Additionally, the feature-wise contrastive method itself exhibits stronger generalization, allowing it to perform well on unseen instances(or tasks) rather than relying on trained instances(or tasks).
2.3 Alignment tuning
The development of LLMs has introduced a new paradigm for graph machine learning. However, existing research[18] indicates that LLMs alone cannot fully comprehend graph structures and their underlying information. To enable LLMs to more effectively capture information and improve their performance in cross-dataset and cross-task zero-shot learning, it is essential to design specific methods for LLMs to incorporate graph information suitably. To this end, we propose an alignment tuning method that includes specially designed instructions for various graph tasks at different levels, as well as a graph representation to graph token embeddings mechanism to integrate graph information.
2.3.1 Instructions design
The instruction we designed can be divided into two parts: one part provides graph information, and the other part describes the task. Here, we take a citation graph as an example, where nodes are papers, and relations are citations, to introduce the instruction.
Graph information provision
The graph information provision in the instructions for node, edge, and graph-level tasks is presented as follows: Given the representation of a paper/two papers/a paper set: , with the following information:\nTitle: First Paper: …\n, where is the placeholder for graph inputs (see Sect.2.3.2), and is the node text information.
Note that, different from most work which use LLM as a predictor, the instruction we designed uses only the title of a paper node, excluding more extensive textual information such as its abstract or description. In fact, reducing the amount of input text not only does not decrease the model’s performance but actually improves it. [18] confirmed through experiments that LLMs benefit from structural information only when the target node lacks sufficient phrases for reasonable predictions. Therefore, using only titles as text input can help LLMs extract more critical information from graph information. The complete instruction for the tasks of node classification and link prediction in citation networks is shown in AppendixD.
Task description
To achieve cross-dataset capability, where the model can be trained on one graph dataset and then perform reasoning on any other dataset, the instruction is designed to include not only the task description itself but also the set of alternative answers. Using the node classification task on the Arxiv dataset (see Sect.3.1) as an example, the instruction is structured as follows: Which arXiv CS sub-category does this paper belong to? Please directly give the most likely answer from the following sub-categories: , where represents the set of alternative answers, which varies across datasets. Including alternative answers enables the model to learn the task of “reasoning the answer from a given set according to the task” rather than memorizing answers for a particular dataset, thus facilitating reasoning across datasets.
2.3.2 Graph token embeddings
The token embeddings of graph mentioned previously, i.e., , are crucial for incorporating graph information and enabling the model’s generalization. We use a projector to map central node representations into graph token embeddings and replace with these tokens. Kindly note that, we map the representations to fixed number of token embeddings regardless of the task type. For example, for node-level tasks, we map the central node representation to token embeddings; for edge-level tasks, we pool the representations of the two nodes of the target edge and then map this pooled representation to token embeddings; for graph-level tasks, similar approach can be applied. In this way, we unify the instruction of graph tasks at different levels. Thanks to the text-aligned contrastive learning, a linear projector is enough to capture the map relationship without tuning LLM:
(9) |
where , , is the dimension size of token embedding of LLM, and is a linear layer.
Remark: This approach offers three primary advantages: (i) When handling tasks at different levels, the changes to the instructions are minimal. This consistency facilitates the transfer of knowledge learned during training to unseen tasks in large language models (LLMs); (ii) The fixed number of token embeddings can be seen as a conditional soft prompt. Unlike traditional soft prompts, learning at the instance level reduces the risk of overfitting to specific datasets or tasks, thereby enhancing generalization to unseen datasets and tasks; (iii) Different from current work which intends to include the representations of all nodes in the subgraph, we only map the representations of the central node to tokens, since there has enough information carried by message passing of GNN. This method is more efficient, and it offers greater generalizability and practicality.
2.3.3 Training and evaluation strategy
To ensure compatibility and facilitate comparisons across various datasets, we map the node features into a consistent vector space. Specifically, we employ a pretrained BERT model [8] to encode the raw text associated with each node, thereby generating the node features. We then pretrain the graph model using contrastive learning with the loss function defined in Equation 8 on a single dataset. After pretraining, the model parameters are fixed. We utilize the pretrained model to obtain node representations and follow the instructions in Section 2.3.1 to train the linear projector on specific tasks within the same dataset. Finally, we evaluate the performance of our model on unseen datasets and tasks. Throughout all phases, the parameters of the language model remain fixed. We use GraphSAGE [11] as our graph encoder and Vicuna-7B-v1.5 [7] as the foundational language model.
3 Experimental results
In this section, comprehensive experiments are conducted to validate the effectiveness of TEA-GLM. These experiments aim to investigate the following research questions:
- RQ1:
How effective is TEA-GLM in handling the cross-dataset zero-shot learning problem?
- RQ2:
How well does TEA-GLM transfer knowledge when adapted to an unseen task and dataset in a zero-shot setting?
- RQ3:
What is the contribution of the feature-wise contrastive learning and graph token embeddings to the zero-shot learning ability of TEA-GLM?
3.1 Experimental setup
Datasets
We test TEA-GLM across eight widely used datasets spanning two distinct domains. Within the citation domain, we employ Arxiv[17], Pubmed[13], and an expanded version of Cora[35] with an increased range of classes and larger scale. In these datasets, each node represents an individual paper, with edges indicating citation relationships. In the e-commerce domain, we utilize datasets from the TAG benchmark[41], including Children (Book-Children), History (Book-History), Computer (Ele-Computer), Photo (Ele-Photo), and Sports (Sports-Fitness). Here, nodes represent distinct products, while edges denote co-viewing or co-purchasing between two products. AppendixA presents the statistics for these datasets.
Baselines
We conduct a comprehensive comparison of TEA-GLM with various categories of baseline methods: (i) Non-graph neural network approaches, such as MLP, which employs a Multilayer Perceptron for node representation; (ii) Supervised methods, including GCN[20], GraphSAGE[11], and GAT[31]; (iii) Self-supervised methods like DGI[32], which maximizes mutual information to learn node representations without relying on ground truth labels; (iv) Graph knowledge distillation frameworks: GKD[42], which distills knowledge from a teacher GNN trained on a complete graph to a student GNN operating on a smaller or sparser graph; GLNN[51], a method combining the advantages of graph neural networks and MLPs using knowledge distillation, aimed at reducing dependency on the inference graph; (v) Graph transformer networks, including NodeFormer[36] and DIFFormer[37]; (vi) Large language models, such as Vicuna-7B-v1.5; (vii) The latest models equipped with transfer and zero-shot capabilities, such as OFA[24], GraphGPT[30], and LLaGA[2].
Implementation details
For datasets within the citation domain, we follow the data split methodology outlined in GraphGPT[30]. For those within the e-commerce domain, we utilize scripts provided by the TAG benchmark[41] to generate data splits. To ensure comparability among different methods, identical data splits are applied to all models. To assess the performance of TEA-GLM, we employ three commonly adopted evaluation metrics: Accuracy and Macro F1 for node classification, and AUC (Area Under the Curve) for link prediction. To ensure result robustness, we conduct five experiments with random seed values ranging from 0 to 4 and report the mean and standard deviation of the results. Due to the limited number of pages, several experimental results, such as Macro F1 results of node classification(AppendixB.2), legality rate of valid answers produced by the LLM(AppendixB.1), and parameter sensitivity analysis(AppendixC), are reported in Appendix.
In the pre-training phase of the GNN, we set the GNN layers to 2. We use a batch size of 512 for 60 epochs and a learning rate of . During the training of the linear projector, we configure a batch size of 2 per GPU for one epoch, with a learning rate of . The Adam optimizer is employed for all approaches. For baseline models, we adjust hyperparameters and utilize the optimal settings. All experiments are conducted on 2 NVIDIA A100 GPUs with 80GB memory each, using CUDA version 11.7.
Model type Model Citation E-commerce Pubmed Cora Children History Photo Sports MLP 0.3230.027 0.0210.006 0.0290.037 0.0800.041 0.1100.070 0.0420.021 GNN aspredictor GCN 0.2880.092 0.0170.004 0.0300.018 0.0630.042 0.1030.047 0.0420.025 GraphSAGE 0.3160.058 0.0140.007 0.0080.007 0.1950.206 0.0560.055 0.0510.015 GAT 0.3430.064 0.0160.004 0.0860.084 0.1720.098 0.0500.027 0.1420.138 DGI 0.3290.103 0.0260.009 0.0820.035 0.2180.168 0.2240.127 0.0490.017 GKD 0.3990.033 0.0420.008 0.2020.064 0.3390.138 0.1660.086 0.2080.077 GLNN 0.3900.011 0.0310.006 0.1870.012 0.2830.021 0.4030.019 0.3170.048 NodeFormer 0.3080.093 0.0160.007 0.0480.028 0.1680.127 0.0730.015 0.1650.057 DIFFormer 0.3610.071 0.0290.014 0.1290.030 0.2750.171 0.3210.055 0.3060.131 OFA 0.3140.059 0.1300.019 0.0640.086 0.0520.049 0.3400.026 0.1010.071 LLM aspredictor Vicuna-7B-v1.5 0.7190.010 0.1560.001 0.2700.001 0.3630.001 0.3780.004 0.3700.001 Vicuna-7B-SPT 0.7680.036 0.1680.018 0.2270.015 0.2810.088 0.3500.061 0.2300.018 GraphGPT-std 0.701 0.126 - - - - GraphGPT-cot 0.521 0.181 - - - - LLaGA 0.7930.036 0.1680.032 0.1990.007 0.1460.067 0.2760.069 0.3520.033 TEA-GLM 0.8480.010 0.2020.014 0.2710.010 0.5280.058 0.4970.027 0.4040.010
3.2 Cross-dataset zero-shot ability (RQ1)
We train all methods on the Arxiv and Computer, respectively, followed by an evaluation of their zero-shot performance on datasets from the same domain. Zero-shot learning presents challenges for GNN-based models, particularly regarding variations in the number of classes across different datasets. To address this, we adopt the setting outlined in GraphGPT[30]. For each target dataset, we utilize the GNN backbone trained on the source dataset along with a classifier trained with target data, typically a linear layer. Due to the considerable time cost associated with training and evaluating GraphGPT on e-commerce datasets, we only report its performance on citation datasets as provided in their paper. “-std” and “-cot” denote the use of the standard procedure of dual-stage graph instruction tuning and COT instruction datasets generated by LLM, respectively. To demonstrate the difference between our work and Soft Prompt Tuning, we fine-tuned vicuna-7b-v1.5 using Soft Prompt and reported the results. The Accuracy results are presented in Table1. As mentioned earlier, we report the Macro F1 results in AppendixB.2 and report results on two training datasets in AppendixB.3.
The results clearly demonstrate that TEA-GLM outperforms all state-of-the-art (SOTA) models, resulting in significant improvements. Comparative analysis with baseline models across all datasets highlights the robust generalization capability of TEA-GLM. Models utilizing GNN as a predictor face challenges in achieving cross-dataset transferability with traditional supervised and self-supervised learning methods. Even recently developed robust GNN-based models, such as NodeFormer, DIFFormer, and GKD, encounter similar issues. In the case of OFA, a recent framework for cross-domain learning, strong transferability is observed between topic-related datasets such as Arxiv and Cora (both related to computer science). Nevertheless, its generalization performance notably decreases on datasets with lower topic relevance, such as those in the e-commerce domain.
LLM-based solutions, such as Vicuna-7B, demonstrate consistent performance across various datasets. Nevertheless, their predictive capabilities are confined to text information alone. Vicuna-7B-SPT also fails to achieve transferability on e-commerce datasets, indicating that soft prompt tuning alone is insufficient when relying solely on node texts. This suggests that graph tokens indeed contain transferable graph information, enabling the LLM to make more accurate predictions. In contrast, GNN-LLM-combined solutions that use LLM as a predictor demonstrate generalization ability but often face limitations. For instance, GraphGPT tends to underperform compared to Vicuna-7B, due to the lack of a graph foundation model. Instead of relying on a graph foundation model, LLaGA directly maps node representations without GNN and can generalize on citation datasets. However, it demonstrates limited generalization capability across e-commerce datasets, which are more challenging due to highly irrelevant topics. TEA-GLM, on the other hand, utilizes principal components of token embeddings of LLMs to constrain representations learned by GNN, helping the graph representations well transfer to other datasets. Experimental results validate the superior generalization capabilities of TEA-GLM, achieved with less textual data and fewer parameters.
3.3 Cross-task zero-shot ability (RQ2)
We employ models trained on node classification tasks directly for link prediction tasks without any fine-tuning. We omit the comparison with models utilizing GNN as a predictor, as conducting cross-task evaluation of these models without fine-tuning poses a significant challenge, given that different tasks typically correspond to different task heads. Here, we contrast TEA-GLM with OFA, which similarly enables cross-task testing without the need for fine-tuning. Additionally, we compare TEA-GLM with Vicuna-7B and methods that utilize LLM as a predictor, such as GraphGPT and LLaGA. For GraphGPT, we utilize the checkpoint released by the author trained on Arxiv and report the results on citation datasets. The results are reported in Table2.
In the case of OFA, although this framework facilitates cross-domain and cross-task learning, it exhibits negative transfer when lacking task-relevant data, particularly on unseen tasks. Benefiting from the generalization capability of large language models, both the fine-tuned and non-fine-tuned versions of Vicuna do not experience negative transfer. However, due to the absence of graph information, its predictions often appear random. Conversely, GraphGPT shows transferability with familiar datasets, yet its performance declines when dealing with unseen datasets(Pubmed and Cora). Due to the absence of GNN for filtering and aggregating graph information, LLaGA demonstrates unstable performance. While it exhibits cross-task transferability on citation datasets, its performance is poor on most e-commerce datasets. In contrast, TEA-GLM consistently outperforms all baseline methods on both unseen datasets and tasks, except for the results on Sports, indicating the stronger generalization ability of TEA-GLM.
Model Citation E-commerce Arxiv Pubmed Cora Children History Computer Photo Sports OFA 0.469 0.481 0.492 0.484 0.431 0.461 0.459 0.517 Vicuna-7B-v1.5 0.513 0.543 0.527 0.500 0.515 0.502 0.501 0.502 Vicuna-7B-SPT 0.537 0.535 0.565 0.544 0.543 0.509 0.501 0.508 GraphGPT-std 0.649 0.501 0.520 - - - - - LLaGA 0.570 0.569 0.537 0.422 0.449 0.479 0.478 0.597 TEA-GLM 0.657 0.689 0.586 0.571 0.579 0.554 0.545 0.553
3.4 Ablation study (RQ3)
We conduct an ablation study to discuss two key components of our model: feature-wise contrastive learning and graph token embeddings. Here, we directly remove these two components from our model and then test the model’s performance on cross-dataset and cross-task evaluations. The results are shown in Figure2. “w/o FC” means that we pretrain the GNN without feature-wise contrastive learning, while “w/o GT” means predicting without graph token embeddings.
Without graph token embeddings, large language models lack crucial information from the graph, leading to a significant decline in performance on both node-level and edge-level tasks. GNNs pre-trained with feature-wise contrastive learning can obtain node representations aligned with the text space, enabling cross-dataset and cross-task generalization through a simple linear layer. When the feature-wise constraint for pre-training is absent, the model’s performance on the seen datasets (Arxiv and Computer) for the training task improves slightly. However, its performance on unseen datasets declines. Although it remains relatively stable when handling tasks of the same category, its performance decreases notably when dealing with unseen tasks (link prediction). These results indicate that alignment between graph representation and LLM’s token embeddings via feature-wise contrastive learning is important for cross-task zero-shot transfer.
4 Related work
4.1 Graph neural networks
In the field of graph machine learning, Graph Neural Networks(GNNs) have garnered significant attention[5, 22, 28, 40, 9, 6, 46, 1]. The primary strategy of most GNNs is to capture underlying message-passing patterns for graph representation. Several effective neural network architectures have been proposed, such as Graph Attention Network(GAT)[31], Graph Convolution Network(GCN)[20], and GraphSAGE[11]. Recently, there has been a surge of interest in exploring transformer-based encoders for graph machine learning[49, 45, 36, 37]. However, a notable limitation of GNNs is their generalization capability. Typically, GNNs are trained on specific tasks within particular datasets, and when faced with new datasets or tasks, they often struggle to consistently perform well across different datasets or downstream tasks[19].
4.2 Self-supervised learning and prompt-tuning for GNNs
To alleviate the demand for labeled data and enhance the robustness of graph models, self-supervised learning is commonly employed in GNN training[38, 52, 12]. Methods like Deep Graph Infomax(DGI)[32] utilize mutual information maximization for pre-training. Other approaches, such as GraphCL[44], GCA[53], GCC[26], and JOAO[47], learn node representations by contrasting positive and negative samples. GraphMAE[15, 16], on the other hand, learns representations by generating samples that resemble the original graph structure. However, these methods typically require fine-tuning the task-specific heads for downstream applications.
Various methods have explored the use of prompt techniques to enhance the generalization of GNNs. To address the inconsistency between pre-training and downstream task objectives, GraphPrompt[25] proposes a unified task template applicable to both stages. Additionally, ProG[29] reformulates various task types into a unified graph-level representation and employs meta-learning techniques to enhance multi-task learning capabilities. However, whether through self-supervised learning or graph prompt methods, fine-tuning is often necessary when handling new datasets. Moreover, when confronted with datasets containing varying numbers of categories, retraining of task heads is required to achieve optimal performance.
4.3 Large language models for graphs
With the rapid advancement of Large Language Models(LLMs) and their remarkable generalization capabilities, leveraging LLMs to address transferability issues in graph machine learning has garnered significant attention[10, 14]. Some methods represent graph structure information as text input to LLMs[3, 33, 23]; however, this approach often leads to suboptimal solutions[18]. Another paradigm involves using LLMs as enhancers[43, 48, 39, 4, 24], where they generate data or node text representations. Despite this, since GNNs are ultimately used for prediction, this approach significantly limits the model’s transferability. Recently, considerable efforts have been made to utilize LLMs as predictors. For instance, GraphGPT[30] attempts to align LLMs with pre-trained Graph Transformer encoders through two-stage fine-tuning. However, the fine-tuning, conducted on specific datasets, might weaken the method’s transferability. In light of this, LLaGA[2] introduced a novel encoding method that directly translates graph data into sequences compatible with LLMs. However, this approach may compromise performance due to the lack of GNN filtering and aggregation of graph information. Inspired by these challenges, we propose a pre-training strategy that enhances GNN transferability by aligning its representations with the token embeddings of LLMs, resulting in improved performance in zero-shot tasks. Notably, similar to our method, TEST [27] aligns time series representations with several selected LLM token embeddings. However, our approach differs in that we project graph representations into a feature space defined by the principal components of LLM token embeddings. This enables the LLM to function as a zero-shot learner for graph machine learning tasks, rather than just enhancing performance on specific, seen tasks.
5 Limitations
While our TEA-GLM framework demonstrates considerable promise in enhancing zero-shot learning for graph-based tasks, it does have some limitations. Although the framework we designed can be easily applied to graph-level tasks, we have not yet explored the model’s performance through specific experiments. This will be addressed in our future work.
6 Conclusion
This paper introduces TEA-GLM, a framework that enhances zero-shot learning in graph machine learning by aligning GNN representations with LLM token embeddings. TEA-GLM uses a linear projector to map graph representations into graph token embeddings and incorporates a unified instruction design to handle various graph tasks at different levels. This approach enables consistent performance across various datasets and tasks without task-specific fine-tuning. Extensive experiments show that TEA-GLM outperforms state-of-the-art methods in accuracy and generalization, demonstrating its effectiveness and efficiency in zero-shot learning for graph tasks.
7 Acknowledgement
This work was supported by the National Key R&D Program of China (2023YFC3304700). Dr. Junjie Wu’s work was partially supported by the National Natural Science Foundation of China (72242101, 72031001) and Outstanding Young Scientist Program of Beijing Universities (JWZQ20240201002).
References
- Chen etal. [2018]Jie Chen, Tengfei Ma, and Cao Xiao.FastGCN: Fast learning with graph convolutional networks via importance sampling.In International Conference on Learning Representations, 2018.URL https://openreview.net/forum?id=rytstxWAW.
- Chen etal. [2024a]Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, and Zhangyang Wang.Llaga: Large language and graph assistant.In ICML, 2024a.
- Chen etal. [2023]Zhikai Chen, Haitao Mao, Hang Li, Wei Jin, Hongzhi Wen, Xiaochi Wei, Shuaiqiang Wang, Dawei Yin, Wenqi Fan, Hui Liu, and Jiliang Tang.Exploring the potential of large language models (LLMs) in learning on graph.In NeurIPS 2023 Workshop: New Frontiers in Graph Learning, 2023.URL https://openreview.net/forum?id=ScNNo7v4t0.
- Chen etal. [2024b]Zhikai Chen, Haitao Mao, Hongzhi Wen, Haoyu Han, Wei Jin, Haiyang Zhang, Hui Liu, and Jiliang Tang.Label-free node classification on graphs with large language models (LLMs).In The Twelfth International Conference on Learning Representations, 2024b.URL https://openreview.net/forum?id=hESD2NJFg8.
- Cheng etal. [2023]Jiashun Cheng, Man Li, Jia Li, and Fugee Tsung.Wiener graph deconvolutional network improves graph self-supervised learning.In AAAI, 2023.URL https://doi.org/10.1609/aaai.v37i6.25870.
- Chiang etal. [2019]Wei-Lin Chiang, Xuanqing Liu, SiSi, Yang Li, Samy Bengio, and Cho-Jui Hsieh.Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks.In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 257–266, 2019.
- Chiang etal. [2023]Wei-Lin Chiang, Zhuohan Li, ZiLin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, JosephE. Gonzalez, Ion Stoica, and EricP. Xing.Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, 2023.URL https://lmsys.org/blog/2023-03-30-vicuna/.
- Devlin etal. [2019]Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.BERT: Pre-training of deep bidirectional transformers for language understanding.In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, 2019.URL https://aclanthology.org/N19-1423.
- Gao etal. [2018]Hongyang Gao, Zhengyang Wang, and Shuiwang Ji.Large-scale learnable graph convolutional networks.In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, page 1416–1424, 2018.URL https://doi.org/10.1145/3219819.3219947.
- Guo etal. [2023]Jiayan Guo, Lun Du, and Hengyu Liu.Gpt4graph: Can large language models understand graph structured data ? an empirical evaluation and benchmarking.ArXiv, abs/2305.15066, 2023.URL https://api.semanticscholar.org/CorpusID:258865990.
- Hamilton etal. [2017]Will Hamilton, Zhitao Ying, and Jure Leskovec.Inductive representation learning on large graphs.In Advances in Neural Information Processing Systems, 2017.URL https://proceedings.neurips.cc/paper_files/paper/2017/file/5dd9db5e033da9c6fb5ba83c7a7ebea9-Paper.pdf.
- Hassani and Khasahmadi [2020]Kaveh Hassani and AmirHosein Khasahmadi.Contrastive multi-view representation learning on graphs.In Proceedings of the 37th International Conference on Machine Learning, 2020.
- He etal. [2024]Xiaoxin He, Xavier Bresson, Thomas Laurent, Adam Perold, Yann LeCun, and Bryan Hooi.Harnessing explanations: LLM-to-LM interpreter for enhanced text-attributed graph representation learning.In The Twelfth International Conference on Learning Representations, 2024.URL https://openreview.net/forum?id=RXFVcynVe1.
- He and Hooi [2024]Yufei He and Bryan Hooi.Unigraph: Learning a cross-domain graph foundation model from natural language.arXiv preprint arXiv:2402.13630, 2024.
- Hou etal. [2022]Zhenyu Hou, Xiao Liu, Yukuo Cen, Yuxiao Dong, Hongxia Yang, Chunjie Wang, and Jie Tang.Graphmae: Self-supervised masked graph autoencoders.In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, page 594–604, 2022.URL https://doi.org/10.1145/3534678.3539321.
- Hou etal. [2023]Zhenyu Hou, Yufei He, Yukuo Cen, Xiao Liu, Yuxiao Dong, Evgeny Kharlamov, and Jie Tang.Graphmae2: A decoding-enhanced masked self-supervised graph learner.In Proceedings of the ACM Web Conference 2023, page 737–746, 2023.URL https://doi.org/10.1145/3543507.3583379.
- Hu etal. [2020]Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec.Open graph benchmark: Datasets for machine learning on graphs.In Advances in Neural Information Processing Systems, pages 22118–22133, 2020.URL https://proceedings.neurips.cc/paper_files/paper/2020/file/fb60d411a5c5b72b2e7d3527cfc84fd0-Paper.pdf.
- Huang etal. [2023]Jin Huang, Xingjian Zhang, Qiaozhu Mei, and Jiaqi Ma.Can llms effectively leverage graph structural information: when and why.arXiv preprint arXiv:2309.16595, 2023.
- Ju etal. [2023]Mingxuan Ju, Tong Zhao, Qianlong Wen, Wenhao Yu, Neil Shah, Yanfang Ye, and Chuxu Zhang.Multi-task self-supervised graph neural networks enable stronger task generalization.In The Eleventh International Conference on Learning Representations, 2023.URL https://openreview.net/forum?id=1tHAZRqftM.
- Kipf and Welling [2017a]ThomasN. Kipf and Max Welling.Semi-supervised classification with graph convolutional networks.In International Conference on Learning Representations, 2017a.URL https://openreview.net/forum?id=SJU4ayYgl.
- Kipf and Welling [2017b]ThomasN. Kipf and Max Welling.Semi-supervised classification with graph convolutional networks.In International Conference on Learning Representations, 2017b.URL https://openreview.net/forum?id=SJU4ayYgl.
- Li etal. [2019]Jia Li, Zhichao Han, Hong Cheng, Jiao Su, Pengyun Wang, Jianfeng Zhang, and Lujia Pan.Predicting path failure in time-evolving graphs.In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, page 1279–1289, 2019.URL https://doi.org/10.1145/3292500.3330847.
- Liu and Wu [2023]Chang Liu and BoWu.Evaluating large language models on graphs: Performance insights and comparative analysis.arXiv preprint arXiv:2308.11224, 2023.
- Liu etal. [2024]Hao Liu, Jiarui Feng, Lecheng Kong, Ningyue Liang, Dacheng Tao, Yixin Chen, and Muhan Zhang.One for all: Towards training one graph model for all classification tasks.In The Twelfth International Conference on Learning Representations, 2024.URL https://openreview.net/forum?id=4IT2pgc9v6.
- Liu etal. [2023]Zemin Liu, Xingtong Yu, Yuan Fang, and Xinming Zhang.Graphprompt: Unifying pre-training and downstream tasks for graph neural networks.In Proceedings of the ACM Web Conference 2023, page 417–428, 2023.URL https://doi.org/10.1145/3543507.3583386.
- Qiu etal. [2020]Jiezhong Qiu, Qibin Chen, Yuxiao Dong, Jing Zhang, Hongxia Yang, Ming Ding, Kuansan Wang, and Jie Tang.Gcc: Graph contrastive coding for graph neural network pre-training.In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, page 1150–1160, 2020.URL https://doi.org/10.1145/3394486.3403168.
- Sun etal. [2024]Chenxi Sun, Hongyan Li, Yaliang Li, and Shenda Hong.TEST: Text prototype aligned embedding to activate LLM’s ability for time series.In The Twelfth International Conference on Learning Representations, 2024.URL https://openreview.net/forum?id=Tuh4nZVb0g.
- Sun etal. [2021]Xiangguo Sun, Hongzhi Yin, BoLiu, Hongxu Chen, Qing Meng, Wang Han, and Jiuxin Cao.Multi-level hyperedge distillation for social linking prediction on sparsely observed networks.In Proceedings of the Web Conference 2021, page 2934–2945, 2021.URL https://doi.org/10.1145/3442381.3449912.
- Sun etal. [2023]Xiangguo Sun, Hong Cheng, Jia Li, BoLiu, and Jihong Guan.All in one: Multi-task prompting for graph neural networks.In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, page 2120–2131, 2023.URL https://doi.org/10.1145/3580305.3599256.
- Tang etal. [2023]Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, and Chao Huang.Graphgpt: Graph instruction tuning for large language models.arXiv preprint arXiv:2310.13023, 2023.
- Veličković etal. [2018]Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio.Graph attention networks.In International Conference on Learning Representations, 2018.URL https://openreview.net/forum?id=rJXMpikCZ.
- Veličković etal. [2019]Petar Veličković, William Fedus, WilliamL. Hamilton, Pietro Liò, Yoshua Bengio, and RDevon Hjelm.Deep graph infomax.In International Conference on Learning Representations, 2019.URL https://openreview.net/forum?id=rklz9iAcKQ.
- Wang etal. [2023]Heng Wang, Shangbin Feng, Tianxing He, Zhaoxuan Tan, Xiaochuang Han, and Yulia Tsvetkov.Can language models solve graph problems in natural language?In Thirty-seventh Conference on Neural Information Processing Systems, 2023.URL https://openreview.net/forum?id=UDqHhbqYJV.
- Wei etal. [2022]Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, AdamsWei Yu, Brian Lester, Nan Du, AndrewM. Dai, and QuocV Le.Finetuned language models are zero-shot learners.In International Conference on Learning Representations, 2022.URL https://openreview.net/forum?id=gEZrGCozdqR.
- Wen and Fang [2023]Zhihao Wen and Yuan Fang.Augmenting low-resource text classification with graph-grounded pre-training and prompting.In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, page 506–516, 2023.doi: 10.1145/3539618.3591641.URL https://doi.org/10.1145/3539618.3591641.
- Wu etal. [2022]Qitian Wu, Wentao Zhao, Zenan Li, David Wipf, and Junchi Yan.Nodeformer: A scalable graph structure learning transformer for node classification.In AliceH. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.URL https://openreview.net/forum?id=sMezXGG5So.
- Wu etal. [2023]Qitian Wu, Chenxiao Yang, Wentao Zhao, Yixuan He, David Wipf, and Junchi Yan.DIFFormer: Scalable (graph) transformers induced by energy constrained diffusion.In The Eleventh International Conference on Learning Representations, 2023.URL https://openreview.net/forum?id=j6zUzrapY3L.
- Xia etal. [2022]Jun Xia, Lirong Wu, Jintao Chen, Bozhen Hu, and StanZ. Li.Simgrace: A simple framework for graph contrastive learning without data augmentation.In Proceedings of the ACM Web Conference 2022, page 1070–1079, 2022.URL https://doi.org/10.1145/3485447.3512156.
- Xia etal. [2024]Lianghao Xia, Ben Kao, and Chao Huang.Opengraph: Towards open graph foundation models.arXiv preprint arXiv:2403.01121, 2024.
- Xu etal. [2019]Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka.How powerful are graph neural networks?In International Conference on Learning Representations, 2019.URL https://openreview.net/forum?id=ryGs6iA5Km.
- Yan etal. [2023]Hao Yan, Chaozhuo Li, Ruosong Long, Chao Yan, Jianan Zhao, Wenwen Zhuang, Jun Yin, Peiyan Zhang, Weihao Han, Hao Sun, Weiwei Deng, QiZhang, Lichao Sun, Xing Xie, and Senzhang Wang.A comprehensive study on text-attributed graphs: Benchmarking and rethinking.In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.URL https://openreview.net/forum?id=m2mbfoSuJ1.
- Yang etal. [2022]Chenxiao Yang, Qitian Wu, and Junchi Yan.Geometric knowledge distillation: Topology compression for graph neural networks.In Advances in Neural Information Processing Systems, 2022.URL https://openreview.net/forum?id=7WGNT3MHyBm.
- Ye etal. [2023]Ruosong Ye, Caiqi Zhang, Runhui Wang, Shuyuan Xu, and Yongfeng Zhang.Natural language is all a graph needs.arXiv preprint arXiv:2308.07134, 2023.
- Ying etal. [2021a]Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, DiHe, Yanming Shen, and Tie-Yan Liu.Do transformers really perform badly for graph representation?In Advances in Neural Information Processing Systems, pages 28877–28888, 2021a.URL https://proceedings.neurips.cc/paper_files/paper/2021/file/f1c1592588411002af340cbaedd6fc33-Paper.pdf.
- Ying etal. [2021b]Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, DiHe, Yanming Shen, and Tie-Yan Liu.Do transformers really perform badly for graph representation?In Advances in Neural Information Processing Systems, pages 28877–28888, 2021b.URL https://proceedings.neurips.cc/paper_files/paper/2021/file/f1c1592588411002af340cbaedd6fc33-Paper.pdf.
- You etal. [2020]Y.You, T.Chen, Z.Wang, and Y.Shen.L2-gcn: Layer-wise and learned efficient training of graph convolutional networks.In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 2124–2132, 2020.URL https://doi.ieeecomputersociety.org/10.1109/CVPR42600.2020.00220.
- You etal. [2021]Yuning You, Tianlong Chen, Yang Shen, and Zhangyang Wang.Graph contrastive learning automated.In ICML, 2021.URL https://arxiv.org/abs/2106.07594.
- Yu etal. [2023]Jianxiang Yu, Yuxiang Ren, Chenghua Gong, Jiaqi Tan, Xiang Li, and Xuecang Zhang.Empower text-attributed graphs learning with large language models (llms).arXiv preprint arXiv:2310.09872, 2023.
- Yun etal. [2019]Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and HyunwooJ Kim.Graph transformer networks.In Advances in Neural Information Processing Systems, 2019.URL https://proceedings.neurips.cc/paper_files/paper/2019/file/9d63484abb477c97640154d40595a3bb-Paper.pdf.
- Zhang etal. [2024]Mengmei Zhang, Mingwei Sun, Peng Wang, Shen Fan, Yanhu Mo, Xiaoxiao Xu, Hong Liu, Cheng Yang, and Chuan Shi.Graphtranslator: Aligning graph model to large language model for open-ended tasks.In Proceedings of the ACM Web Conference 2023, 2024.
- Zhang etal. [2022]Shichang Zhang, Yozen Liu, Yizhou Sun, and Neil Shah.Graph-less neural networks: Teaching old MLPs new tricks via distillation.In International Conference on Learning Representations, 2022.URL https://openreview.net/forum?id=4p6_5HBWPCw.
- Zhu etal. [2020]Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang.Deep graph contrastive representation learning.arXiv preprint arXiv:2006.04131, 2020.
- Zhu etal. [2021]Yanqiao Zhu, Yichen Xu, Feng Yu, Qiang Liu, Shu Wu, and Liang Wang.Graph contrastive learning with adaptive augmentation.In Proceedings of the Web Conference 2021, page 2069–2080, 2021.URL https://doi.org/10.1145/3442381.3449802.
Appendix A Dataset description
Domain Dataset #Nodes #Edges #Classes Citation Arxiv 169,343 1,166,243 40 Pubmed 19,717 44,338 3 Cora 25,120 91,140 70 E-commerce Ele-Computer 87,229 721,081 10 Ele-Photo 48,362 500,928 12 Book-Children 76,875 1,554,578 24 Book-History 41,551 358,574 12 Sports-Fitness 173,055 1,773,500 13
Citation datasets
The Arxiv dataset[17] represents a directed citation network among Computer Science (CS) papers from the arXiv preprint server. Each node in this graph corresponds to a paper, while edges represent citation links. The PubMed dataset[13] comprises 19,717 scientific publications from the PubMed database related to diabetes, which are categorized into three distinct classes: Experimentally induced diabetes, Type 1 diabetes, and Type 2 diabetes. This classification reflects the focus of each publication within the broader context of diabetes research. Lastly, the Cora dataset[35], formally known as the “Cora Research Paper Classification Dataset”, provides a comprehensive network for analyzing research paper classifications in machine learning. It is an extended version of the dataset commonly referred to in other studies[21], featuring detailed categorizations.
E-commmerce datasets
All e-commerce datasets are provided in the TAG benchmark[41]. The Books-Children and Books-History datasets are extracted from the Amazon-Books dataset. Books-Children includes items with the second-level label “Children”, while Books-History includes items with the second-level label “History”. Each dataset’s label corresponds to the three-level label of the book. The Ele-Computers dataset comprises items with the second-level label “Computers”, and Ele-Photo includes items with the second-level label “Photo”. Each of these datasets is labeled at the third level for electronic products. The Sports-Fitness dataset, sourced from the Amazon-Sports dataset, contains items with the second-level label “Fitness”. Nodes in this dataset represent fitness-related items, and an edge between two items indicates they are frequently co-purchased or co-viewed.
Appendix B More experimental results
B.1 Legality rate
Dataset Arxiv Computer Pubmed Cora Children History Photo Sports Model Legality rate(%) Vicuna-7B-v1.5 99.3 96.7 100.0 95.8 99.2 98.9 94.1 99.6 LLaGA 100.0 100.0 98.9 79.9 93.1 92.4 77.8 94.3 TEA-GLM 100.0 100.0 100.0 92.6 97.0 99.6 99.2 98.5
After training on specific datasets or tasks, large language models (LLMs) may produce invalid or incorrect answers to given questions. For instance, when handling unseen datasets or tasks, LLMs may generate responses that fall outside the set of acceptable answer candidates. To evaluate the impact of the training process on LLM performance, we follow the approach in [50] and use the legality rate to measure the proportion of valid answers produced by the model.
Table4 demonstrates that the illegality rate of the LLaGA model significantly increases when exposed to datasets it has not previously encountered, suggesting a substantial impact of training methodologies on both the acquisition of knowledge and the model’s ability to generalize. Conversely, our model exhibits a notably stable performance across diverse unseen datasets, achieving higher legality rates in several cases.
B.2 F1 score on node classification task
Model type Model Citation E-commerce Pubmed Cora Children History Photo Sports MLP 0.2460.042 0.0090.004 0.0070.007 0.0230.008 0.0410.023 0.0190.005 GNN aspredictor GCN 0.1870.021 0.0070.001 0.0060.004 0.0240.013 0.0340.007 0.0170.009 GraphSAGE 0.2570.084 0.0070.003 0.0050.003 0.0290.024 0.0200.011 0.0210.004 GAT 0.2590.065 0.0060.001 0.0630.067 0.1590.117 0.0360.035 0.0910.090 DGI 0.2130.127 0.0040.002 0.0120.004 0.0380.015 0.0450.015 0.0180.005 GKD 0.2470.039 0.0040.001 0.0280.003 0.0600.008 0.0490.015 0.0500.008 GLNN 0.2210.033 0.0060.001 0.0210.003 0.0640.007 0.0570.002 0.0520.003 NodeFormer 0.2320.089 0.0080.003 0.0190.008 0.0460.031 0.0550.006 0.0490.009 DIFFormer 0.1870.007 0.0070.002 0.0020.002 0.0500.019 0.0690.010 0.0450.007 OFA 0.2870.059 0.0910.013 0.0170.010 0.0260.007 0.1030.007 0.0430.021 LLM aspredictor Vicuna-7B-v1.5 0.6290.024 0.1090.002 0.2790.002 0.3490.003 0.3830.001 0.4100.002 GraphGPT-std 0.649 0.082 - - - - GraphGPT-cot 0.482 0.127 - - - - LLaGA 0.7780.056 0.1080.014 0.1630.029 0.1440.025 0.3620.039 0.4460.035 TEA-GLM 0.8390.012 0.1480.015 0.2520.005 0.3650.011 0.4210.032 0.4300.009
Due to the absence of a metric to calculate the F1 score while considering the illegality rate, we adopt the methodology used in[50]. For the LLM-backbone models, we only calculate the Macro F1 score for legally permissible responses provided by the model. This calculation method may not accurately reflect the model’s performance fully. Therefore, we also report the illegality rate in Table4. Please note that the accuracy metric is unaffected by illegal responses, which are considered error responses.
B.3 Supervised results
Model type Model Arxiv Computer Acc F1 Acc F1 MLP 0.5460.004 0.2950.007 0.4200.006 0.2670.005 GNN aspredictor GCN 0.5450.005 0.3170.006 0.4240.012 0.3860.014 GraphSAGE 0.5560.006 0.3150.008 0.5340.037 0.3470.036 GAT 0.5610.003 0.3390.005 0.6090.035 0.5980.039 DGI 0.3420.024 0.3360.011 0.5940.004 0.4520.008 GKD 0.3930.085 0.1640.029 0.3510.031 0.1550.016 GLNN 0.6020.004 0.3620.008 0.3930.005 0.2430.007 NodeFormer 0.5440.016 0.2970.029 0.4340.012 0.2880.012 DIFFormer 0.6160.025 0.3560.024 0.6290.012 0.4670.022 OFA 0.6820.006 0.4950.006 0.7530.004 0.6870.006 LLM aspredictor Vicuna-7B-v1.5 0.3470.000 0.1640.001 0.3720.010 0.3040.002 GraphGPT-std 0.626 0.262 - - GraphGPT-cot 0.576 0.228 - - LLaGA 0.7490.001 0.5750.003 0.6420.004 0.5620.001 TEA-GLM 0.6550.001 0.4450.002 0.5780.002 0.4960.010
We report the supervised learning results in Table6. The GNN-backbone models continue to demonstrate robust performance in fitting training data. Similarly, the LLaGA model shows its efficacy in supervised learning scenarios. However, despite their strong performance on training datasets, these models exhibit limited generalization capabilities on unseen datasets as shown in Table1 and Table5.
Appendix C Parameter sensitivity analysis
Number of graph token embeddings
To discuss the impact of the number of graph token embeddings, we set and report the results on node classification task in Figure3. In the context of training datasets and unseen datasets, we observe two distinct patterns. With an increase in the number of graph token embeddings in the training dataset, there is a slight improvement in the model’s performance on that dataset. This suggests that in a supervised learning scenario, enhancing the model’s performance can be achieved by increasing the quantity of graph token embeddings. Conversely, for unseen datasets, our model requires only a minimal number of graph token embeddings to achieve satisfactory performance, indicating that the number of learnable parameters in our model is significantly less than concurrent works.
Number of principal components
We define and discuss the results of the node classification task in Figure4. In supervised learning scenarios, omitting contrastive learning with principal components can lead to a slight increase in accuracy. However, this often makes the model more prone to overfitting on training datasets. When the number of principal components is too small, it adversely affects the model’s learning capability. Remarkably, when , the model demonstrates satisfactory performance. At this level, the principal components capture of the variance of LLM’s token embeddings.
Appendix D Complete instructions
In node classification tasks, we provide candidate labels to facilitate the model’s learning process, focusing on discovering the correct answers rather than merely memorizing them. For link prediction, we structure the instructions in a format similar to that of node classification. This approach is designed to enhance the model’s ability to transfer learned knowledge effectively across different tasks.
Appendix E Cross-task zero-shot results with different pooling methods
Model Citation Arxiv Pubmed Cora OFA 0.469 0.481 0.492 Vicuna-7B-v1.5 0.513 0.543 0.527 Vicuna-7B-SPT 0.537 0.535 0.565 GraphGPT-std 0.649 0.501 0.520 LLaGA 0.570 0.569 0.537 TEA-GLM (max) 0.639 0.650 0.566 TEA-GLM (sum) 0.657 0.689 0.586 TEA-GLM (mean) 0.659 0.690 0.588
Considering that different pooling methods may impact cross-task performance, we conducted experiments using three common pooling methods separately, and the results are shown in the Table 7.