Pseudo-Graph Neural Networks On Ordinary Differential Equations

In this paper, we extend the idea of continuous-depth models to pseudo graphs and present pseudo graph ordinary differential equations (PGODE), which are inspired by the neural ordinary differential equation (NODE) for data in the Euclidean domain. All existing graph networks have discrete depth. A pseudo graph neural network (PGNN) is used to parameterize the derivative of hidden node states, and the output states are the solution to this ordinary differential equation (ODE). A memory-efficient framework with precise gradient estimates is then proposed for free-form ODEs. We also introduce the framework of continuous–depth pseudo graph neural networks (PGNNs) on ODE by blending discrete structures and differential equations.


Introduction
Convolutional neural networks (CNNs) have excelled at a variety of tasks, including image classification and segmentation, video processing and machine translation [1,3]. CNNs, on the other hand, are confined to data that can be represented by a grid in the Euclidean domain, such as images (2D grid) and text (1D grid), which prevents them from being used in datasets with irregular structures. Objects are represented as nodes in a graph 1 Research Scholar (Reg.No:19221072092002)

, PG & Research Department of Mathematics, The Madurai
DiraviyamThayumanavar Hindu College,Tirunelveli 627 010, Tamil Nadu, India. Email: vembu.iniya@gmail.com data structure, while relationships between objects are represented as edges. Social networks and knowledge graphs are all examples of graphs being used to model irregularly organised data [8]. In [7] proposed a new class of models called graph neural networks (GNN). Researchers adapt convolution techniques to graphs to collect local information, inspired by the success of CNNs. Convolution on a graph can be done in two ways: spectral methods and non-spectral approaches. Spectral approaches often compute the graph Laplacian first, then do spectral domain filtering. For faster processing, other methods seek to approximate the filters without computing the graph Laplacian in [4].
To our knowledge, all of the above-mentioned GNN models have a discrete layer structure. The GNN struggles to describe continuous diffusion processes in graphs due to its discrete nature. A neural network is viewed as an ordinary differential equation (ODE) whose derivative is parameterized by the network, and the output is the solution to this ODE, according to the recently suggested neural ordinary differential equation (NODE) in [2]. We present pseudo graph ordinary differential equations (PGODE), which model message transmission on a graph as an ODE, by extending NODE from the Euclidean domain to graphs.
In this work, we show that this is due to an inaccuracy in gradient estimation during NODE training, and we present a memory-efficient framework for accurate gradient estimation. Our framework for free-form ODEs is shown to generalise to diverse model configurations and achieve better accuracy for both NODE and PGODE. Our contribution can be summarized as follows: 1. We propose a framework for free-form NODEs to accurately estimate the gradient, which is fundamental to deep-learning models. 2. Our framework is memory-efficient for free-form ODEs. 3. We generalize ODE to pseudo graph and propose PGNN on ODE.

Differential Equations and Neural Networks
It has been proposed that neural networks be seen as differential equations. Based on an examination of the ODE, the neural ordinary differential equation (NODE) was proposed by Chen et al. in [2], which interprets the neural network as a continuous ODE. The adjoint technique has been utilised in optimal control and geophysical issues for a long time, and it has recently been applied to ODEs in [2]. To improve the expressive capability of NODEs which is developed into augmented neural ODEs. However, none of the methods above, to our knowledge, address the issue of erroneous gradient estimates; empirical results of NODE are much lower than state-of-the-art discrete-layer models.

Graph Neural Networks
Spectral and non-spectral GNNs are the two types of GNNs. Because spectral GNNs filter in the Fourier domain of a graph, they require information about the entire graph to calculate the graph Laplacian. Non-spectral GNNs, on the other hand, only evaluate message aggregation around neighbour nodes, making them confined and requiring less processing. First, we'll go through a few spectrum approaches in detail. Bruna et al. [9] were the first to introduce graph convolution in the Fourier domain based on the graph Laplacian, but due to non-localized filters, the computing cost is high. Defferrard et al. in [4] applied Chebyshev expansion to approximate the filters without having to compute the graph Laplacian and its eigenvectors, which resulted in a considerable speedup.
Fast localised spectral filtering on graphs was proposed in [4] and applied Chebyshev expansion to approximate the filters without having to compute the graph Laplacian and its eigenvectors, which resulted in a considerable speedup. In semi-supervised node classification tasks using a localised first-order approximation of graph convolution on graph data and achieved improved results. Fast localised spectral filtering on graphs was proposed by Defferrard et al. in [4]. Distinct weights are learned for different neighbours of a node in graph attention networks. The structure of the graph isomorphism network (GIN) is similar to that of the Weisfeiler-Lehman graph isomorphism test.

Discrete Models to Continuous Models
First, we investigate the models for discrete layers with residual connection which can be expressed as follows: where is the states in the th layer; (. ) is any function differentiated whose output is of the same form as the input.
If we add further layers with shared weights and allow steps in Eq. 1, the difference equation is converted into a neural ordinary differential equation (NODE): We express hidden states with ( ) in the continuous case and in the discrete case. The derivative parameterized by a network is denoted as (. ). The shape of differs significantly between Eqs. 1 and 2: in the discrete case, individual layers (different values) have their own function , whereas in the continuous case, is shared over all time .
The forward pass of a discrete layer model can be stated as follows: 0 = , 1 = 0 + 0 ( 0 ), 2 = 1 + 1 ( 1 ), … …, where is the total number of layers. Then an output layer (for example, fully-connected layer for classification) is then applied to .
The forward pass of a NODE is: where (0) is the input and is the integration time, which in the discrete case corresponds to the number of layers . The solution to the NODE is modelled as the transformation of states . The output layer is then applied to ( ).

On Pseudo Graph Networks, A Neural Ordinary Differential Equation
We start with pseudo graph neural networks with discrete layers, then move on to pseudo graph ordinary differential equations in the continuous case (PGODE).

Message Passing in PGNN
A pseudo graph is made up of nodes (shown by circles) and edges (represented by lines) (solid lines). Each node is given its own colour to make it easier to see. A message forwarding mechanism can be used to represent.
where is the states of the pth node in the network at the kth layer, and ( , ) denotes the edge between nodes p and q. The set of neighbour nodes for node p is represented by N(p). represents a permutation invariant, differentiable operation like mean, max, or sum.
Differentiable functions  and  are parameterized by neural networks.
A PGNN can be thought of as a three-stage model for a specific node p: (1) Message passing, in which neighbour nodes q N(p) convey information to node p, denoted by message (q,p). The message is created by a neural network, which is parameterized by function (. ).
(2) Message aggregation is defined as , where a node p collects all messages from its neighbours N(p). Because pseudo graphs are permutation invariant, the aggregation function is often permutation invariant operations like mean and sum.
(3) Notify, in which a node's states are updated based on its initial states −1 , and aggregate of messages , represented by (.).

Continuous-Time Models On Pseudo Graphs
By substituting with the stated message passing process, which we term pseudo graph ordinary differential equation, we can convert a discrete-time PGNN to a continuoustime PGNN (PGODE). PGODE can capture extremely non-linear functions due to its nature as an ODE, and so has the potential to surpass its discrete-layer competitors. We show that PGODE's asymptotic stability is linked to the phenomenon of over-smoothing.
It is shown that pseudo graph convolution is a special case of Laplacian smoothing, which can be written as are the input and output of a graph-conv layer, = + is the adjacency matrix, and d is the corresponding degree matrix of A, and is a positive scaling constant.
Also because symmetrically normalised Laplacian's eigenvalues are all real and nonnegative, the above ODE's eigenvalues are also real and non-positive. Assume that the normalised Laplacian's eigenvalues are all non-zero. The ODE has only negative eigenvalues in this situation; hence the ODE is asymptotically stable. As a result, once t is large enough, all trajectories are close enough. This shows that if the integration time T is long enough, all nodes (from different classes) will have highly identical features, lowering the classification accuracy.

General Bijective Blocks
We show that the bijective blocks defined by are easily generalizable.

Theorem
For bijective blocks whose forward and reverse mappings are defined as, If (, ) is a bijective function with respect to  when  is given, then the block is a bijective mapping.
Proof. Proving that the forward mapping is bijective is the same as proving that it is both injective and surjective.
Then, given the bijective quality of, apply forward and reverse stated in the statement to the forward function.

Conclusion
This article proposes PGNN on ODE, an application that enables us to model continuous diffusion on pseudo graphs, is created. We propose an experience direct backpropagation method for computing the gradient for universal free-form NODEs, and we show that it outperforms other methods on image classification and pseudo graph data. A relationship between PGNN over-smoothing and ODE asymptotic stability was also discovered.