graphwar.utils

Progbar

A progress bar for display.

BunchDict

Container object for datasets Dictionary-like object that exposes its keys as attributes and remembers insertion order.

CKA

Centered Kernel Alignment (CKA) metric, where the features of the networks are compared.

topk

Returns the k largest/smallest elements and corresponding indices from an array-like input.

wrapper

Wrap a function to make some arguments have the same length.

repeat

Repeat any objects and return iterable ones.

split_nodes

Randomly split a set of nodes labeled with labels.

split_nodes_by_classes

Randomly split the training data by the number of nodes per classes.

setup_logger

Initialize the graphwar logger and set its verbosity level to "DEBUG".

get_logger

Get a logger for a given name.

singleton_filter

Filter edges that, if removed, would turn one or more nodes into singleton nodes.

SingletonFilter

Computes a mask for entries potentially leading to singleton nodes, i.e. one of the two nodes corresponding to the entry have degree 1 and there is an edge between the two nodes.

LikelihoodFilter

Likelihood filter from the "Adversarial Attacks on Neural Networks for Graph Data" paper (KDD'18)

singleton_mask

Computes a mask for entries potentially leading to singleton nodes, i.e. one of the two nodes corresponding to the entry have degree 1 and there is an edge between the two nodes.

add_edges

Add edges to the graph denoted as edge_index.

remove_edges

Remove edges from the graph denoted as edge_index.

flip_edges

Flip edges from the graph denoted as edge_index.

scipy_normalize

Normalize a sparse matrix according to GCN from the "Semi-supervised Classification with Graph Convolutional Networks" paper (ICLR'17)

normalize

Feature normalization function.

overlap

Compute graph overlapping according to the "Node Similarity Preserving Graph Convolutional Networks" paper (WSDM'21)

ego_graph

Returns induced subgraph of neighbors centered at node n within a given radius.

class Progbar(target: int, width: int = 30, verbose: int = 1, interval: float = 0.05, unit_name: str = 'step')[source]

A progress bar for display.

Parameters
  • target (int) – total number of steps expected.

  • width (int, optional) – progress bar width on screen, by default 30

  • verbose (int, optional) – verbosity mode, 0 (silent), 1 (verbose), 2 (semi-verbose), by default 1

  • interval (float, optional) – minimum visual progress update interval (in seconds), by default 0.05

  • unit_name (str, optional) – display name for step counts (usually “step” or “sample”), by default ‘step’

Example

>>> from graphwar.utils import Progbar
>>> pbar = Progbar(5)
>>> for i in range(5):
...     pbar.add(1, msg=f'current number {i}')
5/5 [==============================] - Total: 3.22ms - 643us/step- current number 4
>>> pbar = Progbar(5)
>>> for i in range(5):
...     pbar.update(i+1, msg=f'current number {i}')
5/5 [==============================] - Total: 3.22ms - 643us/step- current number 4
update(current: int, msg: Optional[Union[str, List, Tuple]] = None, finalize: Optional[bool] = None)[source]

Updates the progress bar using current value.

Parameters
  • current (int) – index of current step

  • msg (Optional[Union[str, List, Tuple]], optional) – (name, value_for_last_step) or string messages, by default None

  • finalize (Optional[bool], optional) – whether this is the last update for the progress bar. If None, defaults to current >= self.target, by default None

Raises

ValueError – invalid message msg for progress bar.

add(n: int, msg: Optional[Union[str, List, Tuple]] = None)[source]

Add n steps to the progress bar.

Parameters
  • n (int) – number of steps to add to the progress bar

  • msg (Optional[Union[str, List, Tuple]], optional) – (name, value_for_last_step) or string messages, by default None

static format_num(n: int) str[source]

Intelligent scientific notation (.3g).

Parameters

n (int or float or Numeric) – a Number.

Returns

out – Formatted number.

Return type

str

class BunchDict(*args, **kwargs)[source]

Container object for datasets Dictionary-like object that exposes its keys as attributes and remembers insertion order.

Examples

>>> b = BunchDict(a=1, b=2)
>>> b
Objects in BunchDict:
╒═════════╤═══════════╕
│ Names   │   Objects │
╞═════════╪═══════════╡
│ a       │         1 │
├─────────┼───────────┤
│ b       │         2 │
╘═════════╧═══════════╛
>>> b['b']
2
>>> b.b
2
>>> b.a = 3
>>> b['a']
3
>>> b.c = 6
>>> b['c']
6
>>> # Converting objects in BunchDict to `torch.Tensor` if possible.
>>> b = BunchDict(a=[1,2,3])
>>> b.to_tensor()
Objects in BunchDict:
╒═════════╤═══════════════════════════════╕
│ Names   │ Objects                       │
╞═════════╪═══════════════════════════════╡
│ a       │ Tensor, shape=torch.Size([3]) │
│         │ tensor([1, 2, 3])             │
╘═════════╧═══════════════════════════════╛
>>> b.a
tensor([1, 2, 3])
to_tensor(device: str = 'cpu', dtype=None) BunchDict[source]

Convert objects in BunchDict to torch.Tensor

Parameters
  • device (str, optional) – device of the converted tensors, by default ‘cpu’

  • dtype (_type_, optional) – data types of the converted tensors, by default None

Return type

the converted BunchDict

class CKA(model1: Module, model2: Module, model1_name: Optional[str] = None, model2_name: Optional[str] = None, model1_layers: Optional[List[str]] = None, model2_layers: Optional[List[str]] = None, device: str = 'cpu')[source]

Centered Kernel Alignment (CKA) metric, where the features of the networks are compared.

Parameters
  • model1 (nn.Module) – model 1

  • model2 (nn.Module) – model 2

  • model1_name (str, optional) – name of model 1, by default None

  • model2_name (str, optional) – name of model 2, by default None

  • model1_layers (List[str], optional) – List of layers to extract features from, by default None

  • model2_layers (List[str], optional) – List of layers to extract features from, by default None

  • device (str, optional) – device to run the models, by default ‘cpu’

Example

>>> data = ... # get your graph
>>> m1 = ... # get your model1
>>> m2 = ... # get your model2
>>> cka = CKA(m1, m2)
>>> cka.compare(data)
>>> cka.plot_results()
compare(data1: Data, data2: Data = None) None[source]

Computes the feature similarity between the models on the given datasets.

Parameters
  • data1 (Data) – the dataset where model 1 run on.

  • data2 (Data, optional) – If given, model 2 will run on this dataset. by default None

export() Dict[source]

Exports the CKA data along with the respective model layer names. :return:

plot_results(save_path: Optional[str] = None, title: Optional[str] = None)[source]
topk(array: ndarray, k: int, largest: bool = True) topk_values_indices[source]

Returns the k largest/smallest elements and corresponding indices from an array-like input.

Parameters
  • array (np.ndarray or list) – the array-like input

  • k (int) – the k in “top-k”

  • largest (bool, optional) – controls whether to return largest or smallest elements

Returns

Returns the k largest/smallest elements and corresponding indices of the given array

Return type

namedtuple[values, indices]

Example

>>> array = [5, 3, 7, 2, 1]
>>> topk(array, 2)
topk_values_indices(values=array([7, 5]), indices=array([2, 0], dtype=int64))
>>> topk(array, 2, largest=False)
topk_values_indices(values=array([1, 2]), indices=array([4, 3], dtype=int64))
>>> array = [[1, 2], [3, 4], [5, 6]]
>>> topk(array, 2)
topk_values_indices(values=array([6, 5]), indices=(array([2, 2], dtype=int64), array([1, 0], dtype=int64)))
wrapper(func: Callable) Callable[source]

Wrap a function to make some arguments have the same length. By default, the arguments to be modified are hids and acts.

Uses can custom these arguments by setting argument

  • includes : to includes custom arguments

  • excludes : to excludes custom arguments

  • length_as : to make the length of the arguments the same as length_as, by default, it is hids.

Parameters

func (Callable) – a function to be wrapped.

Returns

a wrapped function.

Return type

Callable

Raises

TypeError – if the required arguments of the function is missing.

Example

>>> @wrapper
... def func(hids=[16], acts=None):
...     print(locals())
>>> func(100)
{'hids': [100], 'acts': [None]}
>>> func([100, 64])
{'hids': [100, 64], 'acts': [None, None]}
>>> func([100, 64], excludes=['acts'])
{'hids': [100, 64], 'acts': None}
>>> @wrapper
... def func(self, hids=[16], acts=None):
...     print(locals())
>>> func()
TypeError: The decorated function 'func' missing required argument 'self'.
>>> func('class_itself')
{'self': 'class_itself', 'hids': [16], 'acts': [None]}
>>> func('class_itself', hids=[])
{'self': 'class_itself', 'hids': [], 'acts': []}
>>> @wrapper
... def func(self, hids=[16], acts=None, heads=8):
...     print(locals())
>>> func('class_itself', hids=[100, 200])
{'self': 'class_itself', 'hids': [100, 200], 'acts': [None, None], 'heads': 8}
>>> func('class_itself', hids=[100, 200], includes=['heads'])
{'self': 'class_itself', 'hids': [100, 200], 'acts': [None, None], 'heads': [8, 8]}
repeat(src: Any, length: Optional[int] = None) Any[source]

Repeat any objects and return iterable ones.

Parameters
  • src (Any) – any objects

  • length (Optional[int], optional) – the length to be repeated. If None, it would return the iterable object itself, by default None

Returns

the iterable repeated object

Return type

Any

Example

>>> from graphwar.utils import repeat
# repeat for single non-iterable object
>>> repeat(1)
[1]
>>> repeat(1, 3)
[1, 1, 1]
>>> repeat('relu', 2)
['relu', 'relu']
>>> repeat(None, 2)
[None, None]
>>> # repeat for iterable object
>>> repeat([1, 2, 3], 2)
[1, 2]
>>> repeat([1, 2, 3], 5)
[1, 2, 3, 3, 3]
split_nodes(labels: Tensor, *, train: float = 0.1, test: float = 0.8, val: float = 0.1, random_state: Optional[int] = None) BunchDict[source]

Randomly split a set of nodes labeled with labels.

Parameters
  • labels (Tensor) – the labels of the nodes.

  • train (float, optional) – the percentage of the training set, by default 0.1

  • test (float, optional) – the percentage of the test set, by default 0.8

  • val (float, optional) – the percentage of the validation set, by default 0.1

  • random_state (Optional[int], optional) – random seed for the random number generator, by default None

Returns

  • train_nodes: torch.Tensor with Size [train * num_nodes]

    The indices of the training nodes

  • val_nodes: torch.Tensor with Size [val * num_nodes]

    The indices of the validation nodes

  • test_nodes torch.Tensor with Size [test * num_nodes]

    The indices of the test nodes

Return type

BunchDict with the following items

split_nodes_by_classes(labels: Tensor, n_per_class: int = 20, random_state: Optional[int] = None) BunchDict[source]

Randomly split the training data by the number of nodes per classes.

Parameters
  • labels (torch.Tensor [num_nodes]) – The class labels

  • n_per_class (int) – Number of samples per class

  • random_state (Optional[int]) – Random seed

Returns

  • train_nodes: torch.Tensor with Size [n_per_class * num_classes]

    The indices of the training nodes

  • val_nodes: torch.Tensor with Size [n_per_class * num_classes]

    The indices of the validation nodes

  • test_nodes torch.Tensor with Size [num_nodes - 2*n_per_class * num_classes]

    The indices of the test nodes

Return type

BunchDict with the following items

setup_logger(output: Optional[str] = None, distributed_rank: int = 0, *, mode: str = 'w', color: bool = True, name: str = 'graphwar', abbrev_name: Optional[str] = None)[source]

Initialize the graphwar logger and set its verbosity level to “DEBUG”.

Parameters
  • output (Optional[str], optional) – a file name or a directory to save log. If None, will not save log file. If ends with “.txt” or “.log”, assumed to be a file name. Otherwise, logs will be saved to output/log.txt.

  • distributed_rank (int, optional) – used for distributed training, by default 0

  • mode (str, optional) – mode for the output file (if output is given), by default ‘w’.

  • color (bool, optional) – whether to use color when printing, by default True

  • name (str, optional) – the root module name of this logger, by default “graphwar”

  • abbrev_name (Optional[str], optional) – an abbreviation of the module, to avoid long names in logs. Set to “” to not log the root module in logs. By default, None.

Returns

a logger

Return type

logging.Logger

Example

>>> logger = setup_logger(name='my exp')
>>> logger.info('message')
[12/19 17:01:43 my exp]: message
>>> logger.error('message')
ERROR [12/19 17:02:22 my exp]: message
>>> logger.warning('message')
WARNING [12/19 17:02:32 my exp]: message
>>> # specify output files
>>> logger = setup_logger(output='log.txt', name='my exp')
# additive, by default mode='w'
>>> logger = setup_logger(output='log.txt', name='my exp', mode='a')

# once you logger is set, you can call it by >>> logger = get_logger(name=’my exp’)

get_logger(name: str = 'GraphWar')[source]

Get a logger for a given name.

Parameters

name (str, optional) – name of the logger, by default “GraphWar”

Return type

a logger for the given name

singleton_filter(edges: ndarray, adj_matrix: csr_matrix)[source]

Filter edges that, if removed, would turn one or more nodes into singleton nodes.

Parameters
  • edges (np.array, shape [M, 2], where M is the number of input edges.) – The candidate edges to remove.

  • adj_matrix (sp.sparse_matrix, shape [num_nodes, num_nodes]) – The input adjacency matrix where edges derived from.

Returns

the edges that removed will not generate singleton nodes.

Return type

np.array, shape [M, 2],

class SingletonFilter(adj_matrix: csr_matrix)[source]

Computes a mask for entries potentially leading to singleton nodes, i.e. one of the two nodes corresponding to the entry have degree 1 and there is an edge between the two nodes.

Parameters

adj_matrix (sp.csr_matrix) – the input adjacency matrix

update(u: int, v: int, edge_weight: float)[source]
class LikelihoodFilter(degree: ndarray, ll_cutoff: float = 0.004)[source]

Likelihood filter from the “Adversarial Attacks on Neural Networks for Graph Data” paper (KDD’18)

Parameters
  • degree (np.ndarray) – the degree of the nodes in the graph

  • ll_cutoff (float, optional) – likelihood cutoff, by default 0.004

update(u: int, v: int, edge_weight: float, idx: int)[source]

Update likelihood ratio test values

static compute_alpha(n, S_d, d_min)[source]

Approximate the alpha of a power law distribution.

Parameters
  • n (int or np.array of int) – Number of entries that are larger than or equal to d_min

  • S_d (float or np.array of float) – Sum of log degrees in the distribution that are larger than or equal to d_min

  • d_min (int) – The minimum degree of nodes to consider

Returns

alpha – The estimated alpha of the power law distribution

Return type

float

static update_Sx(S_old, n_old, d_old, d_new, d_min)[source]

Update on the sum of log degrees S_d and n based on degree distribution resulting from inserting or deleting a single edges.

Parameters
  • S_old (float) – Sum of log degrees in the distribution that are larger than or equal to d_min.

  • n_old (int) – Number of entries in the old distribution that are larger than or equal to d_min.

  • d_old (np.array, shape [num_nodes,] dtype int) – The old degree sequence.

  • d_new (np.array, shape [num_nodes,] dtype int) – The new degree sequence

  • d_min (int) – The minimum degree of nodes to consider

Returns

  • new_S_d (float, the updated sum of log degrees in the distribution that are larger than or equal to d_min.)

  • new_n (int, the updated number of entries in the old distribution that are larger than or equal to d_min.)

static compute_log_likelihood(n, alpha, S_d, d_min)[source]

Compute log likelihood of the powerlaw fit.

Parameters
  • n (int) – Number of entries in the old distribution that are larger than or equal to d_min.

  • alpha (float) – The estimated alpha of the power law distribution

  • S_d (float) – Sum of log degrees in the distribution that are larger than or equal to d_min.

  • d_min (int) – The minimum degree of nodes to consider

Returns

float

Return type

the estimated log likelihood

static filter_chisquare(ll_ratios, cutoff)[source]
singleton_mask(adj_matrix: Tensor)[source]

Computes a mask for entries potentially leading to singleton nodes, i.e. one of the two nodes corresponding to the entry have degree 1 and there is an edge between the two nodes.

Parameters

adj_matrix (Tensor, shape [N, N],) – where N is the number of nodes the input adjacency matrix to compte the mask

Returns

mask – a boolean mask with shape as adj_matrix.

Return type

bool Tensor

add_edges(edge_index: Tensor, edges_to_add: Tensor, symmetric: bool = True, sort_edges: bool = True) Tensor[source]

Add edges to the graph denoted as edge_index.

Parameters
  • edge_index (Tensor) – the graph instance where edges will be removed from.

  • edges_to_add (torch.Tensor) – shape [2, M], the edges to be added into the graph.

  • symmetric (bool) – whether the output graph is symmetric, if True, it would add the edges into the graph by: edges_to_add = torch.cat([edges_to_add, edges_to_add.flip(0)], dim=1)

Returns

the graph instance edge_index with edges added.

Return type

Tensor

remove_edges(edge_index: Tensor, edges_to_remove: Tensor, symmetric: bool = True) Tensor[source]

Remove edges from the graph denoted as edge_index.

Parameters
  • edge_index (Tensor) – the graph instance where edges will be removed from.

  • edges_to_remove (torch.Tensor) – shape [2, M], the edges to be removed in the graph.

  • symmetric (bool) – whether the output graph is symmetric, if True, it would remove the edges from the graph by: edges_to_remove = torch.cat([edges_to_remove, edges_to_remove.flip(0)], dim=1)

Returns

the graph instance edge_index with edges removed.

Return type

Tensor

flip_edges(edge_index: Tensor, edges_to_flip: Tensor, symmetric: bool = True) Tensor[source]

Flip edges from the graph denoted as edge_index.

Parameters
  • edge_index (Tensor) – the graph instance where edges will be flipped from.

  • edges_to_flip (torch.Tensor) – shape [2, M], the edges to be flipped in the graph.

  • symmetric (bool) – whether the output graph is symmetric, if True, it would flip the edges from the graph by: edges_to_flip = torch.cat([edges_to_flip, edges_to_flip.flip(0)], dim=1)

Returns

the graph instance edge_index with edges flipped.

Return type

Tensor

scipy_normalize(adj_matrix: csr_matrix, add_self_loops: bool = True) csr_matrix[source]

Normalize a sparse matrix according to GCN from the “Semi-supervised Classification with Graph Convolutional Networks” paper (ICLR’17)

Parameters
  • adj_matrix (sp.csr_matrix) – the input sparse matrix denoting a graph.

  • add_self_loops (bool, optional) – whether to add self-loops, by default True

Returns

the normalized adjacency matrix.

Return type

sp.csr_matrix

normalize(feat: Tensor, norm: str = 'standardize', dim: Optional[int] = None, lim_min: float = - 1.0, lim_max: float = 1.0) Tensor[source]

Feature normalization function.

Adapted from GRB: https://github.com/THUDM/grb/blob/master/grb/dataset/dataset.py#L638

Parameters
  • feat (Tensor) – node feature matrix with shape [N, D]

  • norm (Optional[str], optional) – how to normalize feature matrix, including [“linearize”, “arctan”, “tanh”, “standardize”, “none”], by default “standardize”

  • dim (None or int, optional) – Axis along which the means or standard deviations are computed. The default is to compute the mean or standard deviations of the flattened array, by default None

  • lim_min (float, optional) – minimum limit of feature, by default -1.0

  • lim_max (float, optional) – maximum limit of feature, by default 1.0

Returns

feat – normalized feature matrix

Return type

Tensor

overlap(edge_index1: Tensor, edge_index2: Tensor, on: str = 'edge', symmetric: bool = False) float[source]

Compute graph overlapping according to the “Node Similarity Preserving Graph Convolutional Networks” paper (WSDM’21)

Parameters
  • (Tensor) (edge_index2) – a graph

  • (Tensor) – another graph

  • (str) (on) –

Returns

overlapping of the two graphs on edge or node

Return type

float

ego_graph(adj_matrix: csr_matrix, targets: Union[int, list], hops: int = 1) ego_graph[source]

Returns induced subgraph of neighbors centered at node n within a given radius.

Parameters
  • adj_matrix (sp.csr_matrix,) – a Scipy CSR sparse adjacency matrix representing a graph

  • targets (Union[int, list]) – center nodes, a single node or a list of nodes

  • hops (int number, optional) – Include all neighbors of distance<=hops from nodes.

Returns

nodes: shape [N], the nodes of the subgraph edges: shape [2, M], the edges of the subgraph

Return type

NamedTuple(nodes, edges)

Note

This is a faster implementation of networkx.ego_graph based on scipy sparse matrix and numba

See also

networkx.ego_graph, torch_geometric.utils.k_hop_subgraph