graphwar.utils¶
A progress bar for display. |
|
Container object for datasets Dictionary-like object that exposes its keys as attributes and remembers insertion order. |
|
Centered Kernel Alignment (CKA) metric, where the features of the networks are compared. |
|
Returns the k largest/smallest elements and corresponding indices from an array-like input. |
|
Wrap a function to make some arguments have the same length. |
|
Repeat any objects and return iterable ones. |
|
Randomly split a set of nodes labeled with |
|
Randomly split the training data by the number of nodes per classes. |
|
Initialize the graphwar logger and set its verbosity level to "DEBUG". |
|
Get a logger for a given name. |
|
Filter edges that, if removed, would turn one or more nodes into singleton nodes. |
|
Computes a mask for entries potentially leading to singleton nodes, i.e. one of the two nodes corresponding to the entry have degree 1 and there is an edge between the two nodes. |
|
Likelihood filter from the "Adversarial Attacks on Neural Networks for Graph Data" paper (KDD'18) |
|
Computes a mask for entries potentially leading to singleton nodes, i.e. one of the two nodes corresponding to the entry have degree 1 and there is an edge between the two nodes. |
|
Add edges to the graph denoted as |
|
Remove edges from the graph denoted as |
|
Flip edges from the graph denoted as |
|
Normalize a sparse matrix according to |
|
Feature normalization function. |
|
Compute graph overlapping according to the "Node Similarity Preserving Graph Convolutional Networks" paper (WSDM'21) |
|
Returns induced subgraph of neighbors centered at node n within a given radius. |
- class Progbar(target: int, width: int = 30, verbose: int = 1, interval: float = 0.05, unit_name: str = 'step')[source]¶
A progress bar for display.
- Parameters
target (int) – total number of steps expected.
width (int, optional) – progress bar width on screen, by default 30
verbose (int, optional) – verbosity mode, 0 (silent), 1 (verbose), 2 (semi-verbose), by default 1
interval (float, optional) – minimum visual progress update interval (in seconds), by default 0.05
unit_name (str, optional) – display name for step counts (usually “step” or “sample”), by default ‘step’
Example
>>> from graphwar.utils import Progbar >>> pbar = Progbar(5) >>> for i in range(5): ... pbar.add(1, msg=f'current number {i}') 5/5 [==============================] - Total: 3.22ms - 643us/step- current number 4
>>> pbar = Progbar(5) >>> for i in range(5): ... pbar.update(i+1, msg=f'current number {i}') 5/5 [==============================] - Total: 3.22ms - 643us/step- current number 4
- update(current: int, msg: Optional[Union[str, List, Tuple]] = None, finalize: Optional[bool] = None)[source]¶
Updates the progress bar using current value.
- Parameters
current (int) – index of current step
msg (Optional[Union[str, List, Tuple]], optional) –
(name, value_for_last_step)
or string messages, by default Nonefinalize (Optional[bool], optional) – whether this is the last update for the progress bar. If
None
, defaults tocurrent >= self.target
, by default None
- Raises
ValueError – invalid message
msg
for progress bar.
- class BunchDict(*args, **kwargs)[source]¶
Container object for datasets Dictionary-like object that exposes its keys as attributes and remembers insertion order.
Examples
>>> b = BunchDict(a=1, b=2) >>> b Objects in BunchDict: ╒═════════╤═══════════╕ │ Names │ Objects │ ╞═════════╪═══════════╡ │ a │ 1 │ ├─────────┼───────────┤ │ b │ 2 │ ╘═════════╧═══════════╛ >>> b['b'] 2 >>> b.b 2 >>> b.a = 3 >>> b['a'] 3 >>> b.c = 6 >>> b['c'] 6
>>> # Converting objects in BunchDict to `torch.Tensor` if possible. >>> b = BunchDict(a=[1,2,3]) >>> b.to_tensor() Objects in BunchDict: ╒═════════╤═══════════════════════════════╕ │ Names │ Objects │ ╞═════════╪═══════════════════════════════╡ │ a │ Tensor, shape=torch.Size([3]) │ │ │ tensor([1, 2, 3]) │ ╘═════════╧═══════════════════════════════╛ >>> b.a tensor([1, 2, 3])
- to_tensor(device: str = 'cpu', dtype=None) BunchDict [source]¶
Convert objects in BunchDict to
torch.Tensor
- Parameters
device (str, optional) – device of the converted tensors, by default ‘cpu’
dtype (_type_, optional) – data types of the converted tensors, by default None
- Return type
the converted BunchDict
- class CKA(model1: Module, model2: Module, model1_name: Optional[str] = None, model2_name: Optional[str] = None, model1_layers: Optional[List[str]] = None, model2_layers: Optional[List[str]] = None, device: str = 'cpu')[source]¶
Centered Kernel Alignment (CKA) metric, where the features of the networks are compared.
- Parameters
model1 (nn.Module) – model 1
model2 (nn.Module) – model 2
model1_name (str, optional) – name of model 1, by default None
model2_name (str, optional) – name of model 2, by default None
model1_layers (List[str], optional) – List of layers to extract features from, by default None
model2_layers (List[str], optional) – List of layers to extract features from, by default None
device (str, optional) – device to run the models, by default ‘cpu’
Example
>>> data = ... # get your graph >>> m1 = ... # get your model1 >>> m2 = ... # get your model2 >>> cka = CKA(m1, m2) >>> cka.compare(data) >>> cka.plot_results()
- topk(array: ndarray, k: int, largest: bool = True) topk_values_indices [source]¶
Returns the k largest/smallest elements and corresponding indices from an array-like input.
- Parameters
- Returns
Returns the
k
largest/smallest elements and corresponding indices of the givenarray
- Return type
namedtuple[values, indices]
Example
>>> array = [5, 3, 7, 2, 1] >>> topk(array, 2) topk_values_indices(values=array([7, 5]), indices=array([2, 0], dtype=int64))
>>> topk(array, 2, largest=False) topk_values_indices(values=array([1, 2]), indices=array([4, 3], dtype=int64))
>>> array = [[1, 2], [3, 4], [5, 6]] >>> topk(array, 2) topk_values_indices(values=array([6, 5]), indices=(array([2, 2], dtype=int64), array([1, 0], dtype=int64)))
- wrapper(func: Callable) Callable [source]¶
Wrap a function to make some arguments have the same length. By default, the arguments to be modified are hids and acts.
Uses can custom these arguments by setting argument
includes : to includes custom arguments
excludes : to excludes custom arguments
length_as : to make the length of the arguments the same as length_as, by default, it is hids.
- Parameters
func (Callable) – a function to be wrapped.
- Returns
a wrapped function.
- Return type
Callable
- Raises
TypeError – if the required arguments of the function is missing.
Example
>>> @wrapper ... def func(hids=[16], acts=None): ... print(locals())
>>> func(100) {'hids': [100], 'acts': [None]}
>>> func([100, 64]) {'hids': [100, 64], 'acts': [None, None]}
>>> func([100, 64], excludes=['acts']) {'hids': [100, 64], 'acts': None}
>>> @wrapper ... def func(self, hids=[16], acts=None): ... print(locals())
>>> func() TypeError: The decorated function 'func' missing required argument 'self'.
>>> func('class_itself') {'self': 'class_itself', 'hids': [16], 'acts': [None]}
>>> func('class_itself', hids=[]) {'self': 'class_itself', 'hids': [], 'acts': []}
>>> @wrapper ... def func(self, hids=[16], acts=None, heads=8): ... print(locals())
>>> func('class_itself', hids=[100, 200]) {'self': 'class_itself', 'hids': [100, 200], 'acts': [None, None], 'heads': 8}
>>> func('class_itself', hids=[100, 200], includes=['heads']) {'self': 'class_itself', 'hids': [100, 200], 'acts': [None, None], 'heads': [8, 8]}
- repeat(src: Any, length: Optional[int] = None) Any [source]¶
Repeat any objects and return iterable ones.
- Parameters
src (Any) – any objects
length (Optional[int], optional) – the length to be repeated. If None, it would return the iterable object itself, by default None
- Returns
the iterable repeated object
- Return type
Any
Example
>>> from graphwar.utils import repeat # repeat for single non-iterable object >>> repeat(1) [1] >>> repeat(1, 3) [1, 1, 1] >>> repeat('relu', 2) ['relu', 'relu'] >>> repeat(None, 2) [None, None]
>>> # repeat for iterable object >>> repeat([1, 2, 3], 2) [1, 2] >>> repeat([1, 2, 3], 5) [1, 2, 3, 3, 3]
- split_nodes(labels: Tensor, *, train: float = 0.1, test: float = 0.8, val: float = 0.1, random_state: Optional[int] = None) BunchDict [source]¶
Randomly split a set of nodes labeled with
labels
.- Parameters
labels (Tensor) – the labels of the nodes.
train (float, optional) – the percentage of the training set, by default 0.1
test (float, optional) – the percentage of the test set, by default 0.8
val (float, optional) – the percentage of the validation set, by default 0.1
random_state (Optional[int], optional) – random seed for the random number generator, by default None
- Returns
- train_nodes: torch.Tensor with Size [train * num_nodes]
The indices of the training nodes
- val_nodes: torch.Tensor with Size [val * num_nodes]
The indices of the validation nodes
- test_nodes torch.Tensor with Size [test * num_nodes]
The indices of the test nodes
- Return type
BunchDict with the following items
- split_nodes_by_classes(labels: Tensor, n_per_class: int = 20, random_state: Optional[int] = None) BunchDict [source]¶
Randomly split the training data by the number of nodes per classes.
- Parameters
- Returns
- train_nodes: torch.Tensor with Size [n_per_class * num_classes]
The indices of the training nodes
- val_nodes: torch.Tensor with Size [n_per_class * num_classes]
The indices of the validation nodes
- test_nodes torch.Tensor with Size [num_nodes - 2*n_per_class * num_classes]
The indices of the test nodes
- Return type
BunchDict with the following items
- setup_logger(output: Optional[str] = None, distributed_rank: int = 0, *, mode: str = 'w', color: bool = True, name: str = 'graphwar', abbrev_name: Optional[str] = None)[source]¶
Initialize the graphwar logger and set its verbosity level to “DEBUG”.
- Parameters
output (Optional[str], optional) – a file name or a directory to save log. If None, will not save log file. If ends with “.txt” or “.log”, assumed to be a file name. Otherwise, logs will be saved to output/log.txt.
distributed_rank (int, optional) – used for distributed training, by default 0
mode (str, optional) – mode for the output file (if output is given), by default ‘w’.
color (bool, optional) – whether to use color when printing, by default True
name (str, optional) – the root module name of this logger, by default “graphwar”
abbrev_name (Optional[str], optional) – an abbreviation of the module, to avoid long names in logs. Set to “” to not log the root module in logs. By default, None.
- Returns
a logger
- Return type
Example
>>> logger = setup_logger(name='my exp')
>>> logger.info('message') [12/19 17:01:43 my exp]: message
>>> logger.error('message') ERROR [12/19 17:02:22 my exp]: message
>>> logger.warning('message') WARNING [12/19 17:02:32 my exp]: message
>>> # specify output files >>> logger = setup_logger(output='log.txt', name='my exp') # additive, by default mode='w' >>> logger = setup_logger(output='log.txt', name='my exp', mode='a')
# once you logger is set, you can call it by >>> logger = get_logger(name=’my exp’)
- get_logger(name: str = 'GraphWar')[source]¶
Get a logger for a given name.
- Parameters
name (str, optional) – name of the logger, by default “GraphWar”
- Return type
a logger for the given name
- singleton_filter(edges: ndarray, adj_matrix: csr_matrix)[source]¶
Filter edges that, if removed, would turn one or more nodes into singleton nodes.
- Parameters
edges (np.array, shape [M, 2], where M is the number of input edges.) – The candidate edges to remove.
adj_matrix (sp.sparse_matrix, shape [num_nodes, num_nodes]) – The input adjacency matrix where edges derived from.
- Returns
the edges that removed will not generate singleton nodes.
- Return type
np.array, shape [M, 2],
- class SingletonFilter(adj_matrix: csr_matrix)[source]¶
Computes a mask for entries potentially leading to singleton nodes, i.e. one of the two nodes corresponding to the entry have degree 1 and there is an edge between the two nodes.
- Parameters
adj_matrix (sp.csr_matrix) – the input adjacency matrix
- class LikelihoodFilter(degree: ndarray, ll_cutoff: float = 0.004)[source]¶
Likelihood filter from the “Adversarial Attacks on Neural Networks for Graph Data” paper (KDD’18)
- Parameters
degree (np.ndarray) – the degree of the nodes in the graph
ll_cutoff (float, optional) – likelihood cutoff, by default 0.004
- static compute_alpha(n, S_d, d_min)[source]¶
Approximate the alpha of a power law distribution.
- Parameters
- Returns
alpha – The estimated alpha of the power law distribution
- Return type
- static update_Sx(S_old, n_old, d_old, d_new, d_min)[source]¶
Update on the sum of log degrees S_d and n based on degree distribution resulting from inserting or deleting a single edges.
- Parameters
S_old (float) – Sum of log degrees in the distribution that are larger than or equal to d_min.
n_old (int) – Number of entries in the old distribution that are larger than or equal to d_min.
d_old (np.array, shape [num_nodes,] dtype int) – The old degree sequence.
d_new (np.array, shape [num_nodes,] dtype int) – The new degree sequence
d_min (int) – The minimum degree of nodes to consider
- Returns
new_S_d (float, the updated sum of log degrees in the distribution that are larger than or equal to d_min.)
new_n (int, the updated number of entries in the old distribution that are larger than or equal to d_min.)
- static compute_log_likelihood(n, alpha, S_d, d_min)[source]¶
Compute log likelihood of the powerlaw fit.
- Parameters
n (int) – Number of entries in the old distribution that are larger than or equal to d_min.
alpha (float) – The estimated alpha of the power law distribution
S_d (float) – Sum of log degrees in the distribution that are larger than or equal to d_min.
d_min (int) – The minimum degree of nodes to consider
- Returns
float
- Return type
the estimated log likelihood
- singleton_mask(adj_matrix: Tensor)[source]¶
Computes a mask for entries potentially leading to singleton nodes, i.e. one of the two nodes corresponding to the entry have degree 1 and there is an edge between the two nodes.
- Parameters
adj_matrix (Tensor, shape [N, N],) – where N is the number of nodes the input adjacency matrix to compte the mask
- Returns
mask – a boolean mask with shape as
adj_matrix
.- Return type
bool Tensor
- add_edges(edge_index: Tensor, edges_to_add: Tensor, symmetric: bool = True, sort_edges: bool = True) Tensor [source]¶
Add edges to the graph denoted as
edge_index
.- Parameters
edge_index (Tensor) – the graph instance where edges will be removed from.
edges_to_add (torch.Tensor) – shape [2, M], the edges to be added into the graph.
symmetric (bool) – whether the output graph is symmetric, if True, it would add the edges into the graph by:
edges_to_add = torch.cat([edges_to_add, edges_to_add.flip(0)], dim=1)
- Returns
the graph instance
edge_index
with edges added.- Return type
Tensor
- remove_edges(edge_index: Tensor, edges_to_remove: Tensor, symmetric: bool = True) Tensor [source]¶
Remove edges from the graph denoted as
edge_index
.- Parameters
edge_index (Tensor) – the graph instance where edges will be removed from.
edges_to_remove (torch.Tensor) – shape [2, M], the edges to be removed in the graph.
symmetric (bool) – whether the output graph is symmetric, if True, it would remove the edges from the graph by:
edges_to_remove = torch.cat([edges_to_remove, edges_to_remove.flip(0)], dim=1)
- Returns
the graph instance
edge_index
with edges removed.- Return type
Tensor
- flip_edges(edge_index: Tensor, edges_to_flip: Tensor, symmetric: bool = True) Tensor [source]¶
Flip edges from the graph denoted as
edge_index
.- Parameters
edge_index (Tensor) – the graph instance where edges will be flipped from.
edges_to_flip (torch.Tensor) – shape [2, M], the edges to be flipped in the graph.
symmetric (bool) – whether the output graph is symmetric, if True, it would flip the edges from the graph by:
edges_to_flip = torch.cat([edges_to_flip, edges_to_flip.flip(0)], dim=1)
- Returns
the graph instance
edge_index
with edges flipped.- Return type
Tensor
- scipy_normalize(adj_matrix: csr_matrix, add_self_loops: bool = True) csr_matrix [source]¶
Normalize a sparse matrix according to
GCN
from the “Semi-supervised Classification with Graph Convolutional Networks” paper (ICLR’17)- Parameters
adj_matrix (sp.csr_matrix) – the input sparse matrix denoting a graph.
add_self_loops (bool, optional) – whether to add self-loops, by default True
- Returns
the normalized adjacency matrix.
- Return type
sp.csr_matrix
- normalize(feat: Tensor, norm: str = 'standardize', dim: Optional[int] = None, lim_min: float = - 1.0, lim_max: float = 1.0) Tensor [source]¶
Feature normalization function.
Adapted from GRB: https://github.com/THUDM/grb/blob/master/grb/dataset/dataset.py#L638
- Parameters
feat (Tensor) – node feature matrix with shape [N, D]
norm (Optional[str], optional) – how to normalize feature matrix, including [“linearize”, “arctan”, “tanh”, “standardize”, “none”], by default “standardize”
dim (None or int, optional) – Axis along which the means or standard deviations are computed. The default is to compute the mean or standard deviations of the flattened array, by default None
lim_min (float, optional) – minimum limit of feature, by default -1.0
lim_max (float, optional) – maximum limit of feature, by default 1.0
- Returns
feat – normalized feature matrix
- Return type
Tensor
- overlap(edge_index1: Tensor, edge_index2: Tensor, on: str = 'edge', symmetric: bool = False) float [source]¶
Compute graph overlapping according to the “Node Similarity Preserving Graph Convolutional Networks” paper (WSDM’21)
- Parameters
(Tensor) (edge_index2) – a graph
(Tensor) – another graph
(str) (on) –
- Returns
overlapping of the two graphs on edge or node
- Return type
- ego_graph(adj_matrix: csr_matrix, targets: Union[int, list], hops: int = 1) ego_graph [source]¶
Returns induced subgraph of neighbors centered at node n within a given radius.
- Parameters
- Returns
nodes: shape [N], the nodes of the subgraph edges: shape [2, M], the edges of the subgraph
- Return type
NamedTuple(nodes, edges)
Note
This is a faster implementation of
networkx.ego_graph
based on scipy sparse matrix and numbaSee also
networkx.ego_graph
,torch_geometric.utils.k_hop_subgraph