greatx.utils
A progress bar for display. |
|
Container object for datasets Dictionary-like object that exposes its keys as attributes and remembers insertion order. |
|
Centered Kernel Alignment (CKA) metric, where the features of the networks are compared. |
|
Returns the k largest/smallest elements and corresponding indices from an array-like input. |
|
Wrap a function to make some arguments have the same length. |
|
Repeat any objects and return iterable ones. |
|
Randomly split a set of nodes labeled with |
|
Randomly split the training data by the number of nodes per classes. |
|
Initialize the GreatX logger and set its verbosity level to "DEBUG". |
|
Get a logger for a given name. |
|
Filter edges that, if removed, would turn one or more nodes into singleton nodes. |
|
Computes a mask for entries potentially leading to singleton nodes, i.e. one of the two nodes corresponding to the entry have degree 1 and there is an edge between the two nodes. |
|
Likelihood filter from the "Adversarial Attacks on Neural Networks for Graph Data" paper (KDD'18) |
|
Computes a mask for entries potentially leading to singleton nodes, i.e. one of the two nodes corresponding to the entry have degree 1 and there is an edge between the two nodes. |
|
Add edges to the graph denoted as |
|
Remove edges from the graph denoted as |
|
Flip edges from the graph denoted as |
|
Flip edges from the graph denoted as |
|
Normalize a sparse matrix according to |
|
Feature normalization function. |
|
Compute graph overlapping according to the "Node Similarity Preserving Graph Convolutional Networks" paper (WSDM'21) |
|
Returns induced subgraph of neighbors centered at node n within a given radius. |
|
Implementation of |
|
Marks the predicted classes and corresponding probability for a better view. |
- class Progbar(target: int, width: int = 20, verbose: int = 1, interval: float = 0.05, unit_name: str = 'step')[source]
A progress bar for display.
- Parameters:
target (int) – total number of steps expected.
width (int, optional) – progress bar width on screen, by default 30
verbose (int, optional) – verbosity mode, 0 (silent), 1 (verbose), 2 (semi-verbose), by default 1
interval (float, optional) – minimum visual progress update interval (in seconds), by default 0.05
unit_name (str, optional) – display name for step counts (usually “step” or “sample”), by default ‘step’
Example
>>> from greatx.utils import Progbar >>> pbar = Progbar(5) >>> for i in range(5): ... pbar.add(1, msg=f'current number {i}') 5/5 [===============] - Total: 3.22ms - 643us/step- current number 4
>>> pbar = Progbar(5) >>> for i in range(5): ... pbar.update(i+1, msg=f'current number {i}') 5/5 [===============] - Total: 3.22ms - 643us/step- current number 4
- update(current: int, msg: Optional[Union[str, List, Tuple]] = None, finalize: Optional[bool] = None)[source]
Updates the progress bar using current value.
- Parameters:
current (int) – index of current step
msg (Optional[Union[str, List, Tuple]], optional) –
(name, value_for_last_step)
or string messages, by default Nonefinalize (Optional[bool], optional) – whether this is the last update for the progress bar. If
None
, defaults tocurrent >= self.target
, by default None
- Raises:
ValueError – invalid message
msg
for progress bar.
- class BunchDict(*args, **kwargs)[source]
Container object for datasets Dictionary-like object that exposes its keys as attributes and remembers insertion order.
Examples
>>> b = BunchDict(a=1, b=2) >>> b Objects in BunchDict: ╒═════════╤═══════════╕ │ Names │ Objects │ ╞═════════╪═══════════╡ │ a │ 1 │ ├─────────┼───────────┤ │ b │ 2 │ ╘═════════╧═══════════╛ >>> b['b'] 2 >>> b.b 2 >>> b.a = 3 >>> b['a'] 3 >>> b.c = 6 >>> b['c'] 6
>>> # Converting objects in BunchDict to `torch.Tensor` if possible. >>> b = BunchDict(a=[1,2,3]) >>> b.to_tensor() Objects in BunchDict: ╒═════════╤═══════════════════════════════╕ │ Names │ Objects │ ╞═════════╪═══════════════════════════════╡ │ a │ Tensor, shape=torch.Size([3]) │ │ │ tensor([1, 2, 3]) │ ╘═════════╧═══════════════════════════════╛ >>> b.a tensor([1, 2, 3])
- to_tensor(device: str = 'cpu', dtype=None) BunchDict [source]
Convert objects in BunchDict to
torch.Tensor
- Parameters:
device (str, optional) – device of the converted tensors, by default ‘cpu’
dtype (_type_, optional) – data types of the converted tensors, by default None
- Return type:
the converted BunchDict
- class CKA(model1: Module, model2: Module, model1_name: Optional[str] = None, model2_name: Optional[str] = None, model1_layers: Optional[List[str]] = None, model2_layers: Optional[List[str]] = None, training: bool = False, device: str = 'cpu')[source]
Centered Kernel Alignment (CKA) metric, where the features of the networks are compared.
- Parameters:
model1 (nn.Module) – model 1
model2 (nn.Module) – model 2
model1_name (str, optional) – name of model 1, by default None
model2_name (str, optional) – name of model 2, by default None
model1_layers (List[str], optional) – List of layers to extract features from, by default None
model2_layers (List[str], optional) – List of layers to extract features from, by default None
training (bool, optional) – whether to set training mode (True) or evaluation mode (False) for models. by default False.
device (str, optional) – device to run the models, by default ‘cpu’
Example
data = ... # get your graph m1 = ... # get your model1 m2 = ... # get your model2 cka = CKA(m1, m2) cka.compare(data) cka.plot_results()
Reference:
- topk(array: ndarray, k: int, largest: bool = True) topk_values_indices [source]
Returns the k largest/smallest elements and corresponding indices from an array-like input.
- Parameters:
- Returns:
Returns the
k
largest/smallest elements and corresponding indices of the givenarray
- Return type:
namedtuple[values, indices]
Example
>>> array = [5, 3, 7, 2, 1] >>> topk(array, 2) topk_values_indices(values=array([7, 5]), indices=array([2, 0], dtype=int64))
>>> topk(array, 2, largest=False) topk_values_indices(values=array([1, 2]), indices=array([4, 3], dtype=int64))
>>> array = [[1, 2], [3, 4], [5, 6]] >>> topk(array, 2) topk_values_indices(values=array([6, 5]), indices=(array([2, 2], dtype=int64), array([1, 0], dtype=int64)))
- wrapper(func: Callable) Callable [source]
Wrap a function to make some arguments have the same length. By default, the arguments to be modified are hids and acts.
Uses can custom these arguments by setting argument
includes : to includes custom arguments
excludes : to excludes custom arguments
length_as : to make the length of the arguments the same as length_as, by default, it is hids.
- Parameters:
func (Callable) – a function to be wrapped.
- Returns:
a wrapped function.
- Return type:
Callable
- Raises:
TypeError – if the required arguments of the function is missing.
Example
>>> @wrapper ... def func(hids=[16], acts=None): ... print(locals())
>>> func(100) {'hids': [100], 'acts': [None]}
>>> func([100, 64]) {'hids': [100, 64], 'acts': [None, None]}
>>> func([100, 64], excludes=['acts']) {'hids': [100, 64], 'acts': None}
>>> @wrapper ... def func(self, hids=[16], acts=None): ... print(locals())
>>> func() TypeError: The decorated function 'func' missing required argument 'self'.
>>> func('class_itself') {'self': 'class_itself', 'hids': [16], 'acts': [None]}
>>> func('class_itself', hids=[]) {'self': 'class_itself', 'hids': [], 'acts': []}
>>> @wrapper ... def func(self, hids=[16], acts=None, heads=8): ... print(locals())
>>> func('class_itself', hids=[100, 200]) {'self': 'class_itself', 'hids': [100, 200], 'acts': [None, None], 'heads': 8}
>>> func('class_itself', hids=[100, 200], includes=['heads']) {'self': 'class_itself', 'hids': [100, 200], 'acts': [None, None], 'heads': [8, 8]}
- repeat(src: Any, length: Optional[int] = None) Any [source]
Repeat any objects and return iterable ones.
- Parameters:
src (Any) – any objects
length (Optional[int], optional) – the length to be repeated. If None, it would return the iterable object itself, by default None
- Returns:
the iterable repeated object
- Return type:
Any
Example
>>> from greatx.utils import repeat # repeat for single non-iterable object >>> repeat(1) [1] >>> repeat(1, 3) [1, 1, 1] >>> repeat('relu', 2) ['relu', 'relu'] >>> repeat(None, 2) [None, None]
>>> # repeat for iterable object >>> repeat([1, 2, 3], 2) [1, 2] >>> repeat([1, 2, 3], 5) [1, 2, 3, 3, 3]
- split_nodes(labels: Tensor, *, train: float = 0.1, test: float = 0.8, val: float = 0.1, random_state: Optional[int] = None) BunchDict [source]
Randomly split a set of nodes labeled with
labels
.- Parameters:
labels (torch.Tensor) – the labels of the nodes.
train (float, optional) – the percentage of the training set, by default 0.1
test (float, optional) – the percentage of the test set, by default 0.8
val (float, optional) – the percentage of the validation set, by default 0.1
random_state (Optional[int], optional) – random seed for the random number generator, by default None
- Returns:
- train_nodes: torch.Tensor with Size [train * num_nodes]
The indices of the training nodes
- val_nodes: torch.Tensor with Size [val * num_nodes]
The indices of the validation nodes
- test_nodes torch.Tensor with Size [test * num_nodes]
The indices of the test nodes
- Return type:
BunchDict with the following items
- split_nodes_by_classes(labels: Tensor, n_per_class: int = 20, random_state: Optional[int] = None) BunchDict [source]
Randomly split the training data by the number of nodes per classes.
- Parameters:
labels (torch.Tensor [num_nodes]) – The class labels
n_per_class (int) – Number of samples per class
random_state (Optional[int]) – Random seed
- Returns:
- train_nodes: torch.Tensor with Size [n_per_class * num_classes]
The indices of the training nodes
- val_nodes: torch.Tensor with Size [n_per_class * num_classes]
The indices of the validation nodes
- test_nodes torch.Tensor with Size [num_nodes - 2*n_per_class * num_classes]
The indices of the test nodes
- Return type:
BunchDict with the following items
- setup_logger(output: Optional[str] = None, name: str = 'GreatX', *, distributed_rank: int = 0, mode: str = 'w', color: bool = True, abbrev_name: Optional[str] = None) Logger [source]
Initialize the GreatX logger and set its verbosity level to “DEBUG”.
- Parameters:
output (Optional[str], optional) – a file name or a directory to save log. If None, will not save log file. If ends with “.txt” or “.log”, assumed to be a file name. Otherwise, logs will be saved to {output}/log.txt.
name (str, optional) – the root module name of this logger, by default “GreatX”
distributed_rank (int, optional) – used for distributed training, by default 0
mode (str, optional) – mode for the output file (if output is given), by default ‘w’.
color (bool, optional) – whether to use color when printing, the termcolor package is required, by default True
abbrev_name (Optional[str], optional) – an abbreviation of the module, to avoid long names in logs. Set to “” to not log the root module in logs. By default, None.
- Returns:
a logger
- Return type:
Example
>>> logger = setup_logger(name='my exp')
>>> logger.info('message') [12/19 17:01:43 my exp]: message
>>> logger.error('message') ERROR [12/19 17:02:22 my exp]: message
>>> logger.warning('message') WARNING [12/19 17:02:32 my exp]: message
>>> # specify output files >>> logger = setup_logger(output='log.txt', name='my exp') # additive, by default mode='w' >>> logger = setup_logger(output='log.txt', name='my exp', mode='a')
# once you logger is set, you can call it by >>> logger = get_logger(name=’my exp’)
- get_logger(name: str = 'GreatX') Logger [source]
Get a logger for a given name.
- Parameters:
name (str, optional) – name of the logger, by default “GreatX”
- Return type:
a logger for the given name
- singleton_filter(edges: ndarray, adj_matrix: csr_matrix)[source]
Filter edges that, if removed, would turn one or more nodes into singleton nodes.
- Parameters:
edges (np.array, shape [M, 2], where M is the number of input edges.) – The candidate edges to remove.
adj_matrix (sp.sparse_matrix, shape [num_nodes, num_nodes]) – The input adjacency matrix where edges derived from.
- Returns:
the edges that removed will not generate singleton nodes.
- Return type:
np.array, shape [M, 2],
- class SingletonFilter(adj_matrix: csr_matrix)[source]
Computes a mask for entries potentially leading to singleton nodes, i.e. one of the two nodes corresponding to the entry have degree 1 and there is an edge between the two nodes.
- Parameters:
adj_matrix (sp.csr_matrix) – the input adjacency matrix
- class LikelihoodFilter(degree: ndarray, ll_cutoff: float = 0.004)[source]
Likelihood filter from the “Adversarial Attacks on Neural Networks for Graph Data” paper (KDD’18)
- Parameters:
degree (np.ndarray) – the degree of the nodes in the graph
ll_cutoff (float, optional) – likelihood cutoff, by default 0.004
- static compute_alpha(n, S_d, d_min)[source]
Approximate the alpha of a power law distribution.
- Parameters:
- Returns:
alpha – The estimated alpha of the power law distribution
- Return type:
- static update_Sx(S_old, n_old, d_old, d_new, d_min)[source]
Update on the sum of log degrees S_d and n based on degree distribution resulting from inserting or deleting a single edges.
- Parameters:
S_old (float) – Sum of log degrees in the distribution that are larger than or equal to d_min.
n_old (int) – Number of entries in the old distribution that are larger than or equal to d_min.
d_old (np.array, shape [num_nodes,] dtype int) – The old degree sequence.
d_new (np.array, shape [num_nodes,] dtype int) – The new degree sequence
d_min (int) – The minimum degree of nodes to consider
- Returns:
new_S_d (float, the updated sum of log degrees in the distribution that are larger than or equal to d_min.)
new_n (int, the updated number of entries in the old distribution that are larger than or equal to d_min.)
- static compute_log_likelihood(n, alpha, S_d, d_min)[source]
Compute log likelihood of the powerlaw fit.
- Parameters:
n (int) – Number of entries in the old distribution that are larger than or equal to d_min.
alpha (float) – The estimated alpha of the power law distribution
S_d (float) – Sum of log degrees in the distribution that are larger than or equal to d_min.
d_min (int) – The minimum degree of nodes to consider
- Returns:
float
- Return type:
the estimated log likelihood
- singleton_mask(adj_matrix: Tensor)[source]
Computes a mask for entries potentially leading to singleton nodes, i.e. one of the two nodes corresponding to the entry have degree 1 and there is an edge between the two nodes.
- Parameters:
adj_matrix (Tensor, shape [N, N],) – where N is the number of nodes the input adjacency matrix to compte the mask
- Returns:
mask – a boolean mask with shape as
adj_matrix
.- Return type:
bool Tensor
- add_edges(edge_index: Tensor, edges_to_add: Tensor, symmetric: bool = True, coalesce: bool = True, sort_edges: bool = True) Tensor [source]
Add edges to the graph denoted as
edge_index
.- Parameters:
edge_index (torch.Tensor) – the graph instance where edges will be removed from.
edges_to_add (torch.Tensor) – shape [2, M], the edges to be added into the graph.
symmetric (bool) – whether the output graph is symmetric, if True, it will also append the reversed edges into the graph.
coalesce (bool) – whether to coalesce the output edges.
sort_edges (bool) – whether to sort the output edges.
- Returns:
the graph instance
edge_index
with edges added.- Return type:
Tensor
- remove_edges(edge_index: Tensor, edges_to_remove: Tensor, symmetric: bool = True) Tensor [source]
Remove edges from the graph denoted as
edge_index
.- Parameters:
edge_index (torch.Tensor) – the graph instance where edges will be removed from.
edges_to_remove (torch.Tensor) – shape [2, M], the edges to be removed in the graph.
symmetric (bool) – whether the output graph is symmetric, if True, it will also remove the reversed edges from the graph.
- Returns:
the graph instance
edge_index
with edges removed.- Return type:
Tensor
- flip_edges(edge_index: Tensor, edges_to_flip: Tensor, symmetric: bool = True) Tensor [source]
Flip edges from the graph denoted as
edge_index
.- Parameters:
edge_index (torch.Tensor) – the graph instance where edges will be flipped from.
edges_to_flip (torch.Tensor) – shape [2, M], the edges to be flipped in the graph.
symmetric (bool) – whether the output graph is symmetric, if True, it will also flip the reversed edges from the graph.
- Returns:
the graph instance
edge_index
with edges flipped.- Return type:
Tensor
- flip_graph(data: Data, edges_to_flip: Tensor, symmetric: bool = True) Data [source]
Flip edges from the graph denoted as
data
.- Parameters:
edge_index (Data) – the graph instance where edges will be flipped from.
edges_to_flip (torch.Tensor) – shape [2, M], the edges to be flipped in the graph.
symmetric (bool) – whether the output graph is symmetric, if True, it will also flip the reversed edges from the graph.
- Returns:
the graph instance
data
with edges flipped.- Return type:
Data
Note
We currently don’t support a weigher graph and this function will automatically set
edge_weight
andadj_t
asNone
.
- scipy_normalize(adj_matrix: csr_matrix, add_self_loops: bool = True) csr_matrix [source]
Normalize a sparse matrix according to
GCN
from the “Semi-supervised Classification with Graph Convolutional Networks” paper (ICLR’17)- Parameters:
adj_matrix (sp.csr_matrix) – the input sparse matrix denoting a graph.
add_self_loops (bool, optional) – whether to add self-loops, by default True
- Returns:
the normalized adjacency matrix.
- Return type:
sp.csr_matrix
- normalize(feat: Tensor, norm: str = 'standardize', dim: Optional[int] = None, lim_min: float = -1.0, lim_max: float = 1.0) Tensor [source]
Feature normalization function.
Adapted from GRB: https://github.com/THUDM/grb/blob/master/grb/dataset/dataset.py#L638
- Parameters:
feat (torch.Tensor) – node feature matrix with shape [N, D]
norm (Optional[str], optional) – how to normalize feature matrix, including [“linearize”, “arctan”, “tanh”, “standardize”, “none”], by default “standardize”
dim (None or int, optional) – Axis along which the means or standard deviations are computed. The default is to compute the mean or standard deviations of the flattened array, by default None
lim_min (float, optional) – minimum limit of feature, by default -1.0
lim_max (float, optional) – maximum limit of feature, by default 1.0
- Returns:
feat – normalized feature matrix
- Return type:
- overlap(edge_index1: Tensor, edge_index2: Tensor, on: str = 'edge', symmetric: bool = False) float [source]
Compute graph overlapping according to the “Node Similarity Preserving Graph Convolutional Networks” paper (WSDM’21)
- Parameters:
edge_index1 (torch.Tensor) – edges indices of a graph a graph
edge_index2 (torch.Tensor) – edges indices of another graph another graph
on (str, optional) – compute overlap on edge or node, by default edge, by default ‘edge’
symmetric (bool, optional) – whether the computed overlap is symetric, by default False
- Returns:
overlapping of the two graphs on edge or node
- Return type:
- ego_graph(adj_matrix: csr_matrix, targets: Union[int, list], hops: int = 1) ego_graph [source]
Returns induced subgraph of neighbors centered at node n within a given radius.
- Parameters:
- Returns:
nodes: shape [N], the nodes of the subgraph edges: shape [2, M], the edges of the subgraph
- Return type:
NamedTuple(nodes, edges)
Note
This is a faster implementation of
networkx.ego_graph
based on scipy sparse matrix and numbaSee also
networkx.ego_graph
,torch_geometric.utils.k_hop_subgraph
- class MissingFeature(missing_rate: float = 0.5, missing_type: str = 'uniform', missing_value: float = nan)[source]
Implementation of
MissingFeature
from the “On the Unreasonable Effectiveness of Feature propagation in Learning on Graphs with Missing Node Features” paper (ICLR’21)MissingFeature
generates missing feature mask indicating whether each feature is present or missing. according differemt stractegies.- Parameters:
missing_rate (float, optional) – ratio of missing features, by default 0.5
missing_type (str, optional) – type of strategies to generate missing feature mask. If
type='uniform'
, then each feature of each node is missing uniformly at random with probabilitymissing_rate
. Instead, if type=’structural’, either we observe all features for a node, or we observe none. For each node there is a probability ofmissing_rate
of not observing any feature, by default'uniform'
missing_value (float, optional) – value to fill missing features, by default
float("nan")
Reference:
- mark(logit: Union[ndarray, Tensor], y_true: Optional[int] = None) str [source]
Marks the predicted classes and corresponding probability for a better view.
- Parameters:
logit (Union[np.ndarray, torch.Tensor]) – the predicted class probability
y_true (Optional[int]) – the ground truth label, by default None
- Returns:
the formated string
- Return type:
Examples
>>> from greatx.utils import mark >>> import torch >>> pred = torch.tensor([0.5, 0.3, 0.1, 0.1]) >>> print(mark(pred)) Prediction (y=0): 0.500, 0.300, 0.100, 0.100 👆(0) >>> print(mark(pred, y_true=2)) Ground Truth (y=2): 👇(2) Prediction (pred=0): 0.500, 0.300, 0.100, 0.100 👆(0) Margin: -0.400