Upgrade NAS-API to v2.0:

we use an abstract class NASBenchMetaAPI to define the spec of an API; it can be inherited to support different kinds of NAS API, while keep the query interface the same.
This commit is contained in:
D-X-Y 2020-06-30 09:05:38 +00:00
parent 91ee265bd2
commit 6effb6f127
23 changed files with 1888 additions and 944 deletions

View File

@ -37,7 +37,7 @@ At the moment, this project provides the following algorithms and scripts to run
<td rowspan="6" align="center" valign="middle" halign="middle"> NAS </td>
<td align="center" valign="middle"> TAS </td>
<td align="center" valign="middle"> <a href="https://arxiv.org/abs/1905.09717">Network Pruning via Transformable Architecture Search</a> </td>
<td align="center" valign="middle"> <a href="https://github.com/D-X-Y/AutoDL-Projects/tree/master/docs/NIPS-2019-TAS.md">NIPS-2019-TAS.md</a> </td>
<td align="center" valign="middle"> <a href="https://github.com/D-X-Y/AutoDL-Projects/tree/master/docs/NeurIPS-2019-TAS.md">NeurIPS-2019-TAS.md</a> </td>
</tr>
<tr> <!-- (2-nd row) -->
<td align="center" valign="middle"> DARTS </td>

View File

@ -37,7 +37,7 @@
<td rowspan="6" align="center" valign="middle" halign="middle"> NAS </td>
<td align="center" valign="middle"> TAS </td>
<td align="center" valign="middle"> <a href="https://arxiv.org/abs/1905.09717">Network Pruning via Transformable Architecture Search</a> </td>
<td align="center" valign="middle"> <a href="https://github.com/D-X-Y/AutoDL-Projects/tree/master/docs/NIPS-2019-TAS.md">NIPS-2019-TAS.md</a> </td>
<td align="center" valign="middle"> <a href="https://github.com/D-X-Y/AutoDL-Projects/tree/master/docs/NeurIPS-2019-TAS.md">NeurIPS-2019-TAS.md</a> </td>
</tr>
<tr> <!-- (2-nd row) -->
<td align="center" valign="middle"> DARTS </td>

View File

@ -0,0 +1,13 @@
{
"scheduler": ["str", "cos"],
"eta_min" : ["float", "0.0"],
"epochs" : ["int", "200"],
"warmup" : ["int", "0"],
"optim" : ["str", "SGD"],
"LR" : ["float", "0.1"],
"decay" : ["float", "0.0005"],
"momentum" : ["float", "0.9"],
"nesterov" : ["bool", "1"],
"criterion": ["str", "Softmax"],
"batch_size": ["int", "256"]
}

View File

@ -29,7 +29,10 @@ NAS-BENCH-201-4-v1.0-archive.tar](https://drive.google.com/open?id=1X2i-JXaElsnV
- [2020.02.25] APIv1.0/FILEv1.0: Checkpoints for 3 runs of each baseline NAS algorithm are provided in [Google Drive](https://drive.google.com/open?id=1eAgLZQAViP3r6dA0_ZOOGG9zPLXhGwXi).
- [2020.03.09] APIv1.2/FILEv1.0: More robust API with more functions and descriptions
- [2020.03.16] APIv1.3/FILEv1.1: [`NAS-Bench-201-v1_1-096897.pth`](https://drive.google.com/open?id=16Y0UwGisiouVRxW-W5hEtbxmcHw_0hF_) (4.7G), where `096897` is the last six digits for this file. It contains information of more trials compared to `NAS-Bench-201-v1_0-e61699.pth`, especially all models trained by 12 epochs on all datasets are avaliable.
- [2020.06.01] APIv2.0/FILEv2.0: coming soon!
- [2020.06.30] APIv2.0: Use abstract class (NASBenchMetaAPI) for APIs of NAS-Bench-x0y.
- [2020.06.30] FILEv2.0: coming soon!
**We recommend to use `NAS-Bench-201-v1_1-096897.pth`**
The training and evaluation data used in NAS-Bench-201 can be downloaded from [Google Drive](https://drive.google.com/open?id=1L0Lzq8rWpZLPfiQGd6QR8q5xLV88emU7) or [Baidu-Wangpan (code:4fg7)](https://pan.baidu.com/s/1XAzavPKq3zcat1yBA1L2tQ).
@ -42,7 +45,8 @@ It is recommended to put these data into `$TORCH_HOME` (`~/.torch/` by default).
from nas_201_api import NASBench201API as API
api = API('$path_to_meta_nas_bench_file')
api = API('NAS-Bench-201-v1_1-096897.pth')
api = API('{:}/{:}'.format(os.environ['TORCH_HOME'], 'NAS-Bench-201-v1_1-096897.pth'))
# The default path for benchmark file is '{:}/{:}'.format(os.environ['TORCH_HOME'], 'NAS-Bench-201-v1_1-096897.pth')
api = API(None)
```
2. Show the number of architectures `len(api)` and each architecture `api[i]`:
@ -149,10 +153,12 @@ api.reload('{:}/{:}'.format(os.environ['TORCH_HOME'], 'NAS-BENCH-201-4-v1.0-arch
weights = api.get_net_param(3, 'cifar10', None) # Obtaining the weights of all trials for the 3-th architecture on cifar10. It will returns a dict, where the key is the seed and the value is the trained weights.
```
To obtain the training and evaluation information (please see the comments [here](https://github.com/D-X-Y/AutoDL-Projects/blob/master/lib/nas_201_api/api.py#L172)):
To obtain the training and evaluation information (please see the comments [here](https://github.com/D-X-Y/AutoDL-Projects/blob/master/lib/nas_201_api/api_201.py#L142)):
```
api.get_more_info(112, 'cifar10', None, False, True)
api.get_more_info(112, 'ImageNet16-120', None, False, True) # the info of last training epoch for 112-th architecture (use 200-epoch-hyper-parameter and randomly select a trial)
api.get_more_info(112, 'cifar10', None, hp='200', is_random=True)
# Query info of last training epoch for 112-th architecture
# using 200-epoch-hyper-parameter and randomly select a trial.
api.get_more_info(112, 'ImageNet16-120', None, hp='200', is_random=True)
```
Please use the following script to show the best architectures on each dataset:

View File

@ -4,6 +4,7 @@
# [2020.02.25] Initialize the API as v1.1
# [2020.03.09] Upgrade the API to v1.2
# [2020.03.16] Upgrade the API to v1.3
# [2020.06.30] Upgrade the API to v2.0
import os
from setuptools import setup
@ -15,7 +16,7 @@ def read(fname='README.md'):
setup(
name = "nas_bench_201",
version = "1.3",
version = "2.0",
author = "Xuanyi Dong",
author_email = "dongxuanyi888@gmail.com",
description = "API for NAS-Bench-201 (a benchmark for neural architecture search).",

View File

@ -22,7 +22,7 @@ def create_result_count(used_seed: int, dataset: Text, arch_config: Dict[Text, A
results: Dict[Text, Any], dataloader_dict: Dict[Text, Any]) -> ResultsCount:
xresult = ResultsCount(dataset, results['net_state_dict'], results['train_acc1es'], results['train_losses'],
results['param'], results['flop'], arch_config, used_seed, results['total_epoch'], None)
net_config = dict2config({'name': 'infer.tiny', 'C': arch_config['channel'], 'N': arch_config['num_cells'], 'genotype': CellStructure.str2structure(arch_config['arch_str']), 'num_classes':arch_config['class_num']}, None)
net_config = dict2config({'name': 'infer.tiny', 'C': arch_config['channel'], 'N': arch_config['num_cells'], 'genotype': CellStructure.str2structure(arch_config['arch_str']), 'num_classes': arch_config['class_num']}, None)
network = get_cell_based_tiny_net(net_config)
network.load_state_dict(xresult.get_net_param())
if 'train_times' in results: # new version
@ -126,7 +126,6 @@ def correct_time_related_info(arch_index: int, arch_info_full: ArchResults, arch
arch_info.reset_pseudo_eval_times('ImageNet16-120', None, 'ori-test', eval_per_sample * nums['ImageNet16-120-test'])
# arch_info_full.debug_test()
# arch_info_less.debug_test()
# import pdb; pdb.set_trace()
return arch_info_full, arch_info_less

View File

@ -0,0 +1,93 @@
###############################################################
# NAS-Bench-201, ICLR 2020 (https://arxiv.org/abs/2001.00326) #
###############################################################
# Copyright (c) Xuanyi Dong [GitHub D-X-Y], 2020.06 #
###############################################################
# Usage: python exps/NAS-Bench-201/test-nas-api-vis.py
###############################################################
import os, sys, time, torch, argparse
import numpy as np
from typing import List, Text, Dict, Any
from shutil import copyfile
from collections import defaultdict
from copy import deepcopy
from pathlib import Path
import matplotlib
import seaborn as sns
matplotlib.use('agg')
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
lib_dir = (Path(__file__).parent / '..' / '..' / 'lib').resolve()
if str(lib_dir) not in sys.path: sys.path.insert(0, str(lib_dir))
from config_utils import dict2config, load_config
from nas_201_api import NASBench201API, NASBench301API
from log_utils import time_string
from models import get_cell_based_tiny_net
def visualize_info(api, vis_save_dir, indicator):
vis_save_dir = vis_save_dir.resolve()
# print ('{:} start to visualize {:} information'.format(time_string(), api))
vis_save_dir.mkdir(parents=True, exist_ok=True)
cifar010_cache_path = vis_save_dir / '{:}-cache-{:}-info.pth'.format('cifar10', indicator)
cifar100_cache_path = vis_save_dir / '{:}-cache-{:}-info.pth'.format('cifar100', indicator)
imagenet_cache_path = vis_save_dir / '{:}-cache-{:}-info.pth'.format('ImageNet16-120', indicator)
cifar010_info = torch.load(cifar010_cache_path)
cifar100_info = torch.load(cifar100_cache_path)
imagenet_info = torch.load(imagenet_cache_path)
indexes = list(range(len(cifar010_info['params'])))
print ('{:} start to visualize relative ranking'.format(time_string()))
cifar010_ord_indexes = sorted(indexes, key=lambda i: cifar010_info['test_accs'][i])
cifar100_ord_indexes = sorted(indexes, key=lambda i: cifar100_info['test_accs'][i])
imagenet_ord_indexes = sorted(indexes, key=lambda i: imagenet_info['test_accs'][i])
cifar100_labels, imagenet_labels = [], []
for idx in cifar010_ord_indexes:
cifar100_labels.append( cifar100_ord_indexes.index(idx) )
imagenet_labels.append( imagenet_ord_indexes.index(idx) )
print ('{:} prepare data done.'.format(time_string()))
dpi, width, height = 200, 1400, 800
figsize = width / float(dpi), height / float(dpi)
LabelSize, LegendFontsize = 18, 12
resnet_scale, resnet_alpha = 120, 0.5
fig = plt.figure(figsize=figsize)
ax = fig.add_subplot(111)
plt.xlim(min(indexes), max(indexes))
plt.ylim(min(indexes), max(indexes))
# plt.ylabel('y').set_rotation(30)
plt.yticks(np.arange(min(indexes), max(indexes), max(indexes)//3), fontsize=LegendFontsize, rotation='vertical')
plt.xticks(np.arange(min(indexes), max(indexes), max(indexes)//5), fontsize=LegendFontsize)
ax.scatter(indexes, cifar100_labels, marker='^', s=0.5, c='tab:green', alpha=0.8)
ax.scatter(indexes, imagenet_labels, marker='*', s=0.5, c='tab:red' , alpha=0.8)
ax.scatter(indexes, indexes , marker='o', s=0.5, c='tab:blue' , alpha=0.8)
ax.scatter([-1], [-1], marker='o', s=100, c='tab:blue' , label='CIFAR-10')
ax.scatter([-1], [-1], marker='^', s=100, c='tab:green', label='CIFAR-100')
ax.scatter([-1], [-1], marker='*', s=100, c='tab:red' , label='ImageNet-16-120')
plt.grid(zorder=0)
ax.set_axisbelow(True)
plt.legend(loc=0, fontsize=LegendFontsize)
ax.set_xlabel('architecture ranking in CIFAR-10', fontsize=LabelSize)
ax.set_ylabel('architecture ranking', fontsize=LabelSize)
save_path = (vis_save_dir / '{:}-relative-rank.pdf'.format(indicator)).resolve()
fig.savefig(save_path, dpi=dpi, bbox_inches='tight', format='pdf')
save_path = (vis_save_dir / '{:}-relative-rank.png'.format(indicator)).resolve()
fig.savefig(save_path, dpi=dpi, bbox_inches='tight', format='png')
print ('{:} save into {:}'.format(time_string(), save_path))
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='NAS-Bench-X', formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--save_dir', type=str, default='output/NAS-BENCH-202', help='Folder to save checkpoints and log.')
parser.add_argument('--check_N', type=int, default=32768, help='For safety.')
# use for train the model
args = parser.parse_args()
visualize_info(None, Path('output/vis-nas-bench/'), 'tss')
visualize_info(None, Path('output/vis-nas-bench/'), 'sss')

View File

@ -0,0 +1,283 @@
###############################################################
# NAS-Bench-201, ICLR 2020 (https://arxiv.org/abs/2001.00326) #
###############################################################
# Copyright (c) Xuanyi Dong [GitHub D-X-Y], 2020.06 #
###############################################################
# Usage: python exps/NAS-Bench-201/test-nas-api.py
###############################################################
import os, sys, time, torch, argparse
import numpy as np
from typing import List, Text, Dict, Any
from shutil import copyfile
from collections import defaultdict
from copy import deepcopy
from pathlib import Path
import matplotlib
import seaborn as sns
matplotlib.use('agg')
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
lib_dir = (Path(__file__).parent / '..' / '..' / 'lib').resolve()
if str(lib_dir) not in sys.path: sys.path.insert(0, str(lib_dir))
from config_utils import dict2config, load_config
from nas_201_api import NASBench201API, NASBench301API
from log_utils import time_string
from models import get_cell_based_tiny_net
def test_api(api, is_301=True):
print('{:} start testing the api : {:}'.format(time_string(), api))
api.clear_params(12)
api.reload(index=12)
# Query the informations of 1113-th architecture
info_strs = api.query_info_str_by_arch(1113)
print(info_strs)
info = api.query_by_index(113)
print('{:}\n'.format(info))
info = api.query_by_index(113, 'cifar100')
print('{:}\n'.format(info))
info = api.query_meta_info_by_index(115, '90' if is_301 else '200')
print('{:}\n'.format(info))
for dataset in ['cifar10', 'cifar100', 'ImageNet16-120']:
for xset in ['train', 'test', 'valid']:
best_index, highest_accuracy = api.find_best(dataset, xset)
print('')
params = api.get_net_param(12, 'cifar10', None)
# obtain the config and create the network
config = api.get_net_config(12, 'cifar10')
print('{:}\n'.format(config))
network = get_cell_based_tiny_net(config)
network.load_state_dict(next(iter(params.values())))
# obtain the cost information
info = api.get_cost_info(12, 'cifar10')
print('{:}\n'.format(info))
info = api.get_latency(12, 'cifar10')
print('{:}\n'.format(info))
# count the number of architectures
info = api.statistics('cifar100', '12')
print('{:}\n'.format(info))
# show the information of the 123-th architecture
api.show(123)
# obtain both cost and performance information
info = api.get_more_info(1234, 'cifar10')
print('{:}\n'.format(info))
print('{:} finish testing the api : {:}'.format(time_string(), api))
def visualize_sss_info(api, dataset, vis_save_dir):
vis_save_dir = vis_save_dir.resolve()
print ('{:} start to visualize {:} information'.format(time_string(), dataset))
vis_save_dir.mkdir(parents=True, exist_ok=True)
cache_file_path = vis_save_dir / '{:}-cache-sss-info.pth'.format(dataset)
if not cache_file_path.exists():
print ('Do not find cache file : {:}'.format(cache_file_path))
params, flops, train_accs, valid_accs, test_accs = [], [], [], [], []
for index in range(len(api)):
info = api.get_cost_info(index, dataset)
params.append(info['params'])
flops.append(info['flops'])
# accuracy
info = api.get_more_info(index, dataset, hp='90')
train_accs.append(info['train-accuracy'])
test_accs.append(info['test-accuracy'])
if dataset == 'cifar10':
info = api.get_more_info(index, 'cifar10-valid', hp='90')
valid_accs.append(info['valid-accuracy'])
else:
valid_accs.append(info['valid-accuracy'])
info = {'params': params, 'flops': flops, 'train_accs': train_accs, 'valid_accs': valid_accs, 'test_accs': test_accs}
torch.save(info, cache_file_path)
else:
print ('Find cache file : {:}'.format(cache_file_path))
info = torch.load(cache_file_path)
params, flops, train_accs, valid_accs, test_accs = info['params'], info['flops'], info['train_accs'], info['valid_accs'], info['test_accs']
print ('{:} collect data done.'.format(time_string()))
pyramid = ['8:16:32:48:64', '8:8:16:32:48', '8:8:16:16:32', '8:8:16:16:48', '8:8:16:16:64', '16:16:32:32:64', '32:32:64:64:64']
pyramid_indexes = [api.query_index_by_arch(x) for x in pyramid]
largest_indexes = [api.query_index_by_arch('64:64:64:64:64')]
indexes = list(range(len(params)))
dpi, width, height = 250, 8500, 1300
figsize = width / float(dpi), height / float(dpi)
LabelSize, LegendFontsize = 24, 24
# resnet_scale, resnet_alpha = 120, 0.5
xscale, xalpha = 120, 0.8
fig, axs = plt.subplots(1, 4, figsize=figsize)
# ax1, ax2, ax3, ax4, ax5 = axs
for ax in axs:
for tick in ax.xaxis.get_major_ticks():
tick.label.set_fontsize(LabelSize)
ax.yaxis.set_major_formatter(ticker.FormatStrFormatter('%.0f'))
for tick in ax.yaxis.get_major_ticks():
tick.label.set_fontsize(LabelSize)
ax2, ax3, ax4, ax5 = axs
# ax1.xaxis.set_ticks(np.arange(0, max(indexes), max(indexes)//5))
# ax1.scatter(indexes, test_accs, marker='o', s=0.5, c='tab:blue')
# ax1.set_xlabel('architecture ID', fontsize=LabelSize)
# ax1.set_ylabel('test accuracy (%)', fontsize=LabelSize)
ax2.scatter(params, train_accs, marker='o', s=0.5, c='tab:blue')
ax2.scatter([params[x] for x in pyramid_indexes], [train_accs[x] for x in pyramid_indexes], marker='*', s=xscale, c='tab:orange', label='Pyramid Structure', alpha=xalpha)
ax2.scatter([params[x] for x in largest_indexes], [train_accs[x] for x in largest_indexes], marker='x', s=xscale, c='tab:green', label='Largest Candidate', alpha=xalpha)
ax2.set_xlabel('#parameters (MB)', fontsize=LabelSize)
ax2.set_ylabel('train accuracy (%)', fontsize=LabelSize)
ax2.legend(loc=4, fontsize=LegendFontsize)
ax3.scatter(params, test_accs, marker='o', s=0.5, c='tab:blue')
ax3.scatter([params[x] for x in pyramid_indexes], [test_accs[x] for x in pyramid_indexes], marker='*', s=xscale, c='tab:orange', label='Pyramid Structure', alpha=xalpha)
ax3.scatter([params[x] for x in largest_indexes], [test_accs[x] for x in largest_indexes], marker='x', s=xscale, c='tab:green', label='Largest Candidate', alpha=xalpha)
ax3.set_xlabel('#parameters (MB)', fontsize=LabelSize)
ax3.set_ylabel('test accuracy (%)', fontsize=LabelSize)
ax3.legend(loc=4, fontsize=LegendFontsize)
ax4.scatter(flops, train_accs, marker='o', s=0.5, c='tab:blue')
ax4.scatter([flops[x] for x in pyramid_indexes], [train_accs[x] for x in pyramid_indexes], marker='*', s=xscale, c='tab:orange', label='Pyramid Structure', alpha=xalpha)
ax4.scatter([flops[x] for x in largest_indexes], [train_accs[x] for x in largest_indexes], marker='x', s=xscale, c='tab:green', label='Largest Candidate', alpha=xalpha)
ax4.set_xlabel('#FLOPs (M)', fontsize=LabelSize)
ax4.set_ylabel('train accuracy (%)', fontsize=LabelSize)
ax4.legend(loc=4, fontsize=LegendFontsize)
ax5.scatter(flops, test_accs, marker='o', s=0.5, c='tab:blue')
ax5.scatter([flops[x] for x in pyramid_indexes], [test_accs[x] for x in pyramid_indexes], marker='*', s=xscale, c='tab:orange', label='Pyramid Structure', alpha=xalpha)
ax5.scatter([flops[x] for x in largest_indexes], [test_accs[x] for x in largest_indexes], marker='x', s=xscale, c='tab:green', label='Largest Candidate', alpha=xalpha)
ax5.set_xlabel('#FLOPs (M)', fontsize=LabelSize)
ax5.set_ylabel('test accuracy (%)', fontsize=LabelSize)
ax5.legend(loc=4, fontsize=LegendFontsize)
save_path = vis_save_dir / 'sss-{:}.png'.format(dataset)
fig.savefig(save_path, dpi=dpi, bbox_inches='tight', format='png')
print ('{:} save into {:}'.format(time_string(), save_path))
plt.close('all')
def visualize_tss_info(api, dataset, vis_save_dir):
vis_save_dir = vis_save_dir.resolve()
print ('{:} start to visualize {:} information'.format(time_string(), dataset))
vis_save_dir.mkdir(parents=True, exist_ok=True)
cache_file_path = vis_save_dir / '{:}-cache-tss-info.pth'.format(dataset)
if not cache_file_path.exists():
print ('Do not find cache file : {:}'.format(cache_file_path))
params, flops, train_accs, valid_accs, test_accs = [], [], [], [], []
for index in range(len(api)):
info = api.get_cost_info(index, dataset)
params.append(info['params'])
flops.append(info['flops'])
# accuracy
info = api.get_more_info(index, dataset, hp='200')
train_accs.append(info['train-accuracy'])
test_accs.append(info['test-accuracy'])
if dataset == 'cifar10':
info = api.get_more_info(index, 'cifar10-valid', hp='200')
valid_accs.append(info['valid-accuracy'])
else:
valid_accs.append(info['valid-accuracy'])
info = {'params': params, 'flops': flops, 'train_accs': train_accs, 'valid_accs': valid_accs, 'test_accs': test_accs}
torch.save(info, cache_file_path)
else:
print ('Find cache file : {:}'.format(cache_file_path))
info = torch.load(cache_file_path)
params, flops, train_accs, valid_accs, test_accs = info['params'], info['flops'], info['train_accs'], info['valid_accs'], info['test_accs']
print ('{:} collect data done.'.format(time_string()))
resnet = ['|nor_conv_3x3~0|+|none~0|nor_conv_3x3~1|+|skip_connect~0|none~1|skip_connect~2|']
resnet_indexes = [api.query_index_by_arch(x) for x in resnet]
largest_indexes = [api.query_index_by_arch('|nor_conv_3x3~0|+|nor_conv_3x3~0|nor_conv_3x3~1|+|nor_conv_3x3~0|nor_conv_3x3~1|nor_conv_3x3~2|')]
indexes = list(range(len(params)))
dpi, width, height = 250, 8500, 1300
figsize = width / float(dpi), height / float(dpi)
LabelSize, LegendFontsize = 24, 24
# resnet_scale, resnet_alpha = 120, 0.5
xscale, xalpha = 120, 0.8
fig, axs = plt.subplots(1, 4, figsize=figsize)
# ax1, ax2, ax3, ax4, ax5 = axs
for ax in axs:
for tick in ax.xaxis.get_major_ticks():
tick.label.set_fontsize(LabelSize)
ax.yaxis.set_major_formatter(ticker.FormatStrFormatter('%.0f'))
for tick in ax.yaxis.get_major_ticks():
tick.label.set_fontsize(LabelSize)
ax2, ax3, ax4, ax5 = axs
# ax1.xaxis.set_ticks(np.arange(0, max(indexes), max(indexes)//5))
# ax1.scatter(indexes, test_accs, marker='o', s=0.5, c='tab:blue')
# ax1.set_xlabel('architecture ID', fontsize=LabelSize)
# ax1.set_ylabel('test accuracy (%)', fontsize=LabelSize)
ax2.scatter(params, train_accs, marker='o', s=0.5, c='tab:blue')
ax2.scatter([params[x] for x in resnet_indexes] , [train_accs[x] for x in resnet_indexes], marker='*', s=xscale, c='tab:orange', label='ResNet', alpha=xalpha)
ax2.scatter([params[x] for x in largest_indexes], [train_accs[x] for x in largest_indexes], marker='x', s=xscale, c='tab:green', label='Largest Candidate', alpha=xalpha)
ax2.set_xlabel('#parameters (MB)', fontsize=LabelSize)
ax2.set_ylabel('train accuracy (%)', fontsize=LabelSize)
ax2.legend(loc=4, fontsize=LegendFontsize)
ax3.scatter(params, test_accs, marker='o', s=0.5, c='tab:blue')
ax3.scatter([params[x] for x in resnet_indexes] , [test_accs[x] for x in resnet_indexes], marker='*', s=xscale, c='tab:orange', label='ResNet', alpha=xalpha)
ax3.scatter([params[x] for x in largest_indexes], [test_accs[x] for x in largest_indexes], marker='x', s=xscale, c='tab:green', label='Largest Candidate', alpha=xalpha)
ax3.set_xlabel('#parameters (MB)', fontsize=LabelSize)
ax3.set_ylabel('test accuracy (%)', fontsize=LabelSize)
ax3.legend(loc=4, fontsize=LegendFontsize)
ax4.scatter(flops, train_accs, marker='o', s=0.5, c='tab:blue')
ax4.scatter([flops[x] for x in resnet_indexes], [train_accs[x] for x in resnet_indexes], marker='*', s=xscale, c='tab:orange', label='ResNet', alpha=xalpha)
ax4.scatter([flops[x] for x in largest_indexes], [train_accs[x] for x in largest_indexes], marker='x', s=xscale, c='tab:green', label='Largest Candidate', alpha=xalpha)
ax4.set_xlabel('#FLOPs (M)', fontsize=LabelSize)
ax4.set_ylabel('train accuracy (%)', fontsize=LabelSize)
ax4.legend(loc=4, fontsize=LegendFontsize)
ax5.scatter(flops, test_accs, marker='o', s=0.5, c='tab:blue')
ax5.scatter([flops[x] for x in resnet_indexes], [test_accs[x] for x in resnet_indexes], marker='*', s=xscale, c='tab:orange', label='ResNet', alpha=xalpha)
ax5.scatter([flops[x] for x in largest_indexes], [test_accs[x] for x in largest_indexes], marker='x', s=xscale, c='tab:green', label='Largest Candidate', alpha=xalpha)
ax5.set_xlabel('#FLOPs (M)', fontsize=LabelSize)
ax5.set_ylabel('test accuracy (%)', fontsize=LabelSize)
ax5.legend(loc=4, fontsize=LegendFontsize)
save_path = vis_save_dir / 'tss-{:}.png'.format(dataset)
fig.savefig(save_path, dpi=dpi, bbox_inches='tight', format='png')
print ('{:} save into {:}'.format(time_string(), save_path))
plt.close('all')
def test_issue_81_82(api):
results = api.query_by_index(0, 'cifar10')
results = api.query_by_index(0, 'cifar10-valid', hp='200')
print(results.keys())
print(results[888].get_eval('x-valid'))
result_dict = api.get_more_info(index=0, dataset='cifar10-valid', iepoch=11, hp='200', is_random=False)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='NAS-Bench-X', formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--save_dir', type=str, default='output/NAS-BENCH-202', help='Folder to save checkpoints and log.')
parser.add_argument('--check_N', type=int, default=32768, help='For safety.')
# use for train the model
args = parser.parse_args()
api201 = NASBench201API(os.path.join(os.environ['TORCH_HOME'], 'NAS-Bench-201-v1_0-e61699.pth'), verbose=True)
test_issue_81_82(api201)
test_api(api201, False)
api201 = NASBench201API(None, verbose=True)
test_issue_81_82(api201)
visualize_tss_info(api201, 'cifar10', Path('output/vis-nas-bench'))
visualize_tss_info(api201, 'cifar100', Path('output/vis-nas-bench'))
visualize_tss_info(api201, 'ImageNet16-120', Path('output/vis-nas-bench'))
test_api(api201, False)
api301 = NASBench301API(None, verbose=True)
visualize_sss_info(api301, 'cifar10', Path('output/vis-nas-bench'))
visualize_sss_info(api301, 'cifar100', Path('output/vis-nas-bench'))
visualize_sss_info(api301, 'ImageNet16-120', Path('output/vis-nas-bench'))
test_api(api301, True)
# save_dir = '{:}/visual'.format(args.save_dir)

View File

@ -38,7 +38,6 @@ def evaluate(api, weight_dir, data: str, use_12epochs_result: bool):
final_test_accs = OrderedDict({'cifar10': [], 'cifar100': [], 'ImageNet16-120': []})
for idx in range(len(api)):
# info = api.get_more_info(idx, data, use_12epochs_result=use_12epochs_result, is_random=False)
# import pdb; pdb.set_trace()
for key in ['cifar10-valid', 'cifar10', 'cifar100', 'ImageNet16-120']:
info = api.get_more_info(idx, key, use_12epochs_result=False, is_random=False)
if key == 'cifar10-valid':

View File

@ -0,0 +1,242 @@
#####################################################
# Copyright (c) Xuanyi Dong [GitHub D-X-Y], 2019.08 #
#####################################################
# python exps/NAS-Bench-201/xshape-collect.py
#####################################################
import os, re, sys, time, argparse, collections
import numpy as np
import torch
from tqdm import tqdm
from pathlib import Path
from collections import defaultdict, OrderedDict
from typing import Dict, Any, Text, List
lib_dir = (Path(__file__).parent / '..' / '..' / 'lib').resolve()
if str(lib_dir) not in sys.path: sys.path.insert(0, str(lib_dir))
from log_utils import AverageMeter, time_string, convert_secs2time
from config_utils import dict2config
# NAS-Bench-201 related module or function
from models import CellStructure, get_cell_based_tiny_net
from nas_201_api import NASBench301API, ArchResults, ResultsCount
from procedures import bench_pure_evaluate as pure_evaluate, get_nas_bench_loaders
def account_one_arch(arch_index: int, arch_str: Text, checkpoints: List[Text], datasets: List[Text]) -> ArchResults:
information = ArchResults(arch_index, arch_str)
for checkpoint_path in checkpoints:
try:
checkpoint = torch.load(checkpoint_path, map_location='cpu')
except:
raise ValueError('This checkpoint failed to be loaded : {:}'.format(checkpoint_path))
used_seed = checkpoint_path.name.split('-')[-1].split('.')[0]
ok_dataset = 0
for dataset in datasets:
if dataset not in checkpoint:
print('Can not find {:} in arch-{:} from {:}'.format(dataset, arch_index, checkpoint_path))
continue
else:
ok_dataset += 1
results = checkpoint[dataset]
assert results['finish-train'], 'This {:} arch seed={:} does not finish train on {:} ::: {:}'.format(arch_index, used_seed, dataset, checkpoint_path)
arch_config = {'name': 'infer.shape.tiny', 'channels': arch_str, 'arch_str': arch_str,
'genotype': results['arch_config']['genotype'],
'class_num': results['arch_config']['num_classes']}
xresult = ResultsCount(dataset, results['net_state_dict'], results['train_acc1es'], results['train_losses'],
results['param'], results['flop'], arch_config, used_seed, results['total_epoch'], None)
xresult.update_train_info(results['train_acc1es'], results['train_acc5es'], results['train_losses'], results['train_times'])
xresult.update_eval(results['valid_acc1es'], results['valid_losses'], results['valid_times'])
information.update(dataset, int(used_seed), xresult)
if ok_dataset < len(datasets): raise ValueError('{:} does find enought data : {:} vs {:}'.format(checkpoint_path, ok_dataset, len(datasets)))
return information
def correct_time_related_info(hp2info: Dict[Text, ArchResults]):
# calibrate the latency based on the number of epochs = 01, since they are trained on the same machine.
x1 = hp2info['01'].get_metrics('cifar10-valid', 'x-valid')['all_time'] / 98
x2 = hp2info['01'].get_metrics('cifar10-valid', 'ori-test')['all_time'] / 40
cifar010_latency = (x1 + x2) / 2
for hp, arch_info in hp2info.items():
arch_info.reset_latency('cifar10-valid', None, cifar010_latency)
arch_info.reset_latency('cifar10', None, cifar010_latency)
# hp2info['01'].get_latency('cifar10')
x1 = hp2info['01'].get_metrics('cifar100', 'ori-test')['all_time'] / 40
x2 = hp2info['01'].get_metrics('cifar100', 'x-test')['all_time'] / 20
x3 = hp2info['01'].get_metrics('cifar100', 'x-valid')['all_time'] / 20
cifar100_latency = (x1 + x2 + x3) / 3
for hp, arch_info in hp2info.items():
arch_info.reset_latency('cifar100', None, cifar100_latency)
x1 = hp2info['01'].get_metrics('ImageNet16-120', 'ori-test')['all_time'] / 24
x2 = hp2info['01'].get_metrics('ImageNet16-120', 'x-test')['all_time'] / 12
x3 = hp2info['01'].get_metrics('ImageNet16-120', 'x-valid')['all_time'] / 12
image_latency = (x1 + x2 + x3) / 3
for hp, arch_info in hp2info.items():
arch_info.reset_latency('ImageNet16-120', None, image_latency)
# CIFAR10 VALID
train_per_epoch_time = list(hp2info['01'].query('cifar10-valid', 777).train_times.values())
train_per_epoch_time = sum(train_per_epoch_time) / len(train_per_epoch_time)
eval_ori_test_time, eval_x_valid_time = [], []
for key, value in hp2info['01'].query('cifar10-valid', 777).eval_times.items():
if key.startswith('ori-test@'):
eval_ori_test_time.append(value)
elif key.startswith('x-valid@'):
eval_x_valid_time.append(value)
else: raise ValueError('-- {:} --'.format(key))
eval_ori_test_time = sum(eval_ori_test_time) / len(eval_ori_test_time)
eval_x_valid_time = sum(eval_x_valid_time) / len(eval_x_valid_time)
for hp, arch_info in hp2info.items():
arch_info.reset_pseudo_train_times('cifar10-valid', None, train_per_epoch_time)
arch_info.reset_pseudo_eval_times('cifar10-valid', None, 'x-valid', eval_x_valid_time)
arch_info.reset_pseudo_eval_times('cifar10-valid', None, 'ori-test', eval_ori_test_time)
# CIFAR10
train_per_epoch_time = list(hp2info['01'].query('cifar10', 777).train_times.values())
train_per_epoch_time = sum(train_per_epoch_time) / len(train_per_epoch_time)
eval_ori_test_time = []
for key, value in hp2info['01'].query('cifar10', 777).eval_times.items():
if key.startswith('ori-test@'):
eval_ori_test_time.append(value)
else: raise ValueError('-- {:} --'.format(key))
eval_ori_test_time = sum(eval_ori_test_time) / len(eval_ori_test_time)
for hp, arch_info in hp2info.items():
arch_info.reset_pseudo_train_times('cifar10', None, train_per_epoch_time)
arch_info.reset_pseudo_eval_times('cifar10', None, 'ori-test', eval_ori_test_time)
# CIFAR100
train_per_epoch_time = list(hp2info['01'].query('cifar100', 777).train_times.values())
train_per_epoch_time = sum(train_per_epoch_time) / len(train_per_epoch_time)
eval_ori_test_time, eval_x_valid_time, eval_x_test_time = [], [], []
for key, value in hp2info['01'].query('cifar100', 777).eval_times.items():
if key.startswith('ori-test@'):
eval_ori_test_time.append(value)
elif key.startswith('x-valid@'):
eval_x_valid_time.append(value)
elif key.startswith('x-test@'):
eval_x_test_time.append(value)
else: raise ValueError('-- {:} --'.format(key))
eval_ori_test_time = sum(eval_ori_test_time) / len(eval_ori_test_time)
eval_x_valid_time = sum(eval_x_valid_time) / len(eval_x_valid_time)
eval_x_test_time = sum(eval_x_test_time) / len(eval_x_test_time)
for hp, arch_info in hp2info.items():
arch_info.reset_pseudo_train_times('cifar100', None, train_per_epoch_time)
arch_info.reset_pseudo_eval_times('cifar100', None, 'x-valid', eval_x_valid_time)
arch_info.reset_pseudo_eval_times('cifar100', None, 'x-test', eval_x_test_time)
arch_info.reset_pseudo_eval_times('cifar100', None, 'ori-test', eval_ori_test_time)
# ImageNet16-120
train_per_epoch_time = list(hp2info['01'].query('ImageNet16-120', 777).train_times.values())
train_per_epoch_time = sum(train_per_epoch_time) / len(train_per_epoch_time)
eval_ori_test_time, eval_x_valid_time, eval_x_test_time = [], [], []
for key, value in hp2info['01'].query('ImageNet16-120', 777).eval_times.items():
if key.startswith('ori-test@'):
eval_ori_test_time.append(value)
elif key.startswith('x-valid@'):
eval_x_valid_time.append(value)
elif key.startswith('x-test@'):
eval_x_test_time.append(value)
else: raise ValueError('-- {:} --'.format(key))
eval_ori_test_time = sum(eval_ori_test_time) / len(eval_ori_test_time)
eval_x_valid_time = sum(eval_x_valid_time) / len(eval_x_valid_time)
eval_x_test_time = sum(eval_x_test_time) / len(eval_x_test_time)
for hp, arch_info in hp2info.items():
arch_info.reset_pseudo_train_times('ImageNet16-120', None, train_per_epoch_time)
arch_info.reset_pseudo_eval_times('ImageNet16-120', None, 'x-valid', eval_x_valid_time)
arch_info.reset_pseudo_eval_times('ImageNet16-120', None, 'x-test', eval_x_test_time)
arch_info.reset_pseudo_eval_times('ImageNet16-120', None, 'ori-test', eval_ori_test_time)
return hp2info
def simplify(save_dir, save_name, nets, total):
hps, seeds = ['01', '12', '90'], set()
for hp in hps:
sub_save_dir = save_dir / 'raw-data-{:}'.format(hp)
ckps = sorted(list(sub_save_dir.glob('arch-*-seed-*.pth')))
seed2names = defaultdict(list)
for ckp in ckps:
parts = re.split('-|\.', ckp.name)
seed2names[parts[3]].append(ckp.name)
print('DIR : {:}'.format(sub_save_dir))
nums = []
for seed, xlist in seed2names.items():
seeds.add(seed)
nums.append(len(xlist))
print(' seed={:}, num={:}'.format(seed, len(xlist)))
# assert len(nets) == total == max(nums), 'there are some missed files : {:} vs {:}'.format(max(nums), total)
print('{:} start simplify the checkpoint.'.format(time_string()))
datasets = ('cifar10-valid', 'cifar10', 'cifar100', 'ImageNet16-120')
simplify_save_dir, arch2infos, evaluated_indexes = save_dir / save_name, {}, set()
simplify_save_dir.mkdir(parents=True, exist_ok=True)
end_time, arch_time = time.time(), AverageMeter()
# for index, arch_str in enumerate(nets):
for index in tqdm(range(total)):
arch_str = nets[index]
hp2info = OrderedDict()
for hp in hps:
sub_save_dir = save_dir / 'raw-data-{:}'.format(hp)
ckps = [sub_save_dir / 'arch-{:06d}-seed-{:}.pth'.format(index, seed) for seed in seeds]
ckps = [x for x in ckps if x.exists()]
if len(ckps) == 0: raise ValueError('Invalid data : index={:}, hp={:}'.format(index, hp))
arch_info = account_one_arch(index, arch_str, ckps, datasets)
hp2info[hp] = arch_info
hp2info = correct_time_related_info(hp2info)
evaluated_indexes.add(index)
to_save_data = OrderedDict({'01': hp2info['01'].state_dict(),
'12': hp2info['12'].state_dict(),
'90': hp2info['90'].state_dict()})
torch.save(to_save_data, simplify_save_dir / '{:}-FULL.pth'.format(index))
for hp in hps: hp2info[hp].clear_params()
to_save_data = OrderedDict({'01': hp2info['01'].state_dict(),
'12': hp2info['12'].state_dict(),
'90': hp2info['90'].state_dict()})
torch.save(to_save_data, simplify_save_dir / '{:}-SIMPLE.pth'.format(index))
arch2infos[index] = to_save_data
# measure elapsed time
arch_time.update(time.time() - end_time)
end_time = time.time()
need_time = '{:}'.format(convert_secs2time(arch_time.avg * (total-index-1), True))
# print('{:} {:06d}/{:06d} : still need {:}'.format(time_string(), index, total, need_time))
print('{:} {:} done.'.format(time_string(), save_name))
final_infos = {'meta_archs' : nets,
'total_archs': total,
'arch2infos' : arch2infos,
'evaluated_indexes': evaluated_indexes}
save_file_name = save_dir / '{:}.pth'.format(save_name)
torch.save(final_infos, save_file_name)
print ('Save {:} / {:} architecture results into {:}.'.format(len(evaluated_indexes), total, save_file_name))
def traverse_net(candidates: List[int], N: int):
nets = ['']
for i in range(N):
new_nets = []
for net in nets:
for C in candidates:
new_nets.append(str(C) if net == '' else "{:}:{:}".format(net,C))
nets = new_nets
return nets
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='NAS-BENCH-201', formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--base_save_dir', type=str, default='./output/NAS-BENCH-202', help='The base-name of folder to save checkpoints and log.')
parser.add_argument('--candidateC' , type=int, nargs='+', default=[8, 16, 24, 32, 40, 48, 56, 64], help='.')
parser.add_argument('--num_layers' , type=int, default=5, help='The number of layers in a network.')
parser.add_argument('--check_N' , type=int, default=32768, help='For safety.')
parser.add_argument('--save_name' , type=str, default='simplify', help='The save directory.')
args = parser.parse_args()
nets = traverse_net(args.candidateC, args.num_layers)
if len(nets) != args.check_N: raise ValueError('Pre-num-check failed : {:} vs {:}'.format(len(nets), args.check_N))
save_dir = Path(args.base_save_dir)
simplify(save_dir, args.save_name, nets, args.check_N)

View File

@ -22,7 +22,7 @@ from log_utils import Logger, AverageMeter, time_string, convert_secs2time
def obtain_valid_ckp(save_dir: Text, total: int):
possible_seeds = [777, 888]
possible_seeds = [777, 888, 999]
seed2ckps = defaultdict(list)
miss2ckps = defaultdict(list)
for i in range(total):
@ -33,7 +33,7 @@ def obtain_valid_ckp(save_dir: Text, total: int):
else:
miss2ckps[seed].append(i)
for seed, xlist in seed2ckps.items():
print('[{:}] [seed={:}] has {:}/{:}'.format(save_dir, seed, len(xlist), total))
print('[{:}] [seed={:}] has {:5d}/{:5d} | miss {:5d}/{:5d}'.format(save_dir, seed, len(xlist), total, total-len(xlist), total))
return dict(seed2ckps), dict(miss2ckps)

View File

@ -65,7 +65,7 @@ class MyWorker(Worker):
assert len(self.seen_archs) > 0
best_index, best_acc = -1, None
for arch_index in self.seen_archs:
info = self._nas_bench.get_more_info(arch_index, self._dataname, None, True, True)
info = self._nas_bench.get_more_info(arch_index, self._dataname, None, hp='200', is_random=True)
vacc = info['valid-accuracy']
if best_acc is None or best_acc < vacc:
best_acc = vacc
@ -77,7 +77,7 @@ class MyWorker(Worker):
start_time = time.time()
structure = self.convert_func( config )
arch_index = self._nas_bench.query_index_by_arch( structure )
info = self._nas_bench.get_more_info(arch_index, self._dataname, None, True, True)
info = self._nas_bench.get_more_info(arch_index, self._dataname, None, hp='200', is_random=True)
cur_time = info['train-all-time'] + info['valid-per-time']
cur_vacc = info['valid-accuracy']
self.real_cost_time += (time.time() - start_time)

View File

@ -42,7 +42,7 @@ def train_and_eval(arch, nas_bench, extra_info, dataname='cifar10-valid', use_01
if use_012_epoch_training and nas_bench is not None:
arch_index = nas_bench.query_index_by_arch( arch )
assert arch_index >= 0, 'can not find this arch : {:}'.format(arch)
info = nas_bench.get_more_info(arch_index, dataname, None, True)
info = nas_bench.get_more_info(arch_index, dataname, iepoch=None, hp='12', is_random=True)
valid_acc, time_cost = info['valid-accuracy'], info['train-all-time'] + info['valid-per-time']
#_, valid_acc = info.get_metrics('cifar10-valid', 'x-valid' , 25, True) # use the validation accuracy after 25 training epochs
elif not use_012_epoch_training and nas_bench is not None:
@ -51,10 +51,10 @@ def train_and_eval(arch, nas_bench, extra_info, dataname='cifar10-valid', use_01
# It did return values for cifar100 and ImageNet16-120, but it has some potential issues. (Please email me for more details)
arch_index, nepoch = nas_bench.query_index_by_arch( arch ), 25
assert arch_index >= 0, 'can not find this arch : {:}'.format(arch)
xoinfo = nas_bench.get_more_info(arch_index, 'cifar10-valid', None, True)
xocost = nas_bench.get_cost_info(arch_index, 'cifar10-valid', False)
info = nas_bench.get_more_info(arch_index, dataname, nepoch, False, True) # use the validation accuracy after 25 training epochs, which is used in our ICLR submission (not the camera ready).
cost = nas_bench.get_cost_info(arch_index, dataname, False)
xoinfo = nas_bench.get_more_info(arch_index, 'cifar10-valid', iepoch=None, hp='12')
xocost = nas_bench.get_cost_info(arch_index, 'cifar10-valid', hp='200')
info = nas_bench.get_more_info(arch_index, dataname, nepoch, hp='200', True) # use the validation accuracy after 25 training epochs, which is used in our ICLR submission (not the camera ready).
cost = nas_bench.get_cost_info(arch_index, dataname, hp='200')
# The following codes are used to estimate the time cost.
# When we build NAS-Bench-201, architectures are trained on different machines and we can not use that time record.
# When we create checkpoints for converged_LR, we run all experiments on 1080Ti, and thus the time for each architecture can be fairly compared.

View File

@ -0,0 +1,20 @@
#
# exps/experimental/test-api.py
#
import sys, time, random, argparse
from copy import deepcopy
import torchvision.models as models
from pathlib import Path
lib_dir = (Path(__file__).parent / '..' / '..' / 'lib').resolve()
if str(lib_dir) not in sys.path: sys.path.insert(0, str(lib_dir))
from nas_201_api import NASBench201API as API
def main():
api = API(None)
info = api.get_more_info(100, 'cifar100', 199, False, True)
if __name__ == '__main__':
main()

View File

@ -112,6 +112,7 @@ class Structure:
@staticmethod
def str2structure(xstr):
if isinstance(xstr, Structure): return xstr
assert isinstance(xstr, str), 'must take string (not {:}) as input'.format(type(xstr))
nodestrs = xstr.split('+')
genotypes = []

View File

@ -1,9 +1,11 @@
#####################################################
# Copyright (c) Xuanyi Dong [GitHub D-X-Y], 2019.08 #
#####################################################
from .api import NASBench201API
from .api import ArchResults, ResultsCount
from .api_utils import ArchResults, ResultsCount
from .api_201 import NASBench201API
from .api_301 import NASBench301API
# NAS_BENCH_201_API_VERSION="v1.1" # [2020.02.25]
# NAS_BENCH_201_API_VERSION="v1.2" # [2020.03.09]
NAS_BENCH_201_API_VERSION="v1.3" # [2020.03.16]
# NAS_BENCH_201_API_VERSION="v1.3" # [2020.03.16]
NAS_BENCH_201_API_VERSION="v2.0" # [2020.06.30]

View File

@ -1,916 +0,0 @@
#####################################################
# Copyright (c) Xuanyi Dong [GitHub D-X-Y], 2019.08 #
############################################################################################
# NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search, ICLR 2020 #
############################################################################################
# The history of benchmark files:
# [2020.02.25] NAS-Bench-201-v1_0-e61699.pth : 6219 architectures are trained once, 1621 architectures are trained twice, 7785 architectures are trained three times. `LESS` only supports CIFAR10-VALID.
# [2020.03.16] NAS-Bench-201-v1_1-096897.pth : 2225 architectures are trained once, 5439 archiitectures are trained twice, 7961 architectures are trained three times on all training sets. For the hyper-parameters with the total epochs of 12, each model is trained on CIFAR-10, CIFAR-100, ImageNet16-120 once, and is trained on CIFAR-10-VALID twice.
#
# I'm still actively enhancing this benchmark. Please feel free to contact me if you have any question w.r.t. NAS-Bench-201.
#
import os, copy, random, torch, numpy as np
from pathlib import Path
from typing import List, Text, Union, Dict
from collections import OrderedDict, defaultdict
def print_information(information, extra_info=None, show=False):
dataset_names = information.get_dataset_names()
strings = [information.arch_str, 'datasets : {:}, extra-info : {:}'.format(dataset_names, extra_info)]
def metric2str(loss, acc):
return 'loss = {:.3f}, top1 = {:.2f}%'.format(loss, acc)
for ida, dataset in enumerate(dataset_names):
metric = information.get_compute_costs(dataset)
flop, param, latency = metric['flops'], metric['params'], metric['latency']
str1 = '{:14s} FLOP={:6.2f} M, Params={:.3f} MB, latency={:} ms.'.format(dataset, flop, param, '{:.2f}'.format(latency*1000) if latency is not None and latency > 0 else None)
train_info = information.get_metrics(dataset, 'train')
if dataset == 'cifar10-valid':
valid_info = information.get_metrics(dataset, 'x-valid')
str2 = '{:14s} train : [{:}], valid : [{:}]'.format(dataset, metric2str(train_info['loss'], train_info['accuracy']), metric2str(valid_info['loss'], valid_info['accuracy']))
elif dataset == 'cifar10':
test__info = information.get_metrics(dataset, 'ori-test')
str2 = '{:14s} train : [{:}], test : [{:}]'.format(dataset, metric2str(train_info['loss'], train_info['accuracy']), metric2str(test__info['loss'], test__info['accuracy']))
else:
valid_info = information.get_metrics(dataset, 'x-valid')
test__info = information.get_metrics(dataset, 'x-test')
str2 = '{:14s} train : [{:}], valid : [{:}], test : [{:}]'.format(dataset, metric2str(train_info['loss'], train_info['accuracy']), metric2str(valid_info['loss'], valid_info['accuracy']), metric2str(test__info['loss'], test__info['accuracy']))
strings += [str1, str2]
if show: print('\n'.join(strings))
return strings
"""
This is the class for API of NAS-Bench-201.
"""
class NASBench201API(object):
""" The initialization function that takes the dataset file path (or a dict loaded from that path) as input. """
def __init__(self, file_path_or_dict: Union[Text, Dict], verbose: bool=True):
self.filename = None
if isinstance(file_path_or_dict, str) or isinstance(file_path_or_dict, Path):
file_path_or_dict = str(file_path_or_dict)
if verbose: print('try to create the NAS-Bench-201 api from {:}'.format(file_path_or_dict))
assert os.path.isfile(file_path_or_dict), 'invalid path : {:}'.format(file_path_or_dict)
self.filename = Path(file_path_or_dict).name
file_path_or_dict = torch.load(file_path_or_dict, map_location='cpu')
elif isinstance(file_path_or_dict, dict):
file_path_or_dict = copy.deepcopy( file_path_or_dict )
else: raise ValueError('invalid type : {:} not in [str, dict]'.format(type(file_path_or_dict)))
assert isinstance(file_path_or_dict, dict), 'It should be a dict instead of {:}'.format(type(file_path_or_dict))
self.verbose = verbose # [TODO] a flag indicating whether to print more logs
keys = ('meta_archs', 'arch2infos', 'evaluated_indexes')
for key in keys: assert key in file_path_or_dict, 'Can not find key[{:}] in the dict'.format(key)
self.meta_archs = copy.deepcopy( file_path_or_dict['meta_archs'] )
self.arch2infos_less = OrderedDict()
self.arch2infos_full = OrderedDict()
for xkey in sorted(list(file_path_or_dict['arch2infos'].keys())):
all_info = file_path_or_dict['arch2infos'][xkey]
self.arch2infos_less[xkey] = ArchResults.create_from_state_dict( all_info['less'] )
self.arch2infos_full[xkey] = ArchResults.create_from_state_dict( all_info['full'] )
self.evaluated_indexes = sorted(list(file_path_or_dict['evaluated_indexes']))
self.archstr2index = {}
for idx, arch in enumerate(self.meta_archs):
#assert arch.tostr() not in self.archstr2index, 'This [{:}]-th arch {:} already in the dict ({:}).'.format(idx, arch, self.archstr2index[arch.tostr()])
assert arch not in self.archstr2index, 'This [{:}]-th arch {:} already in the dict ({:}).'.format(idx, arch, self.archstr2index[arch])
self.archstr2index[ arch ] = idx
def __getitem__(self, index: int):
return copy.deepcopy( self.meta_archs[index] )
def __len__(self):
return len(self.meta_archs)
def __repr__(self):
return ('{name}({num}/{total} architectures, file={filename})'.format(name=self.__class__.__name__, num=len(self.evaluated_indexes), total=len(self.meta_archs), filename=self.filename))
def random(self):
"""Return a random index of all architectures."""
return random.randint(0, len(self.meta_archs)-1)
# This function is used to query the index of an architecture in the search space.
# The input arch can be an architecture string such as '|nor_conv_3x3~0|+|nor_conv_3x3~0|avg_pool_3x3~1|+|skip_connect~0|nor_conv_3x3~1|skip_connect~2|'
# or an instance that has the 'tostr' function that can generate the architecture string.
# This function will return the index.
# If return -1, it means this architecture is not in the search space.
# Otherwise, it will return an int in [0, the-number-of-candidates-in-the-search-space).
def query_index_by_arch(self, arch):
if isinstance(arch, str):
if arch in self.archstr2index: arch_index = self.archstr2index[ arch ]
else : arch_index = -1
elif hasattr(arch, 'tostr'):
if arch.tostr() in self.archstr2index: arch_index = self.archstr2index[ arch.tostr() ]
else : arch_index = -1
else: arch_index = -1
return arch_index
def reload(self, archive_root: Text, index: int):
"""Overwrite all information of the 'index'-th architecture in the search space.
It will load its data from 'archive_root'.
"""
assert os.path.isdir(archive_root), 'invalid directory : {:}'.format(archive_root)
xfile_path = os.path.join(archive_root, '{:06d}-FULL.pth'.format(index))
assert 0 <= index < len(self.meta_archs), 'invalid index of {:}'.format(index)
assert os.path.isfile(xfile_path), 'invalid data path : {:}'.format(xfile_path)
xdata = torch.load(xfile_path, map_location='cpu')
assert isinstance(xdata, dict) and 'full' in xdata and 'less' in xdata, 'invalid format of data in {:}'.format(xfile_path)
if index in self.arch2infos_less: del self.arch2infos_less[index]
if index in self.arch2infos_full: del self.arch2infos_full[index]
self.arch2infos_less[index] = ArchResults.create_from_state_dict( xdata['less'] )
self.arch2infos_full[index] = ArchResults.create_from_state_dict( xdata['full'] )
def clear_params(self, index: int, use_12epochs_result: Union[bool, None]):
"""Remove the architecture's weights to save memory.
:arg
index: the index of the target architecture
use_12epochs_result: a flag to controll how to clear the parameters.
-- None: clear all the weights in both `less` and `full`, which indicates the training hyper-parameters.
-- True: clear all the weights in arch2infos_less, which by default is 12-epoch-training result.
-- False: clear all the weights in arch2infos_full, which by default is 200-epoch-training result.
"""
if use_12epochs_result is None:
self.arch2infos_less[index].clear_params()
self.arch2infos_full[index].clear_params()
else:
if use_12epochs_result: arch2infos = self.arch2infos_less
else : arch2infos = self.arch2infos_full
arch2infos[index].clear_params()
# This function is used to query the information of a specific archiitecture
# 'arch' can be an architecture index or an architecture string
# When use_12epochs_result=True, the hyper-parameters used to train a model are in 'configs/nas-benchmark/CIFAR.config'
# When use_12epochs_result=False, the hyper-parameters used to train a model are in 'configs/nas-benchmark/LESS.config'
# The difference between these two configurations are the number of training epochs, which is 200 in CIFAR.config and 12 in LESS.config.
def query_by_arch(self, arch, use_12epochs_result=False):
if isinstance(arch, int):
arch_index = arch
else:
arch_index = self.query_index_by_arch(arch)
if arch_index == -1: return None # the following two lines are used to support few training epochs
if use_12epochs_result: arch2infos = self.arch2infos_less
else : arch2infos = self.arch2infos_full
if arch_index in arch2infos:
strings = print_information(arch2infos[ arch_index ], 'arch-index={:}'.format(arch_index))
return '\n'.join(strings)
else:
print ('Find this arch-index : {:}, but this arch is not evaluated.'.format(arch_index))
return None
# This 'query_by_index' function is used to query information with the training of 12 epochs or 200 epochs.
# ------
# If use_12epochs_result=True, we train the model by 12 epochs (see config in configs/nas-benchmark/LESS.config)
# If use_12epochs_result=False, we train the model by 200 epochs (see config in configs/nas-benchmark/CIFAR.config)
# ------
# If dataname is None, return the ArchResults
# else, return a dict with all trials on that dataset (the key is the seed)
# Options are 'cifar10-valid', 'cifar10', 'cifar100', 'ImageNet16-120'.
# -- cifar10-valid : training the model on the CIFAR-10 training set.
# -- cifar10 : training the model on the CIFAR-10 training + validation set.
# -- cifar100 : training the model on the CIFAR-100 training set.
# -- ImageNet16-120 : training the model on the ImageNet16-120 training set.
def query_by_index(self, arch_index: int, dataname: Union[None, Text] = None,
use_12epochs_result: bool = False):
if use_12epochs_result: basestr, arch2infos = '12epochs' , self.arch2infos_less
else : basestr, arch2infos = '200epochs', self.arch2infos_full
assert arch_index in arch2infos, 'arch_index [{:}] does not in arch2info with {:}'.format(arch_index, basestr)
archInfo = copy.deepcopy( arch2infos[ arch_index ] )
if dataname is None: return archInfo
else:
assert dataname in archInfo.get_dataset_names(), 'invalid dataset-name : {:}'.format(dataname)
info = archInfo.query(dataname)
return info
def query_meta_info_by_index(self, arch_index, use_12epochs_result=False):
if use_12epochs_result: basestr, arch2infos = '12epochs' , self.arch2infos_less
else : basestr, arch2infos = '200epochs', self.arch2infos_full
assert arch_index in arch2infos, 'arch_index [{:}] does not in arch2info with {:}'.format(arch_index, basestr)
archInfo = copy.deepcopy( arch2infos[ arch_index ] )
return archInfo
def find_best(self, dataset, metric_on_set, FLOP_max=None, Param_max=None, use_12epochs_result=False):
"""Find the architecture with the highest accuracy based on some constraints."""
if use_12epochs_result: basestr, arch2infos = '12epochs' , self.arch2infos_less
else : basestr, arch2infos = '200epochs', self.arch2infos_full
best_index, highest_accuracy = -1, None
for i, idx in enumerate(self.evaluated_indexes):
info = arch2infos[idx].get_compute_costs(dataset)
flop, param, latency = info['flops'], info['params'], info['latency']
if FLOP_max is not None and flop > FLOP_max : continue
if Param_max is not None and param > Param_max: continue
xinfo = arch2infos[idx].get_metrics(dataset, metric_on_set)
loss, accuracy = xinfo['loss'], xinfo['accuracy']
if best_index == -1:
best_index, highest_accuracy = idx, accuracy
elif highest_accuracy < accuracy:
best_index, highest_accuracy = idx, accuracy
return best_index, highest_accuracy
def arch(self, index: int):
"""Return the topology structure of the `index`-th architecture."""
assert 0 <= index < len(self.meta_archs), 'invalid index : {:} vs. {:}.'.format(index, len(self.meta_archs))
return copy.deepcopy(self.meta_archs[index])
def get_net_param(self, index, dataset, seed, use_12epochs_result=False):
"""
This function is used to obtain the trained weights of the `index`-th architecture on `dataset` with the seed of `seed`
Args [seed]:
-- None : return a dict containing the trained weights of all trials, where each key is a seed and its corresponding value is the weights.
-- a interger : return the weights of a specific trial, whose seed is this interger.
Args [use_12epochs_result]:
-- True : train the model by 12 epochs
-- False : train the model by 200 epochs
"""
if use_12epochs_result: arch2infos = self.arch2infos_less
else: arch2infos = self.arch2infos_full
arch_result = arch2infos[index]
return arch_result.get_net_param(dataset, seed)
def get_net_config(self, index: int, dataset: Text):
"""
This function is used to obtain the configuration for the `index`-th architecture on `dataset`.
Args [dataset] (4 possible options):
-- cifar10-valid : training the model on the CIFAR-10 training set.
-- cifar10 : training the model on the CIFAR-10 training + validation set.
-- cifar100 : training the model on the CIFAR-100 training set.
-- ImageNet16-120 : training the model on the ImageNet16-120 training set.
This function will return a dict.
========= Some examlpes for using this function:
config = api.get_net_config(128, 'cifar10')
"""
archresult = self.arch2infos_full[index]
all_results = archresult.query(dataset, None)
if len(all_results) == 0: raise ValueError('can not find one valid trial for the {:}-th architecture on {:}'.format(index, dataset))
for seed, result in all_results.items():
return result.get_config(None)
#print ('SEED [{:}] : {:}'.format(seed, result))
raise ValueError('Impossible to reach here!')
def get_cost_info(self, index: int, dataset: Text, use_12epochs_result: bool = False) -> Dict[Text, float]:
"""To obtain the cost metric for the `index`-th architecture on a dataset."""
if use_12epochs_result: arch2infos = self.arch2infos_less
else: arch2infos = self.arch2infos_full
arch_result = arch2infos[index]
return arch_result.get_compute_costs(dataset)
def get_latency(self, index: int, dataset: Text, use_12epochs_result: bool = False) -> float:
"""
To obtain the latency of the network (by default it will return the latency with the batch size of 256).
:param index: the index of the target architecture
:param dataset: the dataset name (cifar10-valid, cifar10, cifar100, ImageNet16-120)
:return: return a float value in seconds
"""
cost_dict = self.get_cost_info(index, dataset, use_12epochs_result)
return cost_dict['latency']
# obtain the metric for the `index`-th architecture
# `dataset` indicates the dataset:
# 'cifar10-valid' : using the proposed train set of CIFAR-10 as the training set
# 'cifar10' : using the proposed train+valid set of CIFAR-10 as the training set
# 'cifar100' : using the proposed train set of CIFAR-100 as the training set
# 'ImageNet16-120' : using the proposed train set of ImageNet-16-120 as the training set
# `iepoch` indicates the index of training epochs from 0 to 11/199.
# When iepoch=None, it will return the metric for the last training epoch
# When iepoch=11, it will return the metric for the 11-th training epoch (starting from 0)
# `use_12epochs_result` indicates different hyper-parameters for training
# When use_12epochs_result=True, it trains the network with 12 epochs and the LR decayed from 0.1 to 0 within 12 epochs
# When use_12epochs_result=False, it trains the network with 200 epochs and the LR decayed from 0.1 to 0 within 200 epochs
# `is_random`
# When is_random=True, the performance of a random architecture will be returned
# When is_random=False, the performanceo of all trials will be averaged.
def get_more_info(self, index: int, dataset, iepoch=None, use_12epochs_result=False, is_random=True):
if use_12epochs_result: basestr, arch2infos = '12epochs' , self.arch2infos_less
else : basestr, arch2infos = '200epochs', self.arch2infos_full
archresult = arch2infos[index]
# if randomly select one trial, select the seed at first
if isinstance(is_random, bool) and is_random:
seeds = archresult.get_dataset_seeds(dataset)
is_random = random.choice(seeds)
# collect the training information
train_info = archresult.get_metrics(dataset, 'train', iepoch=iepoch, is_random=is_random)
total = train_info['iepoch'] + 1
xinfo = {'train-loss' : train_info['loss'],
'train-accuracy': train_info['accuracy'],
'train-per-time': train_info['all_time'] / total,
'train-all-time': train_info['all_time']}
# collect the evaluation information
if dataset == 'cifar10-valid':
valid_info = archresult.get_metrics(dataset, 'x-valid', iepoch=iepoch, is_random=is_random)
try:
test_info = archresult.get_metrics(dataset, 'ori-test', iepoch=iepoch, is_random=is_random)
except:
test_info = None
valtest_info = None
else:
try: # collect results on the proposed test set
if dataset == 'cifar10':
test_info = archresult.get_metrics(dataset, 'ori-test', iepoch=iepoch, is_random=is_random)
else:
test_info = archresult.get_metrics(dataset, 'x-test', iepoch=iepoch, is_random=is_random)
except:
test_info = None
try: # collect results on the proposed validation set
valid_info = archresult.get_metrics(dataset, 'x-valid', iepoch=iepoch, is_random=is_random)
except:
valid_info = None
try:
if dataset != 'cifar10':
valtest_info = archresult.get_metrics(dataset, 'ori-test', iepoch=iepoch, is_random=is_random)
else:
valtest_info = None
except:
valtest_info = None
if valid_info is not None:
xinfo['valid-loss'] = valid_info['loss']
xinfo['valid-accuracy'] = valid_info['accuracy']
xinfo['valid-per-time'] = valid_info['all_time'] / total
xinfo['valid-all-time'] = valid_info['all_time']
if test_info is not None:
xinfo['test-loss'] = test_info['loss']
xinfo['test-accuracy'] = test_info['accuracy']
xinfo['test-per-time'] = test_info['all_time'] / total
xinfo['test-all-time'] = test_info['all_time']
if valtest_info is not None:
xinfo['valtest-loss'] = valtest_info['loss']
xinfo['valtest-accuracy'] = valtest_info['accuracy']
xinfo['valtest-per-time'] = valtest_info['all_time'] / total
xinfo['valtest-all-time'] = valtest_info['all_time']
return xinfo
""" # The following logic is deprecated after March 15 2020, where the benchmark file upgrades from NAS-Bench-201-v1_0-e61699.pth to NAS-Bench-201-v1_1-096897.pth.
def get_more_info(self, index: int, dataset, iepoch=None, use_12epochs_result=False, is_random=True):
if use_12epochs_result: basestr, arch2infos = '12epochs' , self.arch2infos_less
else : basestr, arch2infos = '200epochs', self.arch2infos_full
archresult = arch2infos[index]
# if randomly select one trial, select the seed at first
if isinstance(is_random, bool) and is_random:
seeds = archresult.get_dataset_seeds(dataset)
is_random = random.choice(seeds)
if dataset == 'cifar10-valid':
train_info = archresult.get_metrics(dataset, 'train' , iepoch=iepoch, is_random=is_random)
valid_info = archresult.get_metrics(dataset, 'x-valid' , iepoch=iepoch, is_random=is_random)
try:
test__info = archresult.get_metrics(dataset, 'ori-test', iepoch=iepoch, is_random=is_random)
except:
test__info = None
total = train_info['iepoch'] + 1
xifo = {'train-loss' : train_info['loss'],
'train-accuracy': train_info['accuracy'],
'train-per-time': None if train_info['all_time'] is None else train_info['all_time'] / total,
'train-all-time': train_info['all_time'],
'valid-loss' : valid_info['loss'],
'valid-accuracy': valid_info['accuracy'],
'valid-all-time': valid_info['all_time'],
'valid-per-time': None if valid_info['all_time'] is None else valid_info['all_time'] / total}
if test__info is not None:
xifo['test-loss'] = test__info['loss']
xifo['test-accuracy'] = test__info['accuracy']
return xifo
else:
train_info = archresult.get_metrics(dataset, 'train' , iepoch=iepoch, is_random=is_random)
try:
if dataset == 'cifar10':
test__info = archresult.get_metrics(dataset, 'ori-test', iepoch=iepoch, is_random=is_random)
else:
test__info = archresult.get_metrics(dataset, 'x-test', iepoch=iepoch, is_random=is_random)
except:
test__info = None
try:
valid_info = archresult.get_metrics(dataset, 'x-valid', iepoch=iepoch, is_random=is_random)
except:
valid_info = None
try:
est_valid_info = archresult.get_metrics(dataset, 'ori-test', iepoch=iepoch, is_random=is_random)
except:
est_valid_info = None
xifo = {'train-loss' : train_info['loss'],
'train-accuracy': train_info['accuracy']}
if test__info is not None:
xifo['test-loss'] = test__info['loss'],
xifo['test-accuracy'] = test__info['accuracy']
if valid_info is not None:
xifo['valid-loss'] = valid_info['loss']
xifo['valid-accuracy'] = valid_info['accuracy']
if est_valid_info is not None:
xifo['est-valid-loss'] = est_valid_info['loss']
xifo['est-valid-accuracy'] = est_valid_info['accuracy']
return xifo
"""
def show(self, index: int = -1) -> None:
"""
This function will print the information of a specific (or all) architecture(s).
:param index: If the index < 0: it will loop for all architectures and print their information one by one.
else: it will print the information of the 'index'-th archiitecture.
:return: nothing
"""
if index < 0: # show all architectures
print(self)
for i, idx in enumerate(self.evaluated_indexes):
print('\n' + '-' * 10 + ' The ({:5d}/{:5d}) {:06d}-th architecture! '.format(i, len(self.evaluated_indexes), idx) + '-'*10)
print('arch : {:}'.format(self.meta_archs[idx]))
strings = print_information(self.arch2infos_full[idx])
print('>' * 40 + ' {:03d} epochs '.format(self.arch2infos_full[idx].get_total_epoch()) + '>' * 40)
print('\n'.join(strings))
strings = print_information(self.arch2infos_less[idx])
print('>' * 40 + ' {:03d} epochs '.format(self.arch2infos_less[idx].get_total_epoch()) + '>' * 40)
print('\n'.join(strings))
print('<' * 40 + '------------' + '<' * 40)
else:
if 0 <= index < len(self.meta_archs):
if index not in self.evaluated_indexes: print('The {:}-th architecture has not been evaluated or not saved.'.format(index))
else:
strings = print_information(self.arch2infos_full[index])
print('>' * 40 + ' {:03d} epochs '.format(self.arch2infos_full[index].get_total_epoch()) + '>' * 40)
print('\n'.join(strings))
strings = print_information(self.arch2infos_less[index])
print('>' * 40 + ' {:03d} epochs '.format(self.arch2infos_less[index].get_total_epoch()) + '>' * 40)
print('\n'.join(strings))
print('<' * 40 + '------------' + '<' * 40)
else:
print('This index ({:}) is out of range (0~{:}).'.format(index, len(self.meta_archs)))
def statistics(self, dataset: Text, use_12epochs_result: bool) -> Dict[int, int]:
"""
This function will count the number of total trials.
"""
valid_datasets = ['cifar10-valid', 'cifar10', 'cifar100', 'ImageNet16-120']
if dataset not in valid_datasets:
raise ValueError('{:} not in {:}'.format(dataset, valid_datasets))
if use_12epochs_result: arch2infos = self.arch2infos_less
else : arch2infos = self.arch2infos_full
nums = defaultdict(lambda: 0)
for index in range(len(self)):
archInfo = arch2infos[index]
dataset_seed = archInfo.dataset_seed
if dataset not in dataset_seed:
nums[0] += 1
else:
nums[len(dataset_seed[dataset])] += 1
return dict(nums)
@staticmethod
def str2lists(arch_str: Text) -> List[tuple]:
"""
This function shows how to read the string-based architecture encoding.
It is the same as the `str2structure` func in `AutoDL-Projects/lib/models/cell_searchs/genotypes.py`
:param
arch_str: the input is a string indicates the architecture topology, such as
|nor_conv_1x1~0|+|none~0|none~1|+|none~0|none~1|skip_connect~2|
:return: a list of tuple, contains multiple (op, input_node_index) pairs.
:usage
arch = api.str2lists( '|nor_conv_1x1~0|+|none~0|none~1|+|none~0|none~1|skip_connect~2|' )
print ('there are {:} nodes in this arch'.format(len(arch)+1)) # arch is a list
for i, node in enumerate(arch):
print('the {:}-th node is the sum of these {:} nodes with op: {:}'.format(i+1, len(node), node))
"""
node_strs = arch_str.split('+')
genotypes = []
for i, node_str in enumerate(node_strs):
inputs = list(filter(lambda x: x != '', node_str.split('|')))
for xinput in inputs: assert len(xinput.split('~')) == 2, 'invalid input length : {:}'.format(xinput)
inputs = ( xi.split('~') for xi in inputs )
input_infos = tuple( (op, int(IDX)) for (op, IDX) in inputs)
genotypes.append( input_infos )
return genotypes
@staticmethod
def str2matrix(arch_str: Text,
search_space: List[Text] = ['none', 'skip_connect', 'nor_conv_1x1', 'nor_conv_3x3', 'avg_pool_3x3']) -> np.ndarray:
"""
This func shows how to convert the string-based architecture encoding to the encoding strategy in NAS-Bench-101.
:param
arch_str: the input is a string indicates the architecture topology, such as
|nor_conv_1x1~0|+|none~0|none~1|+|none~0|none~1|skip_connect~2|
search_space: a list of operation string, the default list is the search space for NAS-Bench-201
the default value should be be consistent with this line https://github.com/D-X-Y/AutoDL-Projects/blob/master/lib/models/cell_operations.py#L24
:return
the numpy matrix (2-D np.ndarray) representing the DAG of this architecture topology
:usage
matrix = api.str2matrix( '|nor_conv_1x1~0|+|none~0|none~1|+|none~0|none~1|skip_connect~2|' )
This matrix is 4-by-4 matrix representing a cell with 4 nodes (only the lower left triangle is useful).
[ [0, 0, 0, 0], # the first line represents the input (0-th) node
[2, 0, 0, 0], # the second line represents the 1-st node, is calculated by 2-th-op( 0-th-node )
[0, 0, 0, 0], # the third line represents the 2-nd node, is calculated by 0-th-op( 0-th-node ) + 0-th-op( 1-th-node )
[0, 0, 1, 0] ] # the fourth line represents the 3-rd node, is calculated by 0-th-op( 0-th-node ) + 0-th-op( 1-th-node ) + 1-th-op( 2-th-node )
In NAS-Bench-201 search space, 0-th-op is 'none', 1-th-op is 'skip_connect',
2-th-op is 'nor_conv_1x1', 3-th-op is 'nor_conv_3x3', 4-th-op is 'avg_pool_3x3'.
:(NOTE)
If a node has two input-edges from the same node, this function does not work. One edge will be overlapped.
"""
node_strs = arch_str.split('+')
num_nodes = len(node_strs) + 1
matrix = np.zeros((num_nodes, num_nodes))
for i, node_str in enumerate(node_strs):
inputs = list(filter(lambda x: x != '', node_str.split('|')))
for xinput in inputs: assert len(xinput.split('~')) == 2, 'invalid input length : {:}'.format(xinput)
for xi in inputs:
op, idx = xi.split('~')
if op not in search_space: raise ValueError('this op ({:}) is not in {:}'.format(op, search_space))
op_idx, node_idx = search_space.index(op), int(idx)
matrix[i+1, node_idx] = op_idx
return matrix
class ArchResults(object):
def __init__(self, arch_index, arch_str):
self.arch_index = int(arch_index)
self.arch_str = copy.deepcopy(arch_str)
self.all_results = dict()
self.dataset_seed = dict()
self.clear_net_done = False
def get_compute_costs(self, dataset):
x_seeds = self.dataset_seed[dataset]
results = [self.all_results[ (dataset, seed) ] for seed in x_seeds]
flops = [result.flop for result in results]
params = [result.params for result in results]
latencies = [result.get_latency() for result in results]
latencies = [x for x in latencies if x > 0]
mean_latency = np.mean(latencies) if len(latencies) > 0 else None
time_infos = defaultdict(list)
for result in results:
time_info = result.get_times()
for key, value in time_info.items(): time_infos[key].append( value )
info = {'flops' : np.mean(flops),
'params' : np.mean(params),
'latency': mean_latency}
for key, value in time_infos.items():
if len(value) > 0 and value[0] is not None:
info[key] = np.mean(value)
else: info[key] = None
return info
def get_metrics(self, dataset, setname, iepoch=None, is_random=False):
"""
This `get_metrics` function is used to obtain obtain the loss, accuracy, etc information on a specific dataset.
If not specify, each set refer to the proposed split in NAS-Bench-201 paper.
If some args return None or raise error, then it is not avaliable.
========================================
Args [dataset] (4 possible options):
-- cifar10-valid : training the model on the CIFAR-10 training set.
-- cifar10 : training the model on the CIFAR-10 training + validation set.
-- cifar100 : training the model on the CIFAR-100 training set.
-- ImageNet16-120 : training the model on the ImageNet16-120 training set.
Args [setname] (each dataset has different setnames):
-- When dataset = cifar10-valid, you can use 'train', 'x-valid', 'ori-test'
------ 'train' : the metric on the training set.
------ 'x-valid' : the metric on the validation set.
------ 'ori-test' : the metric on the test set.
-- When dataset = cifar10, you can use 'train', 'ori-test'.
------ 'train' : the metric on the training + validation set.
------ 'ori-test' : the metric on the test set.
-- When dataset = cifar100 or ImageNet16-120, you can use 'train', 'ori-test', 'x-valid', 'x-test'
------ 'train' : the metric on the training set.
------ 'x-valid' : the metric on the validation set.
------ 'x-test' : the metric on the test set.
------ 'ori-test' : the metric on the validation + test set.
Args [iepoch] (None or an integer in [0, the-number-of-total-training-epochs)
------ None : return the metric after the last training epoch.
------ an integer i : return the metric after the i-th training epoch.
Args [is_random]:
------ True : return the metric of a randomly selected trial.
------ False : return the averaged metric of all avaliable trials.
------ an integer indicating the 'seed' value : return the metric of a specific trial (whose random seed is 'is_random').
"""
x_seeds = self.dataset_seed[dataset]
results = [self.all_results[ (dataset, seed) ] for seed in x_seeds]
infos = defaultdict(list)
for result in results:
if setname == 'train':
info = result.get_train(iepoch)
else:
info = result.get_eval(setname, iepoch)
for key, value in info.items(): infos[key].append( value )
return_info = dict()
if isinstance(is_random, bool) and is_random: # randomly select one
index = random.randint(0, len(results)-1)
for key, value in infos.items(): return_info[key] = value[index]
elif isinstance(is_random, bool) and not is_random: # average
for key, value in infos.items():
if len(value) > 0 and value[0] is not None:
return_info[key] = np.mean(value)
else: return_info[key] = None
elif isinstance(is_random, int): # specify the seed
if is_random not in x_seeds: raise ValueError('can not find random seed ({:}) from {:}'.format(is_random, x_seeds))
index = x_seeds.index(is_random)
for key, value in infos.items(): return_info[key] = value[index]
else:
raise ValueError('invalid value for is_random: {:}'.format(is_random))
return return_info
def show(self, is_print=False):
return print_information(self, None, is_print)
def get_dataset_names(self):
return list(self.dataset_seed.keys())
def get_dataset_seeds(self, dataset):
return copy.deepcopy( self.dataset_seed[dataset] )
def get_net_param(self, dataset: Text, seed: Union[None, int] =None):
"""
This function will return the trained network's weights on the 'dataset'.
:arg
dataset: one of 'cifar10-valid', 'cifar10', 'cifar100', and 'ImageNet16-120'.
seed: an integer indicates the seed value or None that indicates returing all trials.
"""
if seed is None:
x_seeds = self.dataset_seed[dataset]
return {seed: self.all_results[(dataset, seed)].get_net_param() for seed in x_seeds}
else:
return self.all_results[(dataset, seed)].get_net_param()
def reset_latency(self, dataset: Text, seed: Union[None, Text], latency: float) -> None:
"""This function is used to reset the latency in all corresponding ResultsCount(s)."""
if seed is None:
for seed in self.dataset_seed[dataset]:
self.all_results[(dataset, seed)].update_latency([latency])
else:
self.all_results[(dataset, seed)].update_latency([latency])
def reset_pseudo_train_times(self, dataset: Text, seed: Union[None, Text], estimated_per_epoch_time: float) -> None:
"""This function is used to reset the train-times in all corresponding ResultsCount(s)."""
if seed is None:
for seed in self.dataset_seed[dataset]:
self.all_results[(dataset, seed)].reset_pseudo_train_times(estimated_per_epoch_time)
else:
self.all_results[(dataset, seed)].reset_pseudo_train_times(estimated_per_epoch_time)
def reset_pseudo_eval_times(self, dataset: Text, seed: Union[None, Text], eval_name: Text, estimated_per_epoch_time: float) -> None:
"""This function is used to reset the eval-times in all corresponding ResultsCount(s)."""
if seed is None:
for seed in self.dataset_seed[dataset]:
self.all_results[(dataset, seed)].reset_pseudo_eval_times(eval_name, estimated_per_epoch_time)
else:
self.all_results[(dataset, seed)].reset_pseudo_eval_times(eval_name, estimated_per_epoch_time)
def get_latency(self, dataset: Text) -> float:
"""Get the latency of a model on the target dataset. [Timestamp: 2020.03.09]"""
latencies = []
for seed in self.dataset_seed[dataset]:
latency = self.all_results[(dataset, seed)].get_latency()
if not isinstance(latency, float) or latency <= 0:
raise ValueError('invalid latency of {:} for {:} with {:}'.format(dataset))
latencies.append(latency)
return sum(latencies) / len(latencies)
def get_total_epoch(self, dataset=None):
"""Return the total number of training epochs."""
if dataset is None:
epochss = []
for xdata, x_seeds in self.dataset_seed.items():
epochss += [self.all_results[(xdata, seed)].get_total_epoch() for seed in x_seeds]
elif isinstance(dataset, str):
x_seeds = self.dataset_seed[dataset]
epochss = [self.all_results[(dataset, seed)].get_total_epoch() for seed in x_seeds]
else:
raise ValueError('invalid dataset={:}'.format(dataset))
if len(set(epochss)) > 1: raise ValueError('Each trial mush have the same number of training epochs : {:}'.format(epochss))
return epochss[-1]
def query(self, dataset, seed=None):
"""Return the ResultsCount object (containing all information of a single trial) for 'dataset' and 'seed'"""
if seed is None:
x_seeds = self.dataset_seed[dataset]
return {seed: self.all_results[(dataset, seed)] for seed in x_seeds}
else:
return self.all_results[(dataset, seed)]
def arch_idx_str(self):
return '{:06d}'.format(self.arch_index)
def update(self, dataset_name, seed, result):
if dataset_name not in self.dataset_seed:
self.dataset_seed[dataset_name] = []
assert seed not in self.dataset_seed[dataset_name], '{:}-th arch alreadly has this seed ({:}) on {:}'.format(self.arch_index, seed, dataset_name)
self.dataset_seed[ dataset_name ].append( seed )
self.dataset_seed[ dataset_name ] = sorted( self.dataset_seed[ dataset_name ] )
assert (dataset_name, seed) not in self.all_results
self.all_results[ (dataset_name, seed) ] = result
self.clear_net_done = False
def state_dict(self):
state_dict = dict()
for key, value in self.__dict__.items():
if key == 'all_results': # contain the class of ResultsCount
xvalue = dict()
assert isinstance(value, dict), 'invalid type of value for {:} : {:}'.format(key, type(value))
for _k, _v in value.items():
assert isinstance(_v, ResultsCount), 'invalid type of value for {:}/{:} : {:}'.format(key, _k, type(_v))
xvalue[_k] = _v.state_dict()
else:
xvalue = value
state_dict[key] = xvalue
return state_dict
def load_state_dict(self, state_dict):
new_state_dict = dict()
for key, value in state_dict.items():
if key == 'all_results': # to convert to the class of ResultsCount
xvalue = dict()
assert isinstance(value, dict), 'invalid type of value for {:} : {:}'.format(key, type(value))
for _k, _v in value.items():
xvalue[_k] = ResultsCount.create_from_state_dict(_v)
else: xvalue = value
new_state_dict[key] = xvalue
self.__dict__.update(new_state_dict)
@staticmethod
def create_from_state_dict(state_dict_or_file):
x = ArchResults(-1, -1)
if isinstance(state_dict_or_file, str): # a file path
state_dict = torch.load(state_dict_or_file, map_location='cpu')
elif isinstance(state_dict_or_file, dict):
state_dict = state_dict_or_file
else:
raise ValueError('invalid type of state_dict_or_file : {:}'.format(type(state_dict_or_file)))
x.load_state_dict(state_dict)
return x
# This function is used to clear the weights saved in each 'result'
# This can help reduce the memory footprint.
def clear_params(self):
for key, result in self.all_results.items():
del result.net_state_dict
result.net_state_dict = None
self.clear_net_done = True
def debug_test(self):
"""This function is used for me to debug and test, which will call most methods."""
all_dataset = ['cifar10-valid', 'cifar10', 'cifar100', 'ImageNet16-120']
for dataset in all_dataset:
print('---->>>> {:}'.format(dataset))
print('The latency on {:} is {:} s'.format(dataset, self.get_latency(dataset)))
for seed in self.dataset_seed[dataset]:
result = self.all_results[(dataset, seed)]
print(' ==>> result = {:}'.format(result))
print(' ==>> cost = {:}'.format(result.get_times()))
def __repr__(self):
return ('{name}(arch-index={index}, arch={arch}, {num} runs, clear={clear})'.format(name=self.__class__.__name__, index=self.arch_index, arch=self.arch_str, num=len(self.all_results), clear=self.clear_net_done))
"""
This class (ResultsCount) is used to save the information of one trial for a single architecture.
I did not write much comment for this class, because it is the lowest-level class in NAS-Bench-201 API, which will be rarely called.
If you have any question regarding this class, please open an issue or email me.
"""
class ResultsCount(object):
def __init__(self, name, state_dict, train_accs, train_losses, params, flop, arch_config, seed, epochs, latency):
self.name = name
self.net_state_dict = state_dict
self.train_acc1es = copy.deepcopy(train_accs)
self.train_acc5es = None
self.train_losses = copy.deepcopy(train_losses)
self.train_times = None
self.arch_config = copy.deepcopy(arch_config)
self.params = params
self.flop = flop
self.seed = seed
self.epochs = epochs
self.latency = latency
# evaluation results
self.reset_eval()
def update_train_info(self, train_acc1es, train_acc5es, train_losses, train_times) -> None:
self.train_acc1es = train_acc1es
self.train_acc5es = train_acc5es
self.train_losses = train_losses
self.train_times = train_times
def reset_pseudo_train_times(self, estimated_per_epoch_time: float) -> None:
"""Assign the training times."""
train_times = OrderedDict()
for i in range(self.epochs):
train_times[i] = estimated_per_epoch_time
self.train_times = train_times
def reset_pseudo_eval_times(self, eval_name: Text, estimated_per_epoch_time: float) -> None:
"""Assign the evaluation times."""
if eval_name not in self.eval_names: raise ValueError('invalid eval name : {:}'.format(eval_name))
for i in range(self.epochs):
self.eval_times['{:}@{:}'.format(eval_name,i)] = estimated_per_epoch_time
def reset_eval(self):
self.eval_names = []
self.eval_acc1es = {}
self.eval_times = {}
self.eval_losses = {}
def update_latency(self, latency):
self.latency = copy.deepcopy( latency )
def get_latency(self) -> float:
"""Return the latency value in seconds. -1 represents not avaliable ; otherwise it should be a float value"""
if self.latency is None: return -1.0
else: return sum(self.latency) / len(self.latency)
def update_eval(self, accs, losses, times): # new version
data_names = set([x.split('@')[0] for x in accs.keys()])
for data_name in data_names:
assert data_name not in self.eval_names, '{:} has already been added into eval-names'.format(data_name)
self.eval_names.append( data_name )
for iepoch in range(self.epochs):
xkey = '{:}@{:}'.format(data_name, iepoch)
self.eval_acc1es[ xkey ] = accs[ xkey ]
self.eval_losses[ xkey ] = losses[ xkey ]
self.eval_times [ xkey ] = times[ xkey ]
def update_OLD_eval(self, name, accs, losses): # old version
assert name not in self.eval_names, '{:} has already added'.format(name)
self.eval_names.append( name )
for iepoch in range(self.epochs):
if iepoch in accs:
self.eval_acc1es['{:}@{:}'.format(name,iepoch)] = accs[iepoch]
self.eval_losses['{:}@{:}'.format(name,iepoch)] = losses[iepoch]
def __repr__(self):
num_eval = len(self.eval_names)
set_name = '[' + ', '.join(self.eval_names) + ']'
return ('{name}({xname}, arch={arch}, FLOP={flop:.2f}M, Param={param:.3f}MB, seed={seed}, {num_eval} eval-sets: {set_name})'.format(name=self.__class__.__name__, xname=self.name, arch=self.arch_config['arch_str'], flop=self.flop, param=self.params, seed=self.seed, num_eval=num_eval, set_name=set_name))
def get_total_epoch(self):
return copy.deepcopy(self.epochs)
def get_times(self):
"""Obtain the information regarding both training and evaluation time."""
if self.train_times is not None and isinstance(self.train_times, dict):
train_times = list( self.train_times.values() )
time_info = {'T-train@epoch': np.mean(train_times), 'T-train@total': np.sum(train_times)}
else:
time_info = {'T-train@epoch': None, 'T-train@total': None }
for name in self.eval_names:
try:
xtimes = [self.eval_times['{:}@{:}'.format(name,i)] for i in range(self.epochs)]
time_info['T-{:}@epoch'.format(name)] = np.mean(xtimes)
time_info['T-{:}@total'.format(name)] = np.sum(xtimes)
except:
time_info['T-{:}@epoch'.format(name)] = None
time_info['T-{:}@total'.format(name)] = None
return time_info
def get_eval_set(self):
return self.eval_names
# get the training information
def get_train(self, iepoch=None):
if iepoch is None: iepoch = self.epochs-1
assert 0 <= iepoch < self.epochs, 'invalid iepoch={:} < {:}'.format(iepoch, self.epochs)
if self.train_times is not None:
xtime = self.train_times[iepoch]
atime = sum([self.train_times[i] for i in range(iepoch+1)])
else: xtime, atime = None, None
return {'iepoch' : iepoch,
'loss' : self.train_losses[iepoch],
'accuracy': self.train_acc1es[iepoch],
'cur_time': xtime,
'all_time': atime}
def get_eval(self, name, iepoch=None):
"""Get the evaluation information ; there could be multiple evaluation sets (identified by the 'name' argument)."""
if iepoch is None: iepoch = self.epochs-1
assert 0 <= iepoch < self.epochs, 'invalid iepoch={:} < {:}'.format(iepoch, self.epochs)
if isinstance(self.eval_times,dict) and len(self.eval_times) > 0:
xtime = self.eval_times['{:}@{:}'.format(name,iepoch)]
atime = sum([self.eval_times['{:}@{:}'.format(name,i)] for i in range(iepoch+1)])
else: xtime, atime = None, None
return {'iepoch' : iepoch,
'loss' : self.eval_losses['{:}@{:}'.format(name,iepoch)],
'accuracy': self.eval_acc1es['{:}@{:}'.format(name,iepoch)],
'cur_time': xtime,
'all_time': atime}
def get_net_param(self, clone=False):
if clone: return copy.deepcopy(self.net_state_dict)
else: return self.net_state_dict
def get_config(self, str2structure):
"""This function is used to obtain the config dict for this architecture."""
if str2structure is None:
return {'name': 'infer.tiny', 'C': self.arch_config['channel'],
'N' : self.arch_config['num_cells'],
'arch_str': self.arch_config['arch_str'], 'num_classes': self.arch_config['class_num']}
else:
return {'name': 'infer.tiny', 'C': self.arch_config['channel'],
'N' : self.arch_config['num_cells'],
'genotype': str2structure(self.arch_config['arch_str']), 'num_classes': self.arch_config['class_num']}
def state_dict(self):
_state_dict = {key: value for key, value in self.__dict__.items()}
return _state_dict
def load_state_dict(self, state_dict):
self.__dict__.update(state_dict)
@staticmethod
def create_from_state_dict(state_dict):
x = ResultsCount(None, None, None, None, None, None, None, None, None, None)
x.load_state_dict(state_dict)
return x

269
lib/nas_201_api/api_201.py Normal file
View File

@ -0,0 +1,269 @@
#####################################################
# Copyright (c) Xuanyi Dong [GitHub D-X-Y], 2019.08 #
############################################################################################
# NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search, ICLR 2020 #
############################################################################################
# The history of benchmark files:
# [2020.02.25] NAS-Bench-201-v1_0-e61699.pth : 6219 architectures are trained once, 1621 architectures are trained twice, 7785 architectures are trained three times. `LESS` only supports CIFAR10-VALID.
# [2020.03.16] NAS-Bench-201-v1_1-096897.pth : 2225 architectures are trained once, 5439 archiitectures are trained twice, 7961 architectures are trained three times on all training sets. For the hyper-parameters with the total epochs of 12, each model is trained on CIFAR-10, CIFAR-100, ImageNet16-120 once, and is trained on CIFAR-10-VALID twice.
#
# I'm still actively enhancing this benchmark. Please feel free to contact me if you have any question w.r.t. NAS-Bench-201.
#
import os, copy, random, torch, numpy as np
from pathlib import Path
from typing import List, Text, Union, Dict, Optional
from collections import OrderedDict, defaultdict
from .api_utils import ArchResults
from .api_utils import NASBenchMetaAPI
from .api_utils import remap_dataset_set_names
ALL_BENCHMARK_FILES = ['NAS-Bench-201-v1_0-e61699.pth', 'NAS-Bench-201-v1_1-096897.pth']
ALL_ARCHIVE_DIRS = ['NAS-Bench-201-v1_1-archive']
def print_information(information, extra_info=None, show=False):
dataset_names = information.get_dataset_names()
strings = [information.arch_str, 'datasets : {:}, extra-info : {:}'.format(dataset_names, extra_info)]
def metric2str(loss, acc):
return 'loss = {:.3f}, top1 = {:.2f}%'.format(loss, acc)
for ida, dataset in enumerate(dataset_names):
metric = information.get_compute_costs(dataset)
flop, param, latency = metric['flops'], metric['params'], metric['latency']
str1 = '{:14s} FLOP={:6.2f} M, Params={:.3f} MB, latency={:} ms.'.format(dataset, flop, param, '{:.2f}'.format(latency*1000) if latency is not None and latency > 0 else None)
train_info = information.get_metrics(dataset, 'train')
if dataset == 'cifar10-valid':
valid_info = information.get_metrics(dataset, 'x-valid')
str2 = '{:14s} train : [{:}], valid : [{:}]'.format(dataset, metric2str(train_info['loss'], train_info['accuracy']), metric2str(valid_info['loss'], valid_info['accuracy']))
elif dataset == 'cifar10':
test__info = information.get_metrics(dataset, 'ori-test')
str2 = '{:14s} train : [{:}], test : [{:}]'.format(dataset, metric2str(train_info['loss'], train_info['accuracy']), metric2str(test__info['loss'], test__info['accuracy']))
else:
valid_info = information.get_metrics(dataset, 'x-valid')
test__info = information.get_metrics(dataset, 'x-test')
str2 = '{:14s} train : [{:}], valid : [{:}], test : [{:}]'.format(dataset, metric2str(train_info['loss'], train_info['accuracy']), metric2str(valid_info['loss'], valid_info['accuracy']), metric2str(test__info['loss'], test__info['accuracy']))
strings += [str1, str2]
if show: print('\n'.join(strings))
return strings
"""
This is the class for the API of NAS-Bench-201.
"""
class NASBench201API(NASBenchMetaAPI):
""" The initialization function that takes the dataset file path (or a dict loaded from that path) as input. """
def __init__(self, file_path_or_dict: Optional[Union[Text, Dict]]=None,
verbose: bool=True):
self.filename = None
if file_path_or_dict is None:
file_path_or_dict = os.path.join(os.environ['TORCH_HOME'], ALL_BENCHMARK_FILES[-1])
print ('Try to use the default NAS-Bench-201 path from {:}.'.format(file_path_or_dict))
if isinstance(file_path_or_dict, str) or isinstance(file_path_or_dict, Path):
file_path_or_dict = str(file_path_or_dict)
if verbose: print('try to create the NAS-Bench-201 api from {:}'.format(file_path_or_dict))
assert os.path.isfile(file_path_or_dict), 'invalid path : {:}'.format(file_path_or_dict)
self.filename = Path(file_path_or_dict).name
file_path_or_dict = torch.load(file_path_or_dict, map_location='cpu')
elif isinstance(file_path_or_dict, dict):
file_path_or_dict = copy.deepcopy(file_path_or_dict)
else: raise ValueError('invalid type : {:} not in [str, dict]'.format(type(file_path_or_dict)))
assert isinstance(file_path_or_dict, dict), 'It should be a dict instead of {:}'.format(type(file_path_or_dict))
self.verbose = verbose # [TODO] a flag indicating whether to print more logs
keys = ('meta_archs', 'arch2infos', 'evaluated_indexes')
for key in keys: assert key in file_path_or_dict, 'Can not find key[{:}] in the dict'.format(key)
self.meta_archs = copy.deepcopy( file_path_or_dict['meta_archs'] )
# This is a dict mapping each architecture to a dict, where the key is #epochs and the value is ArchResults
self.arch2infos_dict = OrderedDict()
for xkey in sorted(list(file_path_or_dict['arch2infos'].keys())):
all_info = file_path_or_dict['arch2infos'][xkey]
hp2archres = OrderedDict()
# self.arch2infos_less[xkey] = ArchResults.create_from_state_dict( all_info['less'] )
# self.arch2infos_full[xkey] = ArchResults.create_from_state_dict( all_info['full'] )
hp2archres['12'] = ArchResults.create_from_state_dict(all_info['less'])
hp2archres['200'] = ArchResults.create_from_state_dict(all_info['full'])
self.arch2infos_dict[xkey] = hp2archres
self.evaluated_indexes = sorted(list(file_path_or_dict['evaluated_indexes']))
self.archstr2index = {}
for idx, arch in enumerate(self.meta_archs):
assert arch not in self.archstr2index, 'This [{:}]-th arch {:} already in the dict ({:}).'.format(idx, arch, self.archstr2index[arch])
self.archstr2index[ arch ] = idx
def reload(self, archive_root: Text = None, index: int = None):
"""Overwrite all information of the 'index'-th architecture in the search space.
It will load its data from 'archive_root'.
"""
if archive_root is None:
archive_root = os.path.join(os.environ['TORCH_HOME'], ALL_ARCHIVE_DIRS[-1])
assert os.path.isdir(archive_root), 'invalid directory : {:}'.format(archive_root)
if index is None:
indexes = list(range(len(self)))
else:
indexes = [index]
for idx in indexes:
assert 0 <= idx < len(self.meta_archs), 'invalid index of {:}'.format(idx)
xfile_path = os.path.join(archive_root, '{:06d}-FULL.pth'.format(idx))
assert os.path.isfile(xfile_path), 'invalid data path : {:}'.format(xfile_path)
xdata = torch.load(xfile_path, map_location='cpu')
assert isinstance(xdata, dict) and 'full' in xdata and 'less' in xdata, 'invalid format of data in {:}'.format(xfile_path)
hp2archres = OrderedDict()
hp2archres['12'] = ArchResults.create_from_state_dict(xdata['less'])
hp2archres['200'] = ArchResults.create_from_state_dict(xdata['full'])
self.arch2infos_dict[idx] = hp2archres
def query_info_str_by_arch(self, arch, hp: Text='12'):
""" This function is used to query the information of a specific architecture
'arch' can be an architecture index or an architecture string
When hp=12, the hyper-parameters used to train a model are in 'configs/nas-benchmark/hyper-opts/12E.config'
When hp=200, the hyper-parameters used to train a model are in 'configs/nas-benchmark/hyper-opts/200E.config'
The difference between these three configurations are the number of training epochs.
"""
if self.verbose:
print('Call query_info_str_by_arch with arch={:} and hp={:}'.format(arch, hp))
self._query_info_str_by_arch(arch, hp, print_information)
# obtain the metric for the `index`-th architecture
# `dataset` indicates the dataset:
# 'cifar10-valid' : using the proposed train set of CIFAR-10 as the training set
# 'cifar10' : using the proposed train+valid set of CIFAR-10 as the training set
# 'cifar100' : using the proposed train set of CIFAR-100 as the training set
# 'ImageNet16-120' : using the proposed train set of ImageNet-16-120 as the training set
# `iepoch` indicates the index of training epochs from 0 to 11/199.
# When iepoch=None, it will return the metric for the last training epoch
# When iepoch=11, it will return the metric for the 11-th training epoch (starting from 0)
# `use_12epochs_result` indicates different hyper-parameters for training
# When use_12epochs_result=True, it trains the network with 12 epochs and the LR decayed from 0.1 to 0 within 12 epochs
# When use_12epochs_result=False, it trains the network with 200 epochs and the LR decayed from 0.1 to 0 within 200 epochs
# `is_random`
# When is_random=True, the performance of a random architecture will be returned
# When is_random=False, the performanceo of all trials will be averaged.
def get_more_info(self, index: int, dataset, iepoch=None, hp='12', is_random=True):
if self.verbose:
print('Call the get_more_info function with index={:}, dataset={:}, iepoch={:}, hp={:}, and is_random={:}.'.format(index, dataset, iepoch, hp, is_random))
archresult = self.arch2infos_dict[index][str(hp)]
# if randomly select one trial, select the seed at first
if isinstance(is_random, bool) and is_random:
seeds = archresult.get_dataset_seeds(dataset)
is_random = random.choice(seeds)
# collect the training information
train_info = archresult.get_metrics(dataset, 'train', iepoch=iepoch, is_random=is_random)
total = train_info['iepoch'] + 1
xinfo = {'train-loss' : train_info['loss'],
'train-accuracy': train_info['accuracy'],
'train-per-time': train_info['all_time'] / total if train_info['all_time'] is not None else None,
'train-all-time': train_info['all_time']}
# collect the evaluation information
if dataset == 'cifar10-valid':
valid_info = archresult.get_metrics(dataset, 'x-valid', iepoch=iepoch, is_random=is_random)
try:
test_info = archresult.get_metrics(dataset, 'ori-test', iepoch=iepoch, is_random=is_random)
except:
test_info = None
valtest_info = None
else:
try: # collect results on the proposed test set
if dataset == 'cifar10':
test_info = archresult.get_metrics(dataset, 'ori-test', iepoch=iepoch, is_random=is_random)
else:
test_info = archresult.get_metrics(dataset, 'x-test', iepoch=iepoch, is_random=is_random)
except:
test_info = None
try: # collect results on the proposed validation set
valid_info = archresult.get_metrics(dataset, 'x-valid', iepoch=iepoch, is_random=is_random)
except:
valid_info = None
try:
if dataset != 'cifar10':
valtest_info = archresult.get_metrics(dataset, 'ori-test', iepoch=iepoch, is_random=is_random)
else:
valtest_info = None
except:
valtest_info = None
if valid_info is not None:
xinfo['valid-loss'] = valid_info['loss']
xinfo['valid-accuracy'] = valid_info['accuracy']
xinfo['valid-per-time'] = valid_info['all_time'] / total
xinfo['valid-all-time'] = valid_info['all_time']
if test_info is not None:
xinfo['test-loss'] = test_info['loss']
xinfo['test-accuracy'] = test_info['accuracy']
xinfo['test-per-time'] = test_info['all_time'] / total
xinfo['test-all-time'] = test_info['all_time']
if valtest_info is not None:
xinfo['valtest-loss'] = valtest_info['loss']
xinfo['valtest-accuracy'] = valtest_info['accuracy']
xinfo['valtest-per-time'] = valtest_info['all_time'] / total
xinfo['valtest-all-time'] = valtest_info['all_time']
return xinfo
def show(self, index: int = -1) -> None:
"""This function will print the information of a specific (or all) architecture(s)."""
self._show(index, print_information)
@staticmethod
def str2lists(arch_str: Text) -> List[tuple]:
"""
This function shows how to read the string-based architecture encoding.
It is the same as the `str2structure` func in `AutoDL-Projects/lib/models/cell_searchs/genotypes.py`
:param
arch_str: the input is a string indicates the architecture topology, such as
|nor_conv_1x1~0|+|none~0|none~1|+|none~0|none~1|skip_connect~2|
:return: a list of tuple, contains multiple (op, input_node_index) pairs.
:usage
arch = api.str2lists( '|nor_conv_1x1~0|+|none~0|none~1|+|none~0|none~1|skip_connect~2|' )
print ('there are {:} nodes in this arch'.format(len(arch)+1)) # arch is a list
for i, node in enumerate(arch):
print('the {:}-th node is the sum of these {:} nodes with op: {:}'.format(i+1, len(node), node))
"""
node_strs = arch_str.split('+')
genotypes = []
for i, node_str in enumerate(node_strs):
inputs = list(filter(lambda x: x != '', node_str.split('|')))
for xinput in inputs: assert len(xinput.split('~')) == 2, 'invalid input length : {:}'.format(xinput)
inputs = ( xi.split('~') for xi in inputs )
input_infos = tuple( (op, int(IDX)) for (op, IDX) in inputs)
genotypes.append( input_infos )
return genotypes
@staticmethod
def str2matrix(arch_str: Text,
search_space: List[Text] = ['none', 'skip_connect', 'nor_conv_1x1', 'nor_conv_3x3', 'avg_pool_3x3']) -> np.ndarray:
"""
This func shows how to convert the string-based architecture encoding to the encoding strategy in NAS-Bench-101.
:param
arch_str: the input is a string indicates the architecture topology, such as
|nor_conv_1x1~0|+|none~0|none~1|+|none~0|none~1|skip_connect~2|
search_space: a list of operation string, the default list is the search space for NAS-Bench-201
the default value should be be consistent with this line https://github.com/D-X-Y/AutoDL-Projects/blob/master/lib/models/cell_operations.py#L24
:return
the numpy matrix (2-D np.ndarray) representing the DAG of this architecture topology
:usage
matrix = api.str2matrix( '|nor_conv_1x1~0|+|none~0|none~1|+|none~0|none~1|skip_connect~2|' )
This matrix is 4-by-4 matrix representing a cell with 4 nodes (only the lower left triangle is useful).
[ [0, 0, 0, 0], # the first line represents the input (0-th) node
[2, 0, 0, 0], # the second line represents the 1-st node, is calculated by 2-th-op( 0-th-node )
[0, 0, 0, 0], # the third line represents the 2-nd node, is calculated by 0-th-op( 0-th-node ) + 0-th-op( 1-th-node )
[0, 0, 1, 0] ] # the fourth line represents the 3-rd node, is calculated by 0-th-op( 0-th-node ) + 0-th-op( 1-th-node ) + 1-th-op( 2-th-node )
In NAS-Bench-201 search space, 0-th-op is 'none', 1-th-op is 'skip_connect',
2-th-op is 'nor_conv_1x1', 3-th-op is 'nor_conv_3x3', 4-th-op is 'avg_pool_3x3'.
:(NOTE)
If a node has two input-edges from the same node, this function does not work. One edge will be overlapped.
"""
node_strs = arch_str.split('+')
num_nodes = len(node_strs) + 1
matrix = np.zeros((num_nodes, num_nodes))
for i, node_str in enumerate(node_strs):
inputs = list(filter(lambda x: x != '', node_str.split('|')))
for xinput in inputs: assert len(xinput.split('~')) == 2, 'invalid input length : {:}'.format(xinput)
for xi in inputs:
op, idx = xi.split('~')
if op not in search_space: raise ValueError('this op ({:}) is not in {:}'.format(op, search_space))
op_idx, node_idx = search_space.index(op), int(idx)
matrix[i+1, node_idx] = op_idx
return matrix

215
lib/nas_201_api/api_301.py Normal file
View File

@ -0,0 +1,215 @@
#####################################################
# Copyright (c) Xuanyi Dong [GitHub D-X-Y], 2020.06 #
############################################################################################
# NAS-Bench-301, coming soon.
############################################################################################
# The history of benchmark files:
# [2020.06.30] NAS-Bench-301-v1_0
#
import os, copy, random, torch, numpy as np
from pathlib import Path
from typing import List, Text, Union, Dict, Optional
from collections import OrderedDict, defaultdict
from .api_utils import ArchResults
from .api_utils import NASBenchMetaAPI
from .api_utils import remap_dataset_set_names
ALL_BENCHMARK_FILES = ['NAS-Bench-301-v1_0-363be7.pth']
ALL_ARCHIVE_DIRS = ['NAS-Bench-301-v1_0-archive']
def print_information(information, extra_info=None, show=False):
dataset_names = information.get_dataset_names()
strings = [information.arch_str, 'datasets : {:}, extra-info : {:}'.format(dataset_names, extra_info)]
def metric2str(loss, acc):
return 'loss = {:.3f} & top1 = {:.2f}%'.format(loss, acc)
for ida, dataset in enumerate(dataset_names):
metric = information.get_compute_costs(dataset)
flop, param, latency = metric['flops'], metric['params'], metric['latency']
str1 = '{:14s} FLOP={:6.2f} M, Params={:.3f} MB, latency={:} ms.'.format(dataset, flop, param, '{:.2f}'.format(latency*1000) if latency is not None and latency > 0 else None)
train_info = information.get_metrics(dataset, 'train')
if dataset == 'cifar10-valid':
valid_info = information.get_metrics(dataset, 'x-valid')
test__info = information.get_metrics(dataset, 'ori-test')
str2 = '{:14s} train : [{:}], valid : [{:}], test : [{:}]'.format(
dataset, metric2str(train_info['loss'], train_info['accuracy']),
metric2str(valid_info['loss'], valid_info['accuracy']),
metric2str(test__info['loss'], test__info['accuracy']))
elif dataset == 'cifar10':
test__info = information.get_metrics(dataset, 'ori-test')
str2 = '{:14s} train : [{:}], test : [{:}]'.format(dataset, metric2str(train_info['loss'], train_info['accuracy']), metric2str(test__info['loss'], test__info['accuracy']))
else:
valid_info = information.get_metrics(dataset, 'x-valid')
test__info = information.get_metrics(dataset, 'x-test')
str2 = '{:14s} train : [{:}], valid : [{:}], test : [{:}]'.format(dataset, metric2str(train_info['loss'], train_info['accuracy']), metric2str(valid_info['loss'], valid_info['accuracy']), metric2str(test__info['loss'], test__info['accuracy']))
strings += [str1, str2]
if show: print('\n'.join(strings))
return strings
"""
This is the class for the API of NAS-Bench-301.
"""
class NASBench301API(NASBenchMetaAPI):
""" The initialization function that takes the dataset file path (or a dict loaded from that path) as input. """
def __init__(self, file_path_or_dict: Optional[Union[Text, Dict]]=None, verbose: bool=True):
self.filename = None
if file_path_or_dict is None:
file_path_or_dict = os.path.join(os.environ['TORCH_HOME'], ALL_BENCHMARK_FILES[-1])
if isinstance(file_path_or_dict, str) or isinstance(file_path_or_dict, Path):
file_path_or_dict = str(file_path_or_dict)
if verbose: print('try to create the NAS-Bench-201 api from {:}'.format(file_path_or_dict))
assert os.path.isfile(file_path_or_dict), 'invalid path : {:}'.format(file_path_or_dict)
self.filename = Path(file_path_or_dict).name
file_path_or_dict = torch.load(file_path_or_dict, map_location='cpu')
elif isinstance(file_path_or_dict, dict):
file_path_or_dict = copy.deepcopy( file_path_or_dict )
else: raise ValueError('invalid type : {:} not in [str, dict]'.format(type(file_path_or_dict)))
assert isinstance(file_path_or_dict, dict), 'It should be a dict instead of {:}'.format(type(file_path_or_dict))
self.verbose = verbose # [TODO] a flag indicating whether to print more logs
keys = ('meta_archs', 'arch2infos', 'evaluated_indexes')
for key in keys: assert key in file_path_or_dict, 'Can not find key[{:}] in the dict'.format(key)
self.meta_archs = copy.deepcopy( file_path_or_dict['meta_archs'] )
# This is a dict mapping each architecture to a dict, where the key is #epochs and the value is ArchResults
self.arch2infos_dict = OrderedDict()
for xkey in sorted(list(file_path_or_dict['arch2infos'].keys())):
all_infos = file_path_or_dict['arch2infos'][xkey]
hp2archres = OrderedDict()
for hp_key, results in all_infos.items():
hp2archres[hp_key] = ArchResults.create_from_state_dict(results)
self.arch2infos_dict[xkey] = hp2archres
self.evaluated_indexes = sorted(list(file_path_or_dict['evaluated_indexes']))
self.archstr2index = {}
for idx, arch in enumerate(self.meta_archs):
assert arch not in self.archstr2index, 'This [{:}]-th arch {:} already in the dict ({:}).'.format(idx, arch, self.archstr2index[arch])
self.archstr2index[ arch ] = idx
if self.verbose:
print('Create NAS-Bench-301 done with {:}/{:} architectures avaliable.'.format(len(self.evaluated_indexes), len(self.meta_archs)))
def reload(self, archive_root: Text = None, index: int = None):
"""Overwrite all information of the 'index'-th architecture in the search space, where the data will be loaded from 'archive_root'.
If index is None, overwrite all ckps.
"""
if self.verbose:
print('Call clear_params with archive_root={:} and index={:}'.format(archive_root, index))
if archive_root is None:
archive_root = os.path.join(os.environ['TORCH_HOME'], ALL_ARCHIVE_DIRS[-1])
assert os.path.isdir(archive_root), 'invalid directory : {:}'.format(archive_root)
if index is None:
indexes = list(range(len(self)))
else:
indexes = [index]
for idx in indexes:
assert 0 <= idx < len(self.meta_archs), 'invalid index of {:}'.format(idx)
xfile_path = os.path.join(archive_root, '{:06d}-FULL.pth'.format(idx))
if not os.path.isfile(xfile_path):
xfile_path = os.path.join(archive_root, '{:d}-FULL.pth'.format(idx))
assert os.path.isfile(xfile_path), 'invalid data path : {:}'.format(xfile_path)
xdata = torch.load(xfile_path, map_location='cpu')
assert isinstance(xdata, dict), 'invalid format of data in {:}'.format(xfile_path)
hp2archres = OrderedDict()
for hp_key, results in xdata.items():
hp2archres[hp_key] = ArchResults.create_from_state_dict(results)
self.arch2infos_dict[idx] = hp2archres
def query_info_str_by_arch(self, arch, hp: Text='12'):
""" This function is used to query the information of a specific architecture
'arch' can be an architecture index or an architecture string
When hp=01, the hyper-parameters used to train a model are in 'configs/nas-benchmark/hyper-opts/01E.config'
When hp=12, the hyper-parameters used to train a model are in 'configs/nas-benchmark/hyper-opts/12E.config'
When hp=90, the hyper-parameters used to train a model are in 'configs/nas-benchmark/hyper-opts/90E.config'
The difference between these three configurations are the number of training epochs.
"""
if self.verbose:
print('Call query_info_str_by_arch with arch={:} and hp={:}'.format(arch, hp))
self._query_info_str_by_arch(arch, hp, print_information)
def get_more_info(self, index: int, dataset: Text, iepoch=None, hp='12', is_random=True):
"""This function will return the metric for the `index`-th architecture
`dataset` indicates the dataset:
'cifar10-valid' : using the proposed train set of CIFAR-10 as the training set
'cifar10' : using the proposed train+valid set of CIFAR-10 as the training set
'cifar100' : using the proposed train set of CIFAR-100 as the training set
'ImageNet16-120' : using the proposed train set of ImageNet-16-120 as the training set
`iepoch` indicates the index of training epochs from 0 to 11/199.
When iepoch=None, it will return the metric for the last training epoch
When iepoch=11, it will return the metric for the 11-th training epoch (starting from 0)
`hp` indicates different hyper-parameters for training
When hp=01, it trains the network with 01 epochs and the LR decayed from 0.1 to 0 within 01 epochs
When hp=12, it trains the network with 01 epochs and the LR decayed from 0.1 to 0 within 12 epochs
When hp=90, it trains the network with 01 epochs and the LR decayed from 0.1 to 0 within 90 epochs
`is_random`
When is_random=True, the performance of a random architecture will be returned
When is_random=False, the performanceo of all trials will be averaged.
"""
if self.verbose:
print('Call the get_more_info function with index={:}, dataset={:}, iepoch={:}, hp={:}, and is_random={:}.'.format(index, dataset, iepoch, hp, is_random))
archresult = self.arch2infos_dict[index][str(hp)]
# if randomly select one trial, select the seed at first
if isinstance(is_random, bool) and is_random:
seeds = archresult.get_dataset_seeds(dataset)
is_random = random.choice(seeds)
# collect the training information
train_info = archresult.get_metrics(dataset, 'train', iepoch=iepoch, is_random=is_random)
total = train_info['iepoch'] + 1
xinfo = {'train-loss' : train_info['loss'],
'train-accuracy': train_info['accuracy'],
'train-per-time': train_info['all_time'] / total,
'train-all-time': train_info['all_time']}
# collect the evaluation information
if dataset == 'cifar10-valid':
valid_info = archresult.get_metrics(dataset, 'x-valid', iepoch=iepoch, is_random=is_random)
try:
test_info = archresult.get_metrics(dataset, 'ori-test', iepoch=iepoch, is_random=is_random)
except:
test_info = None
valtest_info = None
else:
try: # collect results on the proposed test set
if dataset == 'cifar10':
test_info = archresult.get_metrics(dataset, 'ori-test', iepoch=iepoch, is_random=is_random)
else:
test_info = archresult.get_metrics(dataset, 'x-test', iepoch=iepoch, is_random=is_random)
except:
test_info = None
try: # collect results on the proposed validation set
valid_info = archresult.get_metrics(dataset, 'x-valid', iepoch=iepoch, is_random=is_random)
except:
valid_info = None
try:
if dataset != 'cifar10':
valtest_info = archresult.get_metrics(dataset, 'ori-test', iepoch=iepoch, is_random=is_random)
else:
valtest_info = None
except:
valtest_info = None
if valid_info is not None:
xinfo['valid-loss'] = valid_info['loss']
xinfo['valid-accuracy'] = valid_info['accuracy']
xinfo['valid-per-time'] = valid_info['all_time'] / total
xinfo['valid-all-time'] = valid_info['all_time']
if test_info is not None:
xinfo['test-loss'] = test_info['loss']
xinfo['test-accuracy'] = test_info['accuracy']
xinfo['test-per-time'] = test_info['all_time'] / total
xinfo['test-all-time'] = test_info['all_time']
if valtest_info is not None:
xinfo['valtest-loss'] = valtest_info['loss']
xinfo['valtest-accuracy'] = valtest_info['accuracy']
xinfo['valtest-per-time'] = valtest_info['all_time'] / total
xinfo['valtest-all-time'] = valtest_info['all_time']
return xinfo
def show(self, index: int = -1) -> None:
"""
This function will print the information of a specific (or all) architecture(s).
:param index: If the index < 0: it will loop for all architectures and print their information one by one.
else: it will print the information of the 'index'-th architecture.
:return: nothing
"""
self._show(index, print_information)

View File

@ -0,0 +1,711 @@
#####################################################
# Copyright (c) Xuanyi Dong [GitHub D-X-Y], 2019.08 #
############################################################################################
# NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search, ICLR 2020 #
############################################################################################
# In this Python file, we define NASBenchMetaAPI, the abstract class for benchmark APIs.
# We also define the class ArchResults, which contains all information of a single architecture trained by one kind of hyper-parameters on three datasets.
# We also define the class ResultsCount, which contains all information of a single trial for a single architecture.
############################################################################################
# History:
# [2020.06.30] The first version.
#
import os, abc, copy, random, torch, numpy as np
from pathlib import Path
from typing import List, Text, Union, Dict, Optional
from collections import OrderedDict, defaultdict
def remap_dataset_set_names(dataset, metric_on_set, verbose=False):
"""re-map the metric_on_set to internal keys"""
if verbose:
print('Call internal function _remap_dataset_set_names with dataset={:} and metric_on_set={:}'.format(dataset, metric_on_set))
if dataset == 'cifar10' and metric_on_set == 'valid':
dataset, metric_on_set = 'cifar10-valid', 'x-valid'
elif dataset == 'cifar10' and metric_on_set == 'test':
dataset, metric_on_set = 'cifar10', 'ori-test'
elif dataset == 'cifar10' and metric_on_set == 'train':
dataset, metric_on_set = 'cifar10', 'train'
elif (dataset == 'cifar100' or dataset == 'ImageNet16-120') and metric_on_set == 'valid':
metric_on_set = 'x-valid'
elif (dataset == 'cifar100' or dataset == 'ImageNet16-120') and metric_on_set == 'test':
metric_on_set = 'x-test'
if verbose:
print(' return dataset={:} and metric_on_set={:}'.format(dataset, metric_on_set))
return dataset, metric_on_set
class NASBenchMetaAPI(metaclass=abc.ABCMeta):
@abc.abstractmethod
def __init__(self, file_path_or_dict: Optional[Union[Text, Dict]]=None, verbose: bool=True):
"""The initialization function that takes the dataset file path (or a dict loaded from that path) as input."""
def __getitem__(self, index: int):
return copy.deepcopy(self.meta_archs[index])
def arch(self, index: int):
"""Return the topology structure of the `index`-th architecture."""
if self.verbose:
print('Call the arch function with index={:}'.format(index))
assert 0 <= index < len(self.meta_archs), 'invalid index : {:} vs. {:}.'.format(index, len(self.meta_archs))
return copy.deepcopy(self.meta_archs[index])
def __len__(self):
return len(self.meta_archs)
def __repr__(self):
return ('{name}({num}/{total} architectures, file={filename})'.format(name=self.__class__.__name__, num=len(self.evaluated_indexes), total=len(self.meta_archs), filename=self.filename))
def random(self):
"""Return a random index of all architectures."""
return random.randint(0, len(self.meta_archs)-1)
def query_index_by_arch(self, arch):
""" This function is used to query the index of an architecture in the search space.
In the topology search space, the input arch can be an architecture string such as '|nor_conv_3x3~0|+|nor_conv_3x3~0|avg_pool_3x3~1|+|skip_connect~0|nor_conv_3x3~1|skip_connect~2|';
or an instance that has the 'tostr' function that can generate the architecture string;
or it is directly an architecture index, in this case, we will check whether it is valid or not.
This function will return the index.
If return -1, it means this architecture is not in the search space.
Otherwise, it will return an int in [0, the-number-of-candidates-in-the-search-space).
"""
if self.verbose:
print('Call query_index_by_arch with arch={:}'.format(arch))
if isinstance(arch, int):
if 0 <= arch < len(self):
return arch
else:
raise ValueError('Invalid architecture index {:} vs [{:}, {:}].'.format(arch, 0, len(self)))
elif isinstance(arch, str):
if arch in self.archstr2index: arch_index = self.archstr2index[ arch ]
else : arch_index = -1
elif hasattr(arch, 'tostr'):
if arch.tostr() in self.archstr2index: arch_index = self.archstr2index[ arch.tostr() ]
else : arch_index = -1
else: arch_index = -1
return arch_index
@abc.abstractmethod
def reload(self, archive_root: Text = None, index: int = None):
"""Overwrite all information of the 'index'-th architecture in the search space, where the data will be loaded from 'archive_root'.
If index is None, overwrite all ckps.
"""
def clear_params(self, index: int, hp: Optional[Text]=None):
"""Remove the architecture's weights to save memory.
:arg
index: the index of the target architecture
hp: a flag to controll how to clear the parameters.
-- None: clear all the weights in '01'/'12'/'90', which indicates the number of training epochs.
-- '01' or '12' or '90': clear all the weights in arch2infos_dict[index][hp].
"""
if self.verbose:
print('Call clear_params with index={:} and hp={:}'.format(index, hp))
if hp is None:
for key, result in self.arch2infos_dict[index].items():
result.clear_params()
else:
if str(hp) not in self.arch2infos_dict[index]:
raise ValueError('The {:}-th architecture only has hyper-parameters of {:} instead of {:}.'.format(index, list(self.arch2infos_dict[index].keys()), hp))
self.arch2infos_dict[index][str(hp)].clear_params()
@abc.abstractmethod
def query_info_str_by_arch(self, arch, hp: Text='12'):
"""This function is used to query the information of a specific architecture."""
def _query_info_str_by_arch(self, arch, hp: Text='12', print_information=None):
arch_index = self.query_index_by_arch(arch)
if arch_index in self.arch2infos_dict:
if hp not in self.arch2infos_dict[arch_index]:
raise ValueError('The {:}-th architecture only has hyper-parameters of {:} instead of {:}.'.format(index, list(self.arch2infos_dict[arch_index].keys()), hp))
info = self.arch2infos_dict[arch_index][hp]
strings = print_information(info, 'arch-index={:}'.format(arch_index))
return '\n'.join(strings)
else:
print ('Find this arch-index : {:}, but this arch is not evaluated.'.format(arch_index))
return None
def query_meta_info_by_index(self, arch_index, hp: Text = '12'):
"""Return the ArchResults for the 'arch_index'-th architecture. This function is similar to query_by_index."""
if self.verbose:
print('Call query_meta_info_by_index with arch_index={:}, hp={:}'.format(arch_index, hp))
if arch_index in self.arch2infos_dict:
if hp not in self.arch2infos_dict[arch_index]:
raise ValueError('The {:}-th architecture only has hyper-parameters of {:} instead of {:}.'.format(arch_index, list(self.arch2infos_dict[arch_index].keys()), hp))
info = self.arch2infos_dict[arch_index][hp]
else:
raise ValueError('arch_index [{:}] does not in arch2infos'.format(arch_index))
return copy.deepcopy(info)
def query_by_index(self, arch_index: int, dataname: Union[None, Text] = None, hp: Text = '12'):
""" This 'query_by_index' function is used to query information with the training of 01 epochs, 12 epochs, 90 epochs, or 200 epochs.
------
If hp=01, we train the model by 01 epochs (see config in configs/nas-benchmark/hyper-opts/01E.config)
If hp=12, we train the model by 01 epochs (see config in configs/nas-benchmark/hyper-opts/12E.config)
If hp=90, we train the model by 01 epochs (see config in configs/nas-benchmark/hyper-opts/90E.config)
If hp=200, we train the model by 01 epochs (see config in configs/nas-benchmark/hyper-opts/200E.config)
------
If dataname is None, return the ArchResults
else, return a dict with all trials on that dataset (the key is the seed)
Options are 'cifar10-valid', 'cifar10', 'cifar100', 'ImageNet16-120'.
-- cifar10-valid : training the model on the CIFAR-10 training set.
-- cifar10 : training the model on the CIFAR-10 training + validation set.
-- cifar100 : training the model on the CIFAR-100 training set.
-- ImageNet16-120 : training the model on the ImageNet16-120 training set.
"""
if self.verbose:
print('Call query_by_index with arch_index={:}, dataname={:}, hp={:}'.format(arch_index, dataname, hp))
info = self.query_meta_info_by_index(arch_index, hp)
if dataname is None: return info
else:
if dataname not in info.get_dataset_names():
raise ValueError('invalid dataset-name : {:} vs. {:}'.format(dataname, info.get_dataset_names()))
return info.query(dataname)
def find_best(self, dataset, metric_on_set, FLOP_max=None, Param_max=None, hp: Text = '12'):
"""Find the architecture with the highest accuracy based on some constraints."""
if self.verbose:
print('Call find_best with dataset={:}, metric_on_set={:}, hp={:} | with #FLOPs < {:} and #Params < {:}'.format(dataset, metric_on_set, hp, FLOP_max, Param_max))
dataset, metric_on_set = remap_dataset_set_names(dataset, metric_on_set, self.verbose)
best_index, highest_accuracy = -1, None
for i, arch_index in enumerate(self.evaluated_indexes):
arch_info = self.arch2infos_dict[arch_index][hp]
info = arch_info.get_compute_costs(dataset) # the information of costs
flop, param, latency = info['flops'], info['params'], info['latency']
if FLOP_max is not None and flop > FLOP_max : continue
if Param_max is not None and param > Param_max: continue
xinfo = arch_info.get_metrics(dataset, metric_on_set) # the information of loss and accuracy
loss, accuracy = xinfo['loss'], xinfo['accuracy']
if best_index == -1:
best_index, highest_accuracy = arch_index, accuracy
elif highest_accuracy < accuracy:
best_index, highest_accuracy = arch_index, accuracy
if self.verbose:
print(' the best architecture : [{:}] {:} with accuracy={:.3f}%'.format(best_index, self.arch(best_index), highest_accuracy))
return best_index, highest_accuracy
def get_net_param(self, index, dataset, seed: Optional[int], hp: Text = '12'):
"""
This function is used to obtain the trained weights of the `index`-th architecture on `dataset` with the seed of `seed`
Args [seed]:
-- None : return a dict containing the trained weights of all trials, where each key is a seed and its corresponding value is the weights.
-- a interger : return the weights of a specific trial, whose seed is this interger.
Args [hp]:
-- 01 : train the model by 01 epochs
-- 12 : train the model by 12 epochs
-- 90 : train the model by 90 epochs
-- 200 : train the model by 200 epochs
"""
if self.verbose:
print('Call the get_net_param function with index={:}, dataset={:}, seed={:}, hp={:}'.format(index, dataset, seed, hp))
info = self.query_meta_info_by_index(index, hp)
return info.get_net_param(dataset, seed)
def get_net_config(self, index: int, dataset: Text):
"""
This function is used to obtain the configuration for the `index`-th architecture on `dataset`.
Args [dataset] (4 possible options):
-- cifar10-valid : training the model on the CIFAR-10 training set.
-- cifar10 : training the model on the CIFAR-10 training + validation set.
-- cifar100 : training the model on the CIFAR-100 training set.
-- ImageNet16-120 : training the model on the ImageNet16-120 training set.
This function will return a dict.
========= Some examlpes for using this function:
config = api.get_net_config(128, 'cifar10')
"""
if self.verbose:
print('Call the get_net_config function with index={:}, dataset={:}.'.format(index, dataset))
if index in self.arch2infos_dict:
info = self.arch2infos_dict[index]
else:
raise ValueError('The arch_index={:} is not in arch2infos_dict.'.format(arch_index))
info = next(iter(info.values()))
results = info.query(dataset, None)
results = next(iter(results.values()))
return results.get_config(None)
def get_cost_info(self, index: int, dataset: Text, hp: Text = '12') -> Dict[Text, float]:
"""To obtain the cost metric for the `index`-th architecture on a dataset."""
if self.verbose:
print('Call the get_cost_info function with index={:}, dataset={:}, and hp={:}.'.format(index, dataset, hp))
info = self.query_meta_info_by_index(index, hp)
return info.get_compute_costs(dataset)
def get_latency(self, index: int, dataset: Text, hp: Text = '12') -> float:
"""
To obtain the latency of the network (by default it will return the latency with the batch size of 256).
:param index: the index of the target architecture
:param dataset: the dataset name (cifar10-valid, cifar10, cifar100, ImageNet16-120)
:return: return a float value in seconds
"""
if self.verbose:
print('Call the get_latency function with index={:}, dataset={:}, and hp={:}.'.format(index, dataset, hp))
cost_dict = self.get_cost_info(index, dataset, hp)
return cost_dict['latency']
@abc.abstractmethod
def show(self, index=-1):
"""This function will print the information of a specific (or all) architecture(s)."""
def _show(self, index=-1, print_information=None) -> None:
"""
This function will print the information of a specific (or all) architecture(s).
:param index: If the index < 0: it will loop for all architectures and print their information one by one.
else: it will print the information of the 'index'-th architecture.
:return: nothing
"""
if index < 0: # show all architectures
print(self)
for i, idx in enumerate(self.evaluated_indexes):
print('\n' + '-' * 10 + ' The ({:5d}/{:5d}) {:06d}-th architecture! '.format(i, len(self.evaluated_indexes), idx) + '-'*10)
print('arch : {:}'.format(self.meta_archs[idx]))
for key, result in self.arch2infos_dict[index].items():
strings = print_information(result)
print('>' * 40 + ' {:03d} epochs '.format(result.get_total_epoch()) + '>' * 40)
print('\n'.join(strings))
print('<' * 40 + '------------' + '<' * 40)
else:
if 0 <= index < len(self.meta_archs):
if index not in self.evaluated_indexes: print('The {:}-th architecture has not been evaluated or not saved.'.format(index))
else:
arch_info = self.arch2infos_dict[index]
for key, result in self.arch2infos_dict[index].items():
strings = print_information(result)
print('>' * 40 + ' {:03d} epochs '.format(result.get_total_epoch()) + '>' * 40)
print('\n'.join(strings))
print('<' * 40 + '------------' + '<' * 40)
else:
print('This index ({:}) is out of range (0~{:}).'.format(index, len(self.meta_archs)))
def statistics(self, dataset: Text, hp: Union[Text, int]) -> Dict[int, int]:
"""This function will count the number of total trials."""
if self.verbose:
print('Call the statistics function with dataset={:} and hp={:}.'.format(dataset, hp))
valid_datasets = ['cifar10-valid', 'cifar10', 'cifar100', 'ImageNet16-120']
if dataset not in valid_datasets:
raise ValueError('{:} not in {:}'.format(dataset, valid_datasets))
nums, hp = defaultdict(lambda: 0), str(hp)
for index in range(len(self)):
archInfo = self.arch2infos_dict[index][hp]
dataset_seed = archInfo.dataset_seed
if dataset not in dataset_seed:
nums[0] += 1
else:
nums[len(dataset_seed[dataset])] += 1
return dict(nums)
class ArchResults(object):
def __init__(self, arch_index, arch_str):
self.arch_index = int(arch_index)
self.arch_str = copy.deepcopy(arch_str)
self.all_results = dict()
self.dataset_seed = dict()
self.clear_net_done = False
def get_compute_costs(self, dataset):
x_seeds = self.dataset_seed[dataset]
results = [self.all_results[ (dataset, seed) ] for seed in x_seeds]
flops = [result.flop for result in results]
params = [result.params for result in results]
latencies = [result.get_latency() for result in results]
latencies = [x for x in latencies if x > 0]
mean_latency = np.mean(latencies) if len(latencies) > 0 else None
time_infos = defaultdict(list)
for result in results:
time_info = result.get_times()
for key, value in time_info.items(): time_infos[key].append( value )
info = {'flops' : np.mean(flops),
'params' : np.mean(params),
'latency': mean_latency}
for key, value in time_infos.items():
if len(value) > 0 and value[0] is not None:
info[key] = np.mean(value)
else: info[key] = None
return info
def get_metrics(self, dataset, setname, iepoch=None, is_random=False):
"""
This `get_metrics` function is used to obtain obtain the loss, accuracy, etc information on a specific dataset.
If not specify, each set refer to the proposed split in NAS-Bench-201 paper.
If some args return None or raise error, then it is not avaliable.
========================================
Args [dataset] (4 possible options):
-- cifar10-valid : training the model on the CIFAR-10 training set.
-- cifar10 : training the model on the CIFAR-10 training + validation set.
-- cifar100 : training the model on the CIFAR-100 training set.
-- ImageNet16-120 : training the model on the ImageNet16-120 training set.
Args [setname] (each dataset has different setnames):
-- When dataset = cifar10-valid, you can use 'train', 'x-valid', 'ori-test'
------ 'train' : the metric on the training set.
------ 'x-valid' : the metric on the validation set.
------ 'ori-test' : the metric on the test set.
-- When dataset = cifar10, you can use 'train', 'ori-test'.
------ 'train' : the metric on the training + validation set.
------ 'ori-test' : the metric on the test set.
-- When dataset = cifar100 or ImageNet16-120, you can use 'train', 'ori-test', 'x-valid', 'x-test'
------ 'train' : the metric on the training set.
------ 'x-valid' : the metric on the validation set.
------ 'x-test' : the metric on the test set.
------ 'ori-test' : the metric on the validation + test set.
Args [iepoch] (None or an integer in [0, the-number-of-total-training-epochs)
------ None : return the metric after the last training epoch.
------ an integer i : return the metric after the i-th training epoch.
Args [is_random]:
------ True : return the metric of a randomly selected trial.
------ False : return the averaged metric of all avaliable trials.
------ an integer indicating the 'seed' value : return the metric of a specific trial (whose random seed is 'is_random').
"""
x_seeds = self.dataset_seed[dataset]
results = [self.all_results[ (dataset, seed) ] for seed in x_seeds]
infos = defaultdict(list)
for result in results:
if setname == 'train':
info = result.get_train(iepoch)
else:
info = result.get_eval(setname, iepoch)
for key, value in info.items(): infos[key].append( value )
return_info = dict()
if isinstance(is_random, bool) and is_random: # randomly select one
index = random.randint(0, len(results)-1)
for key, value in infos.items(): return_info[key] = value[index]
elif isinstance(is_random, bool) and not is_random: # average
for key, value in infos.items():
if len(value) > 0 and value[0] is not None:
return_info[key] = np.mean(value)
else: return_info[key] = None
elif isinstance(is_random, int): # specify the seed
if is_random not in x_seeds: raise ValueError('can not find random seed ({:}) from {:}'.format(is_random, x_seeds))
index = x_seeds.index(is_random)
for key, value in infos.items(): return_info[key] = value[index]
else:
raise ValueError('invalid value for is_random: {:}'.format(is_random))
return return_info
def show(self, is_print=False):
return print_information(self, None, is_print)
def get_dataset_names(self):
return list(self.dataset_seed.keys())
def get_dataset_seeds(self, dataset):
return copy.deepcopy( self.dataset_seed[dataset] )
def get_net_param(self, dataset: Text, seed: Union[None, int] =None):
"""
This function will return the trained network's weights on the 'dataset'.
:arg
dataset: one of 'cifar10-valid', 'cifar10', 'cifar100', and 'ImageNet16-120'.
seed: an integer indicates the seed value or None that indicates returing all trials.
"""
if seed is None:
x_seeds = self.dataset_seed[dataset]
return {seed: self.all_results[(dataset, seed)].get_net_param() for seed in x_seeds}
else:
return self.all_results[(dataset, seed)].get_net_param()
def reset_latency(self, dataset: Text, seed: Union[None, Text], latency: float) -> None:
"""This function is used to reset the latency in all corresponding ResultsCount(s)."""
if seed is None:
for seed in self.dataset_seed[dataset]:
self.all_results[(dataset, seed)].update_latency([latency])
else:
self.all_results[(dataset, seed)].update_latency([latency])
def reset_pseudo_train_times(self, dataset: Text, seed: Union[None, Text], estimated_per_epoch_time: float) -> None:
"""This function is used to reset the train-times in all corresponding ResultsCount(s)."""
if seed is None:
for seed in self.dataset_seed[dataset]:
self.all_results[(dataset, seed)].reset_pseudo_train_times(estimated_per_epoch_time)
else:
self.all_results[(dataset, seed)].reset_pseudo_train_times(estimated_per_epoch_time)
def reset_pseudo_eval_times(self, dataset: Text, seed: Union[None, Text], eval_name: Text, estimated_per_epoch_time: float) -> None:
"""This function is used to reset the eval-times in all corresponding ResultsCount(s)."""
if seed is None:
for seed in self.dataset_seed[dataset]:
self.all_results[(dataset, seed)].reset_pseudo_eval_times(eval_name, estimated_per_epoch_time)
else:
self.all_results[(dataset, seed)].reset_pseudo_eval_times(eval_name, estimated_per_epoch_time)
def get_latency(self, dataset: Text) -> float:
"""Get the latency of a model on the target dataset. [Timestamp: 2020.03.09]"""
latencies = []
for seed in self.dataset_seed[dataset]:
latency = self.all_results[(dataset, seed)].get_latency()
if not isinstance(latency, float) or latency <= 0:
raise ValueError('invalid latency of {:} with seed={:} : {:}'.format(dataset, seed, latency))
latencies.append(latency)
return sum(latencies) / len(latencies)
def get_total_epoch(self, dataset=None):
"""Return the total number of training epochs."""
if dataset is None:
epochss = []
for xdata, x_seeds in self.dataset_seed.items():
epochss += [self.all_results[(xdata, seed)].get_total_epoch() for seed in x_seeds]
elif isinstance(dataset, str):
x_seeds = self.dataset_seed[dataset]
epochss = [self.all_results[(dataset, seed)].get_total_epoch() for seed in x_seeds]
else:
raise ValueError('invalid dataset={:}'.format(dataset))
if len(set(epochss)) > 1: raise ValueError('Each trial mush have the same number of training epochs : {:}'.format(epochss))
return epochss[-1]
def query(self, dataset, seed=None):
"""Return the ResultsCount object (containing all information of a single trial) for 'dataset' and 'seed'"""
if seed is None:
x_seeds = self.dataset_seed[dataset]
return {seed: self.all_results[(dataset, seed)] for seed in x_seeds}
else:
return self.all_results[(dataset, seed)]
def arch_idx_str(self):
return '{:06d}'.format(self.arch_index)
def update(self, dataset_name, seed, result):
if dataset_name not in self.dataset_seed:
self.dataset_seed[dataset_name] = []
assert seed not in self.dataset_seed[dataset_name], '{:}-th arch alreadly has this seed ({:}) on {:}'.format(self.arch_index, seed, dataset_name)
self.dataset_seed[ dataset_name ].append( seed )
self.dataset_seed[ dataset_name ] = sorted( self.dataset_seed[ dataset_name ] )
assert (dataset_name, seed) not in self.all_results
self.all_results[ (dataset_name, seed) ] = result
self.clear_net_done = False
def state_dict(self):
state_dict = dict()
for key, value in self.__dict__.items():
if key == 'all_results': # contain the class of ResultsCount
xvalue = dict()
assert isinstance(value, dict), 'invalid type of value for {:} : {:}'.format(key, type(value))
for _k, _v in value.items():
assert isinstance(_v, ResultsCount), 'invalid type of value for {:}/{:} : {:}'.format(key, _k, type(_v))
xvalue[_k] = _v.state_dict()
else:
xvalue = value
state_dict[key] = xvalue
return state_dict
def load_state_dict(self, state_dict):
new_state_dict = dict()
for key, value in state_dict.items():
if key == 'all_results': # to convert to the class of ResultsCount
xvalue = dict()
assert isinstance(value, dict), 'invalid type of value for {:} : {:}'.format(key, type(value))
for _k, _v in value.items():
xvalue[_k] = ResultsCount.create_from_state_dict(_v)
else: xvalue = value
new_state_dict[key] = xvalue
self.__dict__.update(new_state_dict)
@staticmethod
def create_from_state_dict(state_dict_or_file):
x = ArchResults(-1, -1)
if isinstance(state_dict_or_file, str): # a file path
state_dict = torch.load(state_dict_or_file, map_location='cpu')
elif isinstance(state_dict_or_file, dict):
state_dict = state_dict_or_file
else:
raise ValueError('invalid type of state_dict_or_file : {:}'.format(type(state_dict_or_file)))
x.load_state_dict(state_dict)
return x
# This function is used to clear the weights saved in each 'result'
# This can help reduce the memory footprint.
def clear_params(self):
for key, result in self.all_results.items():
del result.net_state_dict
result.net_state_dict = None
self.clear_net_done = True
def debug_test(self):
"""This function is used for me to debug and test, which will call most methods."""
all_dataset = ['cifar10-valid', 'cifar10', 'cifar100', 'ImageNet16-120']
for dataset in all_dataset:
print('---->>>> {:}'.format(dataset))
print('The latency on {:} is {:} s'.format(dataset, self.get_latency(dataset)))
for seed in self.dataset_seed[dataset]:
result = self.all_results[(dataset, seed)]
print(' ==>> result = {:}'.format(result))
print(' ==>> cost = {:}'.format(result.get_times()))
def __repr__(self):
return ('{name}(arch-index={index}, arch={arch}, {num} runs, clear={clear})'.format(name=self.__class__.__name__, index=self.arch_index, arch=self.arch_str, num=len(self.all_results), clear=self.clear_net_done))
"""
This class (ResultsCount) is used to save the information of one trial for a single architecture.
I did not write much comment for this class, because it is the lowest-level class in NAS-Bench-201 API, which will be rarely called.
If you have any question regarding this class, please open an issue or email me.
"""
class ResultsCount(object):
def __init__(self, name, state_dict, train_accs, train_losses, params, flop, arch_config, seed, epochs, latency):
self.name = name
self.net_state_dict = state_dict
self.train_acc1es = copy.deepcopy(train_accs)
self.train_acc5es = None
self.train_losses = copy.deepcopy(train_losses)
self.train_times = None
self.arch_config = copy.deepcopy(arch_config)
self.params = params
self.flop = flop
self.seed = seed
self.epochs = epochs
self.latency = latency
# evaluation results
self.reset_eval()
def update_train_info(self, train_acc1es, train_acc5es, train_losses, train_times) -> None:
self.train_acc1es = train_acc1es
self.train_acc5es = train_acc5es
self.train_losses = train_losses
self.train_times = train_times
def reset_pseudo_train_times(self, estimated_per_epoch_time: float) -> None:
"""Assign the training times."""
train_times = OrderedDict()
for i in range(self.epochs):
train_times[i] = estimated_per_epoch_time
self.train_times = train_times
def reset_pseudo_eval_times(self, eval_name: Text, estimated_per_epoch_time: float) -> None:
"""Assign the evaluation times."""
if eval_name not in self.eval_names: raise ValueError('invalid eval name : {:}'.format(eval_name))
for i in range(self.epochs):
self.eval_times['{:}@{:}'.format(eval_name,i)] = estimated_per_epoch_time
def reset_eval(self):
self.eval_names = []
self.eval_acc1es = {}
self.eval_times = {}
self.eval_losses = {}
def update_latency(self, latency):
self.latency = copy.deepcopy( latency )
def get_latency(self) -> float:
"""Return the latency value in seconds. -1 represents not avaliable ; otherwise it should be a float value"""
if self.latency is None: return -1.0
else: return sum(self.latency) / len(self.latency)
def update_eval(self, accs, losses, times): # new version
data_names = set([x.split('@')[0] for x in accs.keys()])
for data_name in data_names:
assert data_name not in self.eval_names, '{:} has already been added into eval-names'.format(data_name)
self.eval_names.append( data_name )
for iepoch in range(self.epochs):
xkey = '{:}@{:}'.format(data_name, iepoch)
self.eval_acc1es[ xkey ] = accs[ xkey ]
self.eval_losses[ xkey ] = losses[ xkey ]
self.eval_times [ xkey ] = times[ xkey ]
def update_OLD_eval(self, name, accs, losses): # old version
assert name not in self.eval_names, '{:} has already added'.format(name)
self.eval_names.append( name )
for iepoch in range(self.epochs):
if iepoch in accs:
self.eval_acc1es['{:}@{:}'.format(name,iepoch)] = accs[iepoch]
self.eval_losses['{:}@{:}'.format(name,iepoch)] = losses[iepoch]
def __repr__(self):
num_eval = len(self.eval_names)
set_name = '[' + ', '.join(self.eval_names) + ']'
return ('{name}({xname}, arch={arch}, FLOP={flop:.2f}M, Param={param:.3f}MB, seed={seed}, {num_eval} eval-sets: {set_name})'.format(name=self.__class__.__name__, xname=self.name, arch=self.arch_config['arch_str'], flop=self.flop, param=self.params, seed=self.seed, num_eval=num_eval, set_name=set_name))
def get_total_epoch(self):
return copy.deepcopy(self.epochs)
def get_times(self):
"""Obtain the information regarding both training and evaluation time."""
if self.train_times is not None and isinstance(self.train_times, dict):
train_times = list( self.train_times.values() )
time_info = {'T-train@epoch': np.mean(train_times), 'T-train@total': np.sum(train_times)}
else:
time_info = {'T-train@epoch': None, 'T-train@total': None }
for name in self.eval_names:
try:
xtimes = [self.eval_times['{:}@{:}'.format(name,i)] for i in range(self.epochs)]
time_info['T-{:}@epoch'.format(name)] = np.mean(xtimes)
time_info['T-{:}@total'.format(name)] = np.sum(xtimes)
except:
time_info['T-{:}@epoch'.format(name)] = None
time_info['T-{:}@total'.format(name)] = None
return time_info
def get_eval_set(self):
return self.eval_names
# get the training information
def get_train(self, iepoch=None):
if iepoch is None: iepoch = self.epochs-1
assert 0 <= iepoch < self.epochs, 'invalid iepoch={:} < {:}'.format(iepoch, self.epochs)
if self.train_times is not None:
xtime = self.train_times[iepoch]
atime = sum([self.train_times[i] for i in range(iepoch+1)])
else: xtime, atime = None, None
return {'iepoch' : iepoch,
'loss' : self.train_losses[iepoch],
'accuracy': self.train_acc1es[iepoch],
'cur_time': xtime,
'all_time': atime}
def get_eval(self, name, iepoch=None):
"""Get the evaluation information ; there could be multiple evaluation sets (identified by the 'name' argument)."""
if iepoch is None: iepoch = self.epochs-1
assert 0 <= iepoch < self.epochs, 'invalid iepoch={:} < {:}'.format(iepoch, self.epochs)
if isinstance(self.eval_times,dict) and len(self.eval_times) > 0:
xtime = self.eval_times['{:}@{:}'.format(name,iepoch)]
atime = sum([self.eval_times['{:}@{:}'.format(name,i)] for i in range(iepoch+1)])
else: xtime, atime = None, None
return {'iepoch' : iepoch,
'loss' : self.eval_losses['{:}@{:}'.format(name,iepoch)],
'accuracy': self.eval_acc1es['{:}@{:}'.format(name,iepoch)],
'cur_time': xtime,
'all_time': atime}
def get_net_param(self, clone=False):
if clone: return copy.deepcopy(self.net_state_dict)
else: return self.net_state_dict
def get_config(self, str2structure):
"""This function is used to obtain the config dict for this architecture."""
if str2structure is None:
# In this case, this is NAS-Bench-301
if 'name' in self.arch_config and self.arch_config['name'] == 'infer.shape.tiny':
return {'name': 'infer.shape.tiny', 'channels': self.arch_config['channels'],
'genotype': self.arch_config['genotype'], 'num_classes': self.arch_config['class_num']}
# In this case, this is NAS-Bench-201
else:
return {'name': 'infer.tiny', 'C': self.arch_config['channel'],
'N' : self.arch_config['num_cells'],
'arch_str': self.arch_config['arch_str'], 'num_classes': self.arch_config['class_num']}
else:
# In this case, this is NAS-Bench-301
if 'name' in self.arch_config and self.arch_config['name'] == 'infer.shape.tiny':
return {'name': 'infer.shape.tiny', 'channels': self.arch_config['channels'],
'genotype': str2structure(self.arch_config['genotype']), 'num_classes': self.arch_config['class_num']}
# In this case, this is NAS-Bench-201
else:
return {'name': 'infer.tiny', 'C': self.arch_config['channel'],
'N' : self.arch_config['num_cells'],
'genotype': str2structure(self.arch_config['arch_str']), 'num_classes': self.arch_config['class_num']}
def state_dict(self):
_state_dict = {key: value for key, value in self.__dict__.items()}
return _state_dict
def load_state_dict(self, state_dict):
self.__dict__.update(state_dict)
@staticmethod
def create_from_state_dict(state_dict):
x = ResultsCount(None, None, None, None, None, None, None, None, None, None)
x.load_state_dict(state_dict)
return x

View File

@ -4,7 +4,9 @@
#####################################################
# SLURM_PROCID=0 SLURM_NTASKS=6 bash ./scripts-search/X-X/train-shapes-v2.sh 12 777
#
# SLURM_PROCID=0 SLURM_NTASKS=2 bash ./scripts-search/X-X/train-shapes.sh 31000-32767 90 777
# SLURM_PROCID=0 SLURM_NTASKS=4 bash ./scripts-search/X-X/train-shapes.sh 30000-32767 90 777
# SLURM_PROCID=0 SLURM_NTASKS=4 bash ./scripts-search/X-X/train-shapes.sh 00000-09999 90 777
#
echo script name: $0
echo $# arguments
if [ "$#" -ne 2 ] ;then
@ -21,7 +23,8 @@ fi
#srange=01000-03999,04050-05000,06000-09000,11000-14500,15000-18500,20000-23500,25000-27500,29000-30000
#srange=00000-00999,04000-04049,05001-05999,09001-10999,14501-14999,18501-19999,23501-24999,27501-28999,30001-32767
srange=00000-00999,04000-04049,05001-05999,09001-10999,14501-14999,18501-19999,23501-24999,27501-28999,30001-30999
#srange=00000-09999
srange=10000-29999
opt=$1
all_seeds=$2
cpus=4

View File

@ -5,14 +5,17 @@
# [mars6] CUDA_VISIBLE_DEVICES=0 bash ./scripts-search/X-X/train-shapes.sh 00000-05000 12 777
# [mars6] bash ./scripts-search/X-X/train-shapes.sh 05001-10000 12 777
# [mars20] bash ./scripts-search/X-X/train-shapes.sh 10001-14500 12 777
# [mars20] bash ./scripts-search/X-X/train-shapes.sh 14501-19500 12 777
# [mars20] bash ./scripts-search/X-X/train-shapes.sh 14501-18000 12 777
# [saturn4] bash ./scripts-search/X-X/train-shapes.sh 18001-19500 12 777
# [saturn4] bash ./scripts-search/X-X/train-shapes.sh 19501-23500 12 777
# [saturn4] bash ./scripts-search/X-X/train-shapes.sh 23501-27500 12 777
# [saturn4] bash ./scripts-search/X-X/train-shapes.sh 27501-30000 12 777
# [saturn4] bash ./scripts-search/X-X/train-shapes.sh 30001-32767 12 777
# [x] bash ./scripts-search/X-X/train-shapes.sh 30001-32767 12 777
#
# CUDA_VISIBLE_DEVICES=2 bash ./scripts-search/X-X/train-shapes.sh 01000-03999,04050-05000,06000-09000,11000-14500,15000-18500,20000-23500,25000-27500,29000-30000 12 777
# SLURM_PROCID=1 SLURM_NTASKS=5 bash ./scripts-search/X-X/train-shapes.sh 01000-03999,04050-05000,06000-09000,11000-14500,15000-18500,20000-23500,25000-27500,29000-30000 90 777
# [GCP] bash ./scripts-search/X-X/train-shapes.sh 00000-09999 90 777
# [UTS] bash ./scripts-search/X-X/train-shapes.sh 30000-32767 90 777
echo script name: $0
echo $# arguments
if [ "$#" -ne 3 ] ;then
@ -43,4 +46,4 @@ OMP_NUM_THREADS=${cpus} python exps/NAS-Bench-201/xshapes.py \
$TORCH_HOME/cifar.python \
$TORCH_HOME/cifar.python/ImageNet16 \
--workers ${cpus} \
--seeds ${all_seeds}
--seeds ${all_seeds}