3. **Graph-denoiser**首先集成(integrates)点和边features到一个图tokens,然后用ADaLN来解噪这些tokens到Transformer层。ADaLN在各个隐藏层中,替换化合物的统计(均值和方差)为条件的表示。 effectively outperforming other predictor-based and predictor-free conditioning methods,这种方法高效于其他基于predictor的和predictorfree的条件方法。
2. Denoising Generation with Graph Diffusion Transformer
条件编码器:将时间步t视为特殊条件,获得具有正弦编码的D维表示t,数值/分类条件c_i,用不同的编码操作来获得D维表示。分类用one-hot,数值用聚类编码。把c_i赋值给clusters,Transforming the soft assignment vector of condition values into the representation。可以用Linear(Softmax(Linear(c_i)))来实现。最后我们可以得到条件的表示为c=encode(c_i) for i in range(1, M)。 For numerical conditions, we evaluate our
proposed clustering-based approach against alternatives like direct or interval-based encodings
将生成的图形转换为分子的一种常见方法是只选择最大的连接部分[ 42 ],在我们的模型中称为图形 DiT-LCC。对于 DiT 图形,我们通过随机选择原子来连接所有组件。与图 DiT-LCC 相比,它对生成结构的改变最小,能更准确地反映模型性能。
#### related work not innovation point
- Diffusion models have also been used for molecular property prediction [27], for conformation [47] and molecule generation with atomic coordinates in 3D [ 18 , 48, 3].
1. We extensively validate the Graph DiT for multi-conditional polymer and small molecule generation. A polymer inverse design task for gas separation with feedback from domain experts further demonstrates its practical utility
2. 比较轻量: 使用更简单的细化模块,在推理过程中应用100多次迭代而不会出分歧。构建没有扭曲和固定分辨率更新的成本量 namely the construction of the cost volume without warping and fixed resolution updates.Devon使用dilated cost volume来处理大位移,这个方法是在多个分辨率下汇集correlation volume
repetitive sampling and training of many task-irrelvant architectures.
> sufferfrom the high search cost
>
> proposed to utilize parameterized property predictors without training
>
> NAS waste of a time to explore an extensive search space
>
> the property predictors mostly play a passive role such as evaluators that rank architecture candidates provided by a search strategy to simply filter them out during the search process.
### Innovation Point
#### Abstract
1. proposed a novel conditional **Neural Architecture Generation(NAG)** framework based on diffusion models
2. the guidance of parameterized predictors => task optimal architectures => sampling from a region that is more likely to satisfy the properties.
#### Introduction
1. diffusion generative models.
2.**train** the base diffusion generative model **without requiring expensive label information**. e.g. accuracy
3. deploy the trained diffusion model to diverse downstream tasks, while controlling the generation process with **property predictors**.
4. we leverage **gradients** of **parameterized predictors** to guide the generative model toward the space of architectures with desired properties.
5. our approach facilitates efficient search by **generating architectures that follow the specific distribution of interest within the search space**
6. utilizes the predictor for both NAG and evaluation purposes
7. we can swap out the predictors in a plug-and-play manner without retraining the base generative model
8. design a score network for na
9. previously, na represented as directed acyclic graphs to model the computation flow, now undirected graphs, to represent structure information of graphs completely ignoring the directional relationships between nodes. introduce a score network that encodes the **positional information** of nodes to capture t**heir order connected by directed edges.**
- Neural Architecture $\mathbf{A}$ with $N$ nodes defined by **operator type matrix** $\mathbf{\mathcal{V}}\in\R^{N\times F}$ and upper triangular adjacency matrix $\mathbf{\mathcal{E}}$, $\mathbf{A}=(\mathbf{\mathcal{V}},\mathbf{\mathcal{E}})$, $F$ is the number of predefined operator sets
d\mathbf{A}_t = [\mathbf{f}_t(\mathbf{A}_t)-g_t^2\nabla{A_t}\log p_t(A_t)]\text{d}\bar t + g_t\text{d}\bar{\mathbf{w}}
$$
we discretize the entries of the architecture matrices usingthe operator 1>0.5 to obtain discrete 0-1 matrices after generating samples by simulating the reverse diffusion process.
1. Transferable NAS and Bayesian Optimization(BO)-based NAS. Speedups of up to 35x
2. integrated into a BO-based algorithm, outperforms
#### Introduction
1. Transferable NAS and Bayesian Optimization(BO)-based NAS.
1. Transferable NAS use transferable dataset-aware predictors
2. DiffusionNAG demonstrates superior generation quality compared to MetaD2A
3. This is because DiffusionNAG overcomes the limitation of existing BO-based NAS, which **samples low-quality architectures during the initial phase**, by sampling from the space of the architectures that satisfy the given properties.
1. various types of NAS tasks (e.g., latency or robustness-constrained NAS)
## DiGress 2209.14734
### key research question
- discrete denoising diffusion model for generating graphs with **categorical** node and edge attributes
### Innovation point
#### absctract
- **discrete diffusion process**, progressively edits graphs with noise
- **graph transformer** = denoiser , turn distribution learning over graphs into a sequence of node and edge classification tasks.
- **Markovian noise model**, preserves the marginal distribution of node and edge types during diffusion
- Procedure for conditioning the generation on graph-level features.
#### Introduction
- previous, add **Gaussion noise** to node features and `adj_matrix`, continuous diffusionmay destroys the graphs's sparsity and creates complete noisy graphs
- DiGress. Noise = graphs edits(edge addition or deletion)
- graph transformer denoiser predict the clean graph from a noisy input, result admits an elbo for likelihood estimation
- **guidance procedure** for conditioning graph generation on **graph-level properties**
### method
noise model $q$
data point $x$
a sequences of increasingly noisy data points $(z^1,...,z^T)$, where $q(z^1,...,z^T|x) = q(z^1|x)\prod_{t=2}^Tq(z^t|z^{t-1})$
denoising nn. $\phi_\theta$
#### Diffusion models
噪声从先验分布中采样,然后迭代地通过应用解噪网络解噪
denoising network is not trained to directly predict $z^{t-1}$
when $\int q(z^{t-1}|z^t,x)dp_\theta(x)$ tractable, $x$ can be used as the target of the denoising network, which removes an important source of label noise
### experiments
#### abstract
- 3x validity improvement on a planar graph dataset
- scale to the large GuacaMol dataset containing 1.3M drug-like molecules without the use of molecule-specific representations.