<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet href="/scripts/pretty-feed-v3.xsl" type="text/xsl"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:h="http://www.w3.org/TR/html4/"><channel><title>Pengwee Wang&apos;s blog</title><description>Time waits for no one.</description><link>https://pengwee.wang</link><item><title>SimCLR</title><link>https://pengwee.wang/blog/simclr</link><guid isPermaLink="true">https://pengwee.wang/blog/simclr</guid><description>SimCLR是自监督视觉表征对比学习算法</description><pubDate>Thu, 18 Dec 2025 00:00:00 GMT</pubDate><content:encoded>&lt;h3&gt;整体思想&lt;/h3&gt;
&lt;p&gt;SimCLR是自监督视觉表征对比学习算法，核心是学习具有强判别性与泛化性的图像表征（&lt;strong&gt;不受数据增强/无关变异影响&lt;/strong&gt; —— 忽略裁剪、颜色变化、模糊等表面干扰；同时捕捉不同类别的核心语义） ~~，使同一类别图像在特征空间自然聚集（距离较近、相似度较高），不同类别图像在特征空间中分离（距离较远、相似度较低）~~ 。&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;a href=&quot;https://zhuanlan.zhihu.com/p/197802321&quot;&gt;https://zhuanlan.zhihu.com/p/197802321&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;总而言之，CL的思想就是如果两个事物相似，那么我们希望这两个事物的编码也相似。实际上目前大部分的做法也都是降维后计算contrastive loss。然而，难点就在降维度的过程中 contrastive loss 与对比samples的设计。&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;实验结论：&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;数据增强的组合&lt;/strong&gt;对定义有效预测任务至关重要&lt;/p&gt;
&lt;p&gt;在表征与对比损失间引入&lt;strong&gt;可学习的非线性变换&lt;/strong&gt;能显著提升表征质量&lt;/p&gt;
&lt;p&gt;对比学习相比监督学习更受益于&lt;strong&gt;更大的批次大小和更多训练step&lt;/strong&gt;；&lt;/p&gt;
&lt;p&gt;‍&lt;/p&gt;
&lt;h3&gt;实现细节&lt;/h3&gt;
&lt;p&gt;SimCLR架构：&lt;/p&gt;
&lt;p&gt;​&lt;img src=&quot;https://pengwee.wang/_astro/image-20251218101807-yguwcsn.BMVbhiSi_2fVSsq.webp&quot; alt=&quot;image&quot;&gt;​&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;随机数据增强 $\mathcal{T}$&lt;/li&gt;
&lt;li&gt;视觉表征编码器 $f(\cdot)$&lt;/li&gt;
&lt;li&gt;projection head $g(\cdot)$，包含非线性激活函数&lt;/li&gt;
&lt;li&gt;对比损失函数&lt;strong&gt;NT-Xent（归一化温度缩放交叉熵）&lt;/strong&gt;
&lt;ul&gt;
&lt;li&gt;余弦相似度 $sim(\boldsymbol{u}, \boldsymbol{v}) = \boldsymbol{u}^T \boldsymbol{v}/|\boldsymbol{u}| |\boldsymbol{v}|$&lt;/li&gt;
&lt;li&gt;$\ell_{i,j} = -\log \frac{\exp(\text{sim}(z_i, z_j)/T)}{\sum_{k=1}^{2N} \mathbb{1}_{[k \neq i,j]} \exp(\text{sim}(z_i, z_k)/T)}$​&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;训练算法：&lt;/p&gt;
&lt;p&gt;​&lt;img src=&quot;https://pengwee.wang/_astro/C05DDC0F-D42C-40A8-84DE-AF0BDB4DC234-20251218102900-otkliwl.DpJb2D-i_hQK6C.webp&quot; alt=&quot;{C05DDC0F-D42C-40A8-84DE-AF0BDB4DC234}&quot;&gt;​&lt;/p&gt;
&lt;p&gt;对于一个batch的$N$个样本${x_k}^N_{k=1}$，做两个不同的数据增强 $t \sim \mathcal{T}, t^{&apos;} \sim \mathcal{T}$，得到$2N$个样本，其中$x_{2k-1},x_{2k}$是同一张样本图片的两个不同数据增强，经过$f(\cdot),g(\cdot)$得到$z_{2k-1},z_{2k}$&lt;/p&gt;
&lt;p&gt;计算损失函数$\mathcal{L} = \frac{1}{2N} \sum_{k=1}^{N} \left[ \ell(2k-1, 2k) + \ell(2k, 2k-1) \right]$&lt;/p&gt;
&lt;p&gt;实验中将batch中同一图片的不同数据增强作为正样本，不同图片的数据增强作为负样本，并且batch设置的很大（4096~8192），目的之一个人认为是为了减小同一类别的不同图片作为负样本的影响（因为此时他们的表征投影到低维应该相似）&lt;/p&gt;
&lt;p&gt;‍&lt;/p&gt;
&lt;h4&gt;非线性投影层可以有效提高其之前的表征质量&lt;/h4&gt;
&lt;p&gt;实验表明：非线性投影层 $g(\cdot)$ 的投影矩阵 $W$ 是低秩的，说明“少数维度承载主要有效信息”&lt;/p&gt;
&lt;p&gt;矩阵的 “秩” 代表矩阵中&lt;strong&gt;独立且有价值的信息维度数&lt;/strong&gt;。&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;高秩矩阵：多数维度都承载独特信息，没有明显的冗余；&lt;/li&gt;
&lt;li&gt;低秩矩阵：仅需少数维度就能近似还原矩阵的核心信息，剩余大量维度要么是重复信息，要么是无意义噪声（对任务贡献极小）。&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;即对 $h$ 进行过滤，得到适合对比损失函数计算的特征适配表征和损失函数，&lt;/p&gt;
&lt;p&gt;同时约束引导 $h$ 学习有用的表征。&lt;/p&gt;
&lt;p&gt;为什么不直接使用 $z$ ？因为 $g(\cdot)$ 只是引导作用，适配的是损失函数，而不是下游任务。&lt;/p&gt;</content:encoded><h:img src="/_astro/image-20251218101807-yguwcsn.BMVbhiSi.png"/><enclosure url="/_astro/image-20251218101807-yguwcsn.BMVbhiSi.png"/></item><item><title>Annotated Transformer</title><link>https://pengwee.wang/blog/annotated-transformer</link><guid isPermaLink="true">https://pengwee.wang/blog/annotated-transformer</guid><description>The Annotated Transformer - Attention is All You Need 的详细注释实现</description><pubDate>Fri, 05 Dec 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;文章原文在 &lt;a href=&quot;http://nlp.seas.harvard.edu/annotated-transformer/&quot;&gt;http://nlp.seas.harvard.edu/annotated-transformer/&lt;/a&gt;，本人为了便于访问只做转载，著作权归原作者所有&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;v2022: Austin Huang, Suraj Subramanian, Jonathan Sum, Khalid Almubarak,
and Stella Biderman.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&lt;a href=&quot;https://nlp.seas.harvard.edu/2018/04/03/attention.html&quot;&gt;Original&lt;/a&gt;:
&lt;a href=&quot;http://rush-nlp.com/&quot;&gt;Sasha Rush&lt;/a&gt;.&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The Transformer has been on a lot of
people&apos;s minds over the last year five years.
This post presents an annotated version of the paper in the
form of a line-by-line implementation. It reorders and deletes
some sections from the original paper and adds comments
throughout. This document itself is a working notebook, and should
be a completely usable implementation.
Code is available
&lt;a href=&quot;https://github.com/harvardnlp/annotated-transformer/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;Prelims&lt;/h1&gt;
&lt;p&gt;Skip&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;# !pip install -r requirements.txt
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;# # Uncomment for colab
# #
# !pip install -q torchdata==0.3.0 torchtext==0.12 spacy==3.2 altair GPUtil
# !python -m spacy download de_core_news_sm
# !python -m spacy download en_core_web_sm
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;import os
from os.path import exists
import torch
import torch.nn as nn
from torch.nn.functional import log_softmax, pad
import math
import copy
import time
from torch.optim.lr_scheduler import LambdaLR
import pandas as pd
import altair as alt
from torchtext.data.functional import to_map_style_dataset
from torch.utils.data import DataLoader
from torchtext.vocab import build_vocab_from_iterator
import torchtext.datasets as datasets
import spacy
import GPUtil
import warnings
from torch.utils.data.distributed import DistributedSampler
import torch.distributed as dist
import torch.multiprocessing as mp
from torch.nn.parallel import DistributedDataParallel as DDP


# Set to False to skip notebook execution (e.g. for debugging)
warnings.filterwarnings(&quot;ignore&quot;)
RUN_EXAMPLES = True
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;# Some convenience helper functions used throughout the notebook


def is_interactive_notebook():
    return __name__ == &quot;__main__&quot;


def show_example(fn, args=[]):
    if __name__ == &quot;__main__&quot; and RUN_EXAMPLES:
        return fn(*args)


def execute_example(fn, args=[]):
    if __name__ == &quot;__main__&quot; and RUN_EXAMPLES:
        fn(*args)


class DummyOptimizer(torch.optim.Optimizer):
    def __init__(self):
        self.param_groups = [{&quot;lr&quot;: 0}]
        None

    def step(self):
        None

    def zero_grad(self, set_to_none=False):
        None


class DummyScheduler:
    def step(self):
        None
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;My comments are blockquoted. The main text is all from the paper itself.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h1&gt;Background&lt;/h1&gt;
&lt;p&gt;The goal of reducing sequential computation also forms the
foundation of the Extended Neural GPU, ByteNet and ConvS2S, all of
which use convolutional neural networks as basic building block,
computing hidden representations in parallel for all input and
output positions. In these models, the number of operations required
to relate signals from two arbitrary input or output positions grows
in the distance between positions, linearly for ConvS2S and
logarithmically for ByteNet. This makes it more difficult to learn
dependencies between distant positions. In the Transformer this is
reduced to a constant number of operations, albeit at the cost of
reduced effective resolution due to averaging attention-weighted
positions, an effect we counteract with Multi-Head Attention.&lt;/p&gt;
&lt;p&gt;Self-attention, sometimes called intra-attention is an attention
mechanism relating different positions of a single sequence in order
to compute a representation of the sequence. Self-attention has been
used successfully in a variety of tasks including reading
comprehension, abstractive summarization, textual entailment and
learning task-independent sentence representations. End-to-end
memory networks are based on a recurrent attention mechanism instead
of sequencealigned recurrence and have been shown to perform well on
simple-language question answering and language modeling tasks.&lt;/p&gt;
&lt;p&gt;To the best of our knowledge, however, the Transformer is the first
transduction model relying entirely on self-attention to compute
representations of its input and output without using sequence
aligned RNNs or convolution.&lt;/p&gt;
&lt;h1&gt;Part 1: Model Architecture&lt;/h1&gt;
&lt;h1&gt;Model Architecture&lt;/h1&gt;
&lt;p&gt;Most competitive neural sequence transduction models have an
encoder-decoder structure
&lt;a href=&quot;https://arxiv.org/abs/1409.0473&quot;&gt;(cite)&lt;/a&gt;. Here, the encoder maps an
input sequence of symbol representations $(x_1, ..., x_n)$ to a
sequence of continuous representations $\mathbf{z} = (z_1, ...,
z_n)$. Given $\mathbf{z}$, the decoder then generates an output
sequence $(y_1,...,y_m)$ of symbols one element at a time. At each
step the model is auto-regressive
&lt;a href=&quot;https://arxiv.org/abs/1308.0850&quot;&gt;(cite)&lt;/a&gt;, consuming the previously
generated symbols as additional input when generating the next.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class EncoderDecoder(nn.Module):
    &quot;&quot;&quot;
    A standard Encoder-Decoder architecture. Base for this and many
    other models.
    &quot;&quot;&quot;

    def __init__(self, encoder, decoder, src_embed, tgt_embed, generator):
        super(EncoderDecoder, self).__init__()
        self.encoder = encoder
        self.decoder = decoder
        self.src_embed = src_embed
        self.tgt_embed = tgt_embed
        self.generator = generator

    def forward(self, src, tgt, src_mask, tgt_mask):
        &quot;Take in and process masked src and target sequences.&quot;
        return self.decode(self.encode(src, src_mask), src_mask, tgt, tgt_mask)

    def encode(self, src, src_mask):
        return self.encoder(self.src_embed(src), src_mask)

    def decode(self, memory, src_mask, tgt, tgt_mask):
        return self.decoder(self.tgt_embed(tgt), memory, src_mask, tgt_mask)
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class Generator(nn.Module):
    &quot;Define standard linear + softmax generation step.&quot;

    def __init__(self, d_model, vocab):
        super(Generator, self).__init__()
        self.proj = nn.Linear(d_model, vocab)

    def forward(self, x):
        return log_softmax(self.proj(x), dim=-1)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The Transformer follows this overall architecture using stacked
self-attention and point-wise, fully connected layers for both the
encoder and decoder, shown in the left and right halves of Figure 1,
respectively.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/ModalNet-21.E_wcADzz_1H2Ujx.webp&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Encoder and Decoder Stacks&lt;/h2&gt;
&lt;h3&gt;Encoder&lt;/h3&gt;
&lt;p&gt;The encoder is composed of a stack of $N=6$ identical layers.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def clones(module, N):
    &quot;Produce N identical layers.&quot;
    return nn.ModuleList([copy.deepcopy(module) for _ in range(N)])
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class Encoder(nn.Module):
    &quot;Core encoder is a stack of N layers&quot;

    def __init__(self, layer, N):
        super(Encoder, self).__init__()
        self.layers = clones(layer, N)
        self.norm = LayerNorm(layer.size)

    def forward(self, x, mask):
        &quot;Pass the input (and mask) through each layer in turn.&quot;
        for layer in self.layers:
            x = layer(x, mask)
        return self.norm(x)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We employ a residual connection
&lt;a href=&quot;https://arxiv.org/abs/1512.03385&quot;&gt;(cite)&lt;/a&gt; around each of the two
sub-layers, followed by layer normalization
&lt;a href=&quot;https://arxiv.org/abs/1607.06450&quot;&gt;(cite)&lt;/a&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class LayerNorm(nn.Module):
    &quot;Construct a layernorm module (See citation for details).&quot;

    def __init__(self, features, eps=1e-6):
        super(LayerNorm, self).__init__()
        self.a_2 = nn.Parameter(torch.ones(features))
        self.b_2 = nn.Parameter(torch.zeros(features))
        self.eps = eps

    def forward(self, x):
        mean = x.mean(-1, keepdim=True)
        std = x.std(-1, keepdim=True)
        return self.a_2 * (x - mean) / (std + self.eps) + self.b_2
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;That is, the output of each sub-layer is $\mathrm{LayerNorm}(x +
\mathrm{Sublayer}(x))$, where $\mathrm{Sublayer}(x)$ is the function
implemented by the sub-layer itself. We apply dropout
&lt;a href=&quot;http://jmlr.org/papers/v15/srivastava14a.html&quot;&gt;(cite)&lt;/a&gt; to the
output of each sub-layer, before it is added to the sub-layer input
and normalized.&lt;/p&gt;
&lt;p&gt;To facilitate these residual connections, all sub-layers in the
model, as well as the embedding layers, produce outputs of dimension
$d_{\text{model}}=512$.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class SublayerConnection(nn.Module):
    &quot;&quot;&quot;
    A residual connection followed by a layer norm.
    Note for code simplicity the norm is first as opposed to last.
    &quot;&quot;&quot;

    def __init__(self, size, dropout):
        super(SublayerConnection, self).__init__()
        self.norm = LayerNorm(size)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x, sublayer):
        &quot;Apply residual connection to any sublayer with the same size.&quot;
        return x + self.dropout(sublayer(self.norm(x)))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Each layer has two sub-layers. The first is a multi-head
self-attention mechanism, and the second is a simple, position-wise
fully connected feed-forward network.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class EncoderLayer(nn.Module):
    &quot;Encoder is made up of self-attn and feed forward (defined below)&quot;

    def __init__(self, size, self_attn, feed_forward, dropout):
        super(EncoderLayer, self).__init__()
        self.self_attn = self_attn
        self.feed_forward = feed_forward
        self.sublayer = clones(SublayerConnection(size, dropout), 2)
        self.size = size

    def forward(self, x, mask):
        &quot;Follow Figure 1 (left) for connections.&quot;
        x = self.sublayer[0](x, lambda x: self.self_attn(x, x, x, mask))
        return self.sublayer[1](x, self.feed_forward)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Decoder&lt;/h3&gt;
&lt;p&gt;The decoder is also composed of a stack of $N=6$ identical layers.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class Decoder(nn.Module):
    &quot;Generic N layer decoder with masking.&quot;

    def __init__(self, layer, N):
        super(Decoder, self).__init__()
        self.layers = clones(layer, N)
        self.norm = LayerNorm(layer.size)

    def forward(self, x, memory, src_mask, tgt_mask):
        for layer in self.layers:
            x = layer(x, memory, src_mask, tgt_mask)
        return self.norm(x)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In addition to the two sub-layers in each encoder layer, the decoder
inserts a third sub-layer, which performs multi-head attention over
the output of the encoder stack. Similar to the encoder, we employ
residual connections around each of the sub-layers, followed by
layer normalization.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class DecoderLayer(nn.Module):
    &quot;Decoder is made of self-attn, src-attn, and feed forward (defined below)&quot;

    def __init__(self, size, self_attn, src_attn, feed_forward, dropout):
        super(DecoderLayer, self).__init__()
        self.size = size
        self.self_attn = self_attn
        self.src_attn = src_attn
        self.feed_forward = feed_forward
        self.sublayer = clones(SublayerConnection(size, dropout), 3)

    def forward(self, x, memory, src_mask, tgt_mask):
        &quot;Follow Figure 1 (right) for connections.&quot;
        m = memory
        x = self.sublayer[0](x, lambda x: self.self_attn(x, x, x, tgt_mask))
        x = self.sublayer[1](x, lambda x: self.src_attn(x, m, m, src_mask))
        return self.sublayer[2](x, self.feed_forward)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We also modify the self-attention sub-layer in the decoder stack to
prevent positions from attending to subsequent positions. This
masking, combined with fact that the output embeddings are offset by
one position, ensures that the predictions for position $i$ can
depend only on the known outputs at positions less than $i$.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def subsequent_mask(size):
    &quot;Mask out subsequent positions.&quot;
    attn_shape = (1, size, size)
    subsequent_mask = torch.triu(torch.ones(attn_shape), diagonal=1).type(
        torch.uint8
    )
    return subsequent_mask == 0
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;Below the attention mask shows the position each tgt word (row) is
allowed to look at (column). Words are blocked for attending to
future words during training.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def example_mask():
    LS_data = pd.concat(
        [
            pd.DataFrame(
                {
                    &quot;Subsequent Mask&quot;: subsequent_mask(20)[0][x, y].flatten(),
                    &quot;Window&quot;: y,
                    &quot;Masking&quot;: x,
                }
            )
            for y in range(20)
            for x in range(20)
        ]
    )

    return (
        alt.Chart(LS_data)
        .mark_rect()
        .properties(height=250, width=250)
        .encode(
            alt.X(&quot;Window:O&quot;),
            alt.Y(&quot;Masking:O&quot;),
            alt.Color(&quot;Subsequent Mask:Q&quot;, scale=alt.Scale(scheme=&quot;viridis&quot;)),
        )
        .interactive()
    )


show_example(example_mask)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Attention&lt;/h3&gt;
&lt;p&gt;An attention function can be described as mapping a query and a set
of key-value pairs to an output, where the query, keys, values, and
output are all vectors. The output is computed as a weighted sum of
the values, where the weight assigned to each value is computed by a
compatibility function of the query with the corresponding key.&lt;/p&gt;
&lt;p&gt;We call our particular attention &quot;Scaled Dot-Product Attention&quot;.
The input consists of queries and keys of dimension $d_k$, and
values of dimension $d_v$. We compute the dot products of the query
with all keys, divide each by $\sqrt{d_k}$, and apply a softmax
function to obtain the weights on the values.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/ModalNet-19.C-nEKeVT_2sDvWj.webp&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;In practice, we compute the attention function on a set of queries
simultaneously, packed together into a matrix $Q$. The keys and
values are also packed together into matrices $K$ and $V$. We
compute the matrix of outputs as:&lt;/p&gt;
&lt;p&gt;$$
\mathrm{Attention}(Q, K, V) = \mathrm{softmax}(\frac{QK^T}{\sqrt{d_k}})V
$$&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def attention(query, key, value, mask=None, dropout=None):
    &quot;Compute &apos;Scaled Dot Product Attention&apos;&quot;
    d_k = query.size(-1)
    scores = torch.matmul(query, key.transpose(-2, -1)) / math.sqrt(d_k)
    if mask is not None:
        scores = scores.masked_fill(mask == 0, -1e9)
    p_attn = scores.softmax(dim=-1)
    if dropout is not None:
        p_attn = dropout(p_attn)
    return torch.matmul(p_attn, value), p_attn
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The two most commonly used attention functions are additive
attention &lt;a href=&quot;https://arxiv.org/abs/1409.0473&quot;&gt;(cite)&lt;/a&gt;, and dot-product
(multiplicative) attention. Dot-product attention is identical to
our algorithm, except for the scaling factor of
$\frac{1}{\sqrt{d_k}}$. Additive attention computes the
compatibility function using a feed-forward network with a single
hidden layer. While the two are similar in theoretical complexity,
dot-product attention is much faster and more space-efficient in
practice, since it can be implemented using highly optimized matrix
multiplication code.&lt;/p&gt;
&lt;p&gt;While for small values of $d_k$ the two mechanisms perform
similarly, additive attention outperforms dot product attention
without scaling for larger values of $d_k$
&lt;a href=&quot;https://arxiv.org/abs/1703.03906&quot;&gt;(cite)&lt;/a&gt;. We suspect that for
large values of $d_k$, the dot products grow large in magnitude,
pushing the softmax function into regions where it has extremely
small gradients (To illustrate why the dot products get large,
assume that the components of $q$ and $k$ are independent random
variables with mean $0$ and variance $1$. Then their dot product,
$q \cdot k = \sum_{i=1}^{d_k} q_ik_i$, has mean $0$ and variance
$d_k$.). To counteract this effect, we scale the dot products by
$\frac{1}{\sqrt{d_k}}$.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/ModalNet-20.BioF6ALs_Z1NuBVo.webp&quot; alt=&quot;&quot;&gt;&lt;/p&gt;
&lt;p&gt;Multi-head attention allows the model to jointly attend to
information from different representation subspaces at different
positions. With a single attention head, averaging inhibits this.&lt;/p&gt;
&lt;p&gt;$$
\mathrm{MultiHead}(Q, K, V) =
\mathrm{Concat}(\mathrm{head_1}, ..., \mathrm{head_h})W^O \
\text{where}~\mathrm{head_i} = \mathrm{Attention}(QW^Q_i, KW^K_i, VW^V_i)
$$&lt;/p&gt;
&lt;p&gt;Where the projections are parameter matrices $W^Q_i \in
\mathbb{R}^{d_{\text{model}} \times d_k}$, $W^K_i \in
\mathbb{R}^{d_{\text{model}} \times d_k}$, $W^V_i \in
\mathbb{R}^{d_{\text{model}} \times d_v}$ and $W^O \in
\mathbb{R}^{hd_v \times d_{\text{model}}}$.&lt;/p&gt;
&lt;p&gt;In this work we employ $h=8$ parallel attention layers, or
heads. For each of these we use $d_k=d_v=d_{\text{model}}/h=64$. Due
to the reduced dimension of each head, the total computational cost
is similar to that of single-head attention with full
dimensionality.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class MultiHeadedAttention(nn.Module):
    def __init__(self, h, d_model, dropout=0.1):
        &quot;Take in model size and number of heads.&quot;
        super(MultiHeadedAttention, self).__init__()
        assert d_model % h == 0
        # We assume d_v always equals d_k
        self.d_k = d_model // h
        self.h = h
        self.linears = clones(nn.Linear(d_model, d_model), 4)
        self.attn = None
        self.dropout = nn.Dropout(p=dropout)

    def forward(self, query, key, value, mask=None):
        &quot;Implements Figure 2&quot;
        if mask is not None:
            # Same mask applied to all h heads.
            mask = mask.unsqueeze(1)
        nbatches = query.size(0)

        # 1) Do all the linear projections in batch from d_model =&gt; h x d_k
        query, key, value = [
            lin(x).view(nbatches, -1, self.h, self.d_k).transpose(1, 2)
            for lin, x in zip(self.linears, (query, key, value))
        ]

        # 2) Apply attention on all the projected vectors in batch.
        x, self.attn = attention(
            query, key, value, mask=mask, dropout=self.dropout
        )

        # 3) &quot;Concat&quot; using a view and apply a final linear.
        x = (
            x.transpose(1, 2)
            .contiguous()
            .view(nbatches, -1, self.h * self.d_k)
        )
        del query
        del key
        del value
        return self.linears[-1](x)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Applications of Attention in our Model&lt;/h3&gt;
&lt;p&gt;The Transformer uses multi-head attention in three different ways:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;In &quot;encoder-decoder attention&quot; layers, the queries come from the
previous decoder layer, and the memory keys and values come from the
output of the encoder. This allows every position in the decoder to
attend over all positions in the input sequence. This mimics the
typical encoder-decoder attention mechanisms in sequence-to-sequence
models such as &lt;a href=&quot;https://arxiv.org/abs/1609.08144&quot;&gt;(cite)&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The encoder contains self-attention layers. In a self-attention
layer all of the keys, values and queries come from the same place,
in this case, the output of the previous layer in the encoder. Each
position in the encoder can attend to all positions in the previous
layer of the encoder.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Similarly, self-attention layers in the decoder allow each
position in the decoder to attend to all positions in the decoder up
to and including that position. We need to prevent leftward
information flow in the decoder to preserve the auto-regressive
property. We implement this inside of scaled dot-product attention
by masking out (setting to $-\infty$) all values in the input of the
softmax which correspond to illegal connections.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;Position-wise Feed-Forward Networks&lt;/h2&gt;
&lt;p&gt;In addition to attention sub-layers, each of the layers in our
encoder and decoder contains a fully connected feed-forward network,
which is applied to each position separately and identically. This
consists of two linear transformations with a ReLU activation in
between.&lt;/p&gt;
&lt;p&gt;$$\mathrm{FFN}(x)=\max(0, xW_1 + b_1) W_2 + b_2$$&lt;/p&gt;
&lt;p&gt;While the linear transformations are the same across different
positions, they use different parameters from layer to
layer. Another way of describing this is as two convolutions with
kernel size 1. The dimensionality of input and output is
$d_{\text{model}}=512$, and the inner-layer has dimensionality
$d_{ff}=2048$.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class PositionwiseFeedForward(nn.Module):
    &quot;Implements FFN equation.&quot;

    def __init__(self, d_model, d_ff, dropout=0.1):
        super(PositionwiseFeedForward, self).__init__()
        self.w_1 = nn.Linear(d_model, d_ff)
        self.w_2 = nn.Linear(d_ff, d_model)
        self.dropout = nn.Dropout(dropout)

    def forward(self, x):
        return self.w_2(self.dropout(self.w_1(x).relu()))
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Embeddings and Softmax&lt;/h2&gt;
&lt;p&gt;Similarly to other sequence transduction models, we use learned
embeddings to convert the input tokens and output tokens to vectors
of dimension $d_{\text{model}}$. We also use the usual learned
linear transformation and softmax function to convert the decoder
output to predicted next-token probabilities. In our model, we
share the same weight matrix between the two embedding layers and
the pre-softmax linear transformation, similar to
&lt;a href=&quot;https://arxiv.org/abs/1608.05859&quot;&gt;(cite)&lt;/a&gt;. In the embedding layers,
we multiply those weights by $\sqrt{d_{\text{model}}}$.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class Embeddings(nn.Module):
    def __init__(self, d_model, vocab):
        super(Embeddings, self).__init__()
        self.lut = nn.Embedding(vocab, d_model)
        self.d_model = d_model

    def forward(self, x):
        return self.lut(x) * math.sqrt(self.d_model)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Positional Encoding&lt;/h2&gt;
&lt;p&gt;Since our model contains no recurrence and no convolution, in order
for the model to make use of the order of the sequence, we must
inject some information about the relative or absolute position of
the tokens in the sequence. To this end, we add &quot;positional
encodings&quot; to the input embeddings at the bottoms of the encoder and
decoder stacks. The positional encodings have the same dimension
$d_{\text{model}}$ as the embeddings, so that the two can be summed.
There are many choices of positional encodings, learned and fixed
&lt;a href=&quot;https://arxiv.org/pdf/1705.03122.pdf&quot;&gt;(cite)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In this work, we use sine and cosine functions of different frequencies:&lt;/p&gt;
&lt;p&gt;$$PE_{(pos,2i)} = \sin(pos / 10000^{2i/d_{\text{model}}})$$&lt;/p&gt;
&lt;p&gt;$$PE_{(pos,2i+1)} = \cos(pos / 10000^{2i/d_{\text{model}}})$$&lt;/p&gt;
&lt;p&gt;where $pos$ is the position and $i$ is the dimension. That is, each
dimension of the positional encoding corresponds to a sinusoid. The
wavelengths form a geometric progression from $2\pi$ to $10000 \cdot
2\pi$. We chose this function because we hypothesized it would
allow the model to easily learn to attend by relative positions,
since for any fixed offset $k$, $PE_{pos+k}$ can be represented as a
linear function of $PE_{pos}$.&lt;/p&gt;
&lt;p&gt;In addition, we apply dropout to the sums of the embeddings and the
positional encodings in both the encoder and decoder stacks. For
the base model, we use a rate of $P_{drop}=0.1$.&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class PositionalEncoding(nn.Module):
    &quot;Implement the PE function.&quot;

    def __init__(self, d_model, dropout, max_len=5000):
        super(PositionalEncoding, self).__init__()
        self.dropout = nn.Dropout(p=dropout)

        # Compute the positional encodings once in log space.
        pe = torch.zeros(max_len, d_model)
        position = torch.arange(0, max_len).unsqueeze(1)
        div_term = torch.exp(
            torch.arange(0, d_model, 2) * -(math.log(10000.0) / d_model)
        )
        pe[:, 0::2] = torch.sin(position * div_term)
        pe[:, 1::2] = torch.cos(position * div_term)
        pe = pe.unsqueeze(0)
        self.register_buffer(&quot;pe&quot;, pe)

    def forward(self, x):
        x = x + self.pe[:, : x.size(1)].requires_grad_(False)
        return self.dropout(x)
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;Below the positional encoding will add in a sine wave based on
position. The frequency and offset of the wave is different for
each dimension.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def example_positional():
    pe = PositionalEncoding(20, 0)
    y = pe.forward(torch.zeros(1, 100, 20))

    data = pd.concat(
        [
            pd.DataFrame(
                {
                    &quot;embedding&quot;: y[0, :, dim],
                    &quot;dimension&quot;: dim,
                    &quot;position&quot;: list(range(100)),
                }
            )
            for dim in [4, 5, 6, 7]
        ]
    )

    return (
        alt.Chart(data)
        .mark_line()
        .properties(width=800)
        .encode(x=&quot;position&quot;, y=&quot;embedding&quot;, color=&quot;dimension:N&quot;)
        .interactive()
    )


show_example(example_positional)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We also experimented with using learned positional embeddings
&lt;a href=&quot;https://arxiv.org/pdf/1705.03122.pdf&quot;&gt;(cite)&lt;/a&gt; instead, and found
that the two versions produced nearly identical results. We chose
the sinusoidal version because it may allow the model to extrapolate
to sequence lengths longer than the ones encountered during
training.&lt;/p&gt;
&lt;h2&gt;Full Model&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;Here we define a function from hyperparameters to a full model.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def make_model(
    src_vocab, tgt_vocab, N=6, d_model=512, d_ff=2048, h=8, dropout=0.1
):
    &quot;Helper: Construct a model from hyperparameters.&quot;
    c = copy.deepcopy
    attn = MultiHeadedAttention(h, d_model)
    ff = PositionwiseFeedForward(d_model, d_ff, dropout)
    position = PositionalEncoding(d_model, dropout)
    model = EncoderDecoder(
        Encoder(EncoderLayer(d_model, c(attn), c(ff), dropout), N),
        Decoder(DecoderLayer(d_model, c(attn), c(attn), c(ff), dropout), N),
        nn.Sequential(Embeddings(d_model, src_vocab), c(position)),
        nn.Sequential(Embeddings(d_model, tgt_vocab), c(position)),
        Generator(d_model, tgt_vocab),
    )

    # This was important from their code.
    # Initialize parameters with Glorot / fan_avg.
    for p in model.parameters():
        if p.dim() &gt; 1:
            nn.init.xavier_uniform_(p)
    return model
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Inference:&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;Here we make a forward step to generate a prediction of the
model. We try to use our transformer to memorize the input. As you
will see the output is randomly generated due to the fact that the
model is not trained yet. In the next tutorial we will build the
training function and try to train our model to memorize the numbers
from 1 to 10.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def inference_test():
    test_model = make_model(11, 11, 2)
    test_model.eval()
    src = torch.LongTensor([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
    src_mask = torch.ones(1, 1, 10)

    memory = test_model.encode(src, src_mask)
    ys = torch.zeros(1, 1).type_as(src)

    for i in range(9):
        out = test_model.decode(
            memory, src_mask, ys, subsequent_mask(ys.size(1)).type_as(src.data)
        )
        prob = test_model.generator(out[:, -1])
        _, next_word = torch.max(prob, dim=1)
        next_word = next_word.data[0]
        ys = torch.cat(
            [ys, torch.empty(1, 1).type_as(src.data).fill_(next_word)], dim=1
        )

    print(&quot;Example Untrained Model Prediction:&quot;, ys)


def run_tests():
    for _ in range(10):
        inference_test()


show_example(run_tests)
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;Example Untrained Model Prediction: tensor([[ 0, 10,  0, 10,  0,  0,  0,  0,  0, 10]])
Example Untrained Model Prediction: tensor([[ 0,  8,  1, 10,  0,  8,  1, 10,  0,  8]])


Example Untrained Model Prediction: tensor([[ 0,  9,  0, 10,  4,  5,  3,  2,  4,  3]])
Example Untrained Model Prediction: tensor([[0, 5, 5, 5, 5, 5, 5, 5, 5, 5]])


Example Untrained Model Prediction: tensor([[0, 2, 8, 3, 8, 5, 0, 4, 0, 4]])
Example Untrained Model Prediction: tensor([[ 0, 10,  3, 10,  2,  9,  0,  3, 10,  3]])


Example Untrained Model Prediction: tensor([[0, 3, 3, 3, 3, 3, 3, 3, 3, 3]])
Example Untrained Model Prediction: tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])


Example Untrained Model Prediction: tensor([[0, 3, 2, 2, 2, 4, 0, 3, 1, 3]])
Example Untrained Model Prediction: tensor([[0, 6, 6, 6, 6, 6, 6, 6, 6, 6]])
&lt;/code&gt;&lt;/pre&gt;
&lt;h1&gt;Part 2: Model Training&lt;/h1&gt;
&lt;h1&gt;Training&lt;/h1&gt;
&lt;p&gt;This section describes the training regime for our models.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We stop for a quick interlude to introduce some of the tools
needed to train a standard encoder decoder model. First we define a
batch object that holds the src and target sentences for training,
as well as constructing the masks.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Batches and Masking&lt;/h2&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class Batch:
    &quot;&quot;&quot;Object for holding a batch of data with mask during training.&quot;&quot;&quot;

    def __init__(self, src, tgt=None, pad=2):  # 2 = &amp;#x3C;blank&gt;
        self.src = src
        self.src_mask = (src != pad).unsqueeze(-2)
        if tgt is not None:
            self.tgt = tgt[:, :-1]
            self.tgt_y = tgt[:, 1:]
            self.tgt_mask = self.make_std_mask(self.tgt, pad)
            self.ntokens = (self.tgt_y != pad).data.sum()

    @staticmethod
    def make_std_mask(tgt, pad):
        &quot;Create a mask to hide padding and future words.&quot;
        tgt_mask = (tgt != pad).unsqueeze(-2)
        tgt_mask = tgt_mask &amp;#x26; subsequent_mask(tgt.size(-1)).type_as(
            tgt_mask.data
        )
        return tgt_mask
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;Next we create a generic training and scoring function to keep
track of loss. We pass in a generic loss compute function that
also handles parameter updates.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Training Loop&lt;/h2&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class TrainState:
    &quot;&quot;&quot;Track number of steps, examples, and tokens processed&quot;&quot;&quot;

    step: int = 0  # Steps in the current epoch
    accum_step: int = 0  # Number of gradient accumulation steps
    samples: int = 0  # total # of examples used
    tokens: int = 0  # total # of tokens processed
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def run_epoch(
    data_iter,
    model,
    loss_compute,
    optimizer,
    scheduler,
    mode=&quot;train&quot;,
    accum_iter=1,
    train_state=TrainState(),
):
    &quot;&quot;&quot;Train a single epoch&quot;&quot;&quot;
    start = time.time()
    total_tokens = 0
    total_loss = 0
    tokens = 0
    n_accum = 0
    for i, batch in enumerate(data_iter):
        out = model.forward(
            batch.src, batch.tgt, batch.src_mask, batch.tgt_mask
        )
        loss, loss_node = loss_compute(out, batch.tgt_y, batch.ntokens)
        # loss_node = loss_node / accum_iter
        if mode == &quot;train&quot; or mode == &quot;train+log&quot;:
            loss_node.backward()
            train_state.step += 1
            train_state.samples += batch.src.shape[0]
            train_state.tokens += batch.ntokens
            if i % accum_iter == 0:
                optimizer.step()
                optimizer.zero_grad(set_to_none=True)
                n_accum += 1
                train_state.accum_step += 1
            scheduler.step()

        total_loss += loss
        total_tokens += batch.ntokens
        tokens += batch.ntokens
        if i % 40 == 1 and (mode == &quot;train&quot; or mode == &quot;train+log&quot;):
            lr = optimizer.param_groups[0][&quot;lr&quot;]
            elapsed = time.time() - start
            print(
                (
                    &quot;Epoch Step: %6d | Accumulation Step: %3d | Loss: %6.2f &quot;
                    + &quot;| Tokens / Sec: %7.1f | Learning Rate: %6.1e&quot;
                )
                % (i, n_accum, loss / batch.ntokens, tokens / elapsed, lr)
            )
            start = time.time()
            tokens = 0
        del loss
        del loss_node
    return total_loss / total_tokens, train_state
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Training Data and Batching&lt;/h2&gt;
&lt;p&gt;We trained on the standard WMT 2014 English-German dataset
consisting of about 4.5 million sentence pairs. Sentences were
encoded using byte-pair encoding, which has a shared source-target
vocabulary of about 37000 tokens. For English-French, we used the
significantly larger WMT 2014 English-French dataset consisting of
36M sentences and split tokens into a 32000 word-piece vocabulary.&lt;/p&gt;
&lt;p&gt;Sentence pairs were batched together by approximate sequence length.
Each training batch contained a set of sentence pairs containing
approximately 25000 source tokens and 25000 target tokens.&lt;/p&gt;
&lt;h2&gt;Hardware and Schedule&lt;/h2&gt;
&lt;p&gt;We trained our models on one machine with 8 NVIDIA P100 GPUs. For
our base models using the hyperparameters described throughout the
paper, each training step took about 0.4 seconds. We trained the
base models for a total of 100,000 steps or 12 hours. For our big
models, step time was 1.0 seconds. The big models were trained for
300,000 steps (3.5 days).&lt;/p&gt;
&lt;h2&gt;Optimizer&lt;/h2&gt;
&lt;p&gt;We used the Adam optimizer &lt;a href=&quot;https://arxiv.org/abs/1412.6980&quot;&gt;(cite)&lt;/a&gt;
with $\beta_1=0.9$, $\beta_2=0.98$ and $\epsilon=10^{-9}$. We
varied the learning rate over the course of training, according to
the formula:&lt;/p&gt;
&lt;p&gt;$$
lrate = d_{\text{model}}^{-0.5} \cdot
\min({step_num}^{-0.5},
{step_num} \cdot {warmup_steps}^{-1.5})
$$&lt;/p&gt;
&lt;p&gt;This corresponds to increasing the learning rate linearly for the
first $warmup_steps$ training steps, and decreasing it thereafter
proportionally to the inverse square root of the step number. We
used $warmup_steps=4000$.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note: This part is very important. Need to train with this setup
of the model.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;Example of the curves of this model for different model sizes and
for optimization hyperparameters.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def rate(step, model_size, factor, warmup):
    &quot;&quot;&quot;
    we have to default the step to 1 for LambdaLR function
    to avoid zero raising to negative power.
    &quot;&quot;&quot;
    if step == 0:
        step = 1
    return factor * (
        model_size ** (-0.5) * min(step ** (-0.5), step * warmup ** (-1.5))
    )
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def example_learning_schedule():
    opts = [
        [512, 1, 4000],  # example 1
        [512, 1, 8000],  # example 2
        [256, 1, 4000],  # example 3
    ]

    dummy_model = torch.nn.Linear(1, 1)
    learning_rates = []

    # we have 3 examples in opts list.
    for idx, example in enumerate(opts):
        # run 20000 epoch for each example
        optimizer = torch.optim.Adam(
            dummy_model.parameters(), lr=1, betas=(0.9, 0.98), eps=1e-9
        )
        lr_scheduler = LambdaLR(
            optimizer=optimizer, lr_lambda=lambda step: rate(step, *example)
        )
        tmp = []
        # take 20K dummy training steps, save the learning rate at each step
        for step in range(20000):
            tmp.append(optimizer.param_groups[0][&quot;lr&quot;])
            optimizer.step()
            lr_scheduler.step()
        learning_rates.append(tmp)

    learning_rates = torch.tensor(learning_rates)

    # Enable altair to handle more than 5000 rows
    alt.data_transformers.disable_max_rows()

    opts_data = pd.concat(
        [
            pd.DataFrame(
                {
                    &quot;Learning Rate&quot;: learning_rates[warmup_idx, :],
                    &quot;model_size:warmup&quot;: [&quot;512:4000&quot;, &quot;512:8000&quot;, &quot;256:4000&quot;][
                        warmup_idx
                    ],
                    &quot;step&quot;: range(20000),
                }
            )
            for warmup_idx in [0, 1, 2]
        ]
    )

    return (
        alt.Chart(opts_data)
        .mark_line()
        .properties(width=600)
        .encode(x=&quot;step&quot;, y=&quot;Learning Rate&quot;, color=&quot;model_size:warmup:N&quot;)
        .interactive()
    )


example_learning_schedule()
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Regularization&lt;/h2&gt;
&lt;h3&gt;Label Smoothing&lt;/h3&gt;
&lt;p&gt;During training, we employed label smoothing of value
$\epsilon_{ls}=0.1$ &lt;a href=&quot;https://arxiv.org/abs/1512.00567&quot;&gt;(cite)&lt;/a&gt;.
This hurts perplexity, as the model learns to be more unsure, but
improves accuracy and BLEU score.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;We implement label smoothing using the KL div loss. Instead of
using a one-hot target distribution, we create a distribution that
has &lt;code&gt;confidence&lt;/code&gt; of the correct word and the rest of the
&lt;code&gt;smoothing&lt;/code&gt; mass distributed throughout the vocabulary.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class LabelSmoothing(nn.Module):
    &quot;Implement label smoothing.&quot;

    def __init__(self, size, padding_idx, smoothing=0.0):
        super(LabelSmoothing, self).__init__()
        self.criterion = nn.KLDivLoss(reduction=&quot;sum&quot;)
        self.padding_idx = padding_idx
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing
        self.size = size
        self.true_dist = None

    def forward(self, x, target):
        assert x.size(1) == self.size
        true_dist = x.data.clone()
        true_dist.fill_(self.smoothing / (self.size - 2))
        true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)
        true_dist[:, self.padding_idx] = 0
        mask = torch.nonzero(target.data == self.padding_idx)
        if mask.dim() &gt; 0:
            true_dist.index_fill_(0, mask.squeeze(), 0.0)
        self.true_dist = true_dist
        return self.criterion(x, true_dist.clone().detach())
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;Here we can see an example of how the mass is distributed to the
words based on confidence.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;# Example of label smoothing.


def example_label_smoothing():
    crit = LabelSmoothing(5, 0, 0.4)
    predict = torch.FloatTensor(
        [
            [0, 0.2, 0.7, 0.1, 0],
            [0, 0.2, 0.7, 0.1, 0],
            [0, 0.2, 0.7, 0.1, 0],
            [0, 0.2, 0.7, 0.1, 0],
            [0, 0.2, 0.7, 0.1, 0],
        ]
    )
    crit(x=predict.log(), target=torch.LongTensor([2, 1, 0, 3, 3]))
    LS_data = pd.concat(
        [
            pd.DataFrame(
                {
                    &quot;target distribution&quot;: crit.true_dist[x, y].flatten(),
                    &quot;columns&quot;: y,
                    &quot;rows&quot;: x,
                }
            )
            for y in range(5)
            for x in range(5)
        ]
    )

    return (
        alt.Chart(LS_data)
        .mark_rect(color=&quot;Blue&quot;, opacity=1)
        .properties(height=200, width=200)
        .encode(
            alt.X(&quot;columns:O&quot;, title=None),
            alt.Y(&quot;rows:O&quot;, title=None),
            alt.Color(
                &quot;target distribution:Q&quot;, scale=alt.Scale(scheme=&quot;viridis&quot;)
            ),
        )
        .interactive()
    )


show_example(example_label_smoothing)
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;Label smoothing actually starts to penalize the model if it gets
very confident about a given choice.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;

def loss(x, crit):
    d = x + 3 * 1
    predict = torch.FloatTensor([[0, x / d, 1 / d, 1 / d, 1 / d]])
    return crit(predict.log(), torch.LongTensor([1])).data


def penalization_visualization():
    crit = LabelSmoothing(5, 0, 0.1)
    loss_data = pd.DataFrame(
        {
            &quot;Loss&quot;: [loss(x, crit) for x in range(1, 100)],
            &quot;Steps&quot;: list(range(99)),
        }
    ).astype(&quot;float&quot;)

    return (
        alt.Chart(loss_data)
        .mark_line()
        .properties(width=350)
        .encode(
            x=&quot;Steps&quot;,
            y=&quot;Loss&quot;,
        )
        .interactive()
    )


show_example(penalization_visualization)
&lt;/code&gt;&lt;/pre&gt;
&lt;h1&gt;A First Example&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;We can begin by trying out a simple copy-task. Given a random set
of input symbols from a small vocabulary, the goal is to generate
back those same symbols.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Synthetic Data&lt;/h2&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def data_gen(V, batch_size, nbatches):
    &quot;Generate random data for a src-tgt copy task.&quot;
    for i in range(nbatches):
        data = torch.randint(1, V, size=(batch_size, 10))
        data[:, 0] = 1
        src = data.requires_grad_(False).clone().detach()
        tgt = data.requires_grad_(False).clone().detach()
        yield Batch(src, tgt, 0)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Loss Computation&lt;/h2&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class SimpleLossCompute:
    &quot;A simple loss compute and train function.&quot;

    def __init__(self, generator, criterion):
        self.generator = generator
        self.criterion = criterion

    def __call__(self, x, y, norm):
        x = self.generator(x)
        sloss = (
            self.criterion(
                x.contiguous().view(-1, x.size(-1)), y.contiguous().view(-1)
            )
            / norm
        )
        return sloss.data * norm, sloss
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Greedy Decoding&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;This code predicts a translation using greedy decoding for simplicity.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def greedy_decode(model, src, src_mask, max_len, start_symbol):
    memory = model.encode(src, src_mask)
    ys = torch.zeros(1, 1).fill_(start_symbol).type_as(src.data)
    for i in range(max_len - 1):
        out = model.decode(
            memory, src_mask, ys, subsequent_mask(ys.size(1)).type_as(src.data)
        )
        prob = model.generator(out[:, -1])
        _, next_word = torch.max(prob, dim=1)
        next_word = next_word.data[0]
        ys = torch.cat(
            [ys, torch.zeros(1, 1).type_as(src.data).fill_(next_word)], dim=1
        )
    return ys
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;# Train the simple copy task.


def example_simple_model():
    V = 11
    criterion = LabelSmoothing(size=V, padding_idx=0, smoothing=0.0)
    model = make_model(V, V, N=2)

    optimizer = torch.optim.Adam(
        model.parameters(), lr=0.5, betas=(0.9, 0.98), eps=1e-9
    )
    lr_scheduler = LambdaLR(
        optimizer=optimizer,
        lr_lambda=lambda step: rate(
            step, model_size=model.src_embed[0].d_model, factor=1.0, warmup=400
        ),
    )

    batch_size = 80
    for epoch in range(20):
        model.train()
        run_epoch(
            data_gen(V, batch_size, 20),
            model,
            SimpleLossCompute(model.generator, criterion),
            optimizer,
            lr_scheduler,
            mode=&quot;train&quot;,
        )
        model.eval()
        run_epoch(
            data_gen(V, batch_size, 5),
            model,
            SimpleLossCompute(model.generator, criterion),
            DummyOptimizer(),
            DummyScheduler(),
            mode=&quot;eval&quot;,
        )[0]

    model.eval()
    src = torch.LongTensor([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
    max_len = src.shape[1]
    src_mask = torch.ones(1, 1, max_len)
    print(greedy_decode(model, src, src_mask, max_len=max_len, start_symbol=0))


# execute_example(example_simple_model)
&lt;/code&gt;&lt;/pre&gt;
&lt;h1&gt;Part 3: A Real World Example&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;Now we consider a real-world example using the Multi30k
German-English Translation task. This task is much smaller than
the WMT task considered in the paper, but it illustrates the whole
system. We also show how to use multi-gpu processing to make it
really fast.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Data Loading&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;We will load the dataset using torchtext and spacy for
tokenization.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;# Load spacy tokenizer models, download them if they haven&apos;t been
# downloaded already


def load_tokenizers():

    try:
        spacy_de = spacy.load(&quot;de_core_news_sm&quot;)
    except IOError:
        os.system(&quot;python -m spacy download de_core_news_sm&quot;)
        spacy_de = spacy.load(&quot;de_core_news_sm&quot;)

    try:
        spacy_en = spacy.load(&quot;en_core_web_sm&quot;)
    except IOError:
        os.system(&quot;python -m spacy download en_core_web_sm&quot;)
        spacy_en = spacy.load(&quot;en_core_web_sm&quot;)

    return spacy_de, spacy_en
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def tokenize(text, tokenizer):
    return [tok.text for tok in tokenizer.tokenizer(text)]


def yield_tokens(data_iter, tokenizer, index):
    for from_to_tuple in data_iter:
        yield tokenizer(from_to_tuple[index])
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;

def build_vocabulary(spacy_de, spacy_en):
    def tokenize_de(text):
        return tokenize(text, spacy_de)

    def tokenize_en(text):
        return tokenize(text, spacy_en)

    print(&quot;Building German Vocabulary ...&quot;)
    train, val, test = datasets.Multi30k(language_pair=(&quot;de&quot;, &quot;en&quot;))
    vocab_src = build_vocab_from_iterator(
        yield_tokens(train + val + test, tokenize_de, index=0),
        min_freq=2,
        specials=[&quot;&amp;#x3C;s&gt;&quot;, &quot;&amp;#x3C;/s&gt;&quot;, &quot;&amp;#x3C;blank&gt;&quot;, &quot;&amp;#x3C;unk&gt;&quot;],
    )

    print(&quot;Building English Vocabulary ...&quot;)
    train, val, test = datasets.Multi30k(language_pair=(&quot;de&quot;, &quot;en&quot;))
    vocab_tgt = build_vocab_from_iterator(
        yield_tokens(train + val + test, tokenize_en, index=1),
        min_freq=2,
        specials=[&quot;&amp;#x3C;s&gt;&quot;, &quot;&amp;#x3C;/s&gt;&quot;, &quot;&amp;#x3C;blank&gt;&quot;, &quot;&amp;#x3C;unk&gt;&quot;],
    )

    vocab_src.set_default_index(vocab_src[&quot;&amp;#x3C;unk&gt;&quot;])
    vocab_tgt.set_default_index(vocab_tgt[&quot;&amp;#x3C;unk&gt;&quot;])

    return vocab_src, vocab_tgt


def load_vocab(spacy_de, spacy_en):
    if not exists(&quot;vocab.pt&quot;):
        vocab_src, vocab_tgt = build_vocabulary(spacy_de, spacy_en)
        torch.save((vocab_src, vocab_tgt), &quot;vocab.pt&quot;)
    else:
        vocab_src, vocab_tgt = torch.load(&quot;vocab.pt&quot;)
    print(&quot;Finished.\nVocabulary sizes:&quot;)
    print(len(vocab_src))
    print(len(vocab_tgt))
    return vocab_src, vocab_tgt


if is_interactive_notebook():
    # global variables used later in the script
    spacy_de, spacy_en = show_example(load_tokenizers)
    vocab_src, vocab_tgt = show_example(load_vocab, args=[spacy_de, spacy_en])
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;Finished.
Vocabulary sizes:
59981
36745
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;Batching matters a ton for speed. We want to have very evenly
divided batches, with absolutely minimal padding. To do this we
have to hack a bit around the default torchtext batching. This
code patches their default batching to make sure we search over
enough sentences to find tight batches.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;Iterators&lt;/h2&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def collate_batch(
    batch,
    src_pipeline,
    tgt_pipeline,
    src_vocab,
    tgt_vocab,
    device,
    max_padding=128,
    pad_id=2,
):
    bs_id = torch.tensor([0], device=device)  # &amp;#x3C;s&gt; token id
    eos_id = torch.tensor([1], device=device)  # &amp;#x3C;/s&gt; token id
    src_list, tgt_list = [], []
    for (_src, _tgt) in batch:
        processed_src = torch.cat(
            [
                bs_id,
                torch.tensor(
                    src_vocab(src_pipeline(_src)),
                    dtype=torch.int64,
                    device=device,
                ),
                eos_id,
            ],
            0,
        )
        processed_tgt = torch.cat(
            [
                bs_id,
                torch.tensor(
                    tgt_vocab(tgt_pipeline(_tgt)),
                    dtype=torch.int64,
                    device=device,
                ),
                eos_id,
            ],
            0,
        )
        src_list.append(
            # warning - overwrites values for negative values of padding - len
            pad(
                processed_src,
                (
                    0,
                    max_padding - len(processed_src),
                ),
                value=pad_id,
            )
        )
        tgt_list.append(
            pad(
                processed_tgt,
                (0, max_padding - len(processed_tgt)),
                value=pad_id,
            )
        )

    src = torch.stack(src_list)
    tgt = torch.stack(tgt_list)
    return (src, tgt)
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def create_dataloaders(
    device,
    vocab_src,
    vocab_tgt,
    spacy_de,
    spacy_en,
    batch_size=12000,
    max_padding=128,
    is_distributed=True,
):
    # def create_dataloaders(batch_size=12000):
    def tokenize_de(text):
        return tokenize(text, spacy_de)

    def tokenize_en(text):
        return tokenize(text, spacy_en)

    def collate_fn(batch):
        return collate_batch(
            batch,
            tokenize_de,
            tokenize_en,
            vocab_src,
            vocab_tgt,
            device,
            max_padding=max_padding,
            pad_id=vocab_src.get_stoi()[&quot;&amp;#x3C;blank&gt;&quot;],
        )

    train_iter, valid_iter, test_iter = datasets.Multi30k(
        language_pair=(&quot;de&quot;, &quot;en&quot;)
    )

    train_iter_map = to_map_style_dataset(
        train_iter
    )  # DistributedSampler needs a dataset len()
    train_sampler = (
        DistributedSampler(train_iter_map) if is_distributed else None
    )
    valid_iter_map = to_map_style_dataset(valid_iter)
    valid_sampler = (
        DistributedSampler(valid_iter_map) if is_distributed else None
    )

    train_dataloader = DataLoader(
        train_iter_map,
        batch_size=batch_size,
        shuffle=(train_sampler is None),
        sampler=train_sampler,
        collate_fn=collate_fn,
    )
    valid_dataloader = DataLoader(
        valid_iter_map,
        batch_size=batch_size,
        shuffle=(valid_sampler is None),
        sampler=valid_sampler,
        collate_fn=collate_fn,
    )
    return train_dataloader, valid_dataloader
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Training the System&lt;/h2&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def train_worker(
    gpu,
    ngpus_per_node,
    vocab_src,
    vocab_tgt,
    spacy_de,
    spacy_en,
    config,
    is_distributed=False,
):
    print(f&quot;Train worker process using GPU: {gpu} for training&quot;, flush=True)
    torch.cuda.set_device(gpu)

    pad_idx = vocab_tgt[&quot;&amp;#x3C;blank&gt;&quot;]
    d_model = 512
    model = make_model(len(vocab_src), len(vocab_tgt), N=6)
    model.cuda(gpu)
    module = model
    is_main_process = True
    if is_distributed:
        dist.init_process_group(
            &quot;nccl&quot;, init_method=&quot;env://&quot;, rank=gpu, world_size=ngpus_per_node
        )
        model = DDP(model, device_ids=[gpu])
        module = model.module
        is_main_process = gpu == 0

    criterion = LabelSmoothing(
        size=len(vocab_tgt), padding_idx=pad_idx, smoothing=0.1
    )
    criterion.cuda(gpu)

    train_dataloader, valid_dataloader = create_dataloaders(
        gpu,
        vocab_src,
        vocab_tgt,
        spacy_de,
        spacy_en,
        batch_size=config[&quot;batch_size&quot;] // ngpus_per_node,
        max_padding=config[&quot;max_padding&quot;],
        is_distributed=is_distributed,
    )

    optimizer = torch.optim.Adam(
        model.parameters(), lr=config[&quot;base_lr&quot;], betas=(0.9, 0.98), eps=1e-9
    )
    lr_scheduler = LambdaLR(
        optimizer=optimizer,
        lr_lambda=lambda step: rate(
            step, d_model, factor=1, warmup=config[&quot;warmup&quot;]
        ),
    )
    train_state = TrainState()

    for epoch in range(config[&quot;num_epochs&quot;]):
        if is_distributed:
            train_dataloader.sampler.set_epoch(epoch)
            valid_dataloader.sampler.set_epoch(epoch)

        model.train()
        print(f&quot;[GPU{gpu}] Epoch {epoch} Training ====&quot;, flush=True)
        _, train_state = run_epoch(
            (Batch(b[0], b[1], pad_idx) for b in train_dataloader),
            model,
            SimpleLossCompute(module.generator, criterion),
            optimizer,
            lr_scheduler,
            mode=&quot;train+log&quot;,
            accum_iter=config[&quot;accum_iter&quot;],
            train_state=train_state,
        )

        GPUtil.showUtilization()
        if is_main_process:
            file_path = &quot;%s%.2d.pt&quot; % (config[&quot;file_prefix&quot;], epoch)
            torch.save(module.state_dict(), file_path)
        torch.cuda.empty_cache()

        print(f&quot;[GPU{gpu}] Epoch {epoch} Validation ====&quot;, flush=True)
        model.eval()
        sloss = run_epoch(
            (Batch(b[0], b[1], pad_idx) for b in valid_dataloader),
            model,
            SimpleLossCompute(module.generator, criterion),
            DummyOptimizer(),
            DummyScheduler(),
            mode=&quot;eval&quot;,
        )
        print(sloss)
        torch.cuda.empty_cache()

    if is_main_process:
        file_path = &quot;%sfinal.pt&quot; % config[&quot;file_prefix&quot;]
        torch.save(module.state_dict(), file_path)
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def train_distributed_model(vocab_src, vocab_tgt, spacy_de, spacy_en, config):
    from the_annotated_transformer import train_worker

    ngpus = torch.cuda.device_count()
    os.environ[&quot;MASTER_ADDR&quot;] = &quot;localhost&quot;
    os.environ[&quot;MASTER_PORT&quot;] = &quot;12356&quot;
    print(f&quot;Number of GPUs detected: {ngpus}&quot;)
    print(&quot;Spawning training processes ...&quot;)
    mp.spawn(
        train_worker,
        nprocs=ngpus,
        args=(ngpus, vocab_src, vocab_tgt, spacy_de, spacy_en, config, True),
    )


def train_model(vocab_src, vocab_tgt, spacy_de, spacy_en, config):
    if config[&quot;distributed&quot;]:
        train_distributed_model(
            vocab_src, vocab_tgt, spacy_de, spacy_en, config
        )
    else:
        train_worker(
            0, 1, vocab_src, vocab_tgt, spacy_de, spacy_en, config, False
        )


def load_trained_model():
    config = {
        &quot;batch_size&quot;: 32,
        &quot;distributed&quot;: False,
        &quot;num_epochs&quot;: 8,
        &quot;accum_iter&quot;: 10,
        &quot;base_lr&quot;: 1.0,
        &quot;max_padding&quot;: 72,
        &quot;warmup&quot;: 3000,
        &quot;file_prefix&quot;: &quot;multi30k_model_&quot;,
    }
    model_path = &quot;multi30k_model_final.pt&quot;
    if not exists(model_path):
        train_model(vocab_src, vocab_tgt, spacy_de, spacy_en, config)

    model = make_model(len(vocab_src), len(vocab_tgt), N=6)
    model.load_state_dict(torch.load(&quot;multi30k_model_final.pt&quot;))
    return model


if is_interactive_notebook():
    model = load_trained_model()
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;p&gt;Once trained we can decode the model to produce a set of
translations. Here we simply translate the first sentence in the
validation set. This dataset is pretty small so the translations
with greedy search are reasonably accurate.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h1&gt;Additional Components: BPE, Search, Averaging&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;So this mostly covers the transformer model itself. There are four
aspects that we didn&apos;t cover explicitly. We also have all these
additional features implemented in
&lt;a href=&quot;https://github.com/opennmt/opennmt-py&quot;&gt;OpenNMT-py&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;BPE/ Word-piece: We can use a library to first preprocess the
data into subword units. See Rico Sennrich&apos;s
&lt;a href=&quot;https://github.com/rsennrich/subword-nmt&quot;&gt;subword-nmt&lt;/a&gt;
implementation. These models will transform the training data to
look like this:&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;
&lt;p&gt;▁Die ▁Protokoll datei ▁kann ▁ heimlich ▁per ▁E - Mail ▁oder ▁FTP
▁an ▁einen ▁bestimmte n ▁Empfänger ▁gesendet ▁werden .&lt;/p&gt;
&lt;blockquote&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;Shared Embeddings: When using BPE with shared vocabulary we can
share the same weight vectors between the source / target /
generator. See the &lt;a href=&quot;https://arxiv.org/abs/1608.05859&quot;&gt;(cite)&lt;/a&gt; for
details. To add this to the model simply do this:&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;if False:
    model.src_embed[0].lut.weight = model.tgt_embeddings[0].lut.weight
    model.generator.lut.weight = model.tgt_embed[0].lut.weight
&lt;/code&gt;&lt;/pre&gt;
&lt;blockquote&gt;
&lt;ol start=&quot;3&quot;&gt;
&lt;li&gt;Beam Search: This is a bit too complicated to cover here. See the
&lt;a href=&quot;https://github.com/OpenNMT/OpenNMT-py/&quot;&gt;OpenNMT-py&lt;/a&gt;
for a pytorch implementation.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;ol start=&quot;4&quot;&gt;
&lt;li&gt;Model Averaging: The paper averages the last k checkpoints to
create an ensembling effect. We can do this after the fact if we
have a bunch of models:&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def average(model, models):
    &quot;Average models into model&quot;
    for ps in zip(*[m.params() for m in [model] + models]):
        ps[0].copy_(torch.sum(*ps[1:]) / len(ps[1:]))
&lt;/code&gt;&lt;/pre&gt;
&lt;h1&gt;Results&lt;/h1&gt;
&lt;p&gt;On the WMT 2014 English-to-German translation task, the big
transformer model (Transformer (big) in Table 2) outperforms the
best previously reported models (including ensembles) by more than
2.0 BLEU, establishing a new state-of-the-art BLEU score of
28.4. The configuration of this model is listed in the bottom line
of Table 3. Training took 3.5 days on 8 P100 GPUs. Even our base
model surpasses all previously published models and ensembles, at a
fraction of the training cost of any of the competitive models.&lt;/p&gt;
&lt;p&gt;On the WMT 2014 English-to-French translation task, our big model
achieves a BLEU score of 41.0, outperforming all of the previously
published single models, at less than 1/4 the training cost of the
previous state-of-the-art model. The Transformer (big) model trained
for English-to-French used dropout rate Pdrop = 0.1, instead of 0.3.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;With the addtional extensions in the last section, the OpenNMT-py
replication gets to 26.9 on EN-DE WMT. Here I have loaded in those
parameters to our reimplemenation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;# Load data and model for output checks
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def check_outputs(
    valid_dataloader,
    model,
    vocab_src,
    vocab_tgt,
    n_examples=15,
    pad_idx=2,
    eos_string=&quot;&amp;#x3C;/s&gt;&quot;,
):
    results = [()] * n_examples
    for idx in range(n_examples):
        print(&quot;\nExample %d ========\n&quot; % idx)
        b = next(iter(valid_dataloader))
        rb = Batch(b[0], b[1], pad_idx)
        greedy_decode(model, rb.src, rb.src_mask, 64, 0)[0]

        src_tokens = [
            vocab_src.get_itos()[x] for x in rb.src[0] if x != pad_idx
        ]
        tgt_tokens = [
            vocab_tgt.get_itos()[x] for x in rb.tgt[0] if x != pad_idx
        ]

        print(
            &quot;Source Text (Input)        : &quot;
            + &quot; &quot;.join(src_tokens).replace(&quot;\n&quot;, &quot;&quot;)
        )
        print(
            &quot;Target Text (Ground Truth) : &quot;
            + &quot; &quot;.join(tgt_tokens).replace(&quot;\n&quot;, &quot;&quot;)
        )
        model_out = greedy_decode(model, rb.src, rb.src_mask, 72, 0)[0]
        model_txt = (
            &quot; &quot;.join(
                [vocab_tgt.get_itos()[x] for x in model_out if x != pad_idx]
            ).split(eos_string, 1)[0]
            + eos_string
        )
        print(&quot;Model Output               : &quot; + model_txt.replace(&quot;\n&quot;, &quot;&quot;))
        results[idx] = (rb, src_tokens, tgt_tokens, model_out, model_txt)
    return results


def run_model_example(n_examples=5):
    global vocab_src, vocab_tgt, spacy_de, spacy_en

    print(&quot;Preparing Data ...&quot;)
    _, valid_dataloader = create_dataloaders(
        torch.device(&quot;cpu&quot;),
        vocab_src,
        vocab_tgt,
        spacy_de,
        spacy_en,
        batch_size=1,
        is_distributed=False,
    )

    print(&quot;Loading Trained Model ...&quot;)

    model = make_model(len(vocab_src), len(vocab_tgt), N=6)
    model.load_state_dict(
        torch.load(&quot;multi30k_model_final.pt&quot;, map_location=torch.device(&quot;cpu&quot;))
    )

    print(&quot;Checking Model Outputs:&quot;)
    example_data = check_outputs(
        valid_dataloader, model, vocab_src, vocab_tgt, n_examples=n_examples
    )
    return model, example_data


# execute_example(run_model_example)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Attention Visualization&lt;/h2&gt;
&lt;blockquote&gt;
&lt;p&gt;Even with a greedy decoder the translation looks pretty good. We
can further visualize it to see what is happening at each layer of
the attention&lt;/p&gt;
&lt;/blockquote&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def mtx2df(m, max_row, max_col, row_tokens, col_tokens):
    &quot;convert a dense matrix to a data frame with row and column indices&quot;
    return pd.DataFrame(
        [
            (
                r,
                c,
                float(m[r, c]),
                &quot;%.3d %s&quot;
                % (r, row_tokens[r] if len(row_tokens) &gt; r else &quot;&amp;#x3C;blank&gt;&quot;),
                &quot;%.3d %s&quot;
                % (c, col_tokens[c] if len(col_tokens) &gt; c else &quot;&amp;#x3C;blank&gt;&quot;),
            )
            for r in range(m.shape[0])
            for c in range(m.shape[1])
            if r &amp;#x3C; max_row and c &amp;#x3C; max_col
        ],
        # if float(m[r,c]) != 0 and r &amp;#x3C; max_row and c &amp;#x3C; max_col],
        columns=[&quot;row&quot;, &quot;column&quot;, &quot;value&quot;, &quot;row_token&quot;, &quot;col_token&quot;],
    )


def attn_map(attn, layer, head, row_tokens, col_tokens, max_dim=30):
    df = mtx2df(
        attn[0, head].data,
        max_dim,
        max_dim,
        row_tokens,
        col_tokens,
    )
    return (
        alt.Chart(data=df)
        .mark_rect()
        .encode(
            x=alt.X(&quot;col_token&quot;, axis=alt.Axis(title=&quot;&quot;)),
            y=alt.Y(&quot;row_token&quot;, axis=alt.Axis(title=&quot;&quot;)),
            color=&quot;value&quot;,
            tooltip=[&quot;row&quot;, &quot;column&quot;, &quot;value&quot;, &quot;row_token&quot;, &quot;col_token&quot;],
        )
        .properties(height=400, width=400)
        .interactive()
    )
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def get_encoder(model, layer):
    return model.encoder.layers[layer].self_attn.attn


def get_decoder_self(model, layer):
    return model.decoder.layers[layer].self_attn.attn


def get_decoder_src(model, layer):
    return model.decoder.layers[layer].src_attn.attn


def visualize_layer(model, layer, getter_fn, ntokens, row_tokens, col_tokens):
    # ntokens = last_example[0].ntokens
    attn = getter_fn(model, layer)
    n_heads = attn.shape[1]
    charts = [
        attn_map(
            attn,
            0,
            h,
            row_tokens=row_tokens,
            col_tokens=col_tokens,
            max_dim=ntokens,
        )
        for h in range(n_heads)
    ]
    assert n_heads == 8
    return alt.vconcat(
        charts[0]
        # | charts[1]
        | charts[2]
        # | charts[3]
        | charts[4]
        # | charts[5]
        | charts[6]
        # | charts[7]
        # layer + 1 due to 0-indexing
    ).properties(title=&quot;Layer %d&quot; % (layer + 1))
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Encoder Self Attention&lt;/h2&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def viz_encoder_self():
    model, example_data = run_model_example(n_examples=1)
    example = example_data[
        len(example_data) - 1
    ]  # batch object for the final example

    layer_viz = [
        visualize_layer(
            model, layer, get_encoder, len(example[1]), example[1], example[1]
        )
        for layer in range(6)
    ]
    return alt.hconcat(
        layer_viz[0]
        # &amp;#x26; layer_viz[1]
        &amp;#x26; layer_viz[2]
        # &amp;#x26; layer_viz[3]
        &amp;#x26; layer_viz[4]
        # &amp;#x26; layer_viz[5]
    )


show_example(viz_encoder_self)
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;Preparing Data ...


Loading Trained Model ...


Checking Model Outputs:

Example 0 ========



Source Text (Input)        : &amp;#x3C;s&gt; Mehrere Kinder heben die Hände , während sie auf einem bunten Teppich in einem Klassenzimmer sitzen . &amp;#x3C;/s&gt;
Target Text (Ground Truth) : &amp;#x3C;s&gt; Several children are raising their hands while sitting on a colorful rug in a classroom . &amp;#x3C;/s&gt;


Model Output               : &amp;#x3C;s&gt; A group of children are in their hands while sitting on a colorful carpet . &amp;#x3C;/s&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Decoder Self Attention&lt;/h2&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def viz_decoder_self():
    model, example_data = run_model_example(n_examples=1)
    example = example_data[len(example_data) - 1]

    layer_viz = [
        visualize_layer(
            model,
            layer,
            get_decoder_self,
            len(example[1]),
            example[1],
            example[1],
        )
        for layer in range(6)
    ]
    return alt.hconcat(
        layer_viz[0]
        &amp;#x26; layer_viz[1]
        &amp;#x26; layer_viz[2]
        &amp;#x26; layer_viz[3]
        &amp;#x26; layer_viz[4]
        &amp;#x26; layer_viz[5]
    )


show_example(viz_decoder_self)
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;Preparing Data ...


Loading Trained Model ...


Checking Model Outputs:

Example 0 ========



Source Text (Input)        : &amp;#x3C;s&gt; Drei Menschen wandern auf einem stark verschneiten Weg . &amp;#x3C;/s&gt;
Target Text (Ground Truth) : &amp;#x3C;s&gt; A &amp;#x3C;unk&gt; of people are hiking throughout a heavily snowed path . &amp;#x3C;/s&gt;


Model Output               : &amp;#x3C;s&gt; Three people hiking on a busy &amp;#x3C;unk&gt; . &amp;#x3C;/s&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;Decoder Src Attention&lt;/h2&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def viz_decoder_src():
    model, example_data = run_model_example(n_examples=1)
    example = example_data[len(example_data) - 1]

    layer_viz = [
        visualize_layer(
            model,
            layer,
            get_decoder_src,
            max(len(example[1]), len(example[2])),
            example[1],
            example[2],
        )
        for layer in range(6)
    ]
    return alt.hconcat(
        layer_viz[0]
        &amp;#x26; layer_viz[1]
        &amp;#x26; layer_viz[2]
        &amp;#x26; layer_viz[3]
        &amp;#x26; layer_viz[4]
        &amp;#x26; layer_viz[5]
    )


show_example(viz_decoder_src)
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;Preparing Data ...


Loading Trained Model ...


Checking Model Outputs:

Example 0 ========



Source Text (Input)        : &amp;#x3C;s&gt; Baby sieht sich die Blätter am Zweig eines Baumes an . &amp;#x3C;/s&gt;
Target Text (Ground Truth) : &amp;#x3C;s&gt; Baby looking at the leaves on a branch of a tree . &amp;#x3C;/s&gt;


Model Output               : &amp;#x3C;s&gt; A baby is looking at the leaves at a tree . &amp;#x3C;/s&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h1&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;Hopefully this code is useful for future research. Please reach
out if you have any issues.&lt;/p&gt;
&lt;p&gt;Cheers,
Sasha Rush, Austin Huang, Suraj Subramanian, Jonathan Sum, Khalid Almubarak,
Stella Biderman&lt;/p&gt;</content:encoded><h:img src="/_astro/aiayn.M7sRrIDc.png"/><enclosure url="/_astro/aiayn.M7sRrIDc.png"/></item><item><title>Flow Matching and Diffusion Models</title><link>https://pengwee.wang/blog/flow-matching-and-diffusion-models</link><guid isPermaLink="true">https://pengwee.wang/blog/flow-matching-and-diffusion-models</guid><description>Flow Matching and Diffusion Models 的介绍与对比</description><pubDate>Thu, 21 Aug 2025 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;Introduction&lt;/h2&gt;
&lt;p&gt;生成对象（Object）：对图像，视频，蛋白质等数据类型可视为向量，即 $z \in \mathbb{R}^d$&lt;/p&gt;
&lt;p&gt;生成（Generation）：从数据分布中采样，$z \sim p_{data}$&lt;/p&gt;
&lt;p&gt;数据集（Dataset）：服从数据分布的有限样本，$z_1, ...,z_N \sim p_{data}$&lt;/p&gt;
&lt;p&gt;条件生成（Conditional Generation）：从条件分布中采样，$z \sim p_{data}(\cdot \mid y)$&lt;/p&gt;
&lt;p&gt;目标：训练生成模型，将初始分布（$p_{\text{init}}$）的样本转化为数据分布样本$p_{\text{data}}$&lt;/p&gt;
&lt;h2&gt;Flow and Diffusion Models&lt;/h2&gt;
&lt;p&gt;通过模拟常微分方程（Ordinary Differential Equations, ODEs）和随机微分方程（Stochastic Differential Equations, SDEs）可以实现从初始分布到数据分布的转换，分别对应Flow Model和Diffusion Model&lt;/p&gt;
&lt;h3&gt;Flow Models&lt;/h3&gt;
&lt;p&gt;Flow Model可以由ODE来描述，即&lt;/p&gt;
&lt;p&gt;$$
X_0 \sim p_{\text{init}} \quad \triangleright \text{random init}\
\frac{d}{dt}X_t=u_t^\theta(X_t) \quad \triangleright \text{ODE} \
\text{Goal: } X_1 \sim p_{\text{data}} \Leftrightarrow \psi_{1}^{\theta}(X_0) \sim p_{\text{data}}
$$&lt;/p&gt;
&lt;p&gt;其中向量场 $u_t^\theta: \mathbb{R}^d\times[0,1] \rightarrow \mathbb{R}^d$ 为神经网络，$\theta$为参数。$\psi^\theta_t$描述了由$u_t^\theta$引起的Flow，为ODE方程解（Trajectory）的集合&lt;/p&gt;
&lt;p&gt;通过使用Euler算法，可以模拟ODE计算出Flow，实现从Flow Model中采样&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/image-20250820180506862.B1sdxzj-_1L19zg.webp&quot; alt=&quot;image-20250820180506862&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Diffusion Models&lt;/h3&gt;
&lt;p&gt;Diffusion Model可以由SDEs描述，如下所示（由于其随机性SDEs不使用微分表示形式）&lt;/p&gt;
&lt;p&gt;$$
dX_t = u_t^\theta(X_t)dt +\sigma_tdW_t \quad \triangleright \text{SDE} \
X_0 \sim p_{init} \quad \triangleright \text{random initialization} \
\text{Goal: } X_1 \sim p_{\text{data}}
$$&lt;/p&gt;
&lt;p&gt;其中 $\sigma_t \geq 0$为diffusion系数，$W_t$为随机过程 布朗运动（Brownian motion）&lt;/p&gt;
&lt;p&gt;可以看出Diffusion Model是Flow Model的一个拓展，当$\sigma_t = 0$时即为Flow Model&lt;/p&gt;
&lt;p&gt;同样的，可以使用以下算法实现从Diffusion Model中采样&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/image-20250820184041093.B27Fy-5l_2ghwU7.webp&quot; alt=&quot;image-20250820184041093&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Training Target and Train Loss&lt;/h2&gt;
&lt;p&gt;对于Flow Model和Diffusion Model&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
X_0 \sim p_{\text{init}},\quad dX_t &amp;#x26;= u_t^\theta(X_t) dt &amp;#x26; \text{(Flow model)} \
X_0 \sim p_{\text{init}},\quad dX_t &amp;#x26;= u_t^\theta(X_t) dt + \sigma_t dW_t &amp;#x26; \text{(Diffusion model)}
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;训练可以通过最小化以下损失实现&lt;/p&gt;
&lt;p&gt;$$
\mathcal{L}(\theta) = \left| u_t^\theta(x) - \underbrace{u_t^{\text{target}}(x)}_{\text{training target}} \right|^2
$$&lt;/p&gt;
&lt;p&gt;$u_t^\theta$ 为网络模型，$u_t^{\text{target}}(x)$为目标向量场，其实现将初始数据分布转化为目标数据分布，为了实现计算 $\mathcal{L}(\theta)$ 或者间接计算 $\mathcal{L}(\theta)$需要构建$u_t^{\text{target}}(x)$。&lt;/p&gt;
&lt;h3&gt;Probability Path&lt;/h3&gt;
&lt;p&gt;Probability Path是从初始分布到目标数据分布的渐进插值（gradual interpolation），分为条件概率路径（conditional probability path）和边缘概率路径（marginal probability path），分别为$p_t(\cdot \mid z)$ 和 $p_t(\cdot)$，其中：&lt;/p&gt;
&lt;p&gt;$$
p_0(\cdot \mid z) = p_{\text{init}}, \quad p_1(\cdot \mid z) = \delta_z \quad \text{for all } z \in \mathbb{R}^d
$$&lt;/p&gt;
&lt;p&gt;$p_t(\cdot)$ 可由以下公式获得&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
&amp;#x26;z \sim p_{\text{data}},\ x \sim p_t(\cdot \mid z) \implies x \sim p_t &amp;#x26;\triangleright \text{sampling from marginal path} \
&amp;#x26;p_t(x) = \int p_t(x \mid z) p_{\text{data}}(z)dz &amp;#x26;\triangleright \text{density of marginal path} \
&amp;#x26;p_0 = p_{\text{init}} \quad \text{and} \quad p_1 = p_{\text{data}}
&amp;#x26;\triangleright \text{noise-data interpolation} \
\end{align*}
$$&lt;/p&gt;
&lt;h3&gt;Training Target for Flow Model&lt;/h3&gt;
&lt;p&gt;对于$z \in \mathbb{R^d} \sim p_{data}$，记$u_t^{target}(\cdot \mid z)$为条件概率路径 $p_t(\cdot \mid z)$ 对应的条件向量场，即&lt;/p&gt;
&lt;p&gt;$$
X_0 \sim p_{\text{init}},\quad \frac{\mathrm{d}}{\mathrm{d}t}X_t = u_t^{\text{target}}(X_t|z) \quad \Rightarrow \quad X_t \sim p_t(\cdot|z) \quad (0 \leq t \leq 1)
$$&lt;/p&gt;
&lt;p&gt;则$u_t^{target}(x)$可定义为&lt;/p&gt;
&lt;p&gt;$$
u_t^{\text{target}}(x) = \int u_t^{\text{target}}(x|z) \frac{p_t(x|z)p_{\text{data}}(z)}{p_t(x)} ,\mathrm{d}z
$$&lt;/p&gt;
&lt;p&gt;且满足：&lt;/p&gt;
&lt;p&gt;$$
X_0 \sim p_{\text{init}},\quad \frac{\mathrm{d}}{\mathrm{d}t}X_t = u_t^{\text{target}}(X_t) \quad \Rightarrow \quad X_t \sim p_t \quad (0 \leq t \leq 1)
$$&lt;/p&gt;
&lt;p&gt;其中$X_1 \sim p_{data}$。&lt;/p&gt;
&lt;p&gt;这可以由&lt;strong&gt;Continuity Equation&lt;/strong&gt; 证明&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Continuity Equation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;对于向量场$u_t^{target}$ 且 $X_0 \sim p_{init}$，有$X_t \sim p_t$ 在$0 \leq t \leq 1$ 成立有且仅有&lt;/p&gt;
&lt;p&gt;$$
\partial_t p_t(x) = -\mathrm{div}(p_t u_t^{\text{target}})(x) \quad \text{for all } x \in \mathbb{R}^d, 0 \leq t \leq 1
$$&lt;/p&gt;
&lt;p&gt;其中$\partial_t p_t(x) = \frac{\mathrm{d}}{\mathrm{d}t} p_t(x)$，$\mathrm{div}(v_t)(x) = \sum_{i=1}^d \frac{\partial}{\partial x_i} v_t(x)$&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;Training Target for Diffusion Model&lt;/h3&gt;
&lt;p&gt;同样的，对于Diffusion Model，可以构建$u_t^{target}$如下所示，满足$X_t \sim p_t \quad (0 \leq t \leq 1)$ ，即&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
&amp;#x26;X_0 \sim p_{\text{init}}, \quad \mathrm{d}X_t = \left[ u_t^{\text{target}}(X_t) + \frac{\sigma_t^2}{2} \nabla \log p_t(X_t) \right] \mathrm{d}t + \sigma_t \mathrm{d}W_t \
&amp;#x26;\Rightarrow X_t \sim p_t \quad (0 \leq t \leq 1)
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;并且将$p_t(x), u_t^{target}$ 替换为 $p_t(x\mid z), u_t^{target}(x \mid z)$ 时仍然成立&lt;/p&gt;
&lt;p&gt;其中，$\nabla \log p_t(x)$ 称为marginal score function，$\nabla \log p_t(x \mid z)$ 称为conditional score function，二者满足&lt;/p&gt;
&lt;p&gt;$$
\nabla \log p_t(x) = \frac{\nabla p_t(x)}{p_t(x)} = \frac{\nabla \int p_t(x|z) p_{\text{data}}(z) ,\mathrm{d}z}{p_t(x)} = \frac{\int \nabla p_t(x|z) p_{\text{data}}(z) ,\mathrm{d}z}{p_t(x)} = \int \nabla \log p_t(x|z) \frac{p_t(x|z) p_{\text{data}}(z)}{p_t(x)} ,\mathrm{d}z
$$&lt;/p&gt;
&lt;p&gt;这可以由Fokker-Planck Equation证明&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Fokker-Planck Equation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;对于$X_0 \sim p_{\text{init}}, \quad \mathrm{d}X_t = u_t(X_t),\mathrm{d}t + \sigma_t,\mathrm{d}W_t$ 描述的SDE，$X_t \sim p_t$ 成立，当且仅当&lt;/p&gt;
&lt;p&gt;$$
\partial_t p_t(x) = -\mathrm{div}(p_t u_t)(x) + \frac{\sigma_t^2}{2} \Delta p_t(x) \quad \text{for all } x \in \mathbb{R}^d, 0 \leq t \leq 1
$$&lt;/p&gt;
&lt;p&gt;其中，$\Delta w_t(x) = \sum_{i=1}^d \frac{\partial^2}{\partial x_i^2} w_t(x) = \mathrm{div}(\nabla w_t)(x)$&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Remark&lt;/strong&gt; Langevin dynamics&lt;/p&gt;
&lt;/blockquote&gt;
&lt;blockquote&gt;
&lt;p&gt;当$p_t=p$时，即概率路径为静态时，有&lt;/p&gt;
&lt;p&gt;$$
\mathrm{d}X_t = \frac{\sigma_t^2}{2} \nabla \log p(X_t),\mathrm{d}t + \sigma_t,\mathrm{d}W_t
$$&lt;/p&gt;
&lt;p&gt;此时 $X_0 \sim p \quad \Rightarrow \quad X_t \sim p \quad (t \geq 0)$，即Langevin dynamics&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;Gaussian probability path&lt;/h3&gt;
&lt;p&gt;设噪声调度$\alpha_t, \beta_t$为单调连续可微函数且$\alpha_0=\beta_1=0, \alpha_1=\beta_0=1$，定义Gaussian conditional probability path为&lt;/p&gt;
&lt;p&gt;$$
p_t(\cdot|z) = \mathcal{N}(\alpha_t z, \beta_t^2 I_d)
$$&lt;/p&gt;
&lt;p&gt;其满足 $p_0(\cdot|z) = \mathcal{N}(\alpha_0 z, \beta_0^2 I_d) = \mathcal{N}(0, I_d), \quad \text{and} \quad p_1(\cdot|z) = \mathcal{N}(\alpha_1 z, \beta_1^2 I_d) = \delta_z$&lt;/p&gt;
&lt;p&gt;则从其marginal path中采样可以通过以下方法得到&lt;/p&gt;
&lt;p&gt;$$
z \sim p_{\text{data}},\ \epsilon \sim p_{\text{init}} = \mathcal{N}(0, I_d) \Rightarrow x = \alpha_t z + \beta_t \epsilon \sim p_t
$$&lt;/p&gt;
&lt;p&gt;基于Gaussian probability path的conditional Gaussian vector field可以计算得到&lt;/p&gt;
&lt;p&gt;$$
u_t^{\text{target}}(x|z) = \left( \dot{\alpha}_t - \frac{\dot{\beta}_t}{\beta_t} \alpha_t \right) z + \frac{\dot{\beta}_t}{\beta_t} x
$$&lt;/p&gt;
&lt;p&gt;其中$\dot{\alpha}_t = \partial_t \alpha_t$，$\dot{\beta}_t = \partial_t \beta_t$&lt;/p&gt;
&lt;p&gt;同样的可以得到其marginal score function为&lt;/p&gt;
&lt;p&gt;$$
\nabla \log p_t(x|z) = -\frac{x - \alpha_t z}{\beta_t^2}
$$&lt;/p&gt;
&lt;h3&gt;Flow Matching&lt;/h3&gt;
&lt;p&gt;对于Flow Model，定义flow matching loss为&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
\mathcal{L}&lt;em&gt;{\text{FM}}(\theta) &amp;#x26;= \mathbb{E}&lt;/em&gt;{t \sim \text{Unif}, x \sim p_t}[|u_t^\theta(x) - u_t^{\text{target}}(x)|^2] \
&amp;#x26;= \mathbb{E}&lt;em&gt;{t \sim \text{Unif}, z \sim p&lt;/em&gt;{\text{data}}, x \sim p_t(\cdot|z)}[|u_t^\theta(x) - u_t^{\text{target}}(x)|^2]
\end{align*}
$$&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;$z \sim p_{\text{data}},\ x \sim p_t(\cdot \mid z) \implies x \sim p_t$&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;定义conditional flow matching loss为&lt;/p&gt;
&lt;p&gt;$$
\mathcal{L}&lt;em&gt;{\text{CFM}}(\theta) = \mathbb{E}&lt;/em&gt;{t \sim \text{Unif}, z \sim p_{\text{data}}, x \sim p_t(\cdot|z)}[|u_t^\theta(x) - u_t^{\text{target}}(x|z)|^2]
$$&lt;/p&gt;
&lt;p&gt;其中$u_t^{\text{target}}(x|z)$可以人为构造获得（例如Gaussian probability path）&lt;/p&gt;
&lt;p&gt;可以证明，&lt;/p&gt;
&lt;p&gt;$$
\mathcal{L}&lt;em&gt;{\text{FM}}(\theta) = \mathcal{L}&lt;/em&gt;{\text{CFM}}(\theta) + C
$$&lt;/p&gt;
&lt;p&gt;即&lt;/p&gt;
&lt;p&gt;$$
\nabla_\theta \mathcal{L}&lt;em&gt;{\text{FM}}(\theta) = \nabla&lt;/em&gt;\theta \mathcal{L}_{\text{CFM}}(\theta)
$$&lt;/p&gt;
&lt;p&gt;因此优化$\mathcal{L}&lt;em&gt;{\text{CFM}}$即优化$\mathcal{L}&lt;/em&gt;{\text{FM}}$，而对于$\mathcal{L}_{\text{CFM}}$，只需构造probability path即可，至此可以得到训练Flow Model的算法，整个流程即称为Flow Matching&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Flow Matching for Gaussian Conditional Probability Paths&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;对于Gaussian Probability Path，有&lt;/p&gt;
&lt;p&gt;$$
\epsilon \sim \mathcal{N}(0, I_d) \quad \Rightarrow \quad x_t = \alpha_t z + \beta_t \epsilon \sim \mathcal{N}(\alpha_t z, \beta_t^2 I_d) = p_t(\cdot|z)
$$&lt;/p&gt;
&lt;p&gt;$$
u_t^{\mathrm{target}}(x|z)=\left(\dot{\alpha}_t-\frac{\dot{\beta}_t}{\beta_t}\alpha_t\right)z+\frac{\dot{\beta}_t}{\beta_t}x
$$&lt;/p&gt;
&lt;p&gt;$$
\begin{gathered}
\mathcal{L}&lt;em&gt;{\mathrm{CFM}}(\theta)=\mathbb{E}&lt;/em&gt;{t\sim\mathrm{Unif},z\sim p_{\mathrm{data}},x\sim\mathcal{N}(\alpha_{t}z,\beta_{t}^{2}I_{d})}[|u_{t}^{\theta}(x)-\left(\dot{\alpha}&lt;em&gt;{t}-\frac{\dot{\beta}&lt;/em&gt;{t}}{\beta_{t}}\alpha_{t}\right)z-\frac{\dot{\beta}&lt;em&gt;{t}}{\beta&lt;/em&gt;{t}}x|^{2}] \
\overset{(i)}{\operatorname*{=}}\mathbb{E}&lt;em&gt;{t\sim\mathrm{Unif},z\sim p&lt;/em&gt;{\mathrm{data}},\epsilon\sim\mathcal{N}(0,I_{d})}[|u_{t}^{\theta}(\alpha_{t}z+\beta_{t}\epsilon)-(\dot{\alpha}&lt;em&gt;{t}z+\dot{\beta}&lt;/em&gt;{t}\epsilon)|^{2}]
\end{gathered}
$$&lt;/p&gt;
&lt;p&gt;特别的，对于$\alpha_t=t$，$\beta_t=1-t$，有&lt;/p&gt;
&lt;p&gt;$$
p_{t}(x|z)=\mathcal{N}(tz,(1-t)^{2})
$$&lt;/p&gt;
&lt;p&gt;$$
\mathcal{L}&lt;em&gt;{\mathrm{cfm}}(\theta)=\mathbb{E}&lt;/em&gt;{t\sim\mathrm{Unif},z\sim p_{\mathrm{data}},\epsilon\sim\mathcal{N}(0,I_{d})}[|u_{t}^{\theta}(tz+(1-t)\epsilon)-(z-\epsilon)|^{2}]
$$&lt;/p&gt;
&lt;p&gt;称之为(Gaussian) &lt;strong&gt;CondOT probability path&lt;/strong&gt;，训练过程如下所示&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/image-20250821093129333.HkL89N9B_1mxqLy.webp&quot; alt=&quot;image-20250821093129333&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Score Matching&lt;/h3&gt;
&lt;p&gt;对于Diffusion Models，由于$u_t^{target}$ 难以得到，因此使用&lt;strong&gt;score network&lt;/strong&gt; $\sigma_t^2 : \mathbb{R}^d \times [0, 1] \to \mathbb{R}$对score function进行拟合，同样的，存在score matching loss和conditional score matching loss如下&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
\mathcal{L}&lt;em&gt;{\text{SM}}(\theta) &amp;#x26;= \mathbb{E}&lt;/em&gt;{t \sim \text{Unif}, z \sim p_{\text{data}}, x \sim p_t(\cdot|z)}[|s_t^\theta(x) - \nabla \log p_t(x)|^2] \quad \triangleright \text{ score matching loss} \
\mathcal{L}&lt;em&gt;{\text{CSM}}(\theta) &amp;#x26;= \mathbb{E}&lt;/em&gt;{t \sim \text{Unif}, z \sim p_{\text{data}}, x \sim p_t(\cdot|z)}[|s_t^\theta(x) - \nabla \log p_t(x|z)|^2] \quad \triangleright \text{ conditional score matching loss}
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;同样的，虽然$\nabla \log p_t(x)$未知，但$\nabla \log p_t(x \mid z)$可以人工构造，且存在&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
&amp;#x26;\mathcal{L}&lt;em&gt;{\text{SM}}(\theta) = \mathcal{L}&lt;/em&gt;{\text{SFM}}(\theta) + C \
&amp;#x26;\implies \nabla_\theta \mathcal{L}&lt;em&gt;{\text{SM}}(\theta) = \nabla&lt;/em&gt;\theta \mathcal{L}_{\text{CSM}}(\theta)
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;因此，优化$\mathcal{L}_{\text{CSM}}(\theta)$即可，此时采样过程如下所示&lt;/p&gt;
&lt;p&gt;$$
X_0 \sim p_{\text{init}}, \quad \mathrm{d}X_t = \left[ u_t^\theta(X_t) + \frac{\sigma_t^2}{2} s_t^\theta(X_t) \right] \mathrm{d}t + \sigma_t \mathrm{d}W_t \implies X_1 \sim p_{data}
$$&lt;/p&gt;
&lt;p&gt;其中，尽管理论上对任意$\sigma_t \geq 0$均可实现采样，但由于存在对随机微分方程模拟不精确导致的精度误差，以及训练误差，因此存在一个最优的$\sigma_t$。同时观察采样过程可以发现模拟该SDE还需学习$u_t^\theta$，但其实通常可以使用一个两输出的网络同时处理$u_t^\theta$和$s_t^\theta$，并且对于特定的概率路径，两者可以相互转化。&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Denoising Diffusion Models: Score Matching for Gaussian Probability Paths&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;对于Gaussian Probability Paths，有&lt;/p&gt;
&lt;p&gt;$$
\nabla \log p_t(x|z) = -\frac{x - \alpha_t z}{\beta_t^2}
$$&lt;/p&gt;
&lt;p&gt;则&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
\mathcal{L}&lt;em&gt;{\text{CSM}}(\theta) &amp;#x26;= \mathbb{E}&lt;/em&gt;{t \sim \text{Unif}, z \sim p_{\text{data}}, x \sim p_t(\cdot|z)}\left[\left|s_t^\theta(x) + \frac{x - \alpha_t z}{\beta_t^2}\right|^2\right] \
&amp;#x26;= \mathbb{E}&lt;em&gt;{t \sim \text{Unif}, z \sim p&lt;/em&gt;{\text{data}}, \epsilon \sim \mathcal{N}(0, I_d)}\left[\left|s_t^\theta(\alpha_t z + \beta_t \epsilon) + \frac{\epsilon}{\beta_t}\right|^2\right] \
&amp;#x26;= \mathbb{E}&lt;em&gt;{t \sim \text{Unif}, z \sim p&lt;/em&gt;{\text{data}}, \epsilon \sim \mathcal{N}(0, I_d)}\left[\frac{1}{\beta_t^2} \left|\beta_t s_t^\theta(\alpha_t z + \beta_t \epsilon) + \epsilon\right|^2\right]
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;由于$\frac{1}{\beta^2_t}$在$\beta_t$趋近于0时loss趋近于无穷大，因此通常舍弃常数项$\frac{1}{\beta^2_t}$，并用以下方法reparameterize $s^\theta_t$为$\epsilon_t^\theta$（噪声预测网络）得到DDPM损失函数&lt;/p&gt;
&lt;p&gt;$$
-\beta_t s_t^\theta(x) = \epsilon_t^\theta(x) \quad \Rightarrow \quad \mathcal{L}&lt;em&gt;{\text{DDPM}}(\theta) = \mathbb{E}&lt;/em&gt;{t \sim \text{Unif}, z \sim p_{\text{data}}, \epsilon \sim \mathcal{N}(0, I_d)}\left[\left|\epsilon_t^\theta(\alpha_t z + \beta_t \epsilon) - \epsilon\right|^2\right]
$$&lt;/p&gt;
&lt;p&gt;其训练过程如下所示&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/image-20250821102850655.tzbC0Gd3_25BmOi.webp&quot; alt=&quot;image-20250821102850655&quot;&gt;&lt;/p&gt;
&lt;p&gt;此外，对于Gaussian Probability Paths，vector field和score可以相互转化，即&lt;/p&gt;
&lt;p&gt;$$
u_t^{\text{target}}(x|z) = \left( \beta_t^2 \frac{\dot{\alpha}_t}{\alpha_t} - \dot{\beta}_t \beta_t \right) \nabla \log p_t(x|z) + \frac{\dot{\alpha}_t}{\alpha_t} x \
u_t^{\text{target}}(x) = \left( \beta_t^2 \frac{\dot{\alpha}_t}{\alpha_t} - \dot{\beta}_t \beta_t \right) \nabla \log p_t(x) + \frac{\dot{\alpha}_t}{\alpha_t} x
$$&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;proof&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;$$
u_t^{\text{target}}(x|z) = \left( \dot{\alpha}_t - \frac{\dot{\beta}_t}{\beta_t} \alpha_t \right) z + \frac{\dot{\beta}_t}{\beta_t} x
\stackrel{(i)}{=} \left( \beta_t^2 \frac{\dot{\alpha}_t}{\alpha_t} - \dot{\beta}_t \beta_t \right) \left( \frac{\alpha_t z - x}{\beta_t^2} \right) + \frac{\dot{\alpha}_t}{\alpha_t} x
= \left( \beta_t^2 \frac{\dot{\alpha}_t}{\alpha_t} - \dot{\beta}_t \beta_t \right) \nabla \log p_t(x|z) + \frac{\dot{\alpha}_t}{\alpha_t} x
$$&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
u_t^{\text{target}}(x) &amp;#x26;= \int u_t^{\text{target}}(x|z) \frac{p_t(x|z) p_{\text{data}}(z)}{p_t(x)} ,\mathrm{d}z \
&amp;#x26;= \int \left[ \left( \beta_t^2 \frac{\dot{\alpha}_t}{\alpha_t} - \dot{\beta}_t \beta_t \right) \nabla \log p_t(x|z) + \frac{\dot{\alpha}&lt;em&gt;t}{\alpha_t} x \right] \frac{p_t(x|z) p&lt;/em&gt;{\text{data}}(z)}{p_t(x)} ,\mathrm{d}z \
&amp;#x26;\stackrel{(i)}{=} \left( \beta_t^2 \frac{\dot{\alpha}_t}{\alpha_t} - \dot{\beta}_t \beta_t \right) \nabla \log p_t(x) + \frac{\dot{\alpha}_t}{\alpha_t} x
\end{align*}
$$&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;$u_t^\theta$和$s^\theta_t$ 也可以相互转化，有&lt;/p&gt;
&lt;p&gt;$$
u_t^\theta(x) = \left( \beta_t^2 \frac{\dot{\alpha}_t}{\alpha_t} - \dot{\beta}_t \beta_t \right) s_t^\theta(x) + \frac{\dot{\alpha}_t}{\alpha_t} x
$$&lt;/p&gt;
&lt;p&gt;$$
s_t^\theta(x) = \frac{\alpha_t u_t^\theta(x) - \dot{\alpha}_t x}{\beta_t^2 \alpha_t - \alpha_t \dot{\beta}_t \beta_t}
$$&lt;/p&gt;
&lt;p&gt;因此对于Gaussian probability paths来说，只需训练$u_t^\theta$或$s^\theta_t$ 即可，&lt;strong&gt;且使用flow matching或者使用score matching的方法均可&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;最后，对于训练好的$s_t^\theta$ 从SDE中采样过程如下&lt;/p&gt;
&lt;p&gt;$$
X_0 \sim p_{\text{init}}, \quad \mathrm{d}X_t = \left[ \left( \beta_t^2 \frac{\dot{\alpha}_t}{\alpha_t} - \dot{\beta}_t \beta_t + \frac{\sigma_t^2}{2} \right) s_t^\theta(x) + \frac{\dot{\alpha}&lt;em&gt;t}{\alpha_t} x \right] \mathrm{d}t + \sigma_t \mathrm{d}W_t \
\implies X_1=p&lt;/em&gt;{data}
$$&lt;/p&gt;
&lt;h3&gt;Summary&lt;/h3&gt;
&lt;p&gt;总的来说，Flow Matching比Score Matching更简洁并且Flow Matching更具有拓展性，可以实现从一个任意初始分布$p_{init}$得到任意分布$p_{data}$，但是denoising diffusion models只适用于Gaussian initial distributions and Gaussian probability path。Flow Matching类似于Stochastic Interpolants。&lt;/p&gt;
&lt;h2&gt;Conditional (Guided) Generation&lt;/h2&gt;
&lt;p&gt;在给定条件下进行生成（generate an object &lt;strong&gt;conditioned on&lt;/strong&gt; &lt;strong&gt;some additional information&lt;/strong&gt;），称之为conditional generation，为了和conditional vector field区分多称为guided generation&lt;/p&gt;
&lt;p&gt;用数学语言描述即，对于$y \in \mathcal{Y}$，对$p_{data}(x \mid y)$中采样，因此模型包含条件向量场$u_t^{\theta}(\cdot \mid y)$，模型架构如下所示&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
\text{Neural network: } &amp;#x26; u_t^\theta : \mathbb{R}^d \times \mathcal{Y} \times [0, 1] \to \mathbb{R}^d, \quad (x, y, t) \mapsto u_t^\theta(x|y) \
\text{Fixed: } &amp;#x26; \sigma_t : [0, 1] \to [0, \infty), \quad t \mapsto \sigma_t
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;对于给定的$y \in \mathbb{R}^{d_y}$，采样过程可以描述为&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
\text{Initialization:} \quad &amp;#x26; X_0 \sim p_{\text{init}} \quad &amp;#x26;\triangleright \text{ Initialize with simple distribution} \
\text{Simulation:} \quad &amp;#x26; \mathrm{d}X_t = u_t^\theta(X_t|y),\mathrm{d}t + \sigma_t,\mathrm{d}W_t \quad &amp;#x26;\triangleright \text{ Simulate SDE from } t=0 \text{ to } t=1. \
\text{Goal:} \quad &amp;#x26; X_1 \sim p_{\text{data}}(\cdot|y) \quad &amp;#x26;\triangleright  X_1 \text{ to be distributed like } p_{\text{data}}(\cdot|y)
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;上述在$\sigma_t=0$时即为guided flow model&lt;/p&gt;
&lt;h3&gt;Guided Models&lt;/h3&gt;
&lt;p&gt;Guided Flow Models的训练损失（优化目标，或者说guided conditional flow matching objective）很容的得到，如下所示&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
\mathcal{L}&lt;em&gt;{\text{CFM}}^{\text{guided}}(\theta) &amp;#x26;= \mathbb{E}&lt;/em&gt;{(z,y) \sim p_{\text{data}}(z,y),, t \sim \text{Unif}(0,1),, x \sim p_t(\cdot|z)} \left[ \left| u_t^\theta(x|y) - u_t^{\text{target}}(x|z) \right|^2 \right]
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;同样的，对于Guided Diffusion Models，有guided conditional score matching objective如下&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
\mathcal{L}&lt;em&gt;{\text{CSM}}^{\text{guided}}(\theta) &amp;#x26;= \mathbb{E}&lt;/em&gt;{\square} \left[ | s_t^\theta(x|y) - \nabla \log p_t(x|z) |^2 \right] \
\square &amp;#x26;= (z, y) \sim p_{\text{data}}(z, y),\ t \sim \text{Unif}(0,1),\ x \sim p_t(\cdot|z)
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;虽然理论上上述以及足够生成标签$y$对应样本，但是实际上生成效果并不十分fit $y$，以及，无法控制生成内容对label的fit程度。一种解决方法是人为加强$y$的作用，比较先进的技术是Classifier-Free Guidance。&lt;/p&gt;
&lt;h4&gt;Classifier-Free Guidance&lt;/h4&gt;
&lt;p&gt;对于Flow Models，以Gaussian probability paths为例&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
u_t^{\text{target}}(x|y) = a_t x + b_t \nabla \log p_t(x|y)
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;其中&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
(a_t, b_t) = \left( \frac{\dot{\alpha}_t}{\alpha_t}, \frac{\dot{\alpha}_t \beta_t^2 - \dot{\beta}_t \beta_t \alpha_t}{\alpha_t} \right)
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;又&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
\nabla \log p_t(x|y) = \nabla \log \left( \frac{p_t(x) p_t(y|x)}{p_t(y)} \right) = \nabla \log p_t(x) + \nabla \log p_t(y|x)
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;则&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
u_t^{\text{target}}(x|y) = a_t x + b_t (\nabla \log p_t(x) + \nabla \log p_t(y|x)) = u_t^{\text{target}}(x) + b_t \nabla \log p_t(y|x)
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;可以看出，guided vector field是由unguided vector field和guided score相加得到，一种很自然的想法是对guided score进行加权，得到&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
\tilde{u}_t(x|y) = u_t^{\text{target}}(x) + wb_t \nabla \log p_t(y|x)
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;其中guided score可以看作是噪声类别分类器，早期的工作确实使用这样的方法实现，但是进一步对guided score进行分析得到如下：&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
\tilde{u}_t(x|y) &amp;#x26;= u_t^{\text{target}}(x) + w_b \nabla \log p_t(y|x) \
&amp;#x26;= u_t^{\text{target}}(x) + w_b (\nabla \log p_t(x|y) - \nabla \log p_t(x)) \
&amp;#x26;= u_t^{\text{target}}(x) - (w_a x + w_b \nabla \log p_t(x)) + (w_a x + w_b \nabla \log p_t(x|y)) \
&amp;#x26;= (1 - w) u_t^{\text{target}}(x) + w u_t^{\text{target}}(x|y).
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;即$\tilde{u}_t(x|y)$由unguided vector field和guided vector field加权得到，并且，通过构造$y = \varnothing$其对应概率为人为设计的超参数$\eta$，从而实现使用$u_t^{\text{target}}(x|\varnothing)$代替$u_t^{\text{target}}(x)$，具体可公式化描述为&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
\mathcal{L}&lt;em&gt;{\text{CFM}}^{\text{CFG}}(\theta) &amp;#x26;= \mathbb{E}&lt;/em&gt;{\square} \left[ | u_t^\theta(x|y) - u_t^{\text{target}}(x|z) |^2 \right] \
\square &amp;#x26;= (z, y) \sim p_{\text{data}}(z, y),\ t \sim \text{Unif}(0,1),\ x \sim p_t(\cdot|z),\ \text{replace } y = \varnothing \text{ with prob. } \eta
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;对于Diffusion Models，$\tilde{s}_t(x|y)$同样可改写如下&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
\tilde{s}_t(x|y) &amp;#x26;= \nabla \log p_t(x) + w \nabla \log p_t(y|x) \
&amp;#x26;= \nabla \log p_t(x) + w (\nabla \log p_t(x|y) - \nabla \log p_t(x)) \
&amp;#x26;= (1 - w) \nabla \log p_t(x) + w \nabla \log p_t(x|y) \
&amp;#x26;= (1 - w) \nabla \log p_t(x|\varnothing) + w \nabla \log p_t(x|y)
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;training objective如下&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
\mathcal{L}&lt;em&gt;{\text{CSM}}^{\text{CFG}}(\theta) &amp;#x26;= \mathbb{E}&lt;/em&gt;{\square} \left[ | s_t^\theta(x|(1 - \xi)y + \xi \varnothing) - \nabla \log p_t(x|z) |^2 \right] \
\square &amp;#x26;= (z, y) \sim p_{\text{data}}(z, y),\ t \sim \text{Unif}(0,1),\ x \sim p_t(\cdot|z),\ \text{replace } y = \varnothing \text{ with prob. } \eta
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;训练时，我们通常也可同时优化${s}_t^\theta(x|y)$和${u}_t^\theta(x|y)$，对应的，有&lt;/p&gt;
&lt;p&gt;$$
\begin{align*}
\tilde{s}_t^\theta(x|y) &amp;#x26;= (1 - w) s_t^\theta(x|\varnothing) + w s_t^\theta(x|y), \
\tilde{u}_t^\theta(x|y) &amp;#x26;= (1 - w) u_t^\theta(x|\varnothing) + w u_t^\theta(x|y).
\end{align*}
$$&lt;/p&gt;
&lt;p&gt;采样时，有&lt;/p&gt;
&lt;p&gt;$$
\mathrm{d}X_t = \left[ \tilde{u}_t^\theta(X_t|y) + \frac{\sigma_t^2}{2} s_t^\theta(X_t|y) \right] \mathrm{d}t + \sigma_t \mathrm{d}W_t
$$&lt;/p&gt;
&lt;h2&gt;Network architectures&lt;/h2&gt;
&lt;p&gt;网络模型的设计随建模数据的复杂程度各有差别，但都需满足&lt;/p&gt;
&lt;p&gt;$$
\text{Neural network: }  u_t^\theta : \mathbb{R}^d \times \mathcal{Y} \times [0, 1] \to \mathbb{R}^d, \quad (x, y, t) \mapsto u_t^\theta(x|y)
$$&lt;/p&gt;
&lt;h3&gt;U-Nets&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/image-20250821130050114.riacPsTa_Z1Buxq2.webp&quot; alt=&quot;image-20250821130050114&quot;&gt;&lt;/p&gt;
&lt;h3&gt;Diffusion Transformers&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/8bb17b0d052f439725e08dfee592beb0.B7gJovgr_Z6c6Vr.webp&quot; alt=&quot;img&quot;&gt;&lt;/p&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;p&gt;[1] Peter Holderrieth and Ezra Erives.An Introduction to Flow Matching and Diffusion Models[EB/OL].https://arxiv.org/abs/2506.02070,2025.&lt;/p&gt;</content:encoded><h:img src="/_astro/8bb17b0d052f439725e08dfee592beb0.B7gJovgr.png"/><enclosure url="/_astro/8bb17b0d052f439725e08dfee592beb0.B7gJovgr.png"/></item><item><title>智慧树试卷导出脚本</title><link>https://pengwee.wang/blog/zhi-hui-shu-cha-juan-dao-chu</link><guid isPermaLink="true">https://pengwee.wang/blog/zhi-hui-shu-cha-juan-dao-chu</guid><description>智慧树试卷导出Tampermonkey脚本</description><pubDate>Fri, 13 Jun 2025 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;安装&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;先安装&lt;a href=&quot;https://microsoftedge.microsoft.com/addons/detail/%E7%AF%A1%E6%94%B9%E7%8C%B4/iikmkjmpaadaobahmlepeloendndfphd&quot;&gt;Tampermonkey&lt;/a&gt;（已安装请忽略）&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;点击&lt;a href=&quot;/tools/zhihuishu_exam_export/zhihuishu_exam_export_v1.user.js&quot;&gt;这里&lt;/a&gt; 配合 &lt;a href=&quot;/tools/zhihuishu_exam_export&quot;&gt;渲染工具&lt;/a&gt;，或者直接安装&lt;a href=&quot;/tools/zhihuishu_exam_export/zhihuishu_exam_export_v2.user.js&quot;&gt;这个&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>What Is 3D Rendering? Complete Guide to 3D Visualization</title><link>https://pengwee.wang/blog/3d-rendering</link><guid isPermaLink="true">https://pengwee.wang/blog/3d-rendering</guid><description>3D imagery has the power to bring cinematic visions to life and help accurately plan tomorrow’s cityscapes. Here, 3D expert Ricardo Ortiz explains how it works.</description><pubDate>Sun, 09 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;3D rendering is all around us. From huge action movies to car commercials to previews of upcoming buildings or product designs, 3D visualization has become so widespread and realistic that you probably don’t even know it’s there.&lt;/p&gt;
&lt;p&gt;In this introductory piece, Chaos’ Ricardo Ortiz explains the basics of 3D rendering, from the computational methods that create imagery to the artistic techniques that create great computer-generated (CG) content and its various uses.&lt;/p&gt;
&lt;h2&gt;What is 3D Rendering?&lt;/h2&gt;
&lt;p&gt;Put simply, 3D rendering is the process of using a computer to generate a 2D image from a digital three-dimensional scene.&lt;/p&gt;
&lt;p&gt;To generate an image, specific methodologies and special software and hardware are used. Therefore, we need to understand that 3D rendering is a process—the one that builds the image.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/nikola-arsov-still-life-interior-design-vray-3ds-max-05-930px.DoY3_oVo_alYGQ.webp&quot; alt=&quot;alt text&quot;&gt;&lt;/p&gt;
&lt;h2&gt;Types of 3D rendering&lt;/h2&gt;
&lt;p&gt;We can create different types of rendered image; they can be realistic or non-realistic.&lt;/p&gt;
&lt;p&gt;A realistic image could be an architectural interior that looks like a photograph, a product-design image such as a piece of furniture, or an automotive rendering of a car. On the other hand, we can create a non-realistic image such as an outline-type diagram or a cartoon-style image with a traditional 2D look. Technically, we can visualize anything we can imagine.&lt;/p&gt;
&lt;h2&gt;How is 3D rendering used?&lt;/h2&gt;
&lt;p&gt;3D rendering is an essential technique for many industries including architecture, product design, advertising, video games and visual effects for film, TV and animation.&lt;/p&gt;
&lt;p&gt;In design and architecture, renders allow creative people to communicate their ideas in a clear and transparent way. A render gives them the chance to evaluate their proposals, experiment with materials, conduct studies and contextualize their designs in the real world before they are built or manufactured.&lt;/p&gt;
&lt;p&gt;For the media and entertainment industries, 3D rendering is fundamental to the creation of sequences and animations that tell stories, whether we’re watching an animated movie, a period drama, or an action sequence with explosions, ships from the future, exotic locales, or extraterrestrial creatures.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/thanos-dd-single-image-004a.DUX4VGf-_1A3bTN.webp&quot; alt=&quot;alt text&quot;&gt;&lt;/p&gt;
&lt;p&gt;Over the past few years, the evolution of computer graphics in these industries has replaced traditional techniques. For example, special effects are being replaced by visual effects, which means stunt people no longer risk their lives in car crashes.&lt;/p&gt;
&lt;p&gt;In advertising, I would dare to say that 90% of automotive commercials are CG—or even more. In the architecture industry, many traditional techniques to create representations, such as scale models, have been replaced with photorealistic imagery to ensure we can see exactly how something will look once it’s built.&lt;/p&gt;
&lt;p&gt;Accelerating processes, reducing costs and the demand for better quality results have helped technology evolve. Hardware is more powerful than ever and the switch to CG was inevitable.&lt;/p&gt;
&lt;h2&gt;How is a 3D rendered image generated?&lt;/h2&gt;
&lt;p&gt;Two pieces of software, with different characteristics, are used to computer-generate images and animations: render engines and game engines. Render engines use a technique called ray tracing, while game engines use a technique called rasterization—and some engines mix both techniques, but we will talk about that later on.&lt;/p&gt;</content:encoded><h:img src="/_astro/thumbnail.DzZDiYKA.jpg"/><enclosure url="/_astro/thumbnail.DzZDiYKA.jpg"/></item><item><title>FastAPI项目开发与部署</title><link>https://pengwee.wang/blog/fastapi</link><guid isPermaLink="true">https://pengwee.wang/blog/fastapi</guid><description>FastAPI 项目开发与部署笔记，包含模块化设计、路由定义和Docker部署</description><pubDate>Sun, 16 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;hr&gt;
&lt;h1&gt;&lt;strong&gt;FastAPI 项目开发与部署笔记&lt;/strong&gt;&lt;/h1&gt;
&lt;p&gt;本笔记总结了如何使用 FastAPI 构建一个模块化、可扩展的 API 系统，并通过 Docker 和 Docker Compose 实现高效的开发和部署流程。&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;&lt;strong&gt;1. 项目结构设计&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;为了构建一个清晰、易维护的项目，项目采用以下结构：&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;project/
├── main.py          # 主入口文件
├── routers/         # 路由模块
│   ├── __init__.py
│   ├── xiaohongshu/ # 小红书 API 文件夹
│   │   ├── __init__.py
│   │   └── image.py # 小红书图片解析 API
│   └── other_api/   # 其他功能 API 文件夹（未来扩展）
│       ├── __init__.py
│       └── example.py
├── utils/           # 工具模块
│   ├── __init__.py
│   └── parser.py    # 解析工具函数
└── models/          # 数据模型（如果需要）
    ├── __init__.py
    └── example.py
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;特点&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;每个功能模块独立封装在 &lt;code&gt;routers&lt;/code&gt; 文件夹下的子文件夹中。&lt;/li&gt;
&lt;li&gt;动态加载路由，支持灵活扩展。&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;&lt;strong&gt;2. FastAPI 核心功能实现&lt;/strong&gt;&lt;/h2&gt;
&lt;h3&gt;&lt;strong&gt;(1) 路由定义&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;使用 &lt;code&gt;APIRouter&lt;/code&gt; 定义模块化路由。例如：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from fastapi import APIRouter

router = APIRouter(prefix=&quot;/image&quot;, tags=[&quot;Image Parsing&quot;])

@router.get(&quot;/&quot;)
async def parse_image(url: str):
    result = HongshuParser(url)
    return result
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;&lt;strong&gt;(2) 自动化文档&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;FastAPI 自动生成交互式文档页面：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Swagger UI: &lt;code&gt;/docs&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;ReDoc: &lt;code&gt;/redoc&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;可以通过以下方式定制文档页面：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;修改标题：自定义 HTML 模板。&lt;/li&gt;
&lt;li&gt;添加品牌化元素：如 Logo 和样式。&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;&lt;strong&gt;3. Docker 部署&lt;/strong&gt;&lt;/h2&gt;
&lt;h3&gt;&lt;strong&gt;(1) Dockerfile&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;Dockerfile&lt;/code&gt; 示例：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-dockerfile&quot;&gt;# syntax=docker/dockerfile:1.4
FROM --platform=$BUILDPLATFORM python:3.11

WORKDIR /app
ENV PYTHONDONTWRITEBYTECODE 1
ENV PYTHONUNBUFFERED 1

COPY requirements.txt /app
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 4725/tcp
CMD [&quot;gunicorn&quot;, &quot;-w&quot;, &quot;4&quot;, &quot;-k&quot;, &quot;uvicorn.workers.UvicornWorker&quot;, &quot;--bind&quot;, &quot;0.0.0.0:4725&quot;, &quot;app:app&quot;]
&lt;/code&gt;&lt;/pre&gt;
&lt;hr&gt;
&lt;h3&gt;&lt;strong&gt;(2) Docker Compose&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;通过 &lt;code&gt;docker-compose.yml&lt;/code&gt; 简化多容器管理：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-yaml&quot;&gt;version: &apos;3.9&apos;
services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - &apos;4725:4725&apos;
    volumes:
      - .:/app
    command: &gt;
      gunicorn -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:4725 app:app
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;优点&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;使用 &lt;code&gt;volumes&lt;/code&gt; 挂载本地代码，实时同步代码更改。&lt;/li&gt;
&lt;li&gt;支持多服务管理（如数据库、缓存等）。&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;&lt;strong&gt;4. 更新代码后的重新运行&lt;/strong&gt;&lt;/h2&gt;
&lt;h3&gt;&lt;strong&gt;(1) 手动更新&lt;/strong&gt;&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;停止并删除旧容器：
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;docker stop &amp;#x3C;docker 容器 ID | docker 容器名&gt;
docker rm &amp;#x3C;docker 容器 ID | docker 容器名&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;重新构建镜像并运行：
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;docker build -t &amp;#x3C;docker 容器名&gt; .
docker run -d -p 4725:4725 --name &amp;#x3C;docker 容器名&gt; &amp;#x3C;docker 镜像名&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong&gt;(2) 使用 Docker Compose&lt;/strong&gt;&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;重新构建并启动：
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;docker-compose up --build -d # -d 表示后台运行
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;如果挂载了本地代码，只需保存代码更改即可自动生效。&lt;/li&gt;
&lt;/ol&gt;
&lt;hr&gt;
&lt;h2&gt;&lt;strong&gt;5. 项目拓展&lt;/strong&gt;&lt;/h2&gt;
&lt;h3&gt;&lt;strong&gt;(1) 添加新功能&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;新增功能非常简单，只需在 &lt;code&gt;routers&lt;/code&gt; 文件夹下创建新的子文件夹，并按照以下步骤操作：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;创建模块文件夹。&lt;/li&gt;
&lt;li&gt;定义路由。&lt;/li&gt;
&lt;li&gt;初始化模块。&lt;/li&gt;
&lt;li&gt;测试新功能。&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;&lt;strong&gt;(2) 集成外部工具&lt;/strong&gt;&lt;/h3&gt;
&lt;p&gt;通过依赖注入的方式集成外部工具或服务（如数据库、缓存等）。例如：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;def get_db():
    db = &quot;Database Connection&quot;
    return db

@app.get(&quot;/example-with-db&quot;)
async def example_with_db(db=Depends(get_db)):
    return {&quot;db&quot;: db}
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;&lt;strong&gt;项目代码&lt;/strong&gt;&lt;/h2&gt;
&lt;p&gt;项目代码已上传至 &lt;a href=&quot;https://github.com/Snape-max/api&quot;&gt;Qiumo api&lt;/a&gt;, 部署置 &lt;a href=&quot;https://api.qiumo.fun/&quot;&gt;Qiumo.fun&lt;/a&gt;&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>Time Machine</title><link>https://pengwee.wang/blog/time-machine</link><guid isPermaLink="true">https://pengwee.wang/blog/time-machine</guid><description>Time Machine 歌词</description><pubDate>Mon, 03 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/OIP-C.DvJY6ozT_Z1g9jVR.webp&quot; alt=&quot;machine&quot;&gt;&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Everyone want to go back, but time waits for no one.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Staring at stars&lt;/p&gt;
&lt;p&gt;Watching the moon&lt;/p&gt;
&lt;p&gt;Hoping that one they&apos;ll lead me to you&lt;/p&gt;
&lt;p&gt;Wait every night&lt;/p&gt;
&lt;p&gt;Cause if a star falls&lt;/p&gt;
&lt;p&gt;I&apos;ll wish to go back to the times that I loved&lt;/p&gt;
&lt;p&gt;Why do the stars shine so bright in the sky&lt;/p&gt;
&lt;p&gt;If most of the people are sleeping at night&lt;/p&gt;
&lt;p&gt;Why do we only have one chance at life&lt;/p&gt;
&lt;p&gt;I wish I could go back in time&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;Pictures remind me of the things I forget&lt;/p&gt;
&lt;p&gt;But also of all of the things that I&apos;ve lost&lt;/p&gt;
&lt;p&gt;Can&apos;t get them back they won&apos;t fall from above&lt;/p&gt;
&lt;p&gt;So I try to forget all the times that I loved&lt;/p&gt;
&lt;p&gt;Why do we remember beautiful lies&lt;/p&gt;
&lt;p&gt;We end up regretting them most of our lives&lt;/p&gt;
&lt;p&gt;Why do we only have one chance to try&lt;/p&gt;
&lt;p&gt;I wish I could go back in time&lt;/p&gt;
&lt;p&gt;Each time I fall asleep&lt;/p&gt;
&lt;p&gt;I always see you there in my dreams&lt;/p&gt;
&lt;p&gt;It&apos;s like going back in a time mechine&lt;/p&gt;
&lt;p&gt;I know when I wake up your time with me will end&lt;/p&gt;
&lt;p&gt;So don&apos;t let me fall asleep&lt;/p&gt;
&lt;p&gt;I don&apos;t wanna meet you there in my dreams&lt;/p&gt;
&lt;p&gt;I know that we&apos;ll never build a time machine&lt;/p&gt;
&lt;p&gt;It&apos;s time for me to try and wake up again&lt;/p&gt;
&lt;p&gt;I fall asleep&lt;/p&gt;
&lt;p&gt;But honestly&lt;/p&gt;
&lt;p&gt;I wanna see you in my dreams&lt;/p&gt;
&lt;p&gt;I&apos;m trying to wake up again&lt;/p&gt;</content:encoded><h:img src="/_astro/OIP-C.DvJY6ozT.jpg"/><enclosure url="/_astro/OIP-C.DvJY6ozT.jpg"/></item><item><title>千与千寻——只存在于梦中的童话故事</title><link>https://pengwee.wang/blog/qian-yu-qian-xun</link><guid isPermaLink="true">https://pengwee.wang/blog/qian-yu-qian-xun</guid><description>千与千寻影评——关于本真、友情和爱情的童话故事</description><pubDate>Sun, 02 Feb 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/qyqx.D0Ufl47m_2txG12.webp&quot; alt=&quot;千与千寻&quot;&gt;&lt;/p&gt;
&lt;p&gt;标准的 &lt;code&gt;HE&lt;/code&gt; 故事，故事情节很精彩，主旨也很丰富。但是童话终究是童话。&lt;/p&gt;
&lt;p&gt;比较深刻的一部分是关于无脸男。看电影前刚讨论过关于有的男生结婚前很好，但是结婚后会变坏；有的男生结婚前很坏，但是结婚后会变好。虽然变好变坏难以界定，但是仔细分析来说其变化无非出于内因和外因。无脸男只是一个容器，外界装入什么，就表现出什么。大浴场贪婪，其也就变得贪婪。钱婆婆和千寻善良，在她们的影响下他也不再贪婪。&lt;/p&gt;
&lt;p&gt;这样来说，我们就很需要有非常强大的内心，和刚毅的坚守。像千寻一样，不被外界的贪婪干扰，再具体一点，就是保存本真。&lt;/p&gt;
&lt;p&gt;是的，本真。忘记了自己的名字就忘记了自己是谁，忘记了自己的本真就会被别人控制，困扰。汤婆婆是这样来控制他人的。因此要记得自己的本心。&lt;/p&gt;
&lt;p&gt;但是，世事无常，人生魔幻缤纷，能够保存自己本心的人少之又少，大抵到最后都拜倒在诱惑或者屈膝于生存。这样想来就有一种绝望感。&lt;/p&gt;
&lt;p&gt;不过，宫崎骏像是给出了自己的答案，友情和爱情，千寻拯救无脸男，琥珀川和千寻相互救赎，最后事情都被解决，我们都有美好的未来。&lt;/p&gt;
&lt;p&gt;然而，这仔细想来更让人有点失落，知己难以遇到，真爱更是如此，如果孤身一人看这部电影，初看很温馨，但是回过味来便很让人哭泣。带入千寻，或许当初掉进琥珀川的时候可能就被淹死了，闯进神明世界的时候就独自消失了，进到汤屋的时候就被变成了猪或者煤球。一切的一切都像是巧合，都只是童话。&lt;/p&gt;
&lt;p&gt;这样去想又像是个消极主义者了，是我是消极主义者，还是世界影响着我让我变成了消极主义者？我是我，还是世界造就了我？&lt;/p&gt;
&lt;p&gt;或许应该乐观点，自信点，宏大点。如果不能被理解，可以去理解他人；如果不能被救赎，去努力救赎他人。成为一个理想主义者，观察者和记录者。&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;电影中的名字也挺有特色，汤婆婆喜欢钱却却姓汤，钱婆婆不痴迷钱却姓钱，无脸男有脸却无心；千寻千寻，寻找的既是自己，也是自己爱的人。每个人的名字都很有意义，我要取个什么名字？&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;思考真的可以产生热量，刚刚还非常冷的手和胸膛，现在也火热了起来。&lt;/p&gt;</content:encoded><h:img src="/_astro/qyqx.D0Ufl47m.jpg"/><enclosure url="/_astro/qyqx.D0Ufl47m.jpg"/></item><item><title>空山新雨后</title><link>https://pengwee.wang/blog/kong-shan-xin-yu-hou</link><guid isPermaLink="true">https://pengwee.wang/blog/kong-shan-xin-yu-hou</guid><description>空山新雨后 - 经典古诗词</description><pubDate>Mon, 20 Jan 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/image.Feci_j7Q_1GGxbr.webp&quot; alt=&quot;image&quot;&gt;&lt;/p&gt;
&lt;h1&gt;空山新雨后&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;空山新雨后，天气晚来秋&lt;/p&gt;
&lt;p&gt;明月松间照，清泉石上流&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;山峰轻摆尾&lt;/p&gt;
&lt;p&gt;卷下落花随流水&lt;/p&gt;
&lt;p&gt;路过擦拭曾经 用你柔情 换我的眼泪&lt;/p&gt;
&lt;p&gt;当爱恨都败退&lt;/p&gt;
&lt;p&gt;没谢幕的人啊&lt;/p&gt;
&lt;p&gt;井中月 举杯砸碎 佐一场宿醉&lt;/p&gt;
&lt;p&gt;抽签的玫瑰&lt;/p&gt;
&lt;p&gt;作熏香还(hai)能余味&lt;/p&gt;
&lt;p&gt;猜测无解答案 算了满地 也是种浪费&lt;/p&gt;
&lt;p&gt;我才终于明白&lt;/p&gt;
&lt;p&gt;终于明白&lt;/p&gt;
&lt;p&gt;不能被施舍的是爱&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;取下褪漆的钗&lt;/p&gt;
&lt;p&gt;就化作尘埃&lt;/p&gt;
&lt;p&gt;喝多少暖身的酒&lt;/p&gt;
&lt;p&gt;暖不了心口&lt;/p&gt;
&lt;p&gt;待空山新雨后&lt;/p&gt;
&lt;p&gt;放一叶小舟&lt;/p&gt;
&lt;p&gt;载上无人问津的温柔&lt;/p&gt;
&lt;p&gt;摆渡寻处去忘忧&lt;/p&gt;
&lt;p&gt;抽签的玫瑰&lt;/p&gt;
&lt;p&gt;作熏香还能余味&lt;/p&gt;
&lt;p&gt;猜测无解答案 算了满地 也是种浪费&lt;/p&gt;
&lt;p&gt;我才终于明白&lt;/p&gt;
&lt;p&gt;终于明白&lt;/p&gt;
&lt;p&gt;不能被施舍的是爱&lt;/p&gt;
&lt;p&gt;取下褪漆的钗&lt;/p&gt;
&lt;p&gt;就化作尘埃&lt;/p&gt;
&lt;p&gt;喝多少暖身的酒&lt;/p&gt;
&lt;p&gt;暖不了心口&lt;/p&gt;
&lt;p&gt;待空山新雨后&lt;/p&gt;
&lt;p&gt;放一叶小舟&lt;/p&gt;
&lt;p&gt;载上无人问津的温柔&lt;/p&gt;
&lt;p&gt;摆渡寻处去忘忧&lt;/p&gt;</content:encoded><h:img src="/_astro/image.Feci_j7Q.png"/><enclosure url="/_astro/image.Feci_j7Q.png"/></item><item><title>ax650交叉编译ax-pipeline</title><link>https://pengwee.wang/blog/jiao-cha-bian-yi</link><guid isPermaLink="true">https://pengwee.wang/blog/jiao-cha-bian-yi</guid><description>ax650交叉编译ax-pipeline教程</description><pubDate>Wed, 19 Jun 2024 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;ax650交叉编译ax-pipeline&lt;/h1&gt;
&lt;h2&gt;编译前准备&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;x86 Linux&lt;/code&gt;系统，虚拟机或者实体机，推荐选择&lt;code&gt;Ubuntu 22.04&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;稳定网络环境(需要连接&lt;code&gt;github&lt;/code&gt;)，若下载出现问题可参考&lt;a href=&quot;#github%E9%95%9C%E5%83%8F%E5%8A%A0%E9%80%9F%E4%B8%8B%E8%BD%BD&quot;&gt;此处&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;U盘&lt;/li&gt;
&lt;li&gt;安装基础编译包&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;sudo apt update
sudo apt install build-essential libopencv-dev cmake
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;交叉编译&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;拉取ax-pipeline源码及子模块&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;git clone --recursive https://github.com/AXERA-TECH/ax-pipeline.git
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;下载sdk及设置650n_bsp_sdk版本&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;cd ax-pipeline
./download_ax_bsp.sh ax650
./switch_version_ax650.sh 1.45
cd ax650n_bsp_sdk
wget https://github.com/ZHEQIUSHUI/assets/releases/download/ax650/drm.zip
mkdir third-party
unzip drm.zip -d third-party
cd ..
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;下载opencv&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;mkdir 3rdparty
cd 3rdparty
wget https://github.com/ZHEQIUSHUI/assets/releases/download/ax650/libopencv-4.5.5-aarch64.zip
unzip libopencv-4.5.5-aarch64.zip
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;配置交叉编译器&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;wget https://developer.arm.com/-/media/Files/downloads/gnu-a/9.2-2019.12/binrel/gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu.tar.xz
tar -xvf gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu.tar.xz
export PATH=$PATH:$PWD/gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu/bin/
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;源码编译&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;cd ax-pipeline
mkdir build
cd build
cmake -DAXERA_TARGET_CHIP=AX650 -DBSP_MSP_DIR=$PWD/../ax650n_bsp_sdk/msp/out -DOpenCV_DIR=$PWD/../3rdparty/libopencv-4.5.5-aarch64/lib/cmake/opencv4 -DSIPY_BUILD=OFF -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=../toolchains/aarch64-none-linux-gnu.toolchain.cmake -DCMAKE_INSTALL_PREFIX=install ..
make -j12
make install
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;获得bin文件如下所示&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;bin
├── config
│   ├── custom_model.json
│   ├── dinov2.json
│   ├── dinov2_depth.json
│   ├── glpdepth.json
│   ├── ppyoloe.json
│   ├── scrfd.json
│   ├── scrfd_recognition.json
│   ├── yolo_nas.json
│   ├── yolov5_seg.json
│   ├── yolov5s.json
│   ├── yolov5s_face.json
│   ├── yolov5s_face_recognition.json
│   ├── yolov6.json
│   ├── yolov7.json
│   ├── yolov7_face.json
│   ├── yolov8.json
│   ├── yolov8_pose.json
│   └── yolox.json
├── sample_demux_ivps_npu_hdmi_vo
├── sample_demux_ivps_npu_rtsp
├── sample_demux_ivps_npu_rtsp_hdmi_vo
├── sample_multi_demux_ivps_npu_hdmi_vo
├── sample_multi_demux_ivps_npu_multi_rtsp
├── sample_multi_demux_ivps_npu_multi_rtsp_hdmi_vo
├── sample_vin_ivps_npu_hdmi_vo
└── sample_vin_ivps_npu_venc_rtsp
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;移动到开发板&lt;/h2&gt;
&lt;p&gt;由于编译后文件较大，因此推荐使用U盘进行数据传输&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;将编译后bin文件移动到U盘中&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;U盘插入板卡中&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;查看U盘所在分区&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/image-20240619004815857.w4CFLluT_z3WNt.webp&quot; alt=&quot;image-20240619004815857&quot;&gt;&lt;/p&gt;
&lt;p&gt;如图所示，我的U盘所在分区为&lt;code&gt;/dev/sda1&lt;/code&gt; (根据大小或者其他来判断)&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;挂载到文件夹中(此处挂载到了&lt;code&gt;/mnt/usb&lt;/code&gt;文件夹下)&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;mkdir /mnt/usb
mount /dev/sda1 /mnt/usb
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;em&gt;可能会有以下提示，不影响&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/image-20240619005631945.nonEXb9N_12yeH9.webp&quot; alt=&quot;image-20240619005631945&quot;&gt;&lt;/p&gt;
&lt;p&gt;查看是否挂载&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/image-20240619005701612.DfIX9kq2_4utiM.webp&quot; alt=&quot;image-20240619005701612&quot;&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;移动文件到板卡中(此处创建了&lt;code&gt;~/data目录&lt;/code&gt;，并将文件移动到了&lt;code&gt;~/data/&lt;/code&gt;下)&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code&gt;mkdir ~/data
cp /mnt/usb/bin ~/data -r
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;查看文件&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/image-20240619005844205.DMYMyDJl_ZbeDir.webp&quot; alt=&quot;image-20240619005844205&quot;&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;运行默认示例，不传入模型参数(记得&lt;code&gt;kill fb_vo&lt;/code&gt;进程)&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;cd ~/data/bin
./sample_vin_ivps_npu_hdmi_vo
&lt;/code&gt;&lt;/pre&gt;
&lt;ul&gt;
&lt;li&gt;移除U盘&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;卸载U盘&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;umount /dev/sda1 /mnt/usb
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;即可拔掉U盘&lt;/p&gt;
&lt;h2&gt;github镜像加速下载&lt;/h2&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;git&lt;/code&gt;拉取&lt;code&gt;ax-pipeline&lt;/code&gt;源码加速&lt;/li&gt;
&lt;/ol&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;git clone https://kkgithub.com/AXERA-TECH/ax-pipeline.git
cd ax-pipeline
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;修改&lt;code&gt;ax-pipeline&lt;/code&gt;下&lt;code&gt;.gitmodules&lt;/code&gt;文件， 将&lt;code&gt;url =&lt;/code&gt;中所有&lt;code&gt;github.com&lt;/code&gt;换为&lt;code&gt;kkgithub.com&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;拉取子模块&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;git submodule update --init
./download_ax_bsp.sh ax650
&lt;/code&gt;&lt;/pre&gt;
&lt;ol start=&quot;2&quot;&gt;
&lt;li&gt;&lt;code&gt;wget&lt;/code&gt;文件加速&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;替换&lt;code&gt;wget&lt;/code&gt;下载链接中&lt;code&gt;github.com&lt;/code&gt;为&lt;code&gt;kkgithub.com&lt;/code&gt;&lt;/p&gt;</content:encoded><h:img src="/_astro/image-20240619005844205.DMYMyDJl.png"/><enclosure url="/_astro/image-20240619005844205.DMYMyDJl.png"/></item><item><title>侧耳倾听——阅读、爱情与理想</title><link>https://pengwee.wang/blog/ceer-qingting</link><guid isPermaLink="true">https://pengwee.wang/blog/ceer-qingting</guid><description>侧耳倾听影评——关于阅读、爱情与理想</description><pubDate>Fri, 25 Aug 2023 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/ceer.BokHtzWC_Z1g6jQM.webp&quot; alt=&quot;img&quot;&gt;&lt;/p&gt;
&lt;p&gt;我喜欢上了你努力的样子，因此我也变得越加的努力吸引你的注意。&lt;/p&gt;
&lt;p&gt;两个互相振奋的灵魂、一个变得更加优秀的约定、一份纯洁无暇的爱情。&lt;/p&gt;
&lt;p&gt;就让这些种下未来的种子，在约定的时刻我们相见。&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;想变得更加优秀，在未来的某个时间点上和她一起&lt;/p&gt;
&lt;p&gt;便如此这般&lt;/p&gt;</content:encoded><h:img src="/_astro/ceer.BokHtzWC.png"/><enclosure url="/_astro/ceer.BokHtzWC.png"/></item><item><title>萤火之森——终将别离的爱恋</title><link>https://pengwee.wang/blog/ying-huo-zhi-sen</link><guid isPermaLink="true">https://pengwee.wang/blog/ying-huo-zhi-sen</guid><description>萤火之森影评——终将别离的爱恋</description><pubDate>Wed, 23 Aug 2023 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/Yin.DxXKdJu2_ZhqtCl.webp&quot; alt=&quot;img&quot;&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;时光终有一天会将我们分开，但是。即使如此，在那日降临之前，让我们一直在一起吧。
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;如果我和爱的人无法触碰，无法拥抱，那么我大概是要发疯的。&lt;/p&gt;
&lt;p&gt;然而如果触碰便意味着别离，那么失去或许是成全。&lt;/p&gt;
&lt;p&gt;不过如同开头所说的那句话，我们终将分离，但是在我们仍未分离的时光里，快乐的生活着吧。&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;若我是萤，面对终将分离的爱情时，我大概率是会离开的吧，不想面对，别离时的悲伤。&lt;/p&gt;
&lt;p&gt;如果终将分离，倒不如让情还未深的时候结束，让痛苦来的更早些短暂些。&lt;/p&gt;
&lt;p&gt;这是现在的我。&lt;/p&gt;</content:encoded><h:img src="/_astro/Yin.DxXKdJu2.jpg"/><enclosure url="/_astro/Yin.DxXKdJu2.jpg"/></item><item><title>使用PyQt5开发应用程序总结</title><link>https://pengwee.wang/blog/pyqt5</link><guid isPermaLink="true">https://pengwee.wang/blog/pyqt5</guid><description>PyQt5 是一个用于创建图形用户界面的 Python 框架</description><pubDate>Fri, 11 Aug 2023 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;PyQt5 使用笔记&lt;/h1&gt;
&lt;p&gt;PyQt5 是一个用于创建图形用户界面(GUI)的 Python 框架，基于 Qt 库开发而来。它提供了丰富的工具和组件，使开发者能够轻松地创建各种强大的桌面应用程序。本文将介绍 PyQt5 的基本用法，并提供一些示例代码帮助你入门。&lt;/p&gt;
&lt;h2&gt;安装 PyQt5&lt;/h2&gt;
&lt;p&gt;首先，需要安装 PyQt5 模块。你可以使用 pip 命令来安装：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;pip install PyQt5
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;创建一个基本的 PyQt5 窗口&lt;/h2&gt;
&lt;p&gt;在 PyQt5 中，你可以通过两种方法来创建窗口：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;面向对象编程：&lt;/strong&gt; 这种方法涉及创建一个继承自特定窗口类的新类，并在新类中重写需要的方法来配置界面和处理事件。这种方法更加面向对象，可以更好地组织和管理代码。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;直接编写代码：&lt;/strong&gt; 这种方法涉及直接编写代码来创建窗口和组件，然后配置属性和信号槽等。这种方法更加直接，适用于一些简单的界面或快速原型开发。&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;下面分别展示了这两种方法的示例：&lt;/p&gt;
&lt;h3&gt;面向对象编程&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;import sys
from PyQt5.QtWidgets import QApplication, QMainWindow, QPushButton

class MyWindow(QMainWindow):
    def __init__(self):
        super().__init__()

        self.setWindowTitle(&quot;My Window&quot;)

        self.button = QPushButton(&quot;Click me&quot;, self)
        self.button.setGeometry(50, 50, 100, 30)
        self.button.clicked.connect(self.on_button_click)

    def on_button_click(self):
        print(&quot;Button clicked&quot;)

app = QApplication(sys.argv)
window = MyWindow()
window.show()
sys.exit(app.exec_())
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;直接编写代码&lt;/h3&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;import sys
from PyQt5.QtWidgets import QApplication, QMainWindow, QPushButton

app = QApplication(sys.argv)
window = QMainWindow()
window.setWindowTitle(&quot;My Window&quot;)

button = QPushButton(&quot;Click me&quot;, window)
button.setGeometry(50, 50, 100, 30)
button.clicked.connect(lambda: print(&quot;Button clicked&quot;))

window.show()
sys.exit(app.exec_())
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;无论你选择哪种方法，都可以根据项目需求来灵活调整和扩展代码。如果界面较为复杂或需要更好的代码组织，建议使用面向对象编程。如果界面简单且直接，可以选择直接编写代码。&lt;/p&gt;
&lt;p&gt;以下是一个使用面向对象编程简单的示例代码，展示了如何创建一个基本的 PyQt5 窗口：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;import sys
from PyQt5.QtWidgets import QApplication, QMainWindow

class MyWindow(QMainWindow):
    def __init__(self):
        super().__init__()
        self.setWindowTitle(&quot;My PyQt5 Window&quot;)
        self.setGeometry(100, 100, 800, 600)

if __name__ == &quot;__main__&quot;:
    app = QApplication(sys.argv)
    window = MyWindow()
    window.show()
    sys.exit(app.exec_())
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;在这个示例中，我们首先导入了必要的模块，然后创建了一个继承自 &lt;code&gt;QMainWindow&lt;/code&gt; 的自定义窗口类 &lt;code&gt;MyWindow&lt;/code&gt;。在 &lt;code&gt;__init__&lt;/code&gt; 构造函数中，我们设置了窗口的标题和初始大小。最后，我们创建了一个应用对象并显示窗口。&lt;/p&gt;
&lt;h2&gt;常用的 PyQt5 组件&lt;/h2&gt;
&lt;p&gt;当使用 PyQt5 创建图形用户界面时，会涉及多种常用的组件，每个组件都有其特定的属性和用法。以下是一些常用组件的用法：&lt;/p&gt;
&lt;h3&gt;QLabel（标签）&lt;/h3&gt;
&lt;p&gt;标签用于显示文本或图像，可以用来展示信息、标题、说明等。常用属性和方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;setText(text)&lt;/code&gt;：设置标签的文本内容。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;text()&lt;/code&gt;：获取标签的文本内容。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;setPixmap(pixmap)&lt;/code&gt;：设置标签显示的图像。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;setAlignment(alignment)&lt;/code&gt;：设置文本对齐方式。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;setFont(font)&lt;/code&gt;：设置字体。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QLabel

label = QLabel(&quot;Hello, PyQt5&quot;)
label.setAlignment(Qt.AlignCenter)
label.setFont(QFont(&quot;Arial&quot;, 12, QFont.Bold))
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;QLineEdit（单行文本输入框）&lt;/h3&gt;
&lt;p&gt;单行文本输入框用于接收用户输入的文本，例如用户名、密码等。常用属性和方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;setText(text)&lt;/code&gt;：设置文本框的初始文本。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;text()&lt;/code&gt;：获取用户输入的文本内容。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;setPlaceholderText(text)&lt;/code&gt;：设置提示文本。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QLineEdit

line_edit = QLineEdit()
line_edit.setPlaceholderText(&quot;Enter your name&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;QTextEdit（多行文本输入框）&lt;/h3&gt;
&lt;p&gt;多行文本输入框用于接收多行文本输入，支持富文本格式。常用属性和方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;setText(text)&lt;/code&gt;：设置文本框的初始文本。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;toPlainText()&lt;/code&gt;：获取用户输入的纯文本内容。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;insertHtml(html)&lt;/code&gt;：插入富文本内容。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QTextEdit

text_edit = QTextEdit()
text_edit.insertHtml(&quot;&amp;#x3C;b&gt;Hello&amp;#x3C;/b&gt;, &amp;#x3C;i&gt;PyQt5&amp;#x3C;/i&gt;&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;QComboBox（下拉框）&lt;/h3&gt;
&lt;p&gt;下拉框提供了一组选项供用户选择。常用属性和方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;addItem(item)&lt;/code&gt;：添加选项。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;addItems(items)&lt;/code&gt;：批量添加选项。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;currentIndex()&lt;/code&gt;：获取当前选中的选项索引。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;currentText()&lt;/code&gt;：获取当前选中的选项文本。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;activated.connect(slot)&lt;/code&gt;：连接选项激活的信号。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QComboBox

combo_box = QComboBox()
combo_box.addItem(&quot;Option 1&quot;)
combo_box.addItems([&quot;Option 2&quot;, &quot;Option 3&quot;])
selected_index = combo_box.currentIndex()
selected_text = combo_box.currentText()
combo_box.activated.connect(on_combo_box_activated)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;QPushButton（按钮）&lt;/h3&gt;
&lt;p&gt;按钮用于触发特定操作或事件。常用属性和方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;setText(text)&lt;/code&gt;：设置按钮显示的文本。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;clicked.connect(slot)&lt;/code&gt;：连接按钮点击事件的信号。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QPushButton

button = QPushButton(&quot;Click me&quot;)
button.clicked.connect(on_button_click)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;QCheckBox（复选框）&lt;/h3&gt;
&lt;p&gt;复选框用于表示一个二选一的选项。常用属性和方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;isChecked()&lt;/code&gt;：检查复选框是否被选中。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;text()&lt;/code&gt;：获取复选框的文本内容。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;toggled.connect(slot)&lt;/code&gt;：连接复选框状态变化的信号。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QCheckBox

check_box = QCheckBox(&quot;Check me&quot;)
checked = check_box.isChecked()
check_box.toggled.connect(on_check_box_toggled)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;QRadioButton（单选按钮）&lt;/h3&gt;
&lt;p&gt;单选按钮用于从多个选项中选择一个。常用属性和方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;isChecked()&lt;/code&gt;：检查单选按钮是否被选中。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;text()&lt;/code&gt;：获取单选按钮的文本内容。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;toggled.connect(slot)&lt;/code&gt;：连接单选按钮状态变化的信号。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QRadioButton

radio_button = QRadioButton(&quot;Option 1&quot;)
checked = radio_button.isChecked()
radio_button.toggled.connect(on_radio_button_toggled)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;QSlider（滑块）&lt;/h3&gt;
&lt;p&gt;滑块用于选择一个范围内的值。常用属性和方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;setRange(minimum, maximum)&lt;/code&gt;：设置滑块的范围。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;setValue(value)&lt;/code&gt;：设置滑块的当前值。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;value()&lt;/code&gt;：获取滑块的当前值。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;sliderMoved.connect(slot)&lt;/code&gt;：连接滑块移动事件的信号。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QSlider

slider = QSlider(Qt.Horizontal)
slider.setRange(0, 100)
slider.setValue(50)
slider.sliderMoved.connect(on_slider_moved)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;QProgressBar（进度条）&lt;/h3&gt;
&lt;p&gt;进度条用于显示任务的进度。常用属性和方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;setRange(minimum, maximum)&lt;/code&gt;：设置进度条的范围。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;setValue(value)&lt;/code&gt;：设置进度条的当前值。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;value()&lt;/code&gt;：获取进度条的当前值。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QProgressBar

progress_bar = QProgressBar()
progress_bar.setRange(0, 100)
progress_bar.setValue(75)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;QSpinBox（数值输入框）&lt;/h3&gt;
&lt;p&gt;数值输入框用于输入整数值。常用属性和方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;setRange(minimum, maximum)&lt;/code&gt;：设置数值输入框的范围。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;setValue(value)&lt;/code&gt;：设置数值输入框的当前值。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;value()&lt;/code&gt;：获取数值输入框的当前值。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QSpinBox

spin_box = QSpinBox()
spin_box.setRange(0, 100)
spin_box.setValue(50)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;QDateTimeEdit（日期时间输入框）&lt;/h3&gt;
&lt;p&gt;日期时间输入框用于输入日期和时间。常用属性和方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;setDateTime(datetime)&lt;/code&gt;：设置日期时间输入框的日期时间。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dateTime()&lt;/code&gt;：获取日期时间输入框的日期时间。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QDateTimeEdit

date_time_edit = QDateTimeEdit()
date_time_edit.setDateTime(QDateTime.currentDateTime())
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;QFileDialog（文件对话框）&lt;/h3&gt;
&lt;p&gt;文件对话框用于选择文件或目录。常用方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;getOpenFileName()&lt;/code&gt;：打开文件选择对话框并返回选择的文件路径。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;getSaveFileName()&lt;/code&gt;：打开文件保存对话框并返回选择的文件路径。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;getExistingDirectory()&lt;/code&gt;：打开目录选择对话框并返回选择的目录路径。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QFileDialog

file_path, _ = QFileDialog.getOpenFileName(None, &quot;Open File&quot;, &quot;&quot;, &quot;All Files (*.*)&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;QMessageBox（消息框）&lt;/h3&gt;
&lt;p&gt;消息框用于显示提示、警告或错误信息。常用方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;information(parent, title, text)&lt;/code&gt;：显示信息提示框。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;warning(parent, title, text)&lt;/code&gt;：显示警告提示框。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;critical(parent, title, text)&lt;/code&gt;：显示错误提示框。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;question(parent, title, text)&lt;/code&gt;：显示询问提示框。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QMessageBox

QMessageBox.information(None, &quot;Info&quot;, &quot;This is an information message.&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;布局管理&lt;/h2&gt;
&lt;p&gt;在 PyQt5 中，布局管理用于自动排列和定位组件，以便适应不同窗口大小。以下是一些常用的布局类型和使用示例：&lt;/p&gt;
&lt;h3&gt;QGridLayout（网格布局）&lt;/h3&gt;
&lt;p&gt;网格布局将组件按照行和列的方式排列。常用方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;addWidget(widget, row, column, rowSpan, columnSpan)&lt;/code&gt;：将组件添加到指定行列位置，可跨行列。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QGridLayout

grid = QGridLayout()
grid.addWidget(label, 0, 0)
grid.addWidget(line_edit, 1, 0)
grid.addWidget(text_edit, 2, 0, 2, 1)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;QVBoxLayout（垂直布局）&lt;/h3&gt;
&lt;p&gt;垂直布局将组件按垂直方向排列。常用方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;addWidget(widget)&lt;/code&gt;：将组件按顺序添加到布局。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QVBoxLayout

vbox = QVBoxLayout()
vbox.addWidget(button1)
vbox.addWidget(button2)
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;QHBoxLayout（水平布局）&lt;/h3&gt;
&lt;p&gt;水平布局将组件按水平方向排列。常用方法包括：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;addWidget(widget)&lt;/code&gt;：将组件按顺序添加到布局。&lt;/li&gt;
&lt;/ul&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtWidgets import QHBoxLayout

hbox = QHBoxLayout()
hbox.addWidget(button1)
hbox.addWidget(button2)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;这些是一些常用的 PyQt5 组件和布局，通过合理地使用它们，你可以创建出丰富多彩的图形用户界面。根据项目的需求，你可以灵活地选择合适的组件和布局方式。&lt;/p&gt;
&lt;p&gt;布局管理使得窗口中的组件自动适应并排列，无需手动调整位置和大小。&lt;/p&gt;
&lt;h2&gt;多线程与线程间通信&lt;/h2&gt;
&lt;h3&gt;创建线程&lt;/h3&gt;
&lt;p&gt;在 PyQt5 中，可以使用 &lt;code&gt;QThread&lt;/code&gt; 类来创建线程。为了创建一个自定义线程，需要继承 &lt;code&gt;QThread&lt;/code&gt; 并重写其 &lt;code&gt;run&lt;/code&gt; 方法，将耗时操作放在 &lt;code&gt;run&lt;/code&gt; 方法中执行。&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtCore import QThread

class MyThread(QThread):
    def run(self):
        # 耗时操作
        pass
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;在线程间传递信号&lt;/h3&gt;
&lt;p&gt;在多线程应用中，线程之间的通信是常见的需求。PyQt5 提供了信号与槽机制来实现线程间的通信。可以通过自定义信号，在一个线程中发射信号，然后在另一个线程中连接该信号到槽函数来接收信号。&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtCore import QThread, pyqtSignal

class MyThread(QThread):
    my_signal = pyqtSignal(str)  # 自定义信号，传递参数为 str 类型

    def run(self):
        # 耗时操作
        result = &quot;耗时操作的结果&quot;
        self.my_signal.emit(result)  # 发射信号
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;主线程接收信号&lt;/h3&gt;
&lt;p&gt;主线程可以连接自定义信号的槽函数，以接收在子线程中发射的信号。&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class MainWindow(QMainWindow):
    def __init__(self):
        super().__init__()
        self.thread = MyThread()
        self.init_ui()

    def init_ui(self):
        # ... 初始化界面 ...

        self.thread.my_signal.connect(self.update_label)  # 连接信号和槽函数

    def update_label(self, result):
        # 更新界面
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;安全退出子线程&lt;/h3&gt;
&lt;p&gt;为了确保线程的安全退出，可以在窗口关闭事件中停止子线程并等待其完成。&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;class MainWindow(QMainWindow):
    # ... 其他代码 ...

    def closeEvent(self, event):
        if self.thread.isRunning():
            self.thread.quit()  # 停止线程
            self.thread.wait()  # 等待线程完成
        event.accept()
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;进一步解释：&lt;/h3&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;self.my_signal.emit(result)&lt;/code&gt;：这行代码在子线程中发射了一个自定义信号 &lt;code&gt;my_signal&lt;/code&gt;，并传递了参数 &lt;code&gt;result&lt;/code&gt;。这个信号可以携带任意数量和类型的参数，这里我们传递了一个字符串 &lt;code&gt;result&lt;/code&gt;。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;self.thread.my_signal.connect(self.update_label)&lt;/code&gt;：这行代码在主线程中连接了子线程发射的信号 &lt;code&gt;my_signal&lt;/code&gt; 到主线程的槽函数 &lt;code&gt;update_label&lt;/code&gt;。这样一旦子线程发射了信号，主线程就会调用 &lt;code&gt;update_label&lt;/code&gt; 方法来处理这个信号。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;def update_label(self, result):&lt;/code&gt;：这是主线程中的槽函数。当子线程发射信号时，主线程会调用这个函数，并将子线程传递的参数 &lt;code&gt;result&lt;/code&gt; 作为参数传递给这个函数。因此，&lt;code&gt;result&lt;/code&gt; 确实代表了子线程传递的 &lt;code&gt;result&lt;/code&gt;。&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;关于&lt;code&gt;def update_label(self, result):&lt;/code&gt; 中的参数名
参数名只是一个标识符，它并不影响信号的传递和槽函数的调用。&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;例如，你可以这样修改函数定义：

```python
def update_label(self, data):
# 使用 data 参数进行处理
```

然后在连接信号时，也需要相应地修改：

```python
self.thread.my_signal.connect(self.update_label)
```

只要信号和槽函数的参数类型匹配，无论参数名是什么，信号传递的参数都能够被成功传递给槽函数进行处理。
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;关于传递参数的类型&lt;/h3&gt;
&lt;p&gt;在 PyQt5 中，你可以使用自定义信号来传递多种类型的参数。除了 &lt;code&gt;str&lt;/code&gt; 类型，还可以传递以下常用的参数类型：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;int&lt;/code&gt;：整数类型。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;float&lt;/code&gt;：浮点数类型。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;bool&lt;/code&gt;：布尔类型。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;list&lt;/code&gt; 或 &lt;code&gt;tuple&lt;/code&gt;：列表或元组类型，可以传递多个参数。&lt;/li&gt;
&lt;li&gt;&lt;code&gt;object&lt;/code&gt;：Python 对象，可以传递任意类型的参数。&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;需要注意的是，信号和槽函数的参数类型必须匹配，否则会引发错误。当然，你也可以使用 &lt;code&gt;pyqtSignal(object)&lt;/code&gt; 来传递任意类型的参数，但在槽函数内部需要根据参数类型进行适当的处理。&lt;/p&gt;
&lt;p&gt;以下是一个示例，展示了如何使用不同类型的参数传递自定义信号：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtCore import pyqtSignal, QObject

class MyObject(QObject):
    my_signal_int = pyqtSignal(int)
    my_signal_float = pyqtSignal(float)
    my_signal_bool = pyqtSignal(bool)
    my_signal_list = pyqtSignal(list)
    my_signal_object = pyqtSignal(object)

    def send_signals(self):
        self.my_signal_int.emit(42)
        self.my_signal_float.emit(3.14)
        self.my_signal_bool.emit(True)
        self.my_signal_list.emit([1, 2, 3])
        self.my_signal_object.emit(&quot;Hello from signal!&quot;)

def my_slot(data):
    print(&quot;Received:&quot;, data)

obj = MyObject()
obj.my_signal_int.connect(my_slot)
obj.my_signal_float.connect(my_slot)
obj.my_signal_bool.connect(my_slot)
obj.my_signal_list.connect(my_slot)
obj.my_signal_object.connect(my_slot)

obj.send_signals()
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;在上述示例中，我们定义了一个 &lt;code&gt;MyObject&lt;/code&gt; 类，它包含了不同类型的自定义信号。然后，我们通过连接这些信号到同一个槽函数 &lt;code&gt;my_slot&lt;/code&gt; 来展示如何传递不同类型的参数。在槽函数内部，我们可以根据参数的类型来进行相应的处理。&lt;/p&gt;
&lt;h3&gt;多线程进阶&lt;/h3&gt;
&lt;p&gt;当涉及多线程编程和线程间通信时，以下是一些重要的概念和技术&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;互斥锁和信号量&lt;/strong&gt;：&lt;/p&gt;
&lt;p&gt;互斥锁用于保护共享资源，以确保在任何时候只有一个线程可以访问资源。信号量用于限制同时访问资源的线程数量。&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtCore import QMutex, QSemaphore, QThread

class SharedResource:
    def __init__(self):
        self.mutex = QMutex()  # 创建互斥锁
        self.semaphore = QSemaphore(3)  # 创建信号量，允许3个线程同时访问

    def access_resource(self):
        self.semaphore.acquire()  # 获取信号量
        self.mutex.lock()  # 上锁
        # 访问和操作共享资源
        self.mutex.unlock()  # 解锁
        self.semaphore.release()  # 释放信号量

class WorkerThread(QThread):
    def __init__(self, resource):
        super().__init__()
        self.resource = resource

    def run(self):
        self.resource.access_resource()

resource = SharedResource()
threads = [WorkerThread(resource) for _ in range(5)]

for thread in threads:
    thread.start()
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;线程池&lt;/strong&gt;：&lt;/p&gt;
&lt;p&gt;线程池可以有效地管理和调度多个线程执行任务，避免频繁地创建和销毁线程。&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtCore import QThreadPool, QRunnable, Qt

class Task(QRunnable):
    def __init__(self, task_id):
        super().__init__()
        self.task_id = task_id

    def run(self):
        print(f&quot;Task {self.task_id} is running in thread {int(QThread.currentThreadId())}&quot;)

pool = QThreadPool.globalInstance()

for i in range(5):
    task = Task(i)
    pool.start(task)
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;定时器和延迟&lt;/strong&gt;：&lt;/p&gt;
&lt;p&gt;使用定时器可以在一段时间后触发任务，避免阻塞线程。&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;from PyQt5.QtCore import QTimer, pyqtSlot

class TimerExample:
    def __init__(self):
        self.timer = QTimer()
        self.timer.timeout.connect(self.on_timer_timeout)
        self.timer.start(1000)  # 每秒触发一次

    @pyqtSlot()
    def on_timer_timeout(self):
        print(&quot;Timer triggered&quot;)

example = TimerExample()
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;线程间通信的其他方式&lt;/strong&gt;：&lt;/p&gt;
&lt;p&gt;除了信号和槽函数，还可以使用队列来在线程之间传递数据。&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;import queue
from PyQt5.QtCore import QThread, pyqtSlot

class QueueExample(QThread):
    def __init__(self):
        super().__init__()
        self.message_queue = queue.Queue()

    def run(self):
        while True:
            message = self.message_queue.get()
            if message == &quot;exit&quot;:
                break
            print(f&quot;Received message: {message}&quot;)

    def send_message(self, message):
        self.message_queue.put(message)

example = QueueExample()
example.start()
example.send_message(&quot;Hello&quot;)
example.send_message(&quot;World&quot;)
example.send_message(&quot;exit&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;总结&lt;/h2&gt;
&lt;p&gt;本文介绍了如何使用 PyQt5 创建常见的 GUI 组件，包括标签、按钮、文本框、下拉框、复选框和绘图区域，以及如何使用布局管理来排列这些组件。&lt;/p&gt;
&lt;h2&gt;应用&lt;/h2&gt;
&lt;p&gt;应用以上方法，笔者试着写了一个简易串口调试助手 &lt;a href=&quot;https://github.com/Snape-max/MA-SerialDebugger/&quot;&gt;MA-SerialDebugger&lt;/a&gt;，欢迎使用并提出改进意见。&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>MATLAB中的算数运算命令</title><link>https://pengwee.wang/blog/suan-shu-yun-suan-ming-ling</link><guid isPermaLink="true">https://pengwee.wang/blog/suan-shu-yun-suan-ming-ling</guid><description>MATLAB中的算数运算命令</description><pubDate>Sun, 05 Jun 2022 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://pengwee.wang/_astro/tab1.ldAD-WiB_MCadb.webp&quot; alt=&quot;tab1&quot;&gt;&lt;/p&gt;</content:encoded><h:img src="/_astro/tab1.ldAD-WiB.jpeg"/><enclosure url="/_astro/tab1.ldAD-WiB.jpeg"/></item><item><title>MATLAB中集合操作</title><link>https://pengwee.wang/blog/ji-he-cao-zuo</link><guid isPermaLink="true">https://pengwee.wang/blog/ji-he-cao-zuo</guid><description>MATLAB中集合操作函数</description><pubDate>Fri, 03 Jun 2022 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;函数&lt;/p&gt;
&lt;p&gt;描述&lt;/p&gt;
&lt;p&gt;intersect(A,B)&lt;/p&gt;
&lt;p&gt;设置两个数组的交集；返回A和B所共有的值。返回的值按排序顺序排列。&lt;/p&gt;
&lt;p&gt;intersect(A,B,&apos;rows&apos;)&lt;/p&gt;
&lt;p&gt;将A和B的每一行作为单个实体处理，并返回A和B的公共行。返回的矩阵的行按排序顺序排列。&lt;/p&gt;
&lt;p&gt;ismember(A,B)&lt;/p&gt;
&lt;p&gt;返回与A大小相同的数组，包含1（true），其中A的元素在其他地方的B中找到，它返回0（false）。&lt;/p&gt;
&lt;p&gt;ismember(A,B,&apos;rows&apos;)&lt;/p&gt;
&lt;p&gt;将A和B的每一行作为单个实体处理，并返回一个包含1（true）的向量，其中矩阵A的行也是B的行；否则，它返回0（false）。&lt;/p&gt;
&lt;p&gt;issorted(A)&lt;/p&gt;
&lt;p&gt;如果A的元素按排序顺序返回逻辑1（true），否则返回逻辑0（false）。输入A可以是向量，也可以是N-by-1或1-by-N的字符串数组。如果A和sort（A）的输出相等，则A被认为是排序的。&lt;/p&gt;
&lt;p&gt;issorted(A, &apos;rows&apos;)&lt;/p&gt;
&lt;p&gt;如果二维矩阵A的行按排序顺序返回逻辑1（真），否则返回逻辑0（假）。 如果A和排序（A）的输出相等，则认为矩阵A被排序。&lt;/p&gt;
&lt;p&gt;setdiff(A,B)&lt;/p&gt;
&lt;p&gt;设置两个数组的差值；返回不在B中的值。返回数组中的值按排序顺序排列。&lt;/p&gt;
&lt;p&gt;setdiff(A,B,&apos;rows&apos;)&lt;/p&gt;
&lt;p&gt;将每一行A和B行作为单个实体处理，并返回一个不在B中的行。返回的矩阵的行按排序顺序排列。&lt;/p&gt;
&lt;p&gt;“行”选项不支持单元格数组。&lt;/p&gt;
&lt;p&gt;setxor&lt;/p&gt;
&lt;p&gt;设置两个数组的异或&lt;/p&gt;
&lt;p&gt;union&lt;/p&gt;
&lt;p&gt;设置两个数组的并集&lt;/p&gt;
&lt;p&gt;unique&lt;/p&gt;
&lt;p&gt;数组中唯一的值&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>MATLAB中函数详解</title><link>https://pengwee.wang/blog/han-shu</link><guid isPermaLink="true">https://pengwee.wang/blog/han-shu</guid><description>MATLAB中函数详解</description><pubDate>Thu, 02 Jun 2022 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;em&gt;函数定义在单独的文件中，函数和函数的文件名应该是相同的。&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;函数语句的语法是：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;function [out1,out2, ..., outN] = myfun(in1,in2,in3, ..., inN)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;in1,in2...&lt;/code&gt;是输入&lt;code&gt;out1,out2...&lt;/code&gt;输出&lt;/p&gt;
&lt;p&gt;eg:
下述有个 mymax 函数，它需要五个数字作为参数并返回最大的数字。&lt;/p&gt;
&lt;p&gt;建立函数文件，命名为 mymax.m 并输入下面的代码：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;function max = mymax(n1, n2, n3, n4, n5)
%This function calculates the maximum of the
% five numbers given as input
max =  n1;
if(n2 &gt; max)
    max = n2;
end
if(n3 &gt; max)
   max = n3;
end
if(n4 &gt; max)
    max = n4;
end
if(n5 &gt; max)
    max = n5;
end
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;MATLAB匿名函数&lt;/h2&gt;
&lt;p&gt;一个匿名的函数就像是在传统的编程语言，在一个单一的 MATLAB 语句定义一个内联函数。&lt;/p&gt;
&lt;p&gt;它由一个单一的 MATLAB 表达式和任意数量的输入和输出参数。&lt;/p&gt;
&lt;p&gt;在MATLAB命令行或在一个函数或脚本可以定义一个匿名函数。&lt;/p&gt;
&lt;p&gt;这种方式，可以创建简单的函数，而不必为他们创建一个文件。&lt;/p&gt;
&lt;p&gt;建立一个匿名函数表达式的语法如下：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;f = @(arglist)expression
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;详细例子&lt;/h2&gt;
&lt;p&gt;在这个例子中，我们将编写一个匿名函数 power，这将需要两个数字作为输入并返回第二个数字到第一个数字次幂。&lt;/p&gt;
&lt;p&gt;在MATLAB中建立一个脚本文件，并输入下述代码：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;power = @(x, n) x.^n;
result1 = power(7, 3)
result2 = power(49, 0.5)
result3 = power(10, -10)
result4 = power (4.5, 1.5)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;运行该文件时，显示结果：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;result1 =
   343
result2 =
     7
result3 =
   1.0000e-10
result4 =
    9.5459
&lt;/code&gt;&lt;/pre&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>MATLAB中矩阵的使用</title><link>https://pengwee.wang/blog/ju-zhen</link><guid isPermaLink="true">https://pengwee.wang/blog/ju-zhen</guid><description>MATLAB中矩阵的使用</description><pubDate>Wed, 01 Jun 2022 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;创建矩阵&lt;/h2&gt;
&lt;p&gt;在MATLAB中创建矩阵有以下规则：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;矩阵元素必须在 “&lt;strong&gt;[ ]&lt;/strong&gt;” 内；&lt;/li&gt;
&lt;li&gt;矩阵的同行元素之间用空格（或 “&lt;strong&gt;,&lt;/strong&gt;”）隔开；&lt;/li&gt;
&lt;li&gt;矩阵的行与行之间用 “&lt;strong&gt;;&lt;/strong&gt;”（或回车符）隔开；&lt;/li&gt;
&lt;li&gt;矩阵的元素可以是数值、变量、表达式或函数；&lt;/li&gt;
&lt;li&gt;矩阵的尺寸不必预先定义。&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;矩阵索引&lt;/h2&gt;
&lt;p&gt;如果要引用 mth 行和 nth 列的一个元素，写法如下：&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;mx(m, n);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;索引整列&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;a = [ 1 2 3 4 5; 2 3 4 5 6; 3 4 5 6 7; 4 5 6 7 8];
v = a(:,4)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;返回&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-sh&quot;&gt;v =
     4
     5
     6
     7
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;矩阵赋值&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>MATLAB中多项式详解</title><link>https://pengwee.wang/blog/duo-xiang-shi</link><guid isPermaLink="true">https://pengwee.wang/blog/duo-xiang-shi</guid><description>MATLAB中多项式详解</description><pubDate>Wed, 01 Jun 2022 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;strong&gt;&lt;em&gt;MATLAB表示多项式为包含由下降幂排列的系数的行向量。&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;计算多项式的值&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;polyval()&lt;/code&gt;函数
&lt;strong&gt;eg:&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;p = [1 7 0 -5 9];
polyval(p,4)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;polyvalm()&lt;/code&gt;函数用于评估计算矩阵多项式
&lt;strong&gt;eg:&lt;/strong&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;p = [1 7 0 -5 9];
X = [1 2 -3 4; 2 -5 6 3; 3 1 0 2; 5 -7 3 8];
polyvalm(p, X)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;计算多项式的根&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;roots&lt;/code&gt;函数计算多项式的根。 例如，要计算多项式&lt;code&gt;p&lt;/code&gt;的根，可参考以下语法 -&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-matlab&quot;&gt;p = [1 7 0  -5 9];
r = roots(p)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;poly&lt;/code&gt;函数是&lt;code&gt;roots&lt;/code&gt;函数的逆，并返回到多项式系数。 例如 -&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-matlab&quot;&gt;p = [1 7 0  -5 9];
r = roots(p)
p2 = poly(r)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;MATLAB执行上述代码语句返回以下结果 -&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt;Trial&gt;&gt; p = [1 7 0  -5 9];
r = roots(p)
p2 = poly(r)

r =

  -6.8661 + 0.0000i
  -1.4247 + 0.0000i
   0.6454 + 0.7095i
   0.6454 - 0.7095i


p2 =

    1.0000    7.0000    0.0000   -5.0000    9.0000
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;多项式曲线拟合&lt;/h2&gt;
&lt;p&gt;&lt;code&gt;polyfit&lt;/code&gt;函数用来查找一个多项式的系数，它符合最小二乘法中的一组数据。 如果&lt;code&gt;x&lt;/code&gt;和&lt;code&gt;y&lt;/code&gt;包含要拟合到&lt;code&gt;n&lt;/code&gt;度多项式的&lt;code&gt;x&lt;/code&gt;和&lt;code&gt;y&lt;/code&gt;数据的两个向量，则得到通过拟合数据的多项式，参考代码 -&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-matlab&quot;&gt;p = polyfit(x,y,n)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;strong&gt;示例&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;创建脚本文件并键入以下代码 -&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-matlab&quot;&gt;x = [1 2 3 4 5 6]; y = [5.5 43.1 128 290.7 498.4 978.67];  %data
p = polyfit(x,y,4)   %get the polynomial
% Compute the values of the polyfit estimate over a finer range,
% and plot the estimate over the real data values for comparison:
x2 = 1:.1:6;
y2 = polyval(p,x2);
plot(x,y,&apos;o&apos;,x2,y2)
grid on
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;MATLAB执行上述代码语句返回以下结果 -&lt;/p&gt;
&lt;pre&gt;&lt;code class=&quot;language-shell&quot;&gt;Trial&gt;&gt; x = [1 2 3 4 5 6]; y = [5.5 43.1 128 290.7 498.4 978.67];  %data
p = polyfit(x,y,4)   %get the polynomial
% Compute the values of the polyfit estimate over a finer range,
% and plot the estimate over the real data values for comparison:
x2 = 1:.1:6;
y2 = polyval(p,x2);
plot(x,y,&apos;o&apos;,x2,y2)
grid on

p =

    4.1056  -47.9607  222.2598 -362.7453  191.1250
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;同时还输出一个图形 -&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://www.yiibai.com/uploads/images/201710/0810/631081057_19222.png&quot; alt=&quot;&quot;&gt;&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item><item><title>MATLAB入门</title><link>https://pengwee.wang/blog/matlab</link><guid isPermaLink="true">https://pengwee.wang/blog/matlab</guid><description>MATLAB入门基础向量和矩阵操作</description><pubDate>Mon, 16 May 2022 00:00:00 GMT</pubDate><content:encoded>&lt;h1&gt;向量&lt;/h1&gt;
&lt;p&gt;&lt;strong&gt;列向量&lt;/strong&gt; x = [1 ; 2 ; 3 ; 4 ; 5]
以分号分隔每一列&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;行向量&lt;/strong&gt;x = [1 2 3 4 5]或者[1,2,3,4,5]
以空格或者逗号分隔&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;矩阵&lt;/strong&gt;x = [1 2 3;4 5 6;7 8 9]&lt;/p&gt;
&lt;h1&gt;Matlab运算符&lt;/h1&gt;
&lt;p&gt;| 运算符 |             目的             |
| :----: | :--------------------------: |
|   +    |          加法运算符          |
|   -    |          减法运算符          |
|   *   |        标量和矩阵乘法        |
|   *   |        标量和矩阵乘法        |
|   ^    |        标量和矩阵求幂        |
|   .^   |           数组求幂           |
|   \   |           矩阵左除           |
|   /    |           矩阵右除           |
|  .\   |           阵列左除           |
|   ./   |           阵列右除           |
|   :    |     向量生成；子阵列提取     |
|   .    |      点乘运算，搭配使用      |
|  ...   |            续行符            |
|   ,    |      分行符，结果不显示      |
|   ;    | 语句结束；分行符（结果显示） |
|   %    |            注释符            |
|   _   |         引用和转置符         |
|  ._   |          非共轭转置          |
|   ()   |      下标运算；参数定义      |&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Matlab特殊变量与常量&lt;/strong&gt;
|Name|Meaning|
|:-----:|:----:|
|ans|计算结果的变量名|
|eps|浮点数的相对误差|
|i,j|虚数单位，$i^2 = j^2 = -1$|
|inf|无穷大|
|NaN|不定值|
|pi|圆周率|
&lt;strong&gt;Matlab命令&lt;/strong&gt;
|命令|作用|
|:---:|:---:|
|clc|清除命令窗口|
|clear|从内存中删除变量|
|exist|检查存在的文件或变量|
|global|声明全局变量|
|disp|显示一个数组或字符串的内容|
|fscanf|阅读从文件格式的数据|
|format|控制屏幕显示的格式|
|fprintf|格式化输出屏幕或文件|
|input|显示并的等待输出
|;|禁止显示网版印刷==？==|&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;运算命令&lt;/strong&gt;
|命令|作用/目的|
|:----:|:----:|
|cat|连接数组|
|find|查找非零元素的索引|
|length|计算元素数量|
|linspace|创建间隔向量|
|logspace|创建对数间隔向量|
|max|返回最大元素|
|min|返回最小元素 |
|prod|计算数组元素的连乘积|
|reshape|重新调整矩阵的行数、列数、维数|&lt;br&gt;
|size|计算数组大小|
|sort|排序每个列|
|sum|每列相加|
|eye|创建一个单位矩阵|
|ones|生成全1矩阵|&lt;br&gt;
|zeros|生成零矩阵|
|cross|计算矩阵交叉乘积|
|dot|计算矩阵点积|
|det|计算数组的行列式|
|inv|计算矩阵的逆|
|pinv|计算矩阵的伪逆|
|rank|计算矩阵的秩|
|rref|将矩阵化成行最简形|&lt;br&gt;
|cell|创建单元数组|
|celldisp|显示单元数组|
|cellplot|显示单元数组的图形表示|
|num2cell|将数值阵列转化为异质阵列|
|deal|匹配输入和输出列表|
|iscell|判断是否为元胞类型|&lt;/p&gt;
&lt;h2&gt;MATLAB绘图命令&lt;/h2&gt;
&lt;p&gt;|   命令    |         作用/目的          |
| :-------: | :------------------------: |
|   axis    |     人工选择坐标轴尺寸     |
|   fplot   |        智能绘图功能        |
|   grid    |         显示网格线         |
|   plot    |          生成XY图          |
|   print   |      打印或绘图到文件      |
|   title   |       把文字置于顶部       |
|  xlabel   |    将文本标签添加到x轴     |
|  ylabel   |    将文本标签添加到y轴     |
|   axes    |         创建轴对象         |
|   close   |       关闭当前的绘图       |
| close all |        关闭所有绘图        |
|  figure   |    打开一个新的图形窗口    |
|   gtext   |  通过鼠标在指定位置放注文  |
|   hold    |        保持当前图形        |
|  legend   |        鼠标放置图例        |
|  refresh  |    重新绘制当前图形窗口    |
|    set    |    指定对象的属性，如轴    |
|  subplot  |      在子窗口中创建图      |
|   text    |        在图上做标记        |
|    bar    |         创建条形图         |
|  loglog   |        创建双对数图        |
|   polar   |       创建极坐标图像       |
| semilogx  | 创建半对数图（对数横坐标） |
| semilogy  | 创建半对数图（对数纵坐标） |
|  stairs   |         创建阶梯图         |
|   stem    |         创建针状图         |&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;数据类型转换函数&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;a2b()&lt;/code&gt; &lt;code&gt;a&lt;/code&gt;是要转换的数据类型，&lt;code&gt;b&lt;/code&gt;是要转化为的类型&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;数据类型确定函数&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;isa()&lt;/code&gt; a是要确定的数据类型&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;运算符&lt;/strong&gt;
==~= 不等于==&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;操作符&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;描述&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;加法或一元加号。A + B将A和B。 A和B必须具有相同的尺寸，除非一个是一个标量。一个标量，可以被添加到任何大小的矩阵。&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;减法或一元减号。A - B，减去B从A和B必须具有相同的大小，除非是一个标量。可以从任意大小的矩阵中减去一个标量。&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;矩阵乘法；是一个更精确的矩阵A和B的线性代数积，&lt;/p&gt;
&lt;p&gt;矩阵乘法对于非纯量A和B，列一个数必须等于B.标量可以乘以一个任意大小的矩阵的行数。&lt;/p&gt;
&lt;p&gt;.*&lt;/p&gt;
&lt;p&gt;数组的乘法；A.*B是数组A和B的元素积，A和B必须具有相同的大小，除非A、B中有一个是标量。&lt;/p&gt;
&lt;p&gt;/&lt;/p&gt;
&lt;p&gt;斜线或矩阵右除法；B/A与B * inv（A）大致相同。更确切地说：&lt;/p&gt;
&lt;p&gt; B/A = (A&apos;B&apos;)&apos;&lt;/p&gt;
&lt;p&gt;./&lt;/p&gt;
&lt;p&gt;矩阵右除法；矩阵A与矩阵B相应元素相除（A、B为同纬度的矩阵）&lt;/p&gt;
&lt;p&gt;\&lt;/p&gt;
&lt;p&gt;反斜杠或矩阵左除；如果A是一个方阵，AB是大致相同的INV（A）* B，除非它是以不同的方式计算。如果A是一个n*n的矩阵，B是一个n组成的列向量，或是由若干这样的列的矩阵，则X = AB 是方程 AX = B ，如果A严重缩小或者几乎为单数，则显示警告消息。&lt;/p&gt;
&lt;p&gt;.\&lt;/p&gt;
&lt;p&gt;数组左除法；A. B是元素B（i，j）/A（i，j）的矩阵。A和B必须具有相同的大小，除非其中一个是标量。&lt;/p&gt;
&lt;p&gt;^&lt;/p&gt;
&lt;p&gt;矩阵的幂。X^P是X到幂P，如果p是标量；如果p是一个整数，则通过重复平方计算功率。如果整数为负数，X首先反转。对P值的计算，涉及到特征值和特征向量，即如果[ D ] = V，EIG（x），那么X^P = V * D.^P / V。&lt;/p&gt;
&lt;p&gt;.^&lt;/p&gt;
&lt;p&gt;A.^B：A的每个元素的B次幂（A、B为同纬度的矩阵）&lt;/p&gt;
&lt;p&gt;&apos;&lt;/p&gt;
&lt;p&gt;矩阵的转置；A&apos;是复数矩阵A的线性代数转置，这是复共轭转置。&lt;/p&gt;
&lt;p&gt;.&apos;&lt;/p&gt;
&lt;p&gt;数组的转置；A&apos;是数组A的转置，对于复数矩阵，这不涉及共轭。&lt;/p&gt;</content:encoded><h:img src="undefined"/><enclosure url="undefined"/></item></channel></rss>