Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

NaN problem of `nn.MultiHeadAttention` in PyTorch

1 minute read

Published: August 22, 2024

`MultiHeadAttention` `NaN` 问题

nn.MultiheadAttention causes gradients to become NaN under some use cases · Issue #41508 · pytorch/pytorch · GitHub 这几天持续跟踪了一下 pytorch 实现的 nn.MultiHeadAttention 计算过程中出现 NaN 的问题。根本原因是 tokenizer 在左侧增加 padding token（只能在左侧加，在右侧加是错误的，LLM 自回归生成，无法跟在 padding token 后面继续生成），导致 causal mask 和 padding mask 合并之后存在 attention matrix 前几行整行被 mask 的情况。pytorch 对于被 mask 部分的处理方式是填充 float("-inf")，导致经过 softmax 计算之后，整行都是 NaN。

How to build a personal homepage by academic pages?

2 minute read

Published: July 04, 2024

这篇博客介绍了使用 [academicpages][2] 模板制作个人主页的基本步骤，以及个人的一些推荐设置。对于博客中任何表述不够明确的配置，都可以访问这个个人主页的 github 仓库 wangzhen0518.github.io 阅读相应文件了解详细内容。

Hello!

less than 1 minute read

Published: July 04, 2024

Hello! Welcome to my personal homepage.

portfolio

Portfolio item number 1

Published: August 22, 2024

Short description of portfolio item number 1

Portfolio item number 2

Published: August 22, 2024

Short description of portfolio item number 2

publications

QCIR: Pattern Matching Based Universal Quantum Circuit Rewriting Framework

International Conference on Computer-Aided Design (ICCAD), 2022

Due to multiple limitations of quantum computers in the NISQ era, quantum compilation efforts are required to efficiently execute quantum algorithms on NISQ devices. Program rewriting based on pattern matching can improve the generalization ability of compiler optimization. However, it has rarely been explored for quantum circuit optimization, further considering physical features of target devices. In this paper, we propose a pattern-matching based quantum circuit optimization framework QCIR with a novel pattern description format, enabling the user-configured cost model and two categories of patterns, i.e., generic patterns and folding patterns. To get better compilation latency, we propose a DAG representation of quantum circuit called QCIR-DAG, and QVF algorithm for subcircuit matching. We implement continuous single-qubit optimization pass constructed by QCIR, achieving 10\% and 20\% optimization rate for benchmarks from Qiskit and ScaffCC, respectively.The practicality of QCIR is demonstrated by execution time and experimental results on the quantum simulator and quantum devices.

Recommended citation:
Mingyu Chen, Yu Zhang, Yongshang Li, Zhen Wang, Jun Li, and Xiangyang Li, QCIR: Pattern Matching Based Universal Quantum Circuit Rewriting Framework, in Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, San Diego California: ACM, Oct. 2022.
Download Paper

Accelerating Data Generation for Neural Operators via Krylov Subspace Recycling

International Conference on Learning Representations (ICLR), 2023

Learning neural operators for solving partial differential equations (PDEs) has attracted great attention due to its high inference efficiency. However, training such operators requires generating a substantial amount of labeled data, i.e., PDE problems together with their solutions. The data generation process is exceptionally time-consuming, as it involves solving numerous systems of linear equations to obtain numerical solutions to the PDEs. Many existing methods solve these systems independently without considering their inherent similarities, resulting in extremely redundant computations. To tackle this problem, we propose a novel method, namely Sorting Krylov Recycling (SKR), to boost the efficiency of solving these systems, thus significantly accelerating data generation for neural operators training. To the best of our knowledge, SKR is the first attempt to address the time-consuming nature of data generation for learning neural operators. The working horse of SKR is Krylov subspace recycling, a powerful technique for solving a series of interrelated systems by leveraging their inherent similarities. Specifically, SKR employs a sorting algorithm to arrange these systems in a sequence, where adjacent systems exhibit high similarities. Then it equips a solver with Krylov subspace recycling to solve the systems sequentially instead of independently, thus effectively enhancing the solving efficiency. Both theoretical analysis and extensive experiments demonstrate that SKR can significantly accelerate neural operator data generation, achieving a remarkable speedup of up to 13.9 times.

Recommended citation:
Hong Wang, Zhongkai Hao, Jie Wang, Zijie Geng, Zhen Wang, Bin Li, Feng Wu, Accelerating Data Generation for Neural Operators via Krylov Subspace Recycling, presented at the The Twelfth International Conference on Learning Representations, Oct. 2023.
Download Paper

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.

Zhen Wang / 汪震

Sitemap

Pages

Posts

MultiHeadAttention NaN 问题

portfolio

publications

talks

teaching

`MultiHeadAttention` `NaN` 问题