Articles
![]() |
08/28/2025--
08/14/2025
Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics
Large language models (LLMs) struggle with cross-lingual knowledge transfer:
they hallucinate when asked in one language about facts expressed in a
different language during training. This work introduces a controlled setting
to study the causes and dynamics of this phenomenon by training small
Transformer models from scratch on synthetic multilingual datasets. We identify
a learning phase wherein a model develops either separate or unified
representations of the same facts across languages, and show that unification
is essential for cross-lingual transfer. We also show that the degree of
unification depends on mutual information between facts and training data
language, and on how easy it is to extract that language. Based on these
insights, we develop methods to modulate the level of cross-lingual transfer by
manipulating data distribution and tokenization, and we introduce metrics and
visualizations to formally characterize their effects on unification. Our work
shows how controlled settings can shed light on pre-training dynamics and
suggests new directions for improving cross-lingual transfer in LLMs.
Carter Blum
Katja Filippova
Ann Yuan
Asma Ghandeharioun
Julian Zimmert
Fred Zhang
Jessica Hoffmann
Tal Linzen
Martin Wattenberg
Lucas Dixon
Mor Geva
10/24/2024--
10/24/2024
Spatial-Temporal Search for Spiking Neural Networks
Spiking Neural Networks (SNNs) are considered as a potential candidate for
the next generation of artificial intelligence with appealing characteristics
such as sparse computation and inherent temporal dynamics. By adopting
architectures of Artificial Neural Networks (ANNs), SNNs achieve competitive
performances on benchmark tasks like image classification. However, successful
architectures of ANNs are not optimal for SNNs. In this work, we apply Neural
Architecture Search (NAS) to find suitable architectures for SNNs. Previous NAS
methods for SNNs focus primarily on the spatial dimension, with a notable lack
of consideration for the temporal dynamics that are of critical importance for
SNNs. Drawing inspiration from the heterogeneity of biological neural networks,
we propose a differentiable approach to optimize SNN on both spatial and
temporal dimensions. At spatial level, we have developed a spike-based
differentiable hierarchical search (SpikeDHS) framework, where spike-based
operation is optimized on both the cell and the layer level under computational
constraints. We further propose a differentiable surrogate gradient search
(DGS) method to evolve local SG functions independently during training. At
temporal level, we explore an optimal configuration of diverse temporal
dynamics on different types of spiking neurons by evolving their time
constants, based on which we further develop hybrid networks combining SNN and
ANN, balancing both accuracy and efficiency. Our methods achieve comparable
classification performance of CIFAR10/100 and ImageNet with accuracies of
96.43%, 78.96%, and 70.21%, respectively. On event-based deep stereo, our
methods find optimal layer variation and surpass the accuracy of specially
designed ANNs with 26$\times$ lower computational cost ($6.7\mathrm{mJ}$),
demonstrating the potential of SNN in processing highly sparse and dynamic
signals.
Kaiwei Che
Zhaokun Zhou
Li Yuan
Jianguo Zhang
Yonghong Tian
Luziwei Leng
07/01/1999--
07/01/1999
Doubling properties for second order parabolic equations
We prove the doubling property of L-caloric measure corresponding to the
second order parabolic equation in the whole space and in Lipschitz domains.
For parabolic equations in the divergence form, a weaker form of the doubling
property follows easily from a recent result, the backward Harnack inequality,
and known estimates of Green's function. Our method works for both the
divergence and nondivergence cases. Moreover, the backward Harnack inequality
and estimates of Green's function are not needed in the course of proof.
Mikhail V. Safonov
Yu Yuan
04/14/2021--
04/14/2021
An Interpretability Illusion for BERT
We describe an "interpretability illusion" that arises when analyzing the
BERT model. Activations of individual neurons in the network may spuriously
appear to encode a single, simple concept, when in fact they are encoding
something far more complex. The same effect holds for linear combinations of
activations. We trace the source of this illusion to geometric properties of
BERT's embedding space as well as the fact that common text corpora represent
only narrow slices of possible English sentences. We provide a taxonomy of
model-learned concepts and discuss methodological implications for
interpretability research, especially the importance of testing hypotheses on
multiple data sets.
Tolga Bolukbasi
Adam Pearce
Ann Yuan
Andy Coenen
Emily Reif
Fernanda Viégas
Martin Wattenberg
08/18/2022--
08/18/2022
Merchandise Recommendation for Retail Events with Word Embedding Weighted Tf-idf and Dynamic Query Expansion
To recommend relevant merchandises for seasonal retail events, we rely on
item retrieval from marketplace inventory. With feedback to expand query scope,
we discuss keyword expansion candidate selection using word embedding
similarity, and an enhanced tf-idf formula for expanded words in search
ranking.
Ted Tao Yuan
Zezhong Zhang
04/09/2023--
04/24/2022
An empirical study of the effect of background data size on the stability of SHapley Additive exPlanations (SHAP) for deep learning models
Nowadays, the interpretation of why a machine learning (ML) model makes
certain inferences is as crucial as the accuracy of such inferences. Some ML
models like the decision tree possess inherent interpretability that can be
directly comprehended by humans. Others like artificial neural networks (ANN),
however, rely on external methods to uncover the deduction mechanism. SHapley
Additive exPlanations (SHAP) is one of such external methods, which requires a
background dataset when interpreting ANNs. Generally, a background dataset
consists of instances randomly sampled from the training dataset. However, the
sampling size and its effect on SHAP remain to be unexplored. In our empirical
study on the MIMIC-III dataset, we show that the two core explanations - SHAP
values and variable rankings fluctuate when using different background datasets
acquired from random sampling, indicating that users cannot unquestioningly
trust the one-shot interpretation from SHAP. Luckily, such fluctuation
decreases with the increase of the background dataset size. Also, we notice an
U-shape in the stability assessment of SHAP variable rankings, demonstrating
that SHAP is more reliable in ranking the most and least important variables
compared to moderately important ones. Overall, our results suggest that users
should take into account how background data affects SHAP results, with
improved SHAP stability as the background sample size increases.
Han Yuan
Mingxuan Liu
Lican Kang
Chenkui Miao
Ying Wu
09/27/2023--
09/27/2023
Highly Efficient SNNs for High-speed Object Detection
The high biological properties and low energy consumption of Spiking Neural
Networks (SNNs) have brought much attention in recent years. However, the
converted SNNs generally need large time steps to achieve satisfactory
performance, which will result in high inference latency and computational
resources increase. In this work, we propose a highly efficient and fast SNN
for object detection. First, we build an initial compact ANN by using
quantization training method of convolution layer fold batch normalization
layer and neural network modification. Second, we theoretically analyze how to
obtain the low complexity SNN correctly. Then, we propose a scale-aware
pseudoquantization scheme to guarantee the correctness of the compact ANN to
SNN. Third, we propose a continuous inference scheme by using a Feed-Forward
Integrate-and-Fire (FewdIF) neuron to realize high-speed object detection.
Experimental results show that our efficient SNN can achieve 118X speedup on
GPU with only 1.5MB parameters for object detection tasks. We further verify
our SNN on FPGA platform and the proposed model can achieve 800+FPS object
detection with extremely low latency.
Nemin Qiu
Zhiguo Li
Yuan Li
Chuang Zhu
11/21/2024--
11/21/2024
Experimental comparison of graph-based approximate nearest neighbor search algorithms on edge devices
In this paper, we present an experimental comparison of various graph-based
approximate nearest neighbor (ANN) search algorithms deployed on edge devices
for real-time nearest neighbor search applications, such as smart city
infrastructure and autonomous vehicles. To the best of our knowledge, this
specific comparative analysis has not been previously conducted. While existing
research has explored graph-based ANN algorithms, it has often been limited to
single-threaded implementations on standard commodity hardware. Our study
leverages the full computational and storage capabilities of edge devices,
incorporating additional metrics such as insertion and deletion latency of new
vectors and power consumption. This comprehensive evaluation aims to provide
valuable insights into the performance and suitability of these algorithms for
edge-based real-time tracking systems enhanced by nearest-neighbor search
algorithms.
Ali Ganbarov
Jicheng Yuan
Anh Le-Tuan
Manfred Hauswirth
Danh Le-Phuoc
08/04/2025--
08/04/2025
Physics-informed Fourier Basis Neural Network for Fluid Mechanics
Solving partial differential equations (PDEs) is an important yet challenging
task in fluid mechanics. In this study, we embed an improved Fourier series
into neural networks and propose a physics-informed Fourier basis neural
network (FBNN) by incorporating physical information to solve canonical PDEs in
fluid mechanics. The results demonstrated that the proposed framework exhibits
a strong nonlinear fitting capability and exceptional periodic modeling
performance. In particular, our model shows significant advantages for the
Burgers equation with discontinuous solutions and Helmholtz equation with
strong periodicity. By directly introducing sparse distributed data to
reconstruct the entire flow field, we further intuitively validated the direct
superiority of FBNN over conventional artificial neural networks (ANN) as well
as the benefits of incorporating physical information into the network. By
adjusting the activation functions of networks and comparing with an ANN and
conventional physics-informed neural network, we proved that performance of the
proposed FBNN architecture is not highly sensitive to the choice of activation
functions. The nonlinear fitting capability of FBNN avoids excessive reliance
on activation functions, thereby mitigating the risk of suboptimal outcomes or
training failures stemming from unsuitable activation function choices.hese
results highlightthe potential of PIFBNN as a powerful tool in computational
fluid dynamics.
Chao Wang
Shilong Li
Zelong Yuan
Chunyu Guo
12/19/1999--
12/19/1999
(Teff,log g,[Fe/H]) Classification of Low-Resolution Stellar Spectra using Artificial Neural Networks
New generation large-aperture telescopes, multi-object spectrographs, and
large format detectors are making it possible to acquire very large samples of
stellar spectra rapidly. In this context, traditional star-by-star
spectroscopic analysis are no longer practical. New tools are required that are
capable of extracting quickly and with reasonable accuracy important basic
stellar parameters coded in the spectra. Recent analyses of Artificial Neural
Networks (ANNs) applied to the classification of astronomical spectra have
demonstrated the ability of this concept to derive estimates of temperature and
luminosity. We have adapted the back-propagation ANN technique developed by von
Hippel et al. (1994) to predict effective temperatures, gravities and overall
metallicities from spectra with resolving power ~ 2000 and low signal-to-noise
ratio. We show that ANN techniques are very effective in executing a
three-parameter (Teff,log g,[Fe/H]) stellar classification. The preliminary
results show that the technique is even capable of identifying outliers from
the training sample.
Shawn Snider
Yuan Qu
Carlos Allende Prieto
Ted von Hippel
Timothy C. Beers
Chistopher Sneden
David L. Lambert
|
|