Articles

08/28/2025-- 08/14/2025

Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics

Large language models (LLMs) struggle with cross-lingual knowledge transfer: they hallucinate when asked in one language about facts expressed in a different language during training. This work introduces a controlled setting to study the causes and dynamics of this phenomenon by training small Transformer models from scratch on synthetic multilingual datasets. We identify a learning phase wherein a model develops either separate or unified representations of the same facts across languages, and show that unification is essential for cross-lingual transfer. We also show that the degree of unification depends on mutual information between facts and training data language, and on how easy it is to extract that language. Based on these insights, we develop methods to modulate the level of cross-lingual transfer by manipulating data distribution and tokenization, and we introduce metrics and visualizations to formally characterize their effects on unification. Our work shows how controlled settings can shed light on pre-training dynamics and suggests new directions for improving cross-lingual transfer in LLMs.

Carter Blum Katja Filippova Ann Yuan Asma Ghandeharioun Julian Zimmert Fred Zhang Jessica Hoffmann Tal Linzen Martin Wattenberg Lucas Dixon Mor Geva

10/24/2024-- 10/24/2024

Spatial-Temporal Search for Spiking Neural Networks

Spiking Neural Networks (SNNs) are considered as a potential candidate for the next generation of artificial intelligence with appealing characteristics such as sparse computation and inherent temporal dynamics. By adopting architectures of Artificial Neural Networks (ANNs), SNNs achieve competitive performances on benchmark tasks like image classification. However, successful architectures of ANNs are not optimal for SNNs. In this work, we apply Neural Architecture Search (NAS) to find suitable architectures for SNNs. Previous NAS methods for SNNs focus primarily on the spatial dimension, with a notable lack of consideration for the temporal dynamics that are of critical importance for SNNs. Drawing inspiration from the heterogeneity of biological neural networks, we propose a differentiable approach to optimize SNN on both spatial and temporal dimensions. At spatial level, we have developed a spike-based differentiable hierarchical search (SpikeDHS) framework, where spike-based operation is optimized on both the cell and the layer level under computational constraints. We further propose a differentiable surrogate gradient search (DGS) method to evolve local SG functions independently during training. At temporal level, we explore an optimal configuration of diverse temporal dynamics on different types of spiking neurons by evolving their time constants, based on which we further develop hybrid networks combining SNN and ANN, balancing both accuracy and efficiency. Our methods achieve comparable classification performance of CIFAR10/100 and ImageNet with accuracies of 96.43%, 78.96%, and 70.21%, respectively. On event-based deep stereo, our methods find optimal layer variation and surpass the accuracy of specially designed ANNs with 26$\times$ lower computational cost ($6.7\mathrm{mJ}$), demonstrating the potential of SNN in processing highly sparse and dynamic signals.

Kaiwei Che Zhaokun Zhou Li Yuan Jianguo Zhang Yonghong Tian Luziwei Leng

07/01/1999-- 07/01/1999

Doubling properties for second order parabolic equations

We prove the doubling property of L-caloric measure corresponding to the second order parabolic equation in the whole space and in Lipschitz domains. For parabolic equations in the divergence form, a weaker form of the doubling property follows easily from a recent result, the backward Harnack inequality, and known estimates of Green's function. Our method works for both the divergence and nondivergence cases. Moreover, the backward Harnack inequality and estimates of Green's function are not needed in the course of proof.

Mikhail V. Safonov Yu Yuan

04/14/2021-- 04/14/2021

An Interpretability Illusion for BERT

We describe an "interpretability illusion" that arises when analyzing the BERT model. Activations of individual neurons in the network may spuriously appear to encode a single, simple concept, when in fact they are encoding something far more complex. The same effect holds for linear combinations of activations. We trace the source of this illusion to geometric properties of BERT's embedding space as well as the fact that common text corpora represent only narrow slices of possible English sentences. We provide a taxonomy of model-learned concepts and discuss methodological implications for interpretability research, especially the importance of testing hypotheses on multiple data sets.

Tolga Bolukbasi Adam Pearce Ann Yuan Andy Coenen Emily Reif Fernanda Viégas Martin Wattenberg

08/18/2022-- 08/18/2022

Merchandise Recommendation for Retail Events with Word Embedding Weighted Tf-idf and Dynamic Query Expansion

To recommend relevant merchandises for seasonal retail events, we rely on item retrieval from marketplace inventory. With feedback to expand query scope, we discuss keyword expansion candidate selection using word embedding similarity, and an enhanced tf-idf formula for expanded words in search ranking.

Ted Tao Yuan Zezhong Zhang

04/09/2023-- 04/24/2022

An empirical study of the effect of background data size on the stability of SHapley Additive exPlanations (SHAP) for deep learning models

Nowadays, the interpretation of why a machine learning (ML) model makes certain inferences is as crucial as the accuracy of such inferences. Some ML models like the decision tree possess inherent interpretability that can be directly comprehended by humans. Others like artificial neural networks (ANN), however, rely on external methods to uncover the deduction mechanism. SHapley Additive exPlanations (SHAP) is one of such external methods, which requires a background dataset when interpreting ANNs. Generally, a background dataset consists of instances randomly sampled from the training dataset. However, the sampling size and its effect on SHAP remain to be unexplored. In our empirical study on the MIMIC-III dataset, we show that the two core explanations - SHAP values and variable rankings fluctuate when using different background datasets acquired from random sampling, indicating that users cannot unquestioningly trust the one-shot interpretation from SHAP. Luckily, such fluctuation decreases with the increase of the background dataset size. Also, we notice an U-shape in the stability assessment of SHAP variable rankings, demonstrating that SHAP is more reliable in ranking the most and least important variables compared to moderately important ones. Overall, our results suggest that users should take into account how background data affects SHAP results, with improved SHAP stability as the background sample size increases.

Han Yuan Mingxuan Liu Lican Kang Chenkui Miao Ying Wu

09/27/2023-- 09/27/2023

Highly Efficient SNNs for High-speed Object Detection

The high biological properties and low energy consumption of Spiking Neural Networks (SNNs) have brought much attention in recent years. However, the converted SNNs generally need large time steps to achieve satisfactory performance, which will result in high inference latency and computational resources increase. In this work, we propose a highly efficient and fast SNN for object detection. First, we build an initial compact ANN by using quantization training method of convolution layer fold batch normalization layer and neural network modification. Second, we theoretically analyze how to obtain the low complexity SNN correctly. Then, we propose a scale-aware pseudoquantization scheme to guarantee the correctness of the compact ANN to SNN. Third, we propose a continuous inference scheme by using a Feed-Forward Integrate-and-Fire (FewdIF) neuron to realize high-speed object detection. Experimental results show that our efficient SNN can achieve 118X speedup on GPU with only 1.5MB parameters for object detection tasks. We further verify our SNN on FPGA platform and the proposed model can achieve 800+FPS object detection with extremely low latency.

Nemin Qiu Zhiguo Li Yuan Li Chuang Zhu

11/21/2024-- 11/21/2024

Experimental comparison of graph-based approximate nearest neighbor search algorithms on edge devices

In this paper, we present an experimental comparison of various graph-based approximate nearest neighbor (ANN) search algorithms deployed on edge devices for real-time nearest neighbor search applications, such as smart city infrastructure and autonomous vehicles. To the best of our knowledge, this specific comparative analysis has not been previously conducted. While existing research has explored graph-based ANN algorithms, it has often been limited to single-threaded implementations on standard commodity hardware. Our study leverages the full computational and storage capabilities of edge devices, incorporating additional metrics such as insertion and deletion latency of new vectors and power consumption. This comprehensive evaluation aims to provide valuable insights into the performance and suitability of these algorithms for edge-based real-time tracking systems enhanced by nearest-neighbor search algorithms.

Ali Ganbarov Jicheng Yuan Anh Le-Tuan Manfred Hauswirth Danh Le-Phuoc

08/04/2025-- 08/04/2025

Physics-informed Fourier Basis Neural Network for Fluid Mechanics

Solving partial differential equations (PDEs) is an important yet challenging task in fluid mechanics. In this study, we embed an improved Fourier series into neural networks and propose a physics-informed Fourier basis neural network (FBNN) by incorporating physical information to solve canonical PDEs in fluid mechanics. The results demonstrated that the proposed framework exhibits a strong nonlinear fitting capability and exceptional periodic modeling performance. In particular, our model shows significant advantages for the Burgers equation with discontinuous solutions and Helmholtz equation with strong periodicity. By directly introducing sparse distributed data to reconstruct the entire flow field, we further intuitively validated the direct superiority of FBNN over conventional artificial neural networks (ANN) as well as the benefits of incorporating physical information into the network. By adjusting the activation functions of networks and comparing with an ANN and conventional physics-informed neural network, we proved that performance of the proposed FBNN architecture is not highly sensitive to the choice of activation functions. The nonlinear fitting capability of FBNN avoids excessive reliance on activation functions, thereby mitigating the risk of suboptimal outcomes or training failures stemming from unsuitable activation function choices.hese results highlightthe potential of PIFBNN as a powerful tool in computational fluid dynamics.

Chao Wang Shilong Li Zelong Yuan Chunyu Guo

12/19/1999-- 12/19/1999

(Teff,log g,[Fe/H]) Classification of Low-Resolution Stellar Spectra using Artificial Neural Networks

New generation large-aperture telescopes, multi-object spectrographs, and large format detectors are making it possible to acquire very large samples of stellar spectra rapidly. In this context, traditional star-by-star spectroscopic analysis are no longer practical. New tools are required that are capable of extracting quickly and with reasonable accuracy important basic stellar parameters coded in the spectra. Recent analyses of Artificial Neural Networks (ANNs) applied to the classification of astronomical spectra have demonstrated the ability of this concept to derive estimates of temperature and luminosity. We have adapted the back-propagation ANN technique developed by von Hippel et al. (1994) to predict effective temperatures, gravities and overall metallicities from spectra with resolving power ~ 2000 and low signal-to-noise ratio. We show that ANN techniques are very effective in executing a three-parameter (Teff,log g,[Fe/H]) stellar classification. The preliminary results show that the technique is even capable of identifying outliers from the training sample.

Shawn Snider Yuan Qu Carlos Allende Prieto Ted von Hippel Timothy C. Beers Chistopher Sneden David L. Lambert

with thanks to arxiv.org/