Paper Group ANR 420
The generalization error of random features regression: Precise asymptotics and double descent curve. Unpaired Pose Guided Human Image Generation. Optimizing Generalized Rate Metrics through Game Equilibrium. Neural View-Interpolation for Sparse Light Field Video. Growing a Brain: Fine-Tuning by Increasing Model Capacity. Efficient training and des …
The generalization error of random features regression: Precise asymptotics and double descent curve
Title | The generalization error of random features regression: Precise asymptotics and double descent curve |
Authors | Song Mei, Andrea Montanari |
Abstract | Deep learning methods operate in regimes that defy the traditional statistical mindset. The neural network architectures often contain more parameters than training samples, and are so rich that they can interpolate the observed labels, even if the latter are replaced by pure noise. Despite their huge complexity, the same architectures achieve small generalization error on real data. This phenomenon has been rationalized in terms of a so-called `double descent’ curve. As the model complexity increases, the generalization error follows the usual U-shaped curve at the beginning, first decreasing and then peaking around the interpolation threshold (when the model achieves vanishing training error). However, it descends again as model complexity exceeds this threshold. The global minimum of the generalization error is found in this overparametrized regime, often when the number of parameters is much larger than the number of samples. Far from being a peculiar property of deep neural networks, elements of this behavior have been demonstrated in much simpler settings, including linear regression with random covariates. In this paper we consider the problem of learning an unknown function over the $d$-dimensional sphere $\mathbb S^{d-1}$, from $n$ i.i.d. samples $(\boldsymbol x_i, y_i) \in \mathbb S^{d-1} \times \mathbb R$, $i \le n$. We perform ridge regression on $N$ random features of the form $\sigma(\boldsymbol w_a^{\mathsf T}\boldsymbol x)$, $a \le N$. This can be equivalently described as a two-layers neural network with random first-layer weights. We compute the precise asymptotics of the generalization error, in the limit $N, n, d \to \infty$ with $N/d$ and $n/d$ fixed. This provides the first analytically tractable model that captures all the features of the double descent phenomenon without assuming ad hoc misspecification structures. | |
Tasks | |
Published | 2019-08-14 |
URL | https://arxiv.org/abs/1908.05355v3 |
https://arxiv.org/pdf/1908.05355v3.pdf | |
PWC | https://paperswithcode.com/paper/the-generalization-error-of-random-features |
Repo | |
Framework | |
Unpaired Pose Guided Human Image Generation
Title | Unpaired Pose Guided Human Image Generation |
Authors | Xu Chen, Jie Song, Otmar Hilliges |
Abstract | This paper studies the task of full generative modelling of realistic images of humans, guided only by coarse sketch of the pose, while providing control over the specific instance or type of outfit worn by the user. This is a difficult problem because input and output domain are very different and direct image-to-image translation becomes infeasible. We propose an end-to-end trainable network under the generative adversarial framework, that provides detailed control over the final appearance while not requiring paired training data and hence allows us to forgo the challenging problem of fitting 3D poses to 2D images. The model allows to generate novel samples conditioned on either an image taken from the target domain or a class label indicating the style of clothing (e.g., t-shirt). We thoroughly evaluate the architecture and the contributions of the individual components experimentally. Finally, we show in a large scale perceptual study that our approach can generate realistic looking images and that participants struggle in detecting fake images versus real samples, especially if faces are blurred. |
Tasks | Image Generation, Image-to-Image Translation |
Published | 2019-01-08 |
URL | https://arxiv.org/abs/1901.02284v2 |
https://arxiv.org/pdf/1901.02284v2.pdf | |
PWC | https://paperswithcode.com/paper/unpaired-pose-guided-human-image-generation |
Repo | |
Framework | |
Optimizing Generalized Rate Metrics through Game Equilibrium
Title | Optimizing Generalized Rate Metrics through Game Equilibrium |
Authors | Harikrishna Narasimhan, Andrew Cotter, Maya Gupta |
Abstract | We present a general framework for solving a large class of learning problems with non-linear functions of classification rates. This includes problems where one wishes to optimize a non-decomposable performance metric such as the F-measure or G-mean, and constrained training problems where the classifier needs to satisfy non-linear rate constraints such as predictive parity fairness, distribution divergences or churn ratios. We extend previous two-player game approaches for constrained optimization to a game between three players to decouple the classifier rates from the non-linear objective, and seek to find an equilibrium of the game. Our approach generalizes many existing algorithms, and makes possible new algorithms with more flexibility and tighter handling of non-linear rate constraints. We provide convergence guarantees for convex functions of rates, and show how our methodology can be extended to handle sums of ratios of rates. Experiments on different fairness tasks confirm the efficacy of our approach. |
Tasks | |
Published | 2019-09-06 |
URL | https://arxiv.org/abs/1909.02939v1 |
https://arxiv.org/pdf/1909.02939v1.pdf | |
PWC | https://paperswithcode.com/paper/optimizing-generalized-rate-metrics-through |
Repo | |
Framework | |
Neural View-Interpolation for Sparse Light Field Video
Title | Neural View-Interpolation for Sparse Light Field Video |
Authors | Mojtaba Bemana, Karol Myszkowski, Hans-Peter Seidel, Tobias Ritschel |
Abstract | We suggest representing light field (LF) videos as “one-off” neural networks (NN), i.e., a learned mapping from view-plus-time coordinates to high-resolution color values, trained on sparse views. Initially, this sounds like a bad idea for three main reasons: First, a NN LF will likely have less quality than a same-sized pixel basis representation. Second, only few training data, e.g., 9 exemplars per frame are available for sparse LF videos. Third, there is no generalization across LFs, but across view and time instead. Consequently, a network needs to be trained for each LF video. Surprisingly, these problems can turn into substantial advantages: Other than the linear pixel basis, a NN has to come up with a compact, non-linear i.e., more intelligent, explanation of color, conditioned on the sparse view and time coordinates. As observed for many NN however, this representation now is interpolatable: if the image output for sparse view coordinates is plausible, it is for all intermediate, continuous coordinates as well. Our specific network architecture involves a differentiable occlusion-aware warping step, which leads to a compact set of trainable parameters and consequently fast learning and fast execution. |
Tasks | |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1910.13921v2 |
https://arxiv.org/pdf/1910.13921v2.pdf | |
PWC | https://paperswithcode.com/paper/neural-view-interpolation-for-sparse |
Repo | |
Framework | |
Growing a Brain: Fine-Tuning by Increasing Model Capacity
Title | Growing a Brain: Fine-Tuning by Increasing Model Capacity |
Authors | Yu-Xiong Wang, Deva Ramanan, Martial Hebert |
Abstract | CNNs have made an undeniable impact on computer vision through the ability to learn high-capacity models with large annotated training sets. One of their remarkable properties is the ability to transfer knowledge from a large source dataset to a (typically smaller) target dataset. This is usually accomplished through fine-tuning a fixed-size network on new target data. Indeed, virtually every contemporary visual recognition system makes use of fine-tuning to transfer knowledge from ImageNet. In this work, we analyze what components and parameters change during fine-tuning, and discover that increasing model capacity allows for more natural model adaptation through fine-tuning. By making an analogy to developmental learning, we demonstrate that “growing” a CNN with additional units, either by widening existing layers or deepening the overall network, significantly outperforms classic fine-tuning approaches. But in order to properly grow a network, we show that newly-added units must be appropriately normalized to allow for a pace of learning that is consistent with existing units. We empirically validate our approach on several benchmark datasets, producing state-of-the-art results. |
Tasks | |
Published | 2019-07-18 |
URL | https://arxiv.org/abs/1907.07844v1 |
https://arxiv.org/pdf/1907.07844v1.pdf | |
PWC | https://paperswithcode.com/paper/growing-a-brain-fine-tuning-by-increasing-1 |
Repo | |
Framework | |
Efficient training and design of photonic neural network through neuroevolution
Title | Efficient training and design of photonic neural network through neuroevolution |
Authors | Tian Zhang, Jia Wang, Yihang Dan, Yuxiang Lanqiu, Jian Dai, Xu Han, Xiaojuan Sun, Kun Xu |
Abstract | Recently, optical neural networks (ONNs) integrated in photonic chips has received extensive attention because they are expected to implement the same pattern recognition tasks in the electronic platforms with high efficiency and low power consumption. However, the current lack of various learning algorithms to train the ONNs obstructs their further development. In this article, we propose a novel learning strategy based on neuroevolution to design and train the ONNs. Two typical neuroevolution algorithms are used to determine the hyper-parameters of the ONNs and to optimize the weights (phase shifters) in the connections. In order to demonstrate the effectiveness of the training algorithms, the trained ONNs are applied in the classification tasks for iris plants dataset, wine recognition dataset and modulation formats recognition. The calculated results exhibit that the training algorithms based on neuroevolution are competitive with other traditional learning algorithms on both accuracy and stability. Compared with previous works, we introduce an efficient training method for the ONNs and demonstrate their broad application prospects in pattern recognition, reinforcement learning and so on. |
Tasks | |
Published | 2019-08-04 |
URL | https://arxiv.org/abs/1908.08012v1 |
https://arxiv.org/pdf/1908.08012v1.pdf | |
PWC | https://paperswithcode.com/paper/efficient-training-and-design-of-photonic |
Repo | |
Framework | |
Domain-Symmetric Networks for Adversarial Domain Adaptation
Title | Domain-Symmetric Networks for Adversarial Domain Adaptation |
Authors | Yabin Zhang, Hui Tang, Kui Jia, Mingkui Tan |
Abstract | Unsupervised domain adaptation aims to learn a model of classifier for unlabeled samples on the target domain, given training data of labeled samples on the source domain. Impressive progress is made recently by learning invariant features via domain-adversarial training of deep networks. In spite of the recent progress, domain adaptation is still limited in achieving the invariance of feature distributions at a finer category level. To this end, we propose in this paper a new domain adaptation method called Domain-Symmetric Networks (SymNets). The proposed SymNet is based on a symmetric design of source and target task classifiers, based on which we also construct an additional classifier that shares with them its layer neurons. To train the SymNet, we propose a novel adversarial learning objective whose key design is based on a two-level domain confusion scheme, where the category-level confusion loss improves over the domain-level one by driving the learning of intermediate network features to be invariant at the corresponding categories of the two domains. Both domain discrimination and domain confusion are implemented based on the constructed additional classifier. Since target samples are unlabeled, we also propose a scheme of cross-domain training to help learn the target classifier. Careful ablation studies show the efficacy of our proposed method. In particular, based on commonly used base networks, our SymNets achieve the new state of the art on three benchmark domain adaptation datasets. |
Tasks | Domain Adaptation, Unsupervised Domain Adaptation |
Published | 2019-04-09 |
URL | https://arxiv.org/abs/1904.04663v2 |
https://arxiv.org/pdf/1904.04663v2.pdf | |
PWC | https://paperswithcode.com/paper/domain-symmetric-networks-for-adversarial |
Repo | |
Framework | |
Granular Multimodal Attention Networks for Visual Dialog
Title | Granular Multimodal Attention Networks for Visual Dialog |
Authors | Badri N. Patro, Shivansh Patel, Vinay P. Namboodiri |
Abstract | Vision and language tasks have benefited from attention. There have been a number of different attention models proposed. However, the scale at which attention needs to be applied has not been well examined. Particularly, in this work, we propose a new method Granular Multi-modal Attention, where we aim to particularly address the question of the right granularity at which one needs to attend while solving the Visual Dialog task. The proposed method shows improvement in both image and text attention networks. We then propose a granular Multi-modal Attention network that jointly attends on the image and text granules and shows the best performance. With this work, we observe that obtaining granular attention and doing exhaustive Multi-modal Attention appears to be the best way to attend while solving visual dialog. |
Tasks | Visual Dialog |
Published | 2019-10-13 |
URL | https://arxiv.org/abs/1910.05728v1 |
https://arxiv.org/pdf/1910.05728v1.pdf | |
PWC | https://paperswithcode.com/paper/granular-multimodal-attention-networks-for |
Repo | |
Framework | |
A Control Lyapunov Perspective on Episodic Learning via Projection to State Stability
Title | A Control Lyapunov Perspective on Episodic Learning via Projection to State Stability |
Authors | Andrew J. Taylor, Victor D. Dorobantu, Meera Krishnamoorthy, Hoang M. Le, Yisong Yue, Aaron D. Ames |
Abstract | The goal of this paper is to understand the impact of learning on control synthesis from a Lyapunov function perspective. In particular, rather than consider uncertainties in the full system dynamics, we employ Control Lyapunov Functions (CLFs) as low-dimensional projections. To understand and characterize the uncertainty that these projected dynamics introduce in the system, we introduce a new notion: Projection to State Stability (PSS). PSS can be viewed as a variant of Input to State Stability defined on projected dynamics, and enables characterizing robustness of a CLF with respect to the data used to learn system uncertainties. We use PSS to bound uncertainty in affine control, and demonstrate that a practical episodic learning approach can use PSS to characterize uncertainty in the CLF for robust control synthesis. |
Tasks | |
Published | 2019-03-18 |
URL | http://arxiv.org/abs/1903.07214v1 |
http://arxiv.org/pdf/1903.07214v1.pdf | |
PWC | https://paperswithcode.com/paper/a-control-lyapunov-perspective-on-episodic |
Repo | |
Framework | |
Self-Play Learning Without a Reward Metric
Title | Self-Play Learning Without a Reward Metric |
Authors | Dan Schmidt, Nick Moran, Jonathan S. Rosenfeld, Jonathan Rosenthal, Jonathan Yedidia |
Abstract | The AlphaZero algorithm for the learning of strategy games via self-play, which has produced superhuman ability in the games of Go, chess, and shogi, uses a quantitative reward function for game outcomes, requiring the users of the algorithm to explicitly balance different components of the reward against each other, such as the game winner and margin of victory. We present a modification to the AlphaZero algorithm that requires only a total ordering over game outcomes, obviating the need to perform any quantitative balancing of reward components. We demonstrate that this system learns optimal play in a comparable amount of time to AlphaZero on a sample game. |
Tasks | |
Published | 2019-12-16 |
URL | https://arxiv.org/abs/1912.07557v1 |
https://arxiv.org/pdf/1912.07557v1.pdf | |
PWC | https://paperswithcode.com/paper/self-play-learning-without-a-reward-metric |
Repo | |
Framework | |
Softmax Optimizations for Intel Xeon Processor-based Platforms
Title | Softmax Optimizations for Intel Xeon Processor-based Platforms |
Authors | Jacek Czaja, Michal Gallus, Tomasz Patejko, Jian Tang |
Abstract | Softmax is popular normalization method used in machine learning. Deep learning solutions like Transformer or BERT use the softmax function intensively, so it is worthwhile to optimize its performance. This article presents our methodology of optimization and its results applied to softmax. By presenting this methodology, we hope to increase an interest in deep learning optimizations for CPUs. We believe that the optimization process presented here could be transferred to other deep learning frameworks such as TensorFlow or PyTorch. |
Tasks | |
Published | 2019-04-28 |
URL | https://arxiv.org/abs/1904.12380v2 |
https://arxiv.org/pdf/1904.12380v2.pdf | |
PWC | https://paperswithcode.com/paper/softmax-optimizations-for-intel-xeon |
Repo | |
Framework | |
Adversarial Multimodal Network for Movie Question Answering
Title | Adversarial Multimodal Network for Movie Question Answering |
Authors | Zhaoquan Yuan, Siyuan Sun, Lixin Duan, Xiao Wu, Changsheng Xu |
Abstract | Visual question answering by using information from multiple modalities has attracted more and more attention in recent years. However, it is a very challenging task, as the visual content and natural language have quite different statistical properties. In this work, we present a method called Adversarial Multimodal Network (AMN) to better understand video stories for question answering. In AMN, as inspired by generative adversarial networks, we propose to learn multimodal feature representations by finding a more coherent subspace for video clips and the corresponding texts (e.g., subtitles and questions). Moreover, we introduce a self-attention mechanism to enforce the so-called consistency constraints in order to preserve the self-correlation of visual cues of the original video clips in the learned multimodal representations. Extensive experiments on the MovieQA dataset show the effectiveness of our proposed AMN over other published state-of-the-art methods. |
Tasks | Question Answering, Video Question Answering, Visual Question Answering |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.09844v2 |
https://arxiv.org/pdf/1906.09844v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-multimodal-network-for-movie |
Repo | |
Framework | |
Large-scale Kernel Methods and Applications to Lifelong Robot Learning
Title | Large-scale Kernel Methods and Applications to Lifelong Robot Learning |
Authors | Raffaello Camoriano |
Abstract | As the size and richness of available datasets grow larger, the opportunities for solving increasingly challenging problems with algorithms learning directly from data grow at the same pace. Consequently, the capability of learning algorithms to work with large amounts of data has become a crucial scientific and technological challenge for their practical applicability. Hence, it is no surprise that large-scale learning is currently drawing plenty of research effort in the machine learning research community. In this thesis, we focus on kernel methods, a theoretically sound and effective class of learning algorithms yielding nonparametric estimators. Kernel methods, in their classical formulations, are accurate and efficient on datasets of limited size, but do not scale up in a cost-effective manner. Recent research has shown that approximate learning algorithms, for instance random subsampling methods like Nystr"om and random features, with time-memory-accuracy trade-off mechanisms are more scalable alternatives. In this thesis, we provide analyses of the generalization properties and computational requirements of several types of such approximation schemes. In particular, we expose the tight relationship between statistics and computations, with the goal of tailoring the accuracy of the learning process to the available computational resources. Our results are supported by experimental evidence on large-scale datasets and numerical simulations. We also study how large-scale learning can be applied to enable accurate, efficient, and reactive lifelong learning for robotics. In particular, we propose algorithms allowing robots to learn continuously from experience and adapt to changes in their operational environment. The proposed methods are validated on the iCub humanoid robot in addition to other benchmarks. |
Tasks | |
Published | 2019-12-11 |
URL | https://arxiv.org/abs/1912.05629v1 |
https://arxiv.org/pdf/1912.05629v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-kernel-methods-and-applications |
Repo | |
Framework | |
Emotion Detection with Neural Personal Discrimination
Title | Emotion Detection with Neural Personal Discrimination |
Authors | Xiabing Zhou, Zhongqing Wang, Shoushan Li, Guodong Zhou, Min Zhang |
Abstract | There have been a recent line of works to automatically predict the emotions of posts in social media. Existing approaches consider the posts individually and predict their emotions independently. Different from previous researches, we explore the dependence among relevant posts via the authors’ backgrounds, since the authors with similar backgrounds, e.g., gender, location, tend to express similar emotions. However, such personal attributes are not easy to obtain in most social media websites, and it is hard to capture attributes-aware words to connect similar people. Accordingly, we propose a Neural Personal Discrimination (NPD) approach to address above challenges by determining personal attributes from posts, and connecting relevant posts with similar attributes to jointly learn their emotions. In particular, we employ adversarial discriminators to determine the personal attributes, with attention mechanisms to aggregate attributes-aware words. In this way, social correlationship among different posts can be better addressed. Experimental results show the usefulness of personal attributes, and the effectiveness of our proposed NPD approach in capturing such personal attributes with significant gains over the state-of-the-art models. |
Tasks | |
Published | 2019-08-28 |
URL | https://arxiv.org/abs/1908.10703v1 |
https://arxiv.org/pdf/1908.10703v1.pdf | |
PWC | https://paperswithcode.com/paper/emotion-detection-with-neural-personal |
Repo | |
Framework | |
Graph- and finite element-based total variation models for the inverse problem in diffuse optical tomography
Title | Graph- and finite element-based total variation models for the inverse problem in diffuse optical tomography |
Authors | Wenqi Lu, Jinming Duan, David Orive-Miguel, Lionel Herve, Iain B Styles |
Abstract | Total variation (TV) is a powerful regularization method that has been widely applied in different imaging applications, but is difficult to apply to diffuse optical tomography (DOT) image reconstruction (inverse problem) due to complex and unstructured geometries, non-linearity of the data fitting and regularization terms, and non-differentiability of the regularization term. We develop several approaches to overcome these difficulties by: i) defining discrete differential operators for unstructured geometries using both finite element and graph representations; ii) developing an optimization algorithm based on the alternating direction method of multipliers (ADMM) for the non-differentiable and non-linear minimization problem; iii) investigating isotropic and anisotropic variants of TV regularization, and comparing their finite element- and graph-based implementations. These approaches are evaluated on experiments on simulated data and real data acquired from a tissue phantom. Our results show that both FEM and graph-based TV regularization is able to accurately reconstruct both sparse and non-sparse distributions without the over-smoothing effect of Tikhonov regularization and the over-sparsifying effect of L$_1$ regularization. The graph representation was found to out-perform the FEM method for low-resolution meshes, and the FEM method was found to be more accurate for high-resolution meshes. |
Tasks | Image Reconstruction |
Published | 2019-01-07 |
URL | http://arxiv.org/abs/1901.01969v2 |
http://arxiv.org/pdf/1901.01969v2.pdf | |
PWC | https://paperswithcode.com/paper/graph-and-finite-element-based-total |
Repo | |
Framework | |