January 26, 2020

3231 words 16 mins read

Paper Group ANR 1598

Frame and Feature-Context Video Super-Resolution. Analytic Continued Fractions for Regression: A Memetic Algorithm Approach. End-to-End Code-Switching ASR for Low-Resourced Language Pairs. Algorithm for Training Neural Networks on Resistive Device Arrays. Product Image Recognition with Guidance Learning and Noisy Supervision. Exploiting Hierarchy f …

Frame and Feature-Context Video Super-Resolution


Title	Frame and Feature-Context Video Super-Resolution
Authors	Bo Yan, Chuming Lin, Weimin Tan
Abstract	For video super-resolution, current state-of-the-art approaches either process multiple low-resolution (LR) frames to produce each output high-resolution (HR) frame separately in a sliding window fashion or recurrently exploit the previously estimated HR frames to super-resolve the following frame. The main weaknesses of these approaches are: 1) separately generating each output frame may obtain high-quality HR estimates while resulting in unsatisfactory flickering artifacts, and 2) combining previously generated HR frames can produce temporally consistent results in the case of short information flow, but it will cause significant jitter and jagged artifacts because the previous super-resolving errors are constantly accumulated to the subsequent frames. In this paper, we propose a fully end-to-end trainable frame and feature-context video super-resolution (FFCVSR) network that consists of two key sub-networks: local network and context network, where the first one explicitly utilizes a sequence of consecutive LR frames to generate local feature and local SR frame, and the other combines the outputs of local network and the previously estimated HR frames and features to super-resolve the subsequent frame. Our approach takes full advantage of the inter-frame information from multiple LR frames and the context information from previously predicted HR frames, producing temporally consistent high-quality results while maintaining real-time speed by directly reusing previous features and frames. Extensive evaluations and comparisons demonstrate that our approach produces state-of-the-art results on a standard benchmark dataset, with advantages in terms of accuracy, efficiency, and visual quality over the existing approaches.
Tasks	Super-Resolution, Video Super-Resolution
Published	2019-09-28
URL	https://arxiv.org/abs/1909.13057v1
PDF	https://arxiv.org/pdf/1909.13057v1.pdf
PWC	https://paperswithcode.com/paper/frame-and-feature-context-video-super
Repo
Framework

Analytic Continued Fractions for Regression: A Memetic Algorithm Approach


Title	Analytic Continued Fractions for Regression: A Memetic Algorithm Approach
Authors	Pablo Moscato, Haoyuan Sun, Mohammad Nazmul Haque
Abstract	We present an approach for regression problems that employs analytic continued fractions as a novel representation. Comparative computational results using a memetic algorithm are reported in this work. Our experiments included fifteen other different machine learning approaches including five genetic programming methods for symbolic regression and ten machine learning methods. The comparison on training and test generalization was performed using 94 datasets of the Penn State Machine Learning Benchmark. The statistical tests showed that the generalization results using analytic continued fractions provides a powerful and interesting new alternative in the quest for compact and interpretable mathematical models for artificial intelligence.
Tasks
Published	2019-12-18
URL	https://arxiv.org/abs/2001.00624v1
PDF	https://arxiv.org/pdf/2001.00624v1.pdf
PWC	https://paperswithcode.com/paper/analytic-continued-fractions-for-regression-a
Repo
Framework

End-to-End Code-Switching ASR for Low-Resourced Language Pairs


Title	End-to-End Code-Switching ASR for Low-Resourced Language Pairs
Authors	Xianghu Yue, Grandee Lee, Emre Yılmaz, Fang Deng, Haizhou Li
Abstract	Despite the significant progress in end-to-end (E2E) automatic speech recognition (ASR), E2E ASR for low resourced code-switching (CS) speech has not been well studied. In this work, we describe an E2E ASR pipeline for the recognition of CS speech in which a low-resourced language is mixed with a high resourced language. Low-resourcedness in acoustic data hinders the performance of E2E ASR systems more severely than the conventional ASR systems.~To mitigate this problem in the transcription of archives with code-switching Frisian-Dutch speech, we integrate a designated decoding scheme and perform rescoring with neural network-based language models to enable better utilization of the available textual resources. We first incorporate a multi-graph decoding approach which creates parallel search spaces for each monolingual and mixed recognition tasks to maximize the utilization of the textual resources from each language. Further, language model rescoring is performed using a recurrent neural network pre-trained with cross-lingual embedding and further adapted with the limited amount of in-domain CS text. The ASR experiments demonstrate the effectiveness of the described techniques in improving the recognition performance of an E2E CS ASR system in a low-resourced scenario.
Tasks	Language Modelling, Speech Recognition
Published	2019-09-27
URL	https://arxiv.org/abs/1909.12681v2
PDF	https://arxiv.org/pdf/1909.12681v2.pdf
PWC	https://paperswithcode.com/paper/end-to-end-code-switching-asr-for-low
Repo
Framework

Algorithm for Training Neural Networks on Resistive Device Arrays


Title	Algorithm for Training Neural Networks on Resistive Device Arrays
Authors	Tayfun Gokmen, Wilfried Haensch
Abstract	Hardware architectures composed of resistive cross-point device arrays can provide significant power and speed benefits for deep neural network training workloads using stochastic gradient descent (SGD) and backpropagation (BP) algorithm. The training accuracy on this imminent analog hardware however strongly depends on the switching characteristics of the cross-point elements. One of the key requirements is that these resistive devices must change conductance in a symmetrical fashion when subjected to positive or negative pulse stimuli. Here, we present a new training algorithm, so-called the “Tiki-Taka” algorithm, that eliminates this stringent symmetry requirement. We show that device asymmetry introduces an unintentional implicit cost term into the SGD algorithm, whereas in the “Tiki-Taka” algorithm a coupled dynamical system simultaneously minimizes the original objective function of the neural network and the unintentional cost term due to device asymmetry in a self-consistent fashion. We tested the validity of this new algorithm on a range of network architectures such as fully connected, convolutional and LSTM networks. Simulation results on these various networks show that whatever accuracy is achieved using the conventional SGD algorithm with symmetric (ideal) device switching characteristics the same accuracy is also achieved using the “Tiki-Taka” algorithm with non-symmetric (non-ideal) device switching characteristics. Moreover, all the operations performed on the arrays are still parallel and therefore the implementation cost of this new algorithm on array architectures is minimal; and it maintains the aforementioned power and speed benefits. These algorithmic improvements are crucial to relax the material specification and to realize technologically viable resistive crossbar arrays that outperform digital accelerators for similar training tasks.
Tasks
Published	2019-09-17
URL	https://arxiv.org/abs/1909.07908v1
PDF	https://arxiv.org/pdf/1909.07908v1.pdf
PWC	https://paperswithcode.com/paper/algorithm-for-training-neural-networks-on
Repo
Framework

Product Image Recognition with Guidance Learning and Noisy Supervision


Title	Product Image Recognition with Guidance Learning and Noisy Supervision
Authors	Qing Li, Xiaojiang Peng, Liangliang Cao, Wenbin Du, Hao Xing, Yu Qiao
Abstract	This paper considers recognizing products from daily photos, which is an important problem in real-world applications but also challenging due to background clutters, category diversities, noisy labels, etc. We address this problem by two contributions. First, we introduce a novel large-scale product image dataset, termed as Product-90. Instead of collecting product images by labor-and time-intensive image capturing, we take advantage of the web and download images from the reviews of several e-commerce websites where the images are casually captured by consumers. Labels are assigned automatically by the categories of e-commerce websites. Totally the Product-90 consists of more than 140K images with 90 categories. Due to the fact that consumers may upload unrelated images, it is inevitable that our Product-90 introduces noisy labels. As the second contribution, we develop a simple yet efficient \textit{guidance learning} (GL) method for training convolutional neural networks (CNNs) with noisy supervision. The GL method first trains an initial teacher network with the full noisy dataset, and then trains a target/student network with both large-scale noisy set and small manually-verified clean set in a multi-task manner. Specifically, in the stage of student network training, the large-scale noisy data is supervised by its guidance knowledge which is the combination of its given noisy label and the soften label from the teacher network. We conduct extensive experiments on our Products-90 and public datasets, namely Food101, Food-101N, and Clothing1M. Our guidance learning method achieves performance superior to state-of-the-art methods on these datasets.
Tasks
Published	2019-07-26
URL	https://arxiv.org/abs/1907.11384v1
PDF	https://arxiv.org/pdf/1907.11384v1.pdf
PWC	https://paperswithcode.com/paper/product-image-recognition-with-guidance
Repo
Framework

Exploiting Hierarchy for Learning and Transfer in KL-regularized RL


Title	Exploiting Hierarchy for Learning and Transfer in KL-regularized RL
Authors	Dhruva Tirumala, Hyeonwoo Noh, Alexandre Galashov, Leonard Hasenclever, Arun Ahuja, Greg Wayne, Razvan Pascanu, Yee Whye Teh, Nicolas Heess
Abstract	As reinforcement learning agents are tasked with solving more challenging and diverse tasks, the ability to incorporate prior knowledge into the learning system and to exploit reusable structure in solution space is likely to become increasingly important. The KL-regularized expected reward objective constitutes one possible tool to this end. It introduces an additional component, a default or prior behavior, which can be learned alongside the policy and as such partially transforms the reinforcement learning problem into one of behavior modelling. In this work we consider the implications of this framework in cases where both the policy and default behavior are augmented with latent variables. We discuss how the resulting hierarchical structures can be used to implement different inductive biases and how their modularity can benefit transfer. Empirically we find that they can lead to faster learning and transfer on a range of continuous control tasks.
Tasks	Continuous Control
Published	2019-03-18
URL	https://arxiv.org/abs/1903.07438v2
PDF	https://arxiv.org/pdf/1903.07438v2.pdf
PWC	https://paperswithcode.com/paper/exploiting-hierarchy-for-learning-and
Repo
Framework

Uneven illumination surface defects inspection based on convolutional neural network


Title	Uneven illumination surface defects inspection based on convolutional neural network
Authors	Hao Wu, Xiangrong Xu, Wenbin Gao
Abstract	Surface defect inspection based on machine vision is often affected by uneven illumination. In order to improve the inspection rate of surface defects inspection under uneven illumination condition, this paper proposes a method for detecting surface image defects based on convolutional neural network, which is based on the adjustment of convolutional neural networks, training parameters, changing the structure of the network, to achieve the purpose of accurately identifying various defects. Experimental on defect inspection of copper strip and steel images shows that the convolutional neural network can automatically learn features without preprocessing the image, and correct identification of various types of image defects affected by uneven illumination, thus overcoming the drawbacks of traditional machine vision inspection methods under uneven illumination.
Tasks
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06683v2
PDF	https://arxiv.org/pdf/1905.06683v2.pdf
PWC	https://paperswithcode.com/paper/uneven-illumination-surface-defects
Repo
Framework

“Why Should You Trust My Explanation?” Understanding Uncertainty in LIME Explanations


Title	“Why Should You Trust My Explanation?” Understanding Uncertainty in LIME Explanations
Authors	Yujia Zhang, Kuangyan Song, Yiming Sun, Sarah Tan, Madeleine Udell
Abstract	Methods for interpreting machine learning black-box models increase the outcomes’ transparency and in turn generates insight into the reliability and fairness of the algorithms. However, the interpretations themselves could contain significant uncertainty that undermines the trust in the outcomes and raises concern about the model’s reliability. Focusing on the method “Local Interpretable Model-agnostic Explanations” (LIME), we demonstrate the presence of two sources of uncertainty, namely the randomness in its sampling procedure and the variation of interpretation quality across different input data points. Such uncertainty is present even in models with high training and test accuracy. We apply LIME to synthetic data and two public data sets, text classification in 20 Newsgroup and recidivism risk-scoring in COMPAS, to support our argument.
Tasks	Text Classification
Published	2019-04-29
URL	https://arxiv.org/abs/1904.12991v2
PDF	https://arxiv.org/pdf/1904.12991v2.pdf
PWC	https://paperswithcode.com/paper/why-should-you-trust-my-interpretation
Repo
Framework

Conv-MPN: Convolutional Message Passing Neural Network for Structured Outdoor Architecture Reconstruction


Title	Conv-MPN: Convolutional Message Passing Neural Network for Structured Outdoor Architecture Reconstruction
Authors	Fuyang Zhang, Nelson Nauata, Yasutaka Furukawa
Abstract	This paper proposes a novel message passing neural (MPN) architecture Conv-MPN, which reconstructs an outdoor building as a planar graph from a single RGB image. Conv-MPN is specifically designed for cases where nodes of a graph have explicit spatial embedding. In our problem, nodes correspond to building edges in an image. Conv-MPN is different from MPN in that 1) the feature associated with a node is represented as a feature volume instead of a 1D vector; and 2) convolutions encode messages instead of fully connected layers. Conv-MPN learns to select a true subset of nodes (i.e., building edges) to reconstruct a building planar graph. Our qualitative and quantitative evaluations over 2,000 buildings show that Conv-MPN makes significant improvements over the existing fully neural solutions. We believe that the paper has a potential to open a new line of graph neural network research for structured geometry reconstruction.
Tasks
Published	2019-12-04
URL	https://arxiv.org/abs/1912.01756v3
PDF	https://arxiv.org/pdf/1912.01756v3.pdf
PWC	https://paperswithcode.com/paper/conv-mpn-convolutional-message-passing-neural
Repo
Framework

HAUAR: Home Automation Using Action Recognition


Title	HAUAR: Home Automation Using Action Recognition
Authors	Shashank Kotyan, Nishant Kumar, Pankaj Kumar Sahu, Venkanna Udutalapally
Abstract	Today, many of the home automation systems deployed are mostly controlled by humans. This control by humans restricts the automation of home appliances to an extent. Also, most of the deployed home automation systems use the Internet of Things technology to control the appliances. In this paper, we propose a system developed using action recognition to fully automate the home appliances. We recognize the three actions of a person (sitting, standing and lying) along with the recognition of an empty room. The accuracy of the system was 90% in the real-life test experiments. With this system, we remove the human intervention in home automation systems for controlling the home appliances and at the same time we ensure the data privacy and reduce the energy consumption by efficiently and optimally using home appliances.
Tasks	Temporal Action Localization
Published	2019-04-23
URL	http://arxiv.org/abs/1904.10354v2
PDF	http://arxiv.org/pdf/1904.10354v2.pdf
PWC	https://paperswithcode.com/paper/hauar-home-automation-using-action
Repo
Framework

Pixel Adaptive Filtering Units


Title	Pixel Adaptive Filtering Units
Authors	Filippos Kokkinos, Ioannis Marras, Matteo Maggioni, Gregory Slabaugh, Stefanos Zafeiriou
Abstract	State-of-the-art methods for computer vision rely heavily on the translation equivariance and spatial sharing properties of convolutional layers without explicitly taking into consideration the input content. Modern techniques employ deep sophisticated architectures in order to circumvent this issue. In this work, we propose a Pixel Adaptive Filtering Unit (PAFU) which introduces a differentiable kernel selection mechanism paired with a discrete, learnable and decorrelated group of kernels to allow for content-based spatial adaptation. First, we demonstrate the applicability of the technique in applications where runtime is of importance. Next, we employ PAFU in deep neural networks as a replacement of standard convolutional layers to enhance the original architectures with spatially varying computations to achieve considerable performance improvements. Finally, diverse and extensive experimentation provides strong empirical evidence in favor of the proposed content-adaptive processing scheme across different image processing and high-level computer vision tasks.
Tasks
Published	2019-11-24
URL	https://arxiv.org/abs/1911.10581v1
PDF	https://arxiv.org/pdf/1911.10581v1.pdf
PWC	https://paperswithcode.com/paper/pixel-adaptive-filtering-units
Repo
Framework

Doppler Spectrum Classification with CNNs via Heatmap Location Encoding and a Multi-head Output Layer


Title	Doppler Spectrum Classification with CNNs via Heatmap Location Encoding and a Multi-head Output Layer
Authors	Andrew Gilbert, Marit Holden, Line Eikvil, Mariia Rakhmail, Aleksandar Babic, Svein Arne Aase, Eigil Samset, Kristin McLeod
Abstract	Spectral Doppler measurements are an important part of the standard echocardiographic examination. These measurements give important insight into myocardial motion and blood flow providing clinicians with parameters for diagnostic decision making. Many of these measurements can currently be performed automatically with high accuracy, increasing the efficiency of the diagnostic pipeline. However, full automation is not yet available because the user must manually select which measurement should be performed on each image. In this work we develop a convolutional neural network (CNN) to automatically classify cardiac Doppler spectra into measurement classes. We show how the multi-modal information in each spectral Doppler recording can be combined using a meta parameter post-processing mapping scheme and heatmaps to encode coordinate locations. Additionally, we experiment with several state-of-the-art network architectures to examine the tradeoff between accuracy and memory usage for resource-constrained environments. Finally, we propose a confidence metric using the values in the last fully connected layer of the network. We analyze example images that fall outside of our proposed classes to show our confidence metric can prevent many misclassifications. Our algorithm achieves 96% accuracy on a test set drawn from a separate clinical site, indicating that the proposed method is suitable for clinical adoption and enabling a fully automatic pipeline from acquisition to Doppler spectrum measurements.
Tasks	Decision Making
Published	2019-11-06
URL	https://arxiv.org/abs/1911.02407v2
PDF	https://arxiv.org/pdf/1911.02407v2.pdf
PWC	https://paperswithcode.com/paper/user-intended-doppler-measurement-type
Repo
Framework

S-RASTER: Contraction Clustering for Evolving Data Streams


Title	S-RASTER: Contraction Clustering for Evolving Data Streams
Authors	Gregor Ulm, Simon Smith, Adrian Nilsson, Emil Gustavsson, Mats Jirstrand
Abstract	Contraction Clustering (RASTER) is a very fast algorithm for density-based clustering, which requires only a single pass. It can process arbitrary amounts of data in linear time and in constant memory, quickly identifying approximate clusters. It also exhibits good scalability in the presence of multiple CPU cores. Yet, RASTER is limited to batch processing. In contrast, S-RASTER is an adaptation of RASTER to the stream processing paradigm that is able to identify clusters in evolving data streams. This algorithm retains the main benefits of its parent algorithm, i.e. single-pass linear time cost and constant memory requirements for each discrete time step in the sliding window. The sliding window is efficiently pruned, and clustering is still performed in linear time. Like RASTER, S-RASTER trades off an often negligible amount of precision for speed. It is very well suited to real-world scenarios where clustering does not happen continually but only periodically. We describe the algorithm, including a discussion of implementation details.
Tasks
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09447v2
PDF	https://arxiv.org/pdf/1911.09447v2.pdf
PWC	https://paperswithcode.com/paper/s-raster-contraction-clustering-for-evolving
Repo
Framework

ProxIQA: A Proxy Approach to Perceptual Optimization of Learned Image Compression


Title	ProxIQA: A Proxy Approach to Perceptual Optimization of Learned Image Compression
Authors	Li-Heng Chen, Christos G. Bampis, Zhi Li, Andrey Norkin, Alan C. Bovik
Abstract	The use of $\ell_p$ $(p=1,2)$ norms has largely dominated the measurement of loss in neural networks due to their simplicity and analytical properties. However, when used to assess the loss of visual information, these simple norms are not very consistent with human perception. Here, we describe a different “proximal” approach to optimize image analysis networks against quantitative perceptual models. Specifically, we construct a proxy network, broadly termed ProxIQA, which mimics the perceptual model while serving as a loss layer of the network. We experimentally demonstrate how this optimization framework can be applied to train an end-to-end optimized image compression network. By building on top of an existing deep image compression model, we are able to demonstrate a bitrate reduction of as much as $31%$ over MSE optimization, given a specified perceptual quality (VMAF) level.
Tasks	Image Compression
Published	2019-10-19
URL	https://arxiv.org/abs/1910.08845v1
PDF	https://arxiv.org/pdf/1910.08845v1.pdf
PWC	https://paperswithcode.com/paper/proxiqa-a-proxy-approach-to-perceptual
Repo
Framework

Densifying Assumed-sparse Tensors: Improving Memory Efficiency and MPI Collective Performance during Tensor Accumulation for Parallelized Training of Neural Machine Translation Models


Title	Densifying Assumed-sparse Tensors: Improving Memory Efficiency and MPI Collective Performance during Tensor Accumulation for Parallelized Training of Neural Machine Translation Models
Authors	Derya Cavdar, Valeriu Codreanu, Can Karakus, John A. Lockman III, Damian Podareanu, Vikram Saletore, Alexander Sergeev, Don D. Smith II, Victor Suthichai, Quy Ta, Srinivas Varadharajan, Lucas A. Wilson, Rengan Xu, Pei Yang
Abstract	Neural machine translation - using neural networks to translate human language - is an area of active research exploring new neuron types and network topologies with the goal of dramatically improving machine translation performance. Current state-of-the-art approaches, such as the multi-head attention-based transformer, require very large translation corpuses and many epochs to produce models of reasonable quality. Recent attempts to parallelize the official TensorFlow “Transformer” model across multiple nodes have hit roadblocks due to excessive memory use and resulting out of memory errors when performing MPI collectives. This paper describes modifications made to the Horovod MPI-based distributed training framework to reduce memory usage for transformer models by converting assumed-sparse tensors to dense tensors, and subsequently replacing sparse gradient gather with dense gradient reduction. The result is a dramatic increase in scale-out capability, with CPU-only scaling tests achieving 91% weak scaling efficiency up to 1200 MPI processes (300 nodes), and up to 65% strong scaling efficiency up to 400 MPI processes (200 nodes) using the Stampede2 supercomputer.
Tasks	Machine Translation
Published	2019-05-10
URL	https://arxiv.org/abs/1905.04035v1
PDF	https://arxiv.org/pdf/1905.04035v1.pdf
PWC	https://paperswithcode.com/paper/densifying-assumed-sparse-tensors-improving
Repo
Framework