October 21, 2019

3167 words 15 mins read

Paper Group AWR 52

Paper Group AWR 52

Design Challenges and Misconceptions in Neural Sequence Labeling. Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset. SlowFast Networks for Video Recognition. Fast and Robust Multiple ColorChecker Detection using Deep Convolutional Neural Networks. Learning to Separate Object Sounds by Watching Unlabeled Video. LaneNet …

Design Challenges and Misconceptions in Neural Sequence Labeling

Title Design Challenges and Misconceptions in Neural Sequence Labeling
Authors Jie Yang, Shuailong Liang, Yue Zhang
Abstract We investigate the design challenges of constructing effective and efficient neural sequence labeling systems, by reproducing twelve neural sequence labeling models, which include most of the state-of-the-art structures, and conduct a systematic model comparison on three benchmarks (i.e. NER, Chunking, and POS tagging). Misconceptions and inconsistent conclusions in existing literature are examined and clarified under statistical experiments. In the comparison and analysis process, we reach several practical conclusions which can be useful to practitioners.
Tasks Chunking
Published 2018-06-12
URL http://arxiv.org/abs/1806.04470v2
PDF http://arxiv.org/pdf/1806.04470v2.pdf
PWC https://paperswithcode.com/paper/design-challenges-and-misconceptions-in
Repo https://github.com/jiesutd/NCRFpp
Framework pytorch

Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

Title Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset
Authors Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, Douglas Eck
Abstract Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales. Fortunately, most music is also highly structured and can be represented as discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude (~0.1 ms to ~100 s), a process we call Wave2Midi2Wave. This large advance in the state of the art is enabled by our release of the new MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) dataset, composed of over 172 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms. The networks and the dataset together present a promising approach toward creating new expressive and interpretable neural models of music.
Tasks Music Modeling, Piano Music Modeling
Published 2018-10-29
URL http://arxiv.org/abs/1810.12247v5
PDF http://arxiv.org/pdf/1810.12247v5.pdf
PWC https://paperswithcode.com/paper/enabling-factorized-piano-music-modeling-and
Repo https://github.com/BShakhovsky/PolyphonicPianoTranscription
Framework tf

SlowFast Networks for Video Recognition

Title SlowFast Networks for Video Recognition
Authors Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, Kaiming He
Abstract We present SlowFast networks for video recognition. Our model involves (i) a Slow pathway, operating at low frame rate, to capture spatial semantics, and (ii) a Fast pathway, operating at high frame rate, to capture motion at fine temporal resolution. The Fast pathway can be made very lightweight by reducing its channel capacity, yet can learn useful temporal information for video recognition. Our models achieve strong performance for both action classification and detection in video, and large improvements are pin-pointed as contributions by our SlowFast concept. We report state-of-the-art accuracy on major video recognition benchmarks, Kinetics, Charades and AVA. Code has been made available at: https://github.com/facebookresearch/SlowFast
Tasks Action Classification, Action Detection, Video Recognition
Published 2018-12-10
URL https://arxiv.org/abs/1812.03982v3
PDF https://arxiv.org/pdf/1812.03982v3.pdf
PWC https://paperswithcode.com/paper/slowfast-networks-for-video-recognition
Repo https://github.com/Guocode/SlowFast-Networks
Framework none

Fast and Robust Multiple ColorChecker Detection using Deep Convolutional Neural Networks

Title Fast and Robust Multiple ColorChecker Detection using Deep Convolutional Neural Networks
Authors Pedro D. Marrero Fernandez, Fidel A. Guerrero-Peña, Tsang Ing Ren, Jorge J. G. Leandro
Abstract ColorCheckers are reference standards that professional photographers and filmmakers use to ensure predictable results under every lighting condition. The objective of this work is to propose a new fast and robust method for automatic ColorChecker detection. The process is divided into two steps: (1) ColorCheckers localization and (2) ColorChecker patches recognition. For the ColorChecker localization, we trained a detection convolutional neural network using synthetic images. The synthetic images are created with the 3D models of the ColorChecker and different background images. The output of the neural networks are the bounding box of each possible ColorChecker candidates in the input image. Each bounding box defines a cropped image which is evaluated by a recognition system, and each image is canonized with regards to color and dimensions. Subsequently, all possible color patches are extracted and grouped with respect to the center’s distance. Each group is evaluated as a candidate for a ColorChecker part, and its position in the scene is estimated. Finally, a cost function is applied to evaluate the accuracy of the estimation. The method is tested using real and synthetic images. The proposed method is fast, robust to overlaps and invariant to affine projections. The algorithm also performs well in case of multiple ColorCheckers detection.
Tasks
Published 2018-10-19
URL http://arxiv.org/abs/1810.08639v1
PDF http://arxiv.org/pdf/1810.08639v1.pdf
PWC https://paperswithcode.com/paper/fast-and-robust-multiple-colorchecker
Repo https://github.com/pedrodiamel/colorchacker-detection
Framework none

Learning to Separate Object Sounds by Watching Unlabeled Video

Title Learning to Separate Object Sounds by Watching Unlabeled Video
Authors Ruohan Gao, Rogerio Feris, Kristen Grauman
Abstract Perceiving a scene most fully requires all the senses. Yet modeling how objects look and sound is challenging: most natural scenes and events contain multiple objects, and the audio track mixes all the sound sources together. We propose to learn audio-visual object models from unlabeled video, then exploit the visual context to perform audio source separation in novel videos. Our approach relies on a deep multi-instance multi-label learning framework to disentangle the audio frequency bases that map to individual visual objects, even without observing/hearing those objects in isolation. We show how the recovered disentangled bases can be used to guide audio source separation to obtain better-separated, object-level sounds. Our work is the first to learn audio source separation from large-scale “in the wild” videos containing multiple audio sources per video. We obtain state-of-the-art results on visually-aided audio source separation and audio denoising. Our video results: http://vision.cs.utexas.edu/projects/separating_object_sounds/
Tasks Audio Denoising, Denoising, Multi-Label Learning
Published 2018-04-05
URL http://arxiv.org/abs/1804.01665v2
PDF http://arxiv.org/pdf/1804.01665v2.pdf
PWC https://paperswithcode.com/paper/learning-to-separate-object-sounds-by
Repo https://github.com/rhgao/Deep-MIML-Network
Framework pytorch

LaneNet: Real-Time Lane Detection Networks for Autonomous Driving

Title LaneNet: Real-Time Lane Detection Networks for Autonomous Driving
Authors Ze Wang, Weiqiang Ren, Qiang Qiu
Abstract Lane detection is to detect lanes on the road and provide the accurate location and shape of each lane. It severs as one of the key techniques to enable modern assisted and autonomous driving systems. However, several unique properties of lanes challenge the detection methods. The lack of distinctive features makes lane detection algorithms tend to be confused by other objects with similar local appearance. Moreover, the inconsistent number of lanes on a road as well as diverse lane line patterns, e.g. solid, broken, single, double, merging, and splitting lines further hamper the performance. In this paper, we propose a deep neural network based method, named LaneNet, to break down the lane detection into two stages: lane edge proposal and lane line localization. Stage one uses a lane edge proposal network for pixel-wise lane edge classification, and the lane line localization network in stage two then detects lane lines based on lane edge proposals. Please note that the goal of our LaneNet is built to detect lane line only, which introduces more difficulties on suppressing the false detections on the similar lane marks on the road like arrows and characters. Despite all the difficulties, our lane detection is shown to be robust to both highway and urban road scenarios method without relying on any assumptions on the lane number or the lane line patterns. The high running speed and low computational cost endow our LaneNet the capability of being deployed on vehicle-based systems. Experiments validate that our LaneNet consistently delivers outstanding performances on real world traffic scenarios.
Tasks Autonomous Driving, Lane Detection
Published 2018-07-04
URL http://arxiv.org/abs/1807.01726v1
PDF http://arxiv.org/pdf/1807.01726v1.pdf
PWC https://paperswithcode.com/paper/lanenet-real-time-lane-detection-networks-for
Repo https://github.com/klintan/pytorch-lanenet
Framework pytorch

Model compression via distillation and quantization

Title Model compression via distillation and quantization
Authors Antonio Polino, Razvan Pascanu, Dan Alistarh
Abstract Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning. One aspect of the field receiving considerable attention is efficiently executing deep models in resource-constrained environments, such as mobile or embedded devices. This paper focuses on this problem, and proposes two new compression methods, which jointly leverage weight quantization and distillation of larger teacher networks into smaller student networks. The first method we propose is called quantized distillation and leverages distillation during the training process, by incorporating distillation loss, expressed with respect to the teacher, into the training of a student network whose weights are quantized to a limited set of levels. The second method, differentiable quantization, optimizes the location of quantization points through stochastic gradient descent, to better fit the behavior of the teacher model. We validate both methods through experiments on convolutional and recurrent architectures. We show that quantized shallow students can reach similar accuracy levels to full-precision teacher models, while providing order of magnitude compression, and inference speedup that is linear in the depth reduction. In sum, our results enable DNNs for resource-constrained environments to leverage architecture and accuracy advances developed on more powerful devices.
Tasks Model Compression, Quantization
Published 2018-02-15
URL http://arxiv.org/abs/1802.05668v1
PDF http://arxiv.org/pdf/1802.05668v1.pdf
PWC https://paperswithcode.com/paper/model-compression-via-distillation-and
Repo https://github.com/NervanaSystems/distiller
Framework pytorch

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

Title AMC: AutoML for Model Compression and Acceleration on Mobile Devices
Authors Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han
Abstract Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets. Conventional model compression techniques rely on hand-crafted heuristics and rule-based policies that require domain experts to explore the large design space trading off among model size, speed, and accuracy, which is usually sub-optimal and time-consuming. In this paper, we propose AutoML for Model Compression (AMC) which leverage reinforcement learning to provide the model compression policy. This learning-based compression policy outperforms conventional rule-based compression policy by having higher compression ratio, better preserving the accuracy and freeing human labor. Under 4x FLOPs reduction, we achieved 2.7% better accuracy than the handcrafted model compression policy for VGG-16 on ImageNet. We applied this automated, push-the-button compression pipeline to MobileNet and achieved 1.81x speedup of measured inference latency on an Android phone and 1.43x speedup on the Titan XP GPU, with only 0.1% loss of ImageNet Top-1 accuracy.
Tasks AutoML, Model Compression, Neural Architecture Search
Published 2018-02-10
URL http://arxiv.org/abs/1802.03494v4
PDF http://arxiv.org/pdf/1802.03494v4.pdf
PWC https://paperswithcode.com/paper/amc-automl-for-model-compression-and
Repo https://github.com/NervanaSystems/distiller
Framework pytorch

Object-Oriented Dynamics Predictor

Title Object-Oriented Dynamics Predictor
Authors Guangxiang Zhu, Zhiao Huang, Chongjie Zhang
Abstract Generalization has been one of the major challenges for learning dynamics models in model-based reinforcement learning. However, previous work on action-conditioned dynamics prediction focuses on learning the pixel-level motion and thus does not generalize well to novel environments with different object layouts. In this paper, we present a novel object-oriented framework, called object-oriented dynamics predictor (OODP), which decomposes the environment into objects and predicts the dynamics of objects conditioned on both actions and object-to-object relations. It is an end-to-end neural network and can be trained in an unsupervised manner. To enable the generalization ability of dynamics learning, we design a novel CNN-based relation mechanism that is class-specific (rather than object-specific) and exploits the locality principle. Empirical results show that OODP significantly outperforms previous methods in terms of generalization over novel environments with various object layouts. OODP is able to learn from very few environments and accurately predict dynamics in a large number of unseen environments. In addition, OODP learns semantically and visually interpretable dynamics models.
Tasks
Published 2018-05-25
URL http://arxiv.org/abs/1806.07371v3
PDF http://arxiv.org/pdf/1806.07371v3.pdf
PWC https://paperswithcode.com/paper/object-oriented-dynamics-predictor
Repo https://github.com/mig-zh/OODP
Framework tf

Deep-FSMN for Large Vocabulary Continuous Speech Recognition

Title Deep-FSMN for Large Vocabulary Continuous Speech Recognition
Authors Shiliang Zhang, Ming Lei, Zhijie Yan, Lirong Dai
Abstract In this paper, we present an improved feedforward sequential memory networks (FSMN) architecture, namely Deep-FSMN (DFSMN), by introducing skip connections between memory blocks in adjacent layers. These skip connections enable the information flow across different layers and thus alleviate the gradient vanishing problem when building very deep structure. As a result, DFSMN significantly benefits from these skip connections and deep structure. We have compared the performance of DFSMN to BLSTM both with and without lower frame rate (LFR) on several large speech recognition tasks, including English and Mandarin. Experimental results shown that DFSMN can consistently outperform BLSTM with dramatic gain, especially trained with LFR using CD-Phone as modeling units. In the 2000 hours Fisher (FSH) task, the proposed DFSMN can achieve a word error rate of 9.4% by purely using the cross-entropy criterion and decoding with a 3-gram language model, which achieves a 1.5% absolute improvement compared to the BLSTM. In a 20000 hours Mandarin recognition task, the LFR trained DFSMN can achieve more than 20% relative improvement compared to the LFR trained BLSTM. Moreover, we can easily design the lookahead filter order of the memory blocks in DFSMN to control the latency for real-time applications.
Tasks Language Modelling, Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published 2018-03-04
URL http://arxiv.org/abs/1803.05030v1
PDF http://arxiv.org/pdf/1803.05030v1.pdf
PWC https://paperswithcode.com/paper/deep-fsmn-for-large-vocabulary-continuous
Repo https://github.com/yangxueruivs/DFSMN
Framework tf

Malthusian Reinforcement Learning

Title Malthusian Reinforcement Learning
Authors Joel Z. Leibo, Julien Perolat, Edward Hughes, Steven Wheelwright, Adam H. Marblestone, Edgar Duéñez-Guzmán, Peter Sunehag, Iain Dunning, Thore Graepel
Abstract Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation. In Malthusian RL, increases in a subpopulation’s average return drive subsequent increases in its size, just as Thomas Malthus argued in 1798 was the relationship between preindustrial income levels and population growth. Malthusian reinforcement learning harnesses the competitive pressures arising from growing and shrinking population size to drive agents to explore regions of state and policy spaces that they could not otherwise reach. Furthermore, in environments where there are potential gains from specialization and division of labor, we show that Malthusian reinforcement learning is better positioned to take advantage of such synergies than algorithms based on self-play.
Tasks Multi-agent Reinforcement Learning
Published 2018-12-17
URL http://arxiv.org/abs/1812.07019v2
PDF http://arxiv.org/pdf/1812.07019v2.pdf
PWC https://paperswithcode.com/paper/malthusian-reinforcement-learning
Repo https://github.com/AbhijeetPendyala/Knowledge_base_ML
Framework tf

Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points

Title Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points
Authors Fabien Baradel, Christian Wolf, Julien Mille, Graham W. Taylor
Abstract We propose a method for human activity recognition from RGB data that does not rely on any pose information during test time and does not explicitly calculate pose information internally. Instead, a visual attention module learns to predict glimpse sequences in each frame. These glimpses correspond to interest points in the scene that are relevant to the classified activities. No spatial coherence is forced on the glimpse locations, which gives the module liberty to explore different points at each frame and better optimize the process of scrutinizing visual information. Tracking and sequentially integrating this kind of unstructured data is a challenge, which we address by separating the set of glimpses from a set of recurrent tracking/recognition workers. These workers receive glimpses, jointly performing subsequent motion tracking and activity prediction. The glimpses are soft-assigned to the workers, optimizing coherence of the assignments in space, time and feature space using an external memory module. No hard decisions are taken, i.e. each glimpse point is assigned to all existing workers, albeit with different importance. Our methods outperform state-of-the-art methods on the largest human activity recognition dataset available to-date; NTU RGB+D Dataset, and on a smaller human action recognition dataset Northwestern-UCLA Multiview Action 3D Dataset. Our code is publicly available at https://github.com/fabienbaradel/glimpse_clouds.
Tasks Action Recognition In Videos, Activity Prediction, Activity Recognition, Human Activity Recognition, Skeleton Based Action Recognition, Temporal Action Localization
Published 2018-02-22
URL http://arxiv.org/abs/1802.07898v4
PDF http://arxiv.org/pdf/1802.07898v4.pdf
PWC https://paperswithcode.com/paper/glimpse-clouds-human-activity-recognition
Repo https://github.com/fabienbaradel/glimpse_clouds
Framework pytorch

Deep reinforcement learning for time series: playing idealized trading games

Title Deep reinforcement learning for time series: playing idealized trading games
Authors Xiang Gao
Abstract Deep Q-learning is investigated as an end-to-end solution to estimate the optimal strategies for acting on time series input. Experiments are conducted on two idealized trading games. 1) Univariate: the only input is a wave-like price time series, and 2) Bivariate: the input includes a random stepwise price time series and a noisy signal time series, which is positively correlated with future price changes. The Univariate game tests whether the agent can capture the underlying dynamics, and the Bivariate game tests whether the agent can utilize the hidden relation among the inputs. Stacked Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM) units, Convolutional Neural Network (CNN), and multi-layer perceptron (MLP) are used to model Q values. For both games, all agents successfully find a profitable strategy. The GRU-based agents show best overall performance in the Univariate game, while the MLP-based agents outperform others in the Bivariate game.
Tasks Q-Learning, Time Series
Published 2018-03-11
URL http://arxiv.org/abs/1803.03916v1
PDF http://arxiv.org/pdf/1803.03916v1.pdf
PWC https://paperswithcode.com/paper/deep-reinforcement-learning-for-time-series
Repo https://github.com/golsun/deep-RL-time-series
Framework none

DSR: Direct Self-rectification for Uncalibrated Dual-lens Cameras

Title DSR: Direct Self-rectification for Uncalibrated Dual-lens Cameras
Authors Ruichao Xiao, Wenxiu Sun, Jiahao Pang, Qiong Yan, Jimmy Ren
Abstract With the developments of dual-lens camera modules,depth information representing the third dimension of thecaptured scenes becomes available for smartphones. It isestimated by stereo matching algorithms, taking as input thetwo views captured by dual-lens cameras at slightly differ-ent viewpoints. Depth-of-field rendering (also be referred toas synthetic defocus or bokeh) is one of the trending depth-based applications. However, to achieve fast depth estima-tion on smartphones, the stereo pairs need to be rectified inthe first place. In this paper, we propose a cost-effective so-lution to perform stereo rectification for dual-lens camerascalled direct self-rectification, short for DSR1. It removesthe need of individual offline calibration for every pair ofdual-lens cameras. In addition, the proposed solution isrobust to the slight movements, e.g., due to collisions, ofthe dual-lens cameras after fabrication. Different with ex-isting self-rectification approaches, our approach computesthe homography in a novel way with zero geometric distor-tions introduced to the master image. It is achieved by di-rectly minimizing the vertical displacements of correspond-ing points between the original master image and the trans-formed slave image. Our method is evaluated on both real-istic and synthetic stereo image pairs, and produces supe-rior results compared to the calibrated rectification or otherself-rectification approaches
Tasks Calibration, Stereo Matching, Stereo Matching Hand
Published 2018-09-26
URL http://arxiv.org/abs/1809.09763v1
PDF http://arxiv.org/pdf/1809.09763v1.pdf
PWC https://paperswithcode.com/paper/dsr-direct-self-rectification-for
Repo https://github.com/garroud/self-rectification
Framework none

End-to-End Learning of Communications Systems Without a Channel Model

Title End-to-End Learning of Communications Systems Without a Channel Model
Authors Fayçal Ait Aoudia, Jakob Hoydis
Abstract The idea of end-to-end learning of communications systems through neural network -based autoencoders has the shortcoming that it requires a differentiable channel model. We present in this paper a novel learning algorithm which alleviates this problem. The algorithm iterates between supervised training of the receiver and reinforcement learning -based training of the transmitter. We demonstrate that this approach works as well as fully supervised methods on additive white Gaussian noise (AWGN) and Rayleigh block-fading (RBF) channels. Surprisingly, while our method converges slower on AWGN channels than supervised training, it converges faster on RBF channels. Our results are a first step towards learning of communications systems over any type of channel without prior assumptions.
Tasks
Published 2018-04-06
URL http://arxiv.org/abs/1804.02276v3
PDF http://arxiv.org/pdf/1804.02276v3.pdf
PWC https://paperswithcode.com/paper/end-to-end-learning-of-communications-systems
Repo https://github.com/Aithu-Snehith/End-to-End-Learning-of-Communications-Systems-Without-a-Channel-Model
Framework tf
comments powered by Disqus