February 1, 2020

3364 words 16 mins read

Paper Group AWR 326

Comyco: Quality-Aware Adaptive Video Streaming via Imitation Learning. Rank-consistent Ordinal Regression for Neural Networks. Learning Spatio-Temporal Representation with Local and Global Diffusion. Robust Lane Detection from Continuous Driving Scenes Using Deep Neural Networks. An Ensemble of Bayesian Neural Networks for Exoplanetary Atmospheric …

Comyco: Quality-Aware Adaptive Video Streaming via Imitation Learning


Title	Comyco: Quality-Aware Adaptive Video Streaming via Imitation Learning
Authors	Tianchi Huang, Chao Zhou, Rui-Xiao Zhang, Chenglei Wu, Xin Yao, Lifeng Sun
Abstract	Learning-based Adaptive Bit Rate~(ABR) method, aiming to learn outstanding strategies without any presumptions, has become one of the research hotspots for adaptive streaming. However, it typically suffers from several issues, i.e., low sample efficiency and lack of awareness of the video quality information. In this paper, we propose Comyco, a video quality-aware ABR approach that enormously improves the learning-based methods by tackling the above issues. Comyco trains the policy via imitating expert trajectories given by the instant solver, which can not only avoid redundant exploration but also make better use of the collected samples. Meanwhile, Comyco attempts to pick the chunk with higher perceptual video qualities rather than video bitrates. To achieve this, we construct Comyco’s neural network architecture, video datasets and QoE metrics with video quality features. Using trace-driven and real-world experiments, we demonstrate significant improvements of Comyco’s sample efficiency in comparison to prior work, with 1700x improvements in terms of the number of samples required and 16x improvements on training time required. Moreover, results illustrate that Comyco outperforms previously proposed methods, with the improvements on average QoE of 7.5% - 16.79%. Especially, Comyco also surpasses state-of-the-art approach Pensieve by 7.37% on average video quality under the same rebuffering time.
Tasks	Imitation Learning
Published	2019-08-06
URL	https://arxiv.org/abs/1908.02270v2
PDF	https://arxiv.org/pdf/1908.02270v2.pdf
PWC	https://paperswithcode.com/paper/comyco-quality-aware-adaptive-video-streaming
Repo	https://github.com/thu-media/Comyco
Framework	tf

Rank-consistent Ordinal Regression for Neural Networks


Title	Rank-consistent Ordinal Regression for Neural Networks
Authors	Wenzhi Cao, Vahid Mirjalili, Sebastian Raschka
Abstract	Extraordinary progress has been made towards developing neural network architectures for classification tasks. However, commonly used loss functions such as the multi-category cross entropy loss are inadequate for ranking and ordinal regression problems. Hence, approaches that utilize neural networks for ordinal regression tasks transform ordinal target variables into a series of binary classification tasks but suffer from inconsistencies among the different binary classifiers. Thus, we propose a new framework (Consistent Rank Logits, CORAL) with theoretical guarantees for rank-monotonicity and consistent confidence scores. Through parameter sharing, our framework also benefits from lower training complexity and can easily be implemented to extend conventional convolutional neural network classifiers for ordinal regression tasks. Furthermore, the empirical evaluation of our method on a range of face image datasets for age prediction shows a substantial improvement compared to the current state-of-the-art ordinal regression method.
Tasks	Age And Gender Classification, Age Estimation, Gender Prediction
Published	2019-01-20
URL	https://arxiv.org/abs/1901.07884v4
PDF	https://arxiv.org/pdf/1901.07884v4.pdf
PWC	https://paperswithcode.com/paper/consistent-rank-logits-for-ordinal-regression
Repo	https://github.com/mshehrozsajjad/Age-Classification
Framework	pytorch

Learning Spatio-Temporal Representation with Local and Global Diffusion


Title	Learning Spatio-Temporal Representation with Local and Global Diffusion
Authors	Zhaofan Qiu, Ting Yao, Chong-Wah Ngo, Xinmei Tian, Tao Mei
Abstract	Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for visual recognition problems. Nevertheless, the convolutional filters in these networks are local operations while ignoring the large-range dependency. Such drawback becomes even worse particularly for video recognition, since video is an information-intensive media with complex temporal variations. In this paper, we present a novel framework to boost the spatio-temporal representation learning by Local and Global Diffusion (LGD). Specifically, we construct a novel neural network architecture that learns the local and global representations in parallel. The architecture is composed of LGD blocks, where each block updates local and global features by modeling the diffusions between these two representations. Diffusions effectively interact two aspects of information, i.e., localized and holistic, for more powerful way of representation learning. Furthermore, a kernelized classifier is introduced to combine the representations from two aspects for video recognition. Our LGD networks achieve clear improvements on the large-scale Kinetics-400 and Kinetics-600 video classification datasets against the best competitors by 3.5% and 0.7%. We further examine the generalization of both the global and local representations produced by our pre-trained LGD networks on four different benchmarks for video action recognition and spatio-temporal action detection tasks. Superior performances over several state-of-the-art techniques on these benchmarks are reported. Code is available at: https://github.com/ZhaofanQiu/local-and-global-diffusion-networks.
Tasks	Action Detection, Representation Learning, Temporal Action Localization, Video Classification, Video Recognition
Published	2019-06-13
URL	https://arxiv.org/abs/1906.05571v1
PDF	https://arxiv.org/pdf/1906.05571v1.pdf
PWC	https://paperswithcode.com/paper/learning-spatio-temporal-representation-with-3
Repo	https://github.com/ZhaofanQiu/local-and-global-diffusion-networks
Framework	none

Robust Lane Detection from Continuous Driving Scenes Using Deep Neural Networks


Title	Robust Lane Detection from Continuous Driving Scenes Using Deep Neural Networks
Authors	Qin Zou, Hanwen Jiang, Qiyu Dai, Yuanhao Yue, Long Chen, Qian Wang
Abstract	Lane detection in driving scenes is an important module for autonomous vehicles and advanced driver assistance systems. In recent years, many sophisticated lane detection methods have been proposed. However, most methods focus on detecting the lane from one single image, and often lead to unsatisfactory performance in handling some extremely-bad situations such as heavy shadow, severe mark degradation, serious vehicle occlusion, and so on. In fact, lanes are continuous line structures on the road. Consequently, the lane that cannot be accurately detected in one current frame may potentially be inferred out by incorporating information of previous frames. To this end, we investigate lane detection by using multiple frames of a continuous driving scene, and propose a hybrid deep architecture by combining the convolutional neural network (CNN) and the recurrent neural network (RNN). Specifically, information of each frame is abstracted by a CNN block, and the CNN features of multiple continuous frames, holding the property of time-series, are then fed into the RNN block for feature learning and lane prediction. Extensive experiments on two large-scale datasets demonstrate that, the proposed method outperforms the competing methods in lane detection, especially in handling difficult situations.
Tasks	Autonomous Vehicles, Lane Detection, Time Series
Published	2019-03-06
URL	http://arxiv.org/abs/1903.02193v1
PDF	http://arxiv.org/pdf/1903.02193v1.pdf
PWC	https://paperswithcode.com/paper/robust-lane-detection-from-continuous-driving
Repo	https://github.com/NickLucche/lane-detection
Framework	pytorch

An Ensemble of Bayesian Neural Networks for Exoplanetary Atmospheric Retrieval


Title	An Ensemble of Bayesian Neural Networks for Exoplanetary Atmospheric Retrieval
Authors	Adam D. Cobb, Michael D. Himes, Frank Soboczenski, Simone Zorzan, Molly D. O’Beirne, Atılım Güneş Baydin, Yarin Gal, Shawn D. Domagal-Goldman, Giada N. Arney, Daniel Angerhausen
Abstract	Machine learning is now used in many areas of astrophysics, from detecting exoplanets in Kepler transit signals to removing telescope systematics. Recent work demonstrated the potential of using machine learning algorithms for atmospheric retrieval by implementing a random forest to perform retrievals in seconds that are consistent with the traditional, computationally-expensive nested-sampling retrieval method. We expand upon their approach by presenting a new machine learning model, \texttt{plan-net}, based on an ensemble of Bayesian neural networks that yields more accurate inferences than the random forest for the same data set of synthetic transmission spectra. We demonstrate that an ensemble provides greater accuracy and more robust uncertainties than a single model. In addition to being the first to use Bayesian neural networks for atmospheric retrieval, we also introduce a new loss function for Bayesian neural networks that learns correlations between the model outputs. Importantly, we show that designing machine learning models to explicitly incorporate domain-specific knowledge both improves performance and provides additional insight by inferring the covariance of the retrieved atmospheric parameters. We apply \texttt{plan-net} to the Hubble Space Telescope Wide Field Camera 3 transmission spectrum for WASP-12b and retrieve an isothermal temperature and water abundance consistent with the literature. We highlight that our method is flexible and can be expanded to higher-resolution spectra and a larger number of atmospheric parameters.
Tasks
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10659v1
PDF	https://arxiv.org/pdf/1905.10659v1.pdf
PWC	https://paperswithcode.com/paper/an-ensemble-of-bayesian-neural-networks-for
Repo	https://github.com/exoml/plan-net
Framework	tf

Stein Variational Online Changepoint Detection with Applications to Hawkes Processes and Neural Networks


Title	Stein Variational Online Changepoint Detection with Applications to Hawkes Processes and Neural Networks
Authors	Gianluca Detommaso, Hanne Hoitzing, Tiangang Cui, Ardavan Alamir
Abstract	Bayesian online changepoint detection (BOCPD) (Adams & MacKay, 2007) offers a rigorous and viable way to identify changepoints in complex systems. In this work, we introduce a Stein variational online changepoint detection (SVOCD) method to provide a computationally tractable generalization of BOCPD beyond the exponential family of probability distributions. We integrate the recently developed Stein variational Newton (SVN) method (Detommaso et al., 2018) and BOCPD to offer a full online Bayesian treatment for a large number of situations with significant importance in practice. We apply the resulting method to two challenging and novel applications: Hawkes processes and long short-term memory (LSTM) neural networks. In both cases, we successfully demonstrate the efficacy of our method on real data.
Tasks
Published	2019-01-23
URL	https://arxiv.org/abs/1901.07987v2
PDF	https://arxiv.org/pdf/1901.07987v2.pdf
PWC	https://paperswithcode.com/paper/stein-variational-online-changepoint
Repo	https://github.com/gianlucadetommaso/Stein-variational-samplers
Framework	none

Learning with Fenchel-Young Losses


Title	Learning with Fenchel-Young Losses
Authors	Mathieu Blondel, André F. T. Martins, Vlad Niculae
Abstract	Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction. Understanding the core principles and theoretical properties underpinning these losses is key to choose the right loss for the right problem, as well as to create new losses which combine their strengths. In this paper, we introduce Fenchel-Young losses, a generic way to construct a convex loss function for a regularized prediction function. We provide an in-depth study of their properties in a very broad setting, covering all the aforementioned supervised learning tasks, and revealing new connections between sparsity, generalized entropies, and separation margins. We show that Fenchel-Young losses unify many well-known loss functions and allow to create useful new ones easily. Finally, we derive efficient predictive and training algorithms, making Fenchel-Young losses appealing both in theory and practice.
Tasks	Structured Prediction
Published	2019-01-08
URL	https://arxiv.org/abs/1901.02324v2
PDF	https://arxiv.org/pdf/1901.02324v2.pdf
PWC	https://paperswithcode.com/paper/learning-with-fenchel-young-losses
Repo	https://github.com/mblondel/projection-losses
Framework	none

Combining Experience Replay with Exploration by Random Network Distillation


Title	Combining Experience Replay with Exploration by Random Network Distillation
Authors	Francesco Sovrano
Abstract	Our work is a simple extension of the paper “Exploration by Random Network Distillation”. More in detail, we show how to efficiently combine Intrinsic Rewards with Experience Replay in order to achieve more efficient and robust exploration (with respect to PPO/RND) and consequently better results in terms of agent performances and sample efficiency. We are able to do it by using a new technique named Prioritized Oversampled Experience Replay (POER), that has been built upon the definition of what is the important experience useful to replay. Finally, we evaluate our technique on the famous Atari game Montezuma’s Revenge and some other hard exploration Atari games.
Tasks	Atari Games, Montezuma’s Revenge
Published	2019-05-18
URL	https://arxiv.org/abs/1905.07579v1
PDF	https://arxiv.org/pdf/1905.07579v1.pdf
PWC	https://paperswithcode.com/paper/combining-experience-replay-with-exploration
Repo	https://github.com/Francesco-Sovrano/Combining--experience-replay--with--exploration-by-random-network-distillation-
Framework	tf

Visualizing the PHATE of Neural Networks


Title	Visualizing the PHATE of Neural Networks
Authors	Scott Gigante, Adam S. Charles, Smita Krishnaswamy, Gal Mishne
Abstract	Understanding why and how certain neural networks outperform others is key to guiding future development of network architectures and optimization methods. To this end, we introduce a novel visualization algorithm that reveals the internal geometry of such networks: Multislice PHATE (M-PHATE), the first method designed explicitly to visualize how a neural network’s hidden representations of data evolve throughout the course of training. We demonstrate that our visualization provides intuitive, detailed summaries of the learning dynamics beyond simple global measures (i.e., validation loss and accuracy), without the need to access validation data. Furthermore, M-PHATE better captures both the dynamics and community structure of the hidden units as compared to visualization based on standard dimensionality reduction methods (e.g., ISOMAP, t-SNE). We demonstrate M-PHATE with two vignettes: continual learning and generalization. In the former, the M-PHATE visualizations display the mechanism of “catastrophic forgetting” which is a major challenge for learning in task-switching contexts. In the latter, our visualizations reveal how increased heterogeneity among hidden units correlates with improved generalization performance. An implementation of M-PHATE, along with scripts to reproduce the figures in this paper, is available at https://github.com/scottgigante/M-PHATE.
Tasks	Continual Learning, Dimensionality Reduction
Published	2019-08-07
URL	https://arxiv.org/abs/1908.02831v1
PDF	https://arxiv.org/pdf/1908.02831v1.pdf
PWC	https://paperswithcode.com/paper/visualizing-the-phate-of-neural-networks
Repo	https://github.com/syyunn/awesome-nn-visualization
Framework	none

High Speed and High Dynamic Range Video with an Event Camera


Title	High Speed and High Dynamic Range Video with an Event Camera
Authors	Henri Rebecq, René Ranftl, Vladlen Koltun, Davide Scaramuzza
Abstract	Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous “events” instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution, high dynamic range, and no motion blur. While the stream of events encodes in principle the complete visual signal, the reconstruction of an intensity image from a stream of events is an ill-posed problem in practice. Existing reconstruction approaches are based on hand-crafted priors and strong assumptions about the imaging process as well as the statistics of natural images. In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors. We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data. During training we propose to use a perceptual loss to encourage reconstructions to follow natural image statistics. We further extend our approach to synthesize color images from color event streams. Our network surpasses state-of-the-art reconstruction methods by a large margin in terms of image quality (> 20%), while comfortably running in real-time. We show that the network is able to synthesize high framerate videos (> 5,000 frames per second) of high-speed phenomena (e.g. a bullet hitting an object) and is able to provide high dynamic range reconstructions in challenging lighting conditions. We also demonstrate the effectiveness of our reconstructions as an intermediate representation for event data. We show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as object classification and visual-inertial odometry and that this strategy consistently outperforms algorithms that were specifically designed for event data.
Tasks	Object Classification
Published	2019-06-15
URL	https://arxiv.org/abs/1906.07165v1
PDF	https://arxiv.org/pdf/1906.07165v1.pdf
PWC	https://paperswithcode.com/paper/high-speed-and-high-dynamic-range-video-with
Repo	https://github.com/uzh-rpg/rpg_e2vid
Framework	pytorch

Distilling Effective Supervision from Severe Label Noise


Title	Distilling Effective Supervision from Severe Label Noise
Authors	Zizhao Zhang, Han Zhang, Sercan O. Arik, Honglak Lee, Tomas Pfister
Abstract	Collecting large-scale data with clean labels for supervised training of neural networks is practically challenging. Although noisy labels are usually cheap to acquire, existing methods suffer a lot from label noise. This paper targets at the challenge of robust training at high label noise regimes. The key insight to achieve this goal is to wisely leverage a small trusted set to estimate exemplar weights and pseudo labels for noisy data in order to reuse them for supervised training. We present a holistic framework to train deep neural networks in a way that is highly invulnerable to label noise. Our method sets the new state of the art on various types of label noise and achieves excellent performance on large-scale datasets with real-world label noise. For instance, on CIFAR100 with a $40%$ uniform noise ratio and only 10 trusted labeled data per class, our method achieves $80.2{\pm}0.3%$ classification accuracy, where the error rate is only $1.4%$ higher than a neural network trained without label noise. Moreover, increasing the noise ratio to $80%$, our method still maintains a high accuracy of $75.5{\pm}0.2%$, compared to the previous best accuracy $48.2%$. Source code available: https://github.com/google-research/google-research/tree/master/ieg
Tasks	Image Classification
Published	2019-10-01
URL	https://arxiv.org/abs/1910.00701v4
PDF	https://arxiv.org/pdf/1910.00701v4.pdf
PWC	https://paperswithcode.com/paper/ieg-robust-neural-network-training-to-tackle
Repo	https://github.com/google-research/google-research/tree/master/ieg
Framework	tf

ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network


Title	ICD Coding from Clinical Text Using Multi-Filter Residual Convolutional Neural Network
Authors	Fei Li, Hong Yu
Abstract	Automated ICD coding, which assigns the International Classification of Disease codes to patient visits, has attracted much research attention since it can save time and labor for billing. The previous state-of-the-art model utilized one convolutional layer to build document representations for predicting ICD codes. However, the lengths and grammar of text fragments, which are closely related to ICD coding, vary a lot in different documents. Therefore, a flat and fixed-length convolutional architecture may not be capable of learning good document representations. In this paper, we proposed a Multi-Filter Residual Convolutional Neural Network (MultiResCNN) for ICD coding. The innovations of our model are two-folds: it utilizes a multi-filter convolutional layer to capture various text patterns with different lengths and a residual convolutional layer to enlarge the receptive field. We evaluated the effectiveness of our model on the widely-used MIMIC dataset. On the full code set of MIMIC-III, our model outperformed the state-of-the-art model in 4 out of 6 evaluation metrics. On the top-50 code set of MIMIC-III and the full code set of MIMIC-II, our model outperformed all the existing and state-of-the-art models in all evaluation metrics. The code is available at https://github.com/foxlf823/Multi-Filter-Residual-Convolutional-Neural-Network.
Tasks
Published	2019-11-25
URL	https://arxiv.org/abs/1912.00862v1
PDF	https://arxiv.org/pdf/1912.00862v1.pdf
PWC	https://paperswithcode.com/paper/icd-coding-from-clinical-text-using-multi
Repo	https://github.com/foxlf823/Multi-Filter-Residual-Convolutional-Neural-Network
Framework	pytorch

What Do You Mean `Why?': Resolving Sluices in Conversations


Title	What Do You Mean `Why?': Resolving Sluices in Conversations \|
Authors	Victor Petrén Bach Hansen, Anders Søgaard
Abstract	In conversation, we often ask one-word questions such as `Why?' or` Who?'. Such questions are typically easy for humans to answer, but can be hard for computers, because their resolution requires retrieving both the right semantic frames and the right arguments from context. This paper introduces the novel ellipsis resolution task of resolving such one-word questions, referred to as sluices in linguistics. We present a crowd-sourced dataset containing annotations of sluices from over 4,000 dialogues collected from conversational QA datasets, as well as a series of strong baseline architectures.
Tasks
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09478v1
PDF	https://arxiv.org/pdf/1911.09478v1.pdf
PWC	https://paperswithcode.com/paper/what-do-you-mean-why-resolving-sluices-in
Repo	https://github.com/vpetren/conv_sluice_resolution
Framework	none

Learning to Ask: Question-based Sequential Bayesian Product Search


Title	Learning to Ask: Question-based Sequential Bayesian Product Search
Authors	Jie Zou, Evangelos Kanoulas
Abstract	Product search is generally recognized as the first and foremost stage of online shopping and thus significant for users and retailers of e-commerce. Most of the traditional retrieval methods use some similarity functions to match the user’s query and the document that describes a product, either directly or in a latent vector space. However, user queries are often too general to capture the minute details of the specific product that a user is looking for. In this paper, we propose a novel interactive method to effectively locate the best matching product. The method is based on the assumption that there is a set of candidate questions for each product to be asked. In this work, we instantiate this candidate set by making the hypothesis that products can be discriminated by the entities that appear in the documents associated with them. We propose a Question-based Sequential Bayesian Product Search method, QSBPS, which directly queries users on the expected presence of entities in the relevant product documents. The method learns the product relevance as well as the reward of the potential questions to be asked to the user by being trained on the search history and purchase behavior of a specific user together with that of other users. The experimental results show that the proposed method can greatly improve the performance of product search compared to the state-of-the-art baselines.
Tasks
Published	2019-08-30
URL	https://arxiv.org/abs/1908.11733v1
PDF	https://arxiv.org/pdf/1908.11733v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-ask-question-based-sequential
Repo	https://github.com/UvA-HuMIL/QSBPS
Framework	none

Learning Cross-lingual Embeddings from Twitter via Distant Supervision


Title	Learning Cross-lingual Embeddings from Twitter via Distant Supervision
Authors	Jose Camacho-Collados, Yerai Doval, Eugenio Martínez-Cámara, Luis Espinosa-Anke, Francesco Barbieri, Steven Schockaert
Abstract	Cross-lingual embeddings represent the meaning of words from different languages in the same vector space. Recent work has shown that it is possible to construct such representations by aligning independently learned monolingual embedding spaces, and that accurate alignments can be obtained even without external bilingual data. In this paper we explore a research direction that has been surprisingly neglected in the literature: leveraging noisy user-generated text to learn cross-lingual embeddings particularly tailored towards social media applications. While the noisiness and informal nature of the social media genre poses additional challenges to cross-lingual embedding methods, we find that it also provides key opportunities due to the abundance of code-switching and the existence of a shared vocabulary of emoji and named entities. Our contribution consists of a very simple post-processing step that exploits these phenomena to significantly improve the performance of state-of-the-art alignment methods.
Tasks
Published	2019-05-17
URL	https://arxiv.org/abs/1905.07358v3
PDF	https://arxiv.org/pdf/1905.07358v3.pdf
PWC	https://paperswithcode.com/paper/learning-cross-lingual-embeddings-from
Repo	https://github.com/pedrada88/crossembeddings-twitter
Framework	none