January 30, 2020

3391 words 16 mins read

Paper Group ANR 270

Reinforcement learning for bandwidth estimation and congestion control in real-time communications. Ctrl-Z: Recovering from Instability in Reinforcement Learning. QuicK-means: Acceleration of K-means by learning a fast transform. Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization. Tag-based Semantic Feat …

Reinforcement learning for bandwidth estimation and congestion control in real-time communications


Title	Reinforcement learning for bandwidth estimation and congestion control in real-time communications
Authors	Joyce Fang, Martin Ellis, Bin Li, Siyao Liu, Yasaman Hosseinkashi, Michael Revow, Albert Sadovnikov, Ziyuan Liu, Peng Cheng, Sachin Ashok, David Zhao, Ross Cutler, Yan Lu, Johannes Gehrke
Abstract	Bandwidth estimation and congestion control for real-time communications (i.e., audio and video conferencing) remains a difficult problem, despite many years of research. Achieving high quality of experience (QoE) for end users requires continual updates due to changing network architectures and technologies. In this paper, we apply reinforcement learning for the first time to the problem of real-time communications (RTC), where we seek to optimize user-perceived quality. We present initial proof-of-concept results, where we learn an agent to control sending rate in an RTC system, evaluating using both network simulation and real Internet video calls. We discuss the challenges we observed, particularly in designing realistic reward functions that reflect QoE, and in bridging the gap between the training environment and real-world networks.
Tasks
Published	2019-12-04
URL	https://arxiv.org/abs/1912.02222v1
PDF	https://arxiv.org/pdf/1912.02222v1.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-for-bandwidth
Repo
Framework

Ctrl-Z: Recovering from Instability in Reinforcement Learning


Title	Ctrl-Z: Recovering from Instability in Reinforcement Learning
Authors	Vibhavari Dasagi, Jake Bruce, Thierry Peynot, Jürgen Leitner
Abstract	When learning behavior, training data is often generated by the learner itself; this can result in unstable training dynamics, and this problem has particularly important applications in safety-sensitive real-world control tasks such as robotics. In this work, we propose a principled and model-agnostic approach to mitigate the issue of unstable learning dynamics by maintaining a history of a reinforcement learning agent over the course of training, and reverting to the parameters of a previous agent whenever performance significantly decreases. We develop techniques for evaluating this performance through statistical hypothesis testing of continued improvement, and evaluate them on a standard suite of challenging benchmark tasks involving continuous control of simulated robots. We show improvements over state-of-the-art reinforcement learning algorithms in performance and robustness to hyperparameters, outperforming DDPG in 5 out of 6 evaluation environments and showing no decrease in performance with TD3, which is known to be relatively stable. In this way, our approach takes an important step towards increasing data efficiency and stability in training for real-world robotic applications.
Tasks	Continuous Control
Published	2019-10-09
URL	https://arxiv.org/abs/1910.03732v1
PDF	https://arxiv.org/pdf/1910.03732v1.pdf
PWC	https://paperswithcode.com/paper/ctrl-z-recovering-from-instability-in
Repo
Framework

QuicK-means: Acceleration of K-means by learning a fast transform


Title	QuicK-means: Acceleration of K-means by learning a fast transform
Authors	Luc Giffon, Valentin Emiya, Liva Ralaivola, Hachem Kadri
Abstract	K-means – and the celebrated Lloyd algorithm – is more than the clustering method it was originally designed to be. It has indeed proven pivotal to help increase the speed of many machine learning and data analysis techniques such as indexing, nearest-neighbor search and prediction, data compression; its beneficial use has been shown to carry over to the acceleration of kernel machines (when using the Nystr"om method). Here, we propose a fast extension of K-means, dubbed QuicK-means, that rests on the idea of expressing the matrix of the $K$ centroids as a product of sparse matrices, a feat made possible by recent results devoted to find approximations of matrices as a product of sparse factors. Using such a decomposition squashes the complexity of the matrix-vector product between the factorized $K \times D$ centroid matrix $\mathbf{U}$ and any vector from $\mathcal{O}(K D)$ to $\mathcal{O}(A \log A+B)$, with $A=\min (K, D)$ and $B=\max (K, D)$, where $D$ is the dimension of the training data. This drastic computational saving has a direct impact in the assignment process of a point to a cluster, meaning that it is not only tangible at prediction time, but also at training time, provided the factorization procedure is performed during Lloyd’s algorithm. We precisely show that resorting to a factorization step at each iteration does not impair the convergence of the optimization scheme and that, depending on the context, it may entail a reduction of the training time. Finally, we provide discussions and numerical simulations that show the versatility of our computationally-efficient QuicK-means algorithm.
Tasks
Published	2019-08-23
URL	https://arxiv.org/abs/1908.08713v1
PDF	https://arxiv.org/pdf/1908.08713v1.pdf
PWC	https://paperswithcode.com/paper/quick-means-acceleration-of-k-means-by
Repo
Framework

Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization


Title	Knowledge Distillation For Recurrent Neural Network Language Modeling With Trust Regularization
Authors	Yangyang Shi, Mei-Yuh Hwang, Xin Lei, Haoyu Sheng
Abstract	Recurrent Neural Networks (RNNs) have dominated language modeling because of their superior performance over traditional N-gram based models. In many applications, a large Recurrent Neural Network language model (RNNLM) or an ensemble of several RNNLMs is used. These models have large memory footprints and require heavy computation. In this paper, we examine the effect of applying knowledge distillation in reducing the model size for RNNLMs. In addition, we propose a trust regularization method to improve the knowledge distillation training for RNNLMs. Using knowledge distillation with trust regularization, we reduce the parameter size to a third of that of the previously published best model while maintaining the state-of-the-art perplexity result on Penn Treebank data. In a speech recognition N-bestrescoring task, we reduce the RNNLM model size to 18.5% of the baseline system, with no degradation in word error rate(WER) performance on Wall Street Journal data set.
Tasks	Language Modelling, Speech Recognition
Published	2019-04-08
URL	http://arxiv.org/abs/1904.04163v1
PDF	http://arxiv.org/pdf/1904.04163v1.pdf
PWC	https://paperswithcode.com/paper/knowledge-distillation-for-recurrent-neural
Repo
Framework

Tag-based Semantic Features for Scene Image Classification


Title	Tag-based Semantic Features for Scene Image Classification
Authors	Chiranjibi Sitaula, Yong Xiang, Anish Basnet, Sunil Aryal, Xuequan Lu
Abstract	The existing image feature extraction methods are primarily based on the content and structure information of images, and rarely consider the contextual semantic information. Regarding some types of images such as scenes and objects, the annotations and descriptions of them available on the web may provide reliable contextual semantic information for feature extraction. In this paper, we introduce novel semantic features of an image based on the annotations and descriptions of its similar images available on the web. Specifically, we propose a new method which consists of two consecutive steps to extract our semantic features. For each image in the training set, we initially search the top $k$ most similar images from the internet and extract their annotations/descriptions (e.g., tags or keywords). The annotation information is employed to design a filter bank for each image category and generate filter words (codebook). Finally, each image is represented by the histogram of the occurrences of filter words in all categories. We evaluate the performance of the proposed features in scene image classification on three commonly-used scene image datasets (i.e., MIT-67, Scene15 and Event8). Our method typically produces a lower feature dimension than existing feature extraction methods. Experimental results show that the proposed features generate better classification accuracies than vision based and tag based features, and comparable results to deep learning based features.
Tasks	Image Classification
Published	2019-09-22
URL	https://arxiv.org/abs/1909.09999v1
PDF	https://arxiv.org/pdf/1909.09999v1.pdf
PWC	https://paperswithcode.com/paper/190909999
Repo
Framework

On the Equivalence Between Abstract Dialectical Frameworks and Logic Programs


Title	On the Equivalence Between Abstract Dialectical Frameworks and Logic Programs
Authors	João Alcântara, Samy Sá, Juan Acosta-Guadarrama
Abstract	Abstract Dialectical Frameworks (ADFs) are argumentation frameworks where each node is associated with an acceptance condition. This allows us to model different types of dependencies as supports and attacks. Previous studies provided a translation from Normal Logic Programs (NLPs) to ADFs and proved the stable models semantics for a normal logic program has an equivalent semantics to that of the corresponding ADF. However, these studies failed in identifying a semantics for ADFs equivalent to a three-valued semantics (as partial stable models and well-founded models) for NLPs. In this work, we focus on a fragment of ADFs, called Attacking Dialectical Frameworks (ADF$^+$s), and provide a translation from NLPs to ADF$^+$s robust enough to guarantee the equivalence between partial stable models, well-founded models, regular models, stable models semantics for NLPs and respectively complete models, grounded models, preferred models, stable models for ADFs. In addition, we define a new semantics for ADF$^+$s, called L-stable, and show it is equivalent to the L-stable semantics for NLPs. This paper is under consideration for acceptance in TPLP.
Tasks
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09548v1
PDF	https://arxiv.org/pdf/1907.09548v1.pdf
PWC	https://paperswithcode.com/paper/on-the-equivalence-between-abstract
Repo
Framework

So2Sat LCZ42: A Benchmark Dataset for Global Local Climate Zones Classification


Title	So2Sat LCZ42: A Benchmark Dataset for Global Local Climate Zones Classification
Authors	Xiao Xiang Zhu, Jingliang Hu, Chunping Qiu, Yilei Shi, Jian Kang, Lichao Mou, Hossein Bagheri, Matthias Häberle, Yuansheng Hua, Rong Huang, Lloyd Hughes, Hao Li, Yao Sun, Guichen Zhang, Shiyao Han, Michael Schmitt, Yuanyuan Wang
Abstract	Access to labeled reference data is one of the grand challenges in supervised machine learning endeavors. This is especially true for an automated analysis of remote sensing images on a global scale, which enables us to address global challenges such as urbanization and climate change using state-of-the-art machine learning techniques. To meet these pressing needs, especially in urban research, we provide open access to a valuable benchmark dataset named “So2Sat LCZ42,” which consists of local climate zone (LCZ) labels of about half a million Sentinel-1 and Sentinel-2 image patches in 42 urban agglomerations (plus 10 additional smaller areas) across the globe. This dataset was labeled by 15 domain experts following a carefully designed labeling work flow and evaluation process over a period of six months. As rarely done in other labeled remote sensing dataset, we conducted rigorous quality assessment by domain experts. The dataset achieved an overall confidence of 85%. We believe this LCZ dataset is a first step towards an unbiased globallydistributed dataset for urban growth monitoring using machine learning methods, because LCZ provide a rather objective measure other than many other semantic land use and land cover classifications. It provides measures of the morphology, compactness, and height of urban areas, which are less dependent on human and culture. This dataset can be accessed from http://doi.org/10.14459/2018mp1483140.
Tasks
Published	2019-12-19
URL	https://arxiv.org/abs/1912.12171v1
PDF	https://arxiv.org/pdf/1912.12171v1.pdf
PWC	https://paperswithcode.com/paper/so2sat-lcz42-a-benchmark-dataset-for-global
Repo
Framework

An Objective Evaluation Metric for image fusion based on Del Operator


Title	An Objective Evaluation Metric for image fusion based on Del Operator
Authors	Ali A. Kiaei, Hassan Khotanlou, Mahdi Abbasi, Paniz Kiaei, Yasin Bhrouzi
Abstract	In this paper, a novel objective evaluation metric for image fusion is presented. Remarkable and attractive points of the proposed metric are that it has no parameter, the result is probability in the range of [0, 1] and it is free from illumination dependence. This metric is easy to implement and the result is computed in four steps: (1) Smoothing the images using Gaussian filter. (2) Transforming images to a vector field using Del operator. (3) Computing the normal distribution function ({\mu},{\sigma}) for each corresponding pixel, and converting to the standard normal distribution function. (4) Computing the probability of being well-behaved fusion method as the result. To judge the quality of the proposed metric, it is compared to thirteen well-known non-reference objective evaluation metrics, where eight fusion methods are employed on seven experiments of multimodal medical images. The experimental results and statistical comparisons show that in contrast to the previously objective evaluation metrics the proposed one performs better in terms of both agreeing with human visual perception and evaluating fusion methods that are not performed at the same level.
Tasks
Published	2019-05-19
URL	https://arxiv.org/abs/1905.07709v2
PDF	https://arxiv.org/pdf/1905.07709v2.pdf
PWC	https://paperswithcode.com/paper/an-objective-evaluation-metric-for-image
Repo
Framework

Motion-Aware Feature for Improved Video Anomaly Detection


Title	Motion-Aware Feature for Improved Video Anomaly Detection
Authors	Yi Zhu, Shawn Newsam
Abstract	Motivated by our observation that motion information is the key to good anomaly detection performance in video, we propose a temporal augmented network to learn a motion-aware feature. This feature alone can achieve competitive performance with previous state-of-the-art methods, and when combined with them, can achieve significant performance improvements. Furthermore, we incorporate temporal context into the Multiple Instance Learning (MIL) ranking model by using an attention block. The learned attention weights can help to differentiate between anomalous and normal video segments better. With the proposed motion-aware feature and the temporal MIL ranking model, we outperform previous approaches by a large margin on both anomaly detection and anomalous action recognition tasks in the UCF Crime dataset.
Tasks	Anomaly Detection, Multiple Instance Learning
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10211v1
PDF	https://arxiv.org/pdf/1907.10211v1.pdf
PWC	https://paperswithcode.com/paper/motion-aware-feature-for-improved-video
Repo
Framework

A Static Analysis-based Cross-Architecture Performance Prediction Using Machine Learning


Title	A Static Analysis-based Cross-Architecture Performance Prediction Using Machine Learning
Authors	Newsha Ardalani, Urmish Thakker, Aws Albarghouthi, Karu Sankaralingam
Abstract	Porting code from CPU to GPU is costly and time-consuming; Unless much time is invested in development and optimization, it is not obvious, a priori, how much speed-up is achievable or how much room is left for improvement. Knowing the potential speed-up a priori can be very useful: It can save hundreds of engineering hours, help programmers with prioritization and algorithm selection. We aim to address this problem using machine learning in a supervised setting, using solely the single-threaded source code of the program, without having to run or profile the code. We propose a static analysis-based cross-architecture performance prediction framework (Static XAPP) which relies solely on program properties collected using static analysis of the CPU source code and predicts whether the potential speed-up is above or below a given threshold. We offer preliminary results that show we can achieve 94% accuracy in binary classification, in average, across different thresholds
Tasks
Published	2019-06-18
URL	https://arxiv.org/abs/1906.07840v1
PDF	https://arxiv.org/pdf/1906.07840v1.pdf
PWC	https://paperswithcode.com/paper/a-static-analysis-based-cross-architecture
Repo
Framework

POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion


Title	POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion
Authors	Wen Chen, Pipei Huang, Jiaming Xu, Xin Guo, Cheng Guo, Fei Sun, Chao Li, Andreas Pfadler, Huan Zhao, Binqiang Zhao
Abstract	Increasing demand for fashion recommendation raises a lot of challenges for online shopping platforms and fashion communities. In particular, there exist two requirements for fashion outfit recommendation: the Compatibility of the generated fashion outfits, and the Personalization in the recommendation process. In this paper, we demonstrate these two requirements can be satisfied via building a bridge between outfit generation and recommendation. Through large data analysis, we observe that people have similar tastes in individual items and outfits. Therefore, we propose a Personalized Outfit Generation (POG) model, which connects user preferences regarding individual items and outfits with Transformer architecture. Extensive offline and online experiments provide strong quantitative evidence that our method outperforms alternative methods regarding both compatibility and personalization metrics. Furthermore, we deploy POG on a platform named Dida in Alibaba to generate personalized outfits for the users of the online application iFashion. This work represents a first step towards an industrial-scale fashion outfit generation and recommendation solution, which goes beyond generating outfits based on explicit queries, or merely recommending from existing outfit pools. As part of this work, we release a large-scale dataset consisting of 1.01 million outfits with rich context information, and 0.28 billion user click actions from 3.57 million users. To the best of our knowledge, this dataset is the largest, publicly available, fashion related dataset, and the first to provide user behaviors relating to both outfits and fashion items.
Tasks
Published	2019-05-06
URL	https://arxiv.org/abs/1905.01866v3
PDF	https://arxiv.org/pdf/1905.01866v3.pdf
PWC	https://paperswithcode.com/paper/pog-personalized-outfit-generation-for
Repo
Framework

Real-Time Steganalysis for Stream Media Based on Multi-channel Convolutional Sliding Windows


Title	Real-Time Steganalysis for Stream Media Based on Multi-channel Convolutional Sliding Windows
Authors	Zhongliang Yang, Hao Yang, Yuting Hu, Yongfeng Huang, Yu-Jin Zhang
Abstract	Previous VoIP steganalysis methods face great challenges in detecting speech signals at low embedding rates, and they are also generally difficult to perform real-time detection, making them hard to truly maintain cyberspace security. To solve these two challenges, in this paper, combined with the sliding window detection algorithm and Convolution Neural Network we propose a real-time VoIP steganalysis method which based on multi-channel convolution sliding windows. In order to analyze the correlations between frames and different neighborhood frames in a VoIP signal, we define multi channel sliding detection windows. Within each sliding window, we design two feature extraction channels which contain multiple convolution layers with multiple convolution kernels each layer to extract correlation features of the input signal. Then based on these extracted features, we use a forward fully connected network for feature fusion. Finally, by analyzing the statistical distribution of these features, the discriminator will determine whether the input speech signal contains covert information or not.We designed several experiments to test the proposed model’s detection ability under various conditions, including different embedding rates, different speech length, etc. Experimental results showed that the proposed model outperforms all the previous methods, especially in the case of low embedding rate, which showed state-of-the-art performance. In addition, we also tested the detection efficiency of the proposed model, and the results showed that it can achieve almost real-time detection of VoIP speech signals.
Tasks	Window Detection
Published	2019-02-04
URL	http://arxiv.org/abs/1902.01286v1
PDF	http://arxiv.org/pdf/1902.01286v1.pdf
PWC	https://paperswithcode.com/paper/real-time-steganalysis-for-stream-media-based
Repo
Framework

Embedding Comparator: Visualizing Differences in Global Structure and Local Neighborhoods via Small Multiples


Title	Embedding Comparator: Visualizing Differences in Global Structure and Local Neighborhoods via Small Multiples
Authors	Angie Boggust, Brandon Carter, Arvind Satyanarayan
Abstract	Embeddings – mappings from high-dimensional discrete input to lower-dimensional continuous vector spaces – have been widely adopted in machine learning, linguistics, and computational biology as they often surface interesting and unexpected domain semantics. Through semi-structured interviews with embedding model researchers and practitioners, we find that current tools poorly support a central concern: comparing different embeddings when developing fairer, more robust models. In response, we present the Embedding Comparator, an interactive system that balances gaining an overview of the embedding spaces with making fine-grained comparisons of local neighborhoods. For a pair of models, we compute the similarity of the k-nearest neighbors of every embedded object, and visualize the results as Local Neighborhood Dominoes: small multiples that facilitate rapid comparisons. Using case studies, we illustrate the types of insights the Embedding Comparator reveals including how fine-tuning embeddings changes semantics, how language changes over time, and how training data differences affect two seemingly similar models.
Tasks
Published	2019-12-10
URL	https://arxiv.org/abs/1912.04853v1
PDF	https://arxiv.org/pdf/1912.04853v1.pdf
PWC	https://paperswithcode.com/paper/embedding-comparator-visualizing-differences
Repo
Framework

Deep Matrix Factorization with Spectral Geometric Regularization


Title	Deep Matrix Factorization with Spectral Geometric Regularization
Authors	Amit Boyarski, Sanketh Vedula, Alex Bronstein
Abstract	Deep Matrix Factorization (DMF) is an emerging approach to the problem of reconstructing a matrix from a subset of its entries. Recent works have established that gradient descent applied to a DMF model induces an implicit regularization on the rank of the recovered matrix. Despite these promising theoretical results, empirical evaluation of vanilla DMF on real benchmarks exhibits poor reconstructions which we attribute to the extremely low number of samples available. We propose an explicit spectral regularization scheme that is able to make DMF models competitive on real benchmarks, while still maintaining the implicit regularization induced by gradient descent, thus enjoying the best of both worlds.
Tasks	Matrix Completion, Recommendation Systems
Published	2019-11-17
URL	https://arxiv.org/abs/1911.07255v2
PDF	https://arxiv.org/pdf/1911.07255v2.pdf
PWC	https://paperswithcode.com/paper/deep-geometric-matrix-completion-are-we-doing-1
Repo
Framework

Gradient-only line searches: An Alternative to Probabilistic Line Searches


Title	Gradient-only line searches: An Alternative to Probabilistic Line Searches
Authors	Dominic Kafka, Daniel Wilke
Abstract	Step sizes in neural network training are largely determined using predetermined rules such as fixed learning rates and learning rate schedules, which require user input to determine their functional form and associated hyperparameters. Global optimization strategies to resolve these hyperparameters are computationally expensive. Line searches are capable of adaptively resolving learning rate schedules. However, due to discontinuities induced by mini-batch sampling, they have largely fallen out of favor. Notwithstanding, probabilistic line searches have recently demonstrated viability in resolving learning rates for stochastic loss functions. This method creates surrogates with confidence intervals, where restrictions are placed on the rate at which the search domain can grow along a search direction. This paper introduces an alternative paradigm, Gradient-Only Line Searches that are inexact (GOLS-I), as an alternative strategy to automatically resolve learning rates in stochastic cost functions over a range of 15 orders of magnitude without the use of surrogates. We show that GOLS-I is a competitive strategy to reliably resolve step sizes, adding high value in terms of performance, while being easy to implement. Considering mini-batch sampling, we open the discussion on how to split the effort to resolve quality search directions from quality step size estimates along a search direction.
Tasks
Published	2019-03-22
URL	http://arxiv.org/abs/1903.09383v1
PDF	http://arxiv.org/pdf/1903.09383v1.pdf
PWC	https://paperswithcode.com/paper/gradient-only-line-searches-an-alternative-to
Repo
Framework