January 26, 2020

3172 words 15 mins read

Paper Group ANR 1547

Time-Dynamic Estimates of the Reliability of Deep Semantic Segmentation Networks. Towards Successful Social Media Advertising: Predicting the Influence of Commercial Tweets. Predicting Choice with Set-Dependent Aggregation. Extracting robust and accurate features via a robust information bottleneck. Hybrid-Attention based Decoupled Metric Learning …

Time-Dynamic Estimates of the Reliability of Deep Semantic Segmentation Networks


Title	Time-Dynamic Estimates of the Reliability of Deep Semantic Segmentation Networks
Authors	Kira Maag, Matthias Rottmann, Hanno Gottschalk
Abstract	In the semantic segmentation of street scenes, the reliability of a prediction is of highest interest. The assessment of neural networks by means of uncertainties is a common ansatz to prevent safety issues. As in online applications like automated driving, a video stream of images is available, we present a time-dynamical approach to investigate uncertainties and assess the prediction quality of neural networks.To this end, we track segments over time and gather aggregated metrics per segment, e.g. mean dispersion metrics derived from the softmax output and segment sizes. Due to identifying segments over consecutive frames, we obtain time series of metrics from which we assess prediction quality. We do so by either classifying between intersection over union (IoU) = 0 and IoU > 0 (meta classification) or predicting the IoU directly (meta regression). In our tests, we analyze the influence of the length of the time series on the predictive power of metrics and study different models for meta classification and regression. We use two publicly available DeepLabv3+ networks as well as two street scene datasets, i.e., VIPER as a synthetic one and KITTI based on real data. We achieve classification accuracies of up to 81.20% and AUROC values of up to 88.68% for the task of meta classification. For meta regression we obtain $R^2$ values of up to 87.51%. We show that these results yield improvements compared to other approaches.
Tasks	Semantic Segmentation, Time Series
Published	2019-11-12
URL	https://arxiv.org/abs/1911.05075v1
PDF	https://arxiv.org/pdf/1911.05075v1.pdf
PWC	https://paperswithcode.com/paper/time-dynamic-estimates-of-the-reliability-of
Repo
Framework


Title	Towards Successful Social Media Advertising: Predicting the Influence of Commercial Tweets
Authors	Renhao Cui, Gagan Agrawal, Rajiv Ramnath
Abstract	Businesses communicate using Twitter for a variety of reasons – to raise awareness of their brands, to market new products, to respond to community comments, and to connect with their customers and potential customers in a targeted manner. For businesses to do this effectively, they need to understand which content and structural elements about a tweet make it influential, that is, widely liked, followed, and retweeted. This paper presents a systematic methodology for analyzing commercial tweets, and predicting the influence on their readers. Our model, which use a combination of decoration and meta features, outperforms the prediction ability of the baseline model as well as the tweet embedding model. Further, in order to demonstrate a practical use of this work, we show how an unsuccessful tweet may be engineered (for example, reworded) to increase its potential for success.
Tasks
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12446v1
PDF	https://arxiv.org/pdf/1910.12446v1.pdf
PWC	https://paperswithcode.com/paper/towards-successful-social-media-advertising
Repo
Framework

Predicting Choice with Set-Dependent Aggregation


Title	Predicting Choice with Set-Dependent Aggregation
Authors	Nir Rosenfeld, Kojin Oshiba, Yaron Singer
Abstract	Providing users with alternatives to choose from is an essential component in many online platforms, making the accurate prediction of choice vital to their success. A renewed interest in learning choice models has led to significant progress in modeling power, but most current methods are either limited in the types of choice behavior they capture, cannot be applied to large-scale data, or both. Here we propose a learning framework for predicting choice that is accurate, versatile, theoretically grounded, and scales well. Our key modeling point is that to account for how humans choose, predictive models must capture certain set-related invariances. Building on recent results in economics, we derive a class of models that can express any behavioral choice pattern, enjoy favorable sample complexity guarantees, and can be efficiently trained end-to-end. Experiments on three large choice datasets demonstrate the utility of our approach.
Tasks
Published	2019-06-14
URL	https://arxiv.org/abs/1906.06365v2
PDF	https://arxiv.org/pdf/1906.06365v2.pdf
PWC	https://paperswithcode.com/paper/predicting-choice-with-set-dependent
Repo
Framework

Extracting robust and accurate features via a robust information bottleneck


Title	Extracting robust and accurate features via a robust information bottleneck
Authors	Ankit Pensia, Varun Jog, Po-Ling Loh
Abstract	We propose a novel strategy for extracting features in supervised learning that can be used to construct a classifier which is more robust to small perturbations in the input space. Our method builds upon the idea of the information bottleneck by introducing an additional penalty term that encourages the Fisher information of the extracted features to be small, when parametrized by the inputs. By tuning the regularization parameter, we can explicitly trade off the opposing desiderata of robustness and accuracy when constructing a classifier. We derive the optimal solution to the robust information bottleneck when the inputs and outputs are jointly Gaussian, proving that the optimally robust features are also jointly Gaussian in that setting. Furthermore, we propose a method for optimizing a variational bound on the robust information bottleneck objective in general settings using stochastic gradient descent, which may be implemented efficiently in neural networks. Our experimental results for synthetic and real data sets show that the proposed feature extraction method indeed produces classifiers with increased robustness to perturbations.
Tasks
Published	2019-10-15
URL	https://arxiv.org/abs/1910.06893v1
PDF	https://arxiv.org/pdf/1910.06893v1.pdf
PWC	https://paperswithcode.com/paper/extracting-robust-and-accurate-features-via-a
Repo
Framework

Hybrid-Attention based Decoupled Metric Learning for Zero-Shot Image Retrieval


Title	Hybrid-Attention based Decoupled Metric Learning for Zero-Shot Image Retrieval
Authors	Binghui Chen, Weihong Deng
Abstract	In zero-shot image retrieval (ZSIR) task, embedding learning becomes more attractive, however, many methods follow the traditional metric learning idea and omit the problems behind zero-shot settings. In this paper, we first emphasize the importance of learning visual discriminative metric and preventing the partial/selective learning behavior of learner in ZSIR, and then propose the Decoupled Metric Learning (DeML) framework to achieve these individually. Instead of coarsely optimizing an unified metric, we decouple it into multiple attention-specific parts so as to recurrently induce the discrimination and explicitly enhance the generalization. And they are mainly achieved by our object-attention module based on random walk graph propagation and the channel-attention module based on the adversary constraint, respectively. We demonstrate the necessity of addressing the vital problems in ZSIR on the popular benchmarks, outperforming the state-of-theart methods by a significant margin. Code is available at http://www.bhchen.cn
Tasks	Image Retrieval, Metric Learning
Published	2019-07-27
URL	https://arxiv.org/abs/1907.11832v1
PDF	https://arxiv.org/pdf/1907.11832v1.pdf
PWC	https://paperswithcode.com/paper/hybrid-attention-based-decoupled-metric-1
Repo
Framework

A Data Efficient and Feasible Level Set Method for Stochastic Convex Optimization with Expectation Constraints


Title	A Data Efficient and Feasible Level Set Method for Stochastic Convex Optimization with Expectation Constraints
Authors	Qihang Lin, Selvaprabu Nadarajah, Negar Soheili, Tianbao Yang
Abstract	Stochastic convex optimization problems with expectation constraints (SOECs) are encountered in statistics and machine learning, business, and engineering. In data-rich environments, the SOEC objective and constraints contain expectations defined with respect to large datasets. Therefore, efficient algorithms for solving such SOECs need to limit the fraction of data points that they use, which we refer to as algorithmic data complexity. Recent stochastic first order methods exhibit low data complexity when handling SOECs but guarantee near-feasibility and near-optimality only at convergence. These methods may thus return highly infeasible solutions when heuristically terminated, as is often the case, due to theoretical convergence criteria being highly conservative. This issue limits the use of first order methods in several applications where the SOEC constraints encode implementation requirements. We design a stochastic feasible level set method (SFLS) for SOECs that has low data complexity and emphasizes feasibility before convergence. Specifically, our level-set method solves a root-finding problem by calling a novel first order oracle that computes a stochastic upper bound on the level-set function by extending mirror descent and online validation techniques. We establish that SFLS maintains a high-probability feasible solution at each root-finding iteration and exhibits favorable iteration complexity compared to state-of-the-art deterministic feasible level set and stochastic subgradient methods. Numerical experiments on three diverse applications validate the low data complexity of SFLS relative to the former approach and highlight how SFLS finds feasible solutions with small optimality gaps significantly faster than the latter method.
Tasks
Published	2019-08-07
URL	https://arxiv.org/abs/1908.03077v2
PDF	https://arxiv.org/pdf/1908.03077v2.pdf
PWC	https://paperswithcode.com/paper/a-data-efficient-and-feasible-level-set
Repo
Framework

Skip-Clip: Self-Supervised Spatiotemporal Representation Learning by Future Clip Order Ranking


Title	Skip-Clip: Self-Supervised Spatiotemporal Representation Learning by Future Clip Order Ranking
Authors	Alaaeldin El-Nouby, Shuangfei Zhai, Graham W. Taylor, Joshua M. Susskind
Abstract	Deep neural networks require collecting and annotating large amounts of data to train successfully. In order to alleviate the annotation bottleneck, we propose a novel self-supervised representation learning approach for spatiotemporal features extracted from videos. We introduce Skip-Clip, a method that utilizes temporal coherence in videos, by training a deep model for future clip order ranking conditioned on a context clip as a surrogate objective for video future prediction. We show that features learned using our method are generalizable and transfer strongly to downstream tasks. For action recognition on the UCF101 dataset, we obtain 51.8% improvement over random initialization and outperform models initialized using inflated ImageNet parameters. Skip-Clip also achieves results competitive with state-of-the-art self-supervision methods.
Tasks	Future prediction, Representation Learning
Published	2019-10-28
URL	https://arxiv.org/abs/1910.12770v1
PDF	https://arxiv.org/pdf/1910.12770v1.pdf
PWC	https://paperswithcode.com/paper/skip-clip-self-supervised-spatiotemporal
Repo
Framework

A Parallel Projection Method for Metric Constrained Optimization


Title	A Parallel Projection Method for Metric Constrained Optimization
Authors	Cameron Ruggles, Nate Veldt, David F. Gleich
Abstract	Many clustering applications in machine learning and data mining rely on solving metric-constrained optimization problems. These problems are characterized by $O(n^3)$ constraints that enforce triangle inequalities on distance variables associated with $n$ objects in a large dataset. Despite its usefulness, metric-constrained optimization is challenging in practice due to the cubic number of constraints and the high-memory requirements of standard optimization software. Recent work has shown that iterative projection methods are able to solve metric-constrained optimization problems on a much larger scale than was previously possible, thanks to their comparatively low memory requirement. However, the major limitation of projection methods is their slow convergence rate. In this paper we present a parallel projection method for metric-constrained optimization which allows us to speed up the convergence rate in practice. The key to our approach is a new parallel execution schedule that allows us to perform projections at multiple metric constraints simultaneously without any conflicts or locking of variables. We illustrate the effectiveness of this execution schedule by implementing and testing a parallel projection method for solving the metric-constrained linear programming relaxation of correlation clustering. We show numerous experimental results on problems involving up to 2.9 trillion constraints.
Tasks
Published	2019-01-29
URL	http://arxiv.org/abs/1901.10084v1
PDF	http://arxiv.org/pdf/1901.10084v1.pdf
PWC	https://paperswithcode.com/paper/a-parallel-projection-method-for-metric
Repo
Framework

Multi-Stage Fault Warning for Large Electric Grids Using Anomaly Detection and Machine Learning


Title	Multi-Stage Fault Warning for Large Electric Grids Using Anomaly Detection and Machine Learning
Authors	Sanjeev Raja, Ernest Fokoué
Abstract	In the monitoring of a complex electric grid, it is of paramount importance to provide operators with early warnings of anomalies detected on the network, along with a precise classification and diagnosis of the specific fault type. In this paper, we propose a novel multi-stage early warning system prototype for electric grid fault detection, classification, subgroup discovery, and visualization. In the first stage, a computationally efficient anomaly detection method based on quartiles detects the presence of a fault in real time. In the second stage, the fault is classified into one of nine pre-defined disaster scenarios. The time series data are first mapped to highly discriminative features by applying dimensionality reduction based on temporal autocorrelation. The features are then mapped through one of three classification techniques: support vector machine, random forest, and artificial neural network. Finally in the third stage, intra-class clustering based on dynamic time warping is used to characterize the fault with further granularity. Results on the Bonneville Power Administration electric grid data show that i) the proposed anomaly detector is both fast and accurate; ii) dimensionality reduction leads to dramatic improvement in classification accuracy and speed; iii) the random forest method offers the most accurate, consistent, and robust fault classification; and iv) time series within a given class naturally separate into five distinct clusters which correspond closely to the geographical distribution of electric grid buses.
Tasks	Anomaly Detection, Dimensionality Reduction, Fault Detection, Time Series
Published	2019-03-15
URL	http://arxiv.org/abs/1903.06700v1
PDF	http://arxiv.org/pdf/1903.06700v1.pdf
PWC	https://paperswithcode.com/paper/multi-stage-fault-warning-for-large-electric
Repo
Framework

Uncertainty on Asynchronous Time Event Prediction


Title	Uncertainty on Asynchronous Time Event Prediction
Authors	Marin Biloš, Bertrand Charpentier, Stephan Günnemann
Abstract	Asynchronous event sequences are the basis of many applications throughout different industries. In this work, we tackle the task of predicting the next event (given a history), and how this prediction changes with the passage of time. Since at some time points (e.g. predictions far into the future) we might not be able to predict anything with confidence, capturing uncertainty in the predictions is crucial. We present two new architectures, WGP-LN and FD-Dir, modelling the evolution of the distribution on the probability simplex with time-dependent logistic normal and Dirichlet distributions. In both cases, the combination of RNNs with either Gaussian process or function decomposition allows to express rich temporal evolution of the distribution parameters, and naturally captures uncertainty. Experiments on class prediction, time prediction and anomaly detection demonstrate the high performances of our models on various datasets compared to other approaches.
Tasks	Anomaly Detection
Published	2019-11-13
URL	https://arxiv.org/abs/1911.05503v2
PDF	https://arxiv.org/pdf/1911.05503v2.pdf
PWC	https://paperswithcode.com/paper/uncertainty-on-asynchronous-time-event-1
Repo
Framework

(Pen-) Ultimate DNN Pruning


Title	(Pen-) Ultimate DNN Pruning
Authors	Marc Riera, Jose-Maria Arnau, Antonio Gonzalez
Abstract	DNN pruning reduces memory footprint and computational work of DNN-based solutions to improve performance and energy-efficiency. An effective pruning scheme should be able to systematically remove connections and/or neurons that are unnecessary or redundant, reducing the DNN size without any loss in accuracy. In this paper we show that prior pruning schemes require an extremely time-consuming iterative process that requires retraining the DNN many times to tune the pruning hyperparameters. We propose a DNN pruning scheme based on Principal Component Analysis and relative importance of each neuron’s connection that automatically finds the optimized DNN in one shot without requiring hand-tuning of multiple parameters.
Tasks
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02535v1
PDF	https://arxiv.org/pdf/1906.02535v1.pdf
PWC	https://paperswithcode.com/paper/pen-ultimate-dnn-pruning
Repo
Framework

IMS-Speech: A Speech to Text Tool


Title	IMS-Speech: A Speech to Text Tool
Authors	Pavel Denisov, Ngoc Thang Vu
Abstract	We present the IMS-Speech, a web based tool for German and English speech transcription aiming to facilitate research in various disciplines which require accesses to lexical information in spoken language materials. This tool is based on modern open source software stack, advanced speech recognition methods and public data resources and is freely available for academic researchers. The utilized models are built to be generic in order to provide transcriptions of competitive accuracy on a diverse set of tasks and conditions.
Tasks	Speech Recognition
Published	2019-08-13
URL	https://arxiv.org/abs/1908.04743v1
PDF	https://arxiv.org/pdf/1908.04743v1.pdf
PWC	https://paperswithcode.com/paper/ims-speech-a-speech-to-text-tool
Repo
Framework

GDP: Generalized Device Placement for Dataflow Graphs


Title	GDP: Generalized Device Placement for Dataflow Graphs
Authors	Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter C. Ma, Qiumin Xu, Ming Zhong, Hanxiao Liu, Anna Goldie, Azalia Mirhoseini, James Laudon
Abstract	Runtime and scalability of large neural networks can be significantly affected by the placement of operations in their dataflow graphs on suitable devices. With increasingly complex neural network architectures and heterogeneous device characteristics, finding a reasonable placement is extremely challenging even for domain experts. Most existing automated device placement approaches are impractical due to the significant amount of compute required and their inability to generalize to new, previously held-out graphs. To address both limitations, we propose an efficient end-to-end method based on a scalable sequential attention mechanism over a graph neural network that is transferable to new graphs. On a diverse set of representative deep learning models, including Inception-v3, AmoebaNet, Transformer-XL, and WaveNet, our method on average achieves 16% improvement over human experts and 9.2% improvement over the prior art with 15 times faster convergence. To further reduce the computation cost, we pre-train the policy network on a set of dataflow graphs and use a superposition network to fine-tune it on each individual graph, achieving state-of-the-art performance on large hold-out graphs with over 50k nodes, such as an 8-layer GNMT.
Tasks
Published	2019-09-28
URL	https://arxiv.org/abs/1910.01578v1
PDF	https://arxiv.org/pdf/1910.01578v1.pdf
PWC	https://paperswithcode.com/paper/gdp-generalized-device-placement-for-dataflow
Repo
Framework

A Study of BFLOAT16 for Deep Learning Training


Title	A Study of BFLOAT16 for Deep Learning Training
Authors	Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, Pradeep Dubey
Abstract	This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for Deep Learning training across image classification, speech recognition, language modeling, generative networks and industrial recommendation systems. BFLOAT16 is attractive for Deep Learning training for two reasons: the range of values it can represent is the same as that of IEEE 754 floating-point format (FP32) and conversion to/from FP32 is simple. Maintaining the same range as FP32 is important to ensure that no hyper-parameter tuning is required for convergence; e.g., IEEE 754 compliant half-precision floating point (FP16) requires hyper-parameter tuning. In this paper, we discuss the flow of tensors and various key operations in mixed precision training, and delve into details of operations, such as the rounding modes for converting FP32 tensors to BFLOAT16. We have implemented a method to emulate BFLOAT16 operations in Tensorflow, Caffe2, IntelCaffe, and Neon for our experiments. Our results show that deep learning training using BFLOAT16 tensors achieves the same state-of-the-art (SOTA) results across domains as FP32 tensors in the same number of iterations and with no changes to hyper-parameters.
Tasks	Image Classification, Language Modelling, Recommendation Systems, Speech Recognition
Published	2019-05-29
URL	https://arxiv.org/abs/1905.12322v3
PDF	https://arxiv.org/pdf/1905.12322v3.pdf
PWC	https://paperswithcode.com/paper/a-study-of-bfloat16-for-deep-learning
Repo
Framework

An Efficient Approach for Using Expectation Maximization Algorithm in Capsule Networks


Title	An Efficient Approach for Using Expectation Maximization Algorithm in Capsule Networks
Authors	Moein Hasani, Amin Nasim Saravi, Hassan Khotanlou
Abstract	Capsule Networks (CapsNets) are brand-new architectures that have shown ground-breaking results in certain areas of Computer Vision (CV). In 2017, Hinton and his team introduced CapsNets with routing-by-agreement in “Sabour et al” and in a more recent paper “Matrix Capsules with EM Routing” they proposed a more complete architecture with Expectation-Maximization (EM) algorithm. Unlike the traditional convolutional neural networks (CNNs), this architecture is able to preserve the pose of the objects in the picture. Due to this characteristic, it has been able to beat the previous state-of-theart results on the smallNORB dataset, which includes samples with various view points. Also, this architecture is more robust to white box adversarial attacks. However, CapsNets have two major drawbacks. They can’t perform as well as CNNs on complex datasets and, they need a huge amount of time for training. We try to mitigate these shortcomings by finding optimum settings of EM routing iterations for training CapsNets. Unlike the past studies, we use un-equal numbers of EM routing iterations for different stages of the CapsNet. For our research, we use three datasets: Yale face dataset, Belgium Traffic Sign dataset, and Fashion-MNIST dataset.
Tasks
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05333v2
PDF	https://arxiv.org/pdf/1912.05333v2.pdf
PWC	https://paperswithcode.com/paper/an-efficient-approach-for-using-expectation
Repo
Framework