October 17, 2019

3005 words 15 mins read

Paper Group ANR 911

Multi-Fiber Networks for Video Recognition. Residual Networks as Geodesic Flows of Diffeomorphisms. Round-Table Group Optimization for Sequencing Problems. On Controllable Sparse Alternatives to Softmax. Robust Neural Abstractive Summarization Systems and Evaluation against Adversarial Information. The Roles of Supervised Machine Learning in System …

Multi-Fiber Networks for Video Recognition


Title	Multi-Fiber Networks for Video Recognition
Authors	Yunpeng Chen, Yannis Kalantidis, Jianshu Li, Shuicheng Yan, Jiashi Feng
Abstract	In this paper, we aim to reduce the computational cost of spatio-temporal deep neural networks, making them run as fast as their 2D counterparts while preserving state-of-the-art accuracy on video recognition benchmarks. To this end, we present the novel Multi-Fiber architecture that slices a complex neural network into an ensemble of lightweight networks or fibers that run through the network. To facilitate information flow between fibers we further incorporate multiplexer modules and end up with an architecture that reduces the computational cost of 3D networks by an order of magnitude, while increasing recognition performance at the same time. Extensive experimental results show that our multi-fiber architecture significantly boosts the efficiency of existing convolution networks for both image and video recognition tasks, achieving state-of-the-art performance on UCF-101, HMDB-51 and Kinetics datasets. Our proposed model requires over 9x and 13x less computations than the I3D and R(2+1)D models, respectively, yet providing higher accuracy.
Tasks	Video Recognition
Published	2018-07-30
URL	http://arxiv.org/abs/1807.11195v3
PDF	http://arxiv.org/pdf/1807.11195v3.pdf
PWC	https://paperswithcode.com/paper/multi-fiber-networks-for-video-recognition
Repo
Framework

Residual Networks as Geodesic Flows of Diffeomorphisms


Title	Residual Networks as Geodesic Flows of Diffeomorphisms
Authors	Francois Rousseau, Ronan Fablet
Abstract	This paper addresses the understanding and characterization of residual networks (ResNet), which are among the state-of-the-art deep learning architectures for a variety of supervised learning problems. We focus on the mapping component of ResNets, which map the embedding space towards a new unknown space where the prediction or classification can be stated according to linear criteria. We show that this mapping component can be regarded as the numerical implementation of continuous flows of diffeomorphisms governed by ordinary differential equations. Especially, ResNets with shared weights are fully characterized as numerical approximation of exponential diffeomorphic operators. We stress both theoretically and numerically the relevance of the enforcement of diffeormorphic properties and the importance of numerical issues to make consistent the continuous formulation and the discretized ResNet implementation. We further discuss the resulting theoretical and computational insights on ResNet architectures.
Tasks
Published	2018-05-24
URL	http://arxiv.org/abs/1805.09585v2
PDF	http://arxiv.org/pdf/1805.09585v2.pdf
PWC	https://paperswithcode.com/paper/residual-networks-as-geodesic-flows-of
Repo
Framework

Round-Table Group Optimization for Sequencing Problems


Title	Round-Table Group Optimization for Sequencing Problems
Authors	Xiao-Feng Xie
Abstract	In this paper, a round-table group optimization (RTGO) algorithm is presented. RTGO is a simple metaheuristic framework using the insights of research on group creativity. In a cooperative group, the agents work in iterative sessions to search innovative ideas in a common problem landscape. Each agent has one base idea stored in its individual memory, and one social idea fed by a round-table group support mechanism in each session. The idea combination and improvement processes are respectively realized by using a recombination search (XS) strategy and a local search (LS) strategy, to build on the base and social ideas. RTGO is then implemented for solving two difficult sequencing problems, i.e., the flowshop scheduling problem and the quadratic assignment problem. The domain-specific LS strategies are adopted from existing algorithms, whereas a general XS class, called socially biased combination (SBX), is realized in a modular form. The performance of RTGO is then evaluated on commonly-used benchmark datasets. Good performance on different problems can be achieved by RTGO using appropriate SBX operators. Furthermore, RTGO is able to outperform some existing methods, including methods using the same LS strategies.
Tasks
Published	2018-08-07
URL	http://arxiv.org/abs/1808.02185v1
PDF	http://arxiv.org/pdf/1808.02185v1.pdf
PWC	https://paperswithcode.com/paper/round-table-group-optimization-for-sequencing
Repo
Framework

On Controllable Sparse Alternatives to Softmax


Title	On Controllable Sparse Alternatives to Softmax
Authors	Anirban Laha, Saneem A. Chemmengath, Priyanka Agrawal, Mitesh M. Khapra, Karthik Sankaranarayanan, Harish G. Ramaswamy
Abstract	Converting an n-dimensional vector to a probability distribution over n objects is a commonly used component in many machine learning tasks like multiclass classification, multilabel classification, attention mechanisms etc. For this, several probability mapping functions have been proposed and employed in literature such as softmax, sum-normalization, spherical softmax, and sparsemax, but there is very little understanding in terms how they relate with each other. Further, none of the above formulations offer an explicit control over the degree of sparsity. To address this, we develop a unified framework that encompasses all these formulations as special cases. This framework ensures simple closed-form solutions and existence of sub-gradients suitable for learning via backpropagation. Within this framework, we propose two novel sparse formulations, sparsegen-lin and sparsehourglass, that seek to provide a control over the degree of desired sparsity. We further develop novel convex loss functions that help induce the behavior of aforementioned formulations in the multilabel classification setting, showing improved performance. We also demonstrate empirically that the proposed formulations, when used to compute attention weights, achieve better or comparable performance on standard seq2seq tasks like neural machine translation and abstractive summarization.
Tasks	Abstractive Text Summarization, Machine Translation
Published	2018-10-29
URL	http://arxiv.org/abs/1810.11975v2
PDF	http://arxiv.org/pdf/1810.11975v2.pdf
PWC	https://paperswithcode.com/paper/on-controllable-sparse-alternatives-to
Repo
Framework

Robust Neural Abstractive Summarization Systems and Evaluation against Adversarial Information


Title	Robust Neural Abstractive Summarization Systems and Evaluation against Adversarial Information
Authors	Lisa Fan, Dong Yu, Lu Wang
Abstract	Sequence-to-sequence (seq2seq) neural models have been actively investigated for abstractive summarization. Nevertheless, existing neural abstractive systems frequently generate factually incorrect summaries and are vulnerable to adversarial information, suggesting a crucial lack of semantic understanding. In this paper, we propose a novel semantic-aware neural abstractive summarization model that learns to generate high quality summaries through semantic interpretation over salient content. A novel evaluation scheme with adversarial samples is introduced to measure how well a model identifies off-topic information, where our model yields significantly better performance than the popular pointer-generator summarizer. Human evaluation also confirms that our system summaries are uniformly more informative and faithful as well as less redundant than the seq2seq model.
Tasks	Abstractive Text Summarization
Published	2018-10-14
URL	http://arxiv.org/abs/1810.06065v1
PDF	http://arxiv.org/pdf/1810.06065v1.pdf
PWC	https://paperswithcode.com/paper/robust-neural-abstractive-summarization
Repo
Framework

The Roles of Supervised Machine Learning in Systems Neuroscience


Title	The Roles of Supervised Machine Learning in Systems Neuroscience
Authors	Joshua I. Glaser, Ari S. Benjamin, Roozbeh Farhoodi, Konrad P. Kording
Abstract	Over the last several years, the use of machine learning (ML) in neuroscience has been rapidly increasing. Here, we review ML’s contributions, both realized and potential, across several areas of systems neuroscience. We describe four primary roles of ML within neuroscience: 1) creating solutions to engineering problems, 2) identifying predictive variables, 3) setting benchmarks for simple models of the brain, and 4) serving itself as a model for the brain. The breadth and ease of its applicability suggests that machine learning should be in the toolbox of most systems neuroscientists.
Tasks
Published	2018-05-21
URL	http://arxiv.org/abs/1805.08239v2
PDF	http://arxiv.org/pdf/1805.08239v2.pdf
PWC	https://paperswithcode.com/paper/the-roles-of-supervised-machine-learning-in
Repo
Framework

Investigating performance of neural networks and gradient boosting models approximating microscopic traffic simulations in traffic optimization tasks


Title	Investigating performance of neural networks and gradient boosting models approximating microscopic traffic simulations in traffic optimization tasks
Authors	Paweł Gora, Maciej Brzeski, Marcin Możejko, Arkadiusz Klemenko, Adrian Kochański
Abstract	We analyze the accuracy of traffic simulations metamodels based on neural networks and gradient boosting models (LightGBM), applied to traffic optimization as fitness functions of genetic algorithms. Our metamodels approximate outcomes of traffic simulations (the total time of waiting on a red signal) taking as an input different traffic signal settings, in order to efficiently find (sub)optimal settings. Their accuracy was proven to be very good on randomly selected test sets, but it turned out that the accuracy may drop in case of settings expected (according to genetic algorithms) to be close to local optima, which makes the traffic optimization process more difficult. In this work, we investigate 16 different metamodels and 20 settings of genetic algorithms, in order to understand what are the reasons of this phenomenon, what is its scale, how it can be mitigated and what can be potentially done to design better real-time traffic optimization methods.
Tasks
Published	2018-12-02
URL	http://arxiv.org/abs/1812.00401v3
PDF	http://arxiv.org/pdf/1812.00401v3.pdf
PWC	https://paperswithcode.com/paper/investigating-performance-of-neural-networks
Repo
Framework

Stronger generalization bounds for deep nets via a compression approach


Title	Stronger generalization bounds for deep nets via a compression approach
Authors	Sanjeev Arora, Rong Ge, Behnam Neyshabur, Yi Zhang
Abstract	Deep nets generalize well despite having more parameters than the number of training samples. Recent works try to give an explanation using PAC-Bayes and Margin-based analyses, but do not as yet result in sample complexity bounds better than naive parameter counting. The current paper shows generalization bounds that’re orders of magnitude better in practice. These rely upon new succinct reparametrizations of the trained net — a compression that is explicit and efficient. These yield generalization bounds via a simple compression-based framework introduced here. Our results also provide some theoretical justification for widespread empirical success in compressing deep nets. Analysis of correctness of our compression relies upon some newly identified \textquotedblleft noise stability\textquotedblright properties of trained deep nets, which are also experimentally verified. The study of these properties and resulting generalization bounds are also extended to convolutional nets, which had eluded earlier attempts on proving generalization.
Tasks
Published	2018-02-14
URL	http://arxiv.org/abs/1802.05296v4
PDF	http://arxiv.org/pdf/1802.05296v4.pdf
PWC	https://paperswithcode.com/paper/stronger-generalization-bounds-for-deep-nets
Repo
Framework

Residual Codean Autoencoder for Facial Attribute Analysis


Title	Residual Codean Autoencoder for Facial Attribute Analysis
Authors	Akshay Sethi, Maneet Singh, Richa Singh, Mayank Vatsa
Abstract	Facial attributes can provide rich ancillary information which can be utilized for different applications such as targeted marketing, human computer interaction, and law enforcement. This research focuses on facial attribute prediction using a novel deep learning formulation, termed as R-Codean autoencoder. The paper first presents Cosine similarity based loss function in an autoencoder which is then incorporated into the Euclidean distance based autoencoder to formulate R-Codean. The proposed loss function thus aims to incorporate both magnitude and direction of image vectors during feature learning. Further, inspired by the utility of shortcut connections in deep models to facilitate learning of optimal parameters, without incurring the problem of vanishing gradient, the proposed formulation is extended to incorporate shortcut connections in the architecture. The proposed R-Codean autoencoder is utilized in facial attribute prediction framework which incorporates patch-based weighting mechanism for assigning higher weights to relevant patches for each attribute. The experimental results on publicly available CelebA and LFWA datasets demonstrate the efficacy of the proposed approach in addressing this challenging problem.
Tasks
Published	2018-03-20
URL	http://arxiv.org/abs/1803.07386v1
PDF	http://arxiv.org/pdf/1803.07386v1.pdf
PWC	https://paperswithcode.com/paper/residual-codean-autoencoder-for-facial
Repo
Framework

Categorical Aspects of Parameter Learning


Title	Categorical Aspects of Parameter Learning
Authors	Bart Jacobs
Abstract	Parameter learning is the technique for obtaining the probabilistic parameters in conditional probability tables in Bayesian networks from tables with (observed) data — where it is assumed that the underlying graphical structure is known. There are basically two ways of doing so, referred to as maximal likelihood estimation (MLE) and as Bayesian learning. This paper provides a categorical analysis of these two techniques and describes them in terms of basic properties of the multiset monad M, the distribution monad D and the Giry monad G. In essence, learning is about the reltionships between multisets (used for counting) on the one hand and probability distributions on the other. These relationsips will be described as suitable natural transformations.
Tasks
Published	2018-10-13
URL	http://arxiv.org/abs/1810.05814v1
PDF	http://arxiv.org/pdf/1810.05814v1.pdf
PWC	https://paperswithcode.com/paper/categorical-aspects-of-parameter-learning
Repo
Framework

A Parallel/Distributed Algorithmic Framework for Mining All Quantitative Association Rules


Title	A Parallel/Distributed Algorithmic Framework for Mining All Quantitative Association Rules
Authors	Ioannis T. Christou, Emmanouil Amolochitis, Zheng-Hua Tan
Abstract	We present QARMA, an efficient novel parallel algorithm for mining all Quantitative Association Rules in large multidimensional datasets where items are required to have at least a single common attribute to be specified in the rules single consequent item. Given a minimum support level and a set of threshold criteria of interestingness measures such as confidence, conviction etc. our algorithm guarantees the generation of all non-dominated Quantitative Association Rules that meet the minimum support and interestingness requirements. Such rules can be of great importance to marketing departments seeking to optimize targeted campaigns, or general market segmentation. They can also be of value in medical applications, financial as well as predictive maintenance domains. We provide computational results showing the scalability of our algorithm, and its capability to produce all rules to be found in large scale synthetic and real world datasets such as Movie Lens, within a few seconds or minutes of computational time on commodity hardware.
Tasks
Published	2018-04-18
URL	http://arxiv.org/abs/1804.06764v1
PDF	http://arxiv.org/pdf/1804.06764v1.pdf
PWC	https://paperswithcode.com/paper/a-paralleldistributed-algorithmic-framework
Repo
Framework

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization


Title	A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization
Authors	Zhize Li, Jian Li
Abstract	We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly non-differentiable but convex component. We propose a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+. Our main contribution lies in the analysis of ProxSVRG+. It recovers several existing convergence results and improves/generalizes them (in terms of the number of stochastic gradient oracle calls and proximal oracle calls). In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by [Lei et al., 2017] for the smooth nonconvex case. ProxSVRG+ is also more straightforward than SCSG and yields simpler analysis. Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, which partially solves an open problem proposed in [Reddi et al., 2016b]. Also, ProxSVRG+ uses much less proximal oracle calls than ProxSVRG [Reddi et al., 2016b]. Moreover, for nonconvex functions satisfied Polyak-\L{}ojasiewicz condition, we prove that ProxSVRG+ achieves a global linear convergence rate without restart unlike ProxSVRG. Thus, it can \emph{automatically} switch to the faster linear convergence in some regions as long as the objective function satisfies the PL condition locally in these regions. ProxSVRG+ also improves ProxGD and ProxSVRG/SAGA, and generalizes the results of SCSG in this case. Finally, we conduct several experiments and the experimental results are consistent with the theoretical results.
Tasks
Published	2018-02-13
URL	http://arxiv.org/abs/1802.04477v4
PDF	http://arxiv.org/pdf/1802.04477v4.pdf
PWC	https://paperswithcode.com/paper/a-simple-proximal-stochastic-gradient-method
Repo
Framework

Fine-Grained Age Estimation in the wild with Attention LSTM Networks


Title	Fine-Grained Age Estimation in the wild with Attention LSTM Networks
Authors	Ke Zhang, Na Liu, Xingfang Yuan, Xinyao Guo, Ce Gao, Zhenbing Zhao, Zhanyu Ma
Abstract	Age estimation from a single face image has been an essential task in the field of human-computer interaction and computer vision, which has a wide range of practical application values. Accuracy of age estimation of face images in the wild is relatively low for existing methods, because they only take into account the global features, while neglecting the fine-grained features of age-sensitive areas. We propose a novel method based on our attention long short-term memory (AL) network for fine-grained age estimation in the wild, inspired by the fine-grained categories and the visual attention mechanism. This method combines the residual networks (ResNets) or the residual network of residual network (RoR) models with LSTM units to construct AL-ResNets or AL-RoR networks to extract local features of age-sensitive regions, which effectively improves the age estimation accuracy. First, a ResNets or a RoR model pretrained on ImageNet dataset is selected as the basic model, which is then fine-tuned on the IMDB-WIKI-101 dataset for age estimation. Then, we fine-tune the ResNets or the RoR on the target age datasets to extract the global features of face images. To extract the local features of age-sensitive regions, the LSTM unit is then presented to obtain the coordinates of the agesensitive region automatically. Finally, the age group classification is conducted directly on the Adience dataset, and age-regression experiments are performed by the Deep EXpectation algorithm (DEX) on MORPH Album 2, FG-NET and 15/16LAP datasets. By combining the global and the local features, we obtain our final prediction results. Experimental results illustrate the effectiveness and robustness of the proposed AL-ResNets or AL-RoR for age estimation in the wild, where it achieves better state-of-the-art performance than all other convolutional neural network.
Tasks	Age Estimation
Published	2018-05-26
URL	https://arxiv.org/abs/1805.10445v2
PDF	https://arxiv.org/pdf/1805.10445v2.pdf
PWC	https://paperswithcode.com/paper/fine-grained-age-estimation-in-the-wild-with
Repo
Framework

On the total variation regularized estimator over a class of tree graphs


Title	On the total variation regularized estimator over a class of tree graphs
Authors	Francesco Ortelli, Sara van de Geer
Abstract	We generalize to tree graphs obtained by connecting path graphs an oracle result obtained for the Fused Lasso over the path graph. Moreover we show that it is possible to substitute in the oracle inequality the minimum of the distances between jumps by their harmonic mean. In doing so we prove a lower bound on the compatibility constant for the total variation penalty. Our analysis leverages insights obtained for the path graph with one branch to understand the case of more general tree graphs. As a side result, we get insights into the irrepresentable condition for such tree graphs.
Tasks
Published	2018-06-04
URL	https://arxiv.org/abs/1806.01009v3
PDF	https://arxiv.org/pdf/1806.01009v3.pdf
PWC	https://paperswithcode.com/paper/on-the-total-variation-regularized-estimator
Repo
Framework

Non-Convex Matrix Completion Against a Semi-Random Adversary


Title	Non-Convex Matrix Completion Against a Semi-Random Adversary
Authors	Yu Cheng, Rong Ge
Abstract	Matrix completion is a well-studied problem with many machine learning applications. In practice, the problem is often solved by non-convex optimization algorithms. However, the current theoretical analysis for non-convex algorithms relies heavily on the assumption that every entry is observed with exactly the same probability $p$, which is not realistic in practice. In this paper, we investigate a more realistic semi-random model, where the probability of observing each entry is at least $p$. Even with this mild semi-random perturbation, we can construct counter-examples where existing non-convex algorithms get stuck in bad local optima. In light of the negative results, we propose a pre-processing step that tries to re-weight the semi-random input, so that it becomes “similar” to a random input. We give a nearly-linear time algorithm for this problem, and show that after our pre-processing, all the local minima of the non-convex objective can be used to approximately recover the underlying ground-truth matrix.
Tasks	Matrix Completion
Published	2018-03-28
URL	http://arxiv.org/abs/1803.10846v2
PDF	http://arxiv.org/pdf/1803.10846v2.pdf
PWC	https://paperswithcode.com/paper/non-convex-matrix-completion-against-a-semi
Repo
Framework