October 20, 2019

3003 words 15 mins read

Paper Group AWR 209

SBNet: Sparse Blocks Network for Fast Inference. Learning Controllable Fair Representations. Nonlinear Acceleration of CNNs. Adversarially Learned One-Class Classifier for Novelty Detection. Polynomial Regression As an Alternative to Neural Nets. Adaptive Scenario Discovery for Crowd Counting. Detecting Gang-Involved Escalation on Social Media Usin …

SBNet: Sparse Blocks Network for Fast Inference


Title	SBNet: Sparse Blocks Network for Fast Inference
Authors	Mengye Ren, Andrei Pokrovsky, Bin Yang, Raquel Urtasun
Abstract	Conventional deep convolutional neural networks (CNNs) apply convolution operators uniformly in space across all feature maps for hundreds of layers - this incurs a high computational cost for real-time applications. For many problems such as object detection and semantic segmentation, we are able to obtain a low-cost computation mask, either from a priori problem knowledge, or from a low-resolution segmentation network. We show that such computation masks can be used to reduce computation in the high-resolution main network. Variants of sparse activation CNNs have previously been explored on small-scale tasks and showed no degradation in terms of object classification accuracy, but often measured gains in terms of theoretical FLOPs without realizing a practical speed-up when compared to highly optimized dense convolution implementations. In this work, we leverage the sparsity structure of computation masks and propose a novel tiling-based sparse convolution algorithm. We verified the effectiveness of our sparse CNN on LiDAR-based 3D object detection, and we report significant wall-clock speed-ups compared to dense convolution without noticeable loss of accuracy.
Tasks	3D Object Detection, Object Classification, Object Detection, Semantic Segmentation
Published	2018-01-07
URL	http://arxiv.org/abs/1801.02108v2
PDF	http://arxiv.org/pdf/1801.02108v2.pdf
PWC	https://paperswithcode.com/paper/sbnet-sparse-blocks-network-for-fast
Repo	https://github.com/uber/sbnet
Framework	tf

Learning Controllable Fair Representations


Title	Learning Controllable Fair Representations
Authors	Jiaming Song, Pratyusha Kalluri, Aditya Grover, Shengjia Zhao, Stefano Ermon
Abstract	Learning data representations that are transferable and are fair with respect to certain protected attributes is crucial to reducing unfair decisions while preserving the utility of the data. We propose an information-theoretically motivated objective for learning maximally expressive representations subject to fairness constraints. We demonstrate that a range of existing approaches optimize approximations to the Lagrangian dual of our objective. In contrast to these existing approaches, our objective allows the user to control the fairness of the representations by specifying limits on unfairness. Exploiting duality, we introduce a method that optimizes the model parameters as well as the expressiveness-fairness trade-off. Empirical evidence suggests that our proposed method can balance the trade-off between multiple notions of fairness and achieves higher expressiveness at a lower computational cost.
Tasks
Published	2018-12-11
URL	https://arxiv.org/abs/1812.04218v3
PDF	https://arxiv.org/pdf/1812.04218v3.pdf
PWC	https://paperswithcode.com/paper/learning-controllable-fair-representations
Repo	https://github.com/ermongroup/lag-fairness
Framework	tf

Nonlinear Acceleration of CNNs


Title	Nonlinear Acceleration of CNNs
Authors	Damien Scieur, Edouard Oyallon, Alexandre d’Aspremont, Francis Bach
Abstract	The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to neural networks, in particular for the task of image recognition on CIFAR10 and ImageNet. With very few modifications of exiting frameworks, RNA improves slightly the optimization process of CNNs, after training.
Tasks
Published	2018-06-01
URL	http://arxiv.org/abs/1806.00370v1
PDF	http://arxiv.org/pdf/1806.00370v1.pdf
PWC	https://paperswithcode.com/paper/nonlinear-acceleration-of-cnns
Repo	https://github.com/windows7lover/RegularizedNonlinearAcceleration
Framework	pytorch

Adversarially Learned One-Class Classifier for Novelty Detection


Title	Adversarially Learned One-Class Classifier for Novelty Detection
Authors	Mohammad Sabokrou, Mohammad Khalooei, Mahmood Fathy, Ehsan Adeli
Abstract	Novelty detection is the process of identifying the observation(s) that differ in some respect from the training observations (the target class). In reality, the novelty class is often absent during training, poorly sampled or not well defined. Therefore, one-class classifiers can efficiently model such problems. However, due to the unavailability of data from the novelty class, training an end-to-end deep network is a cumbersome task. In this paper, inspired by the success of generative adversarial networks for training deep models in unsupervised and semi-supervised settings, we propose an end-to-end architecture for one-class classification. Our architecture is composed of two deep networks, each of which trained by competing with each other while collaborating to understand the underlying concept in the target class, and then classify the testing samples. One network works as the novelty detector, while the other supports it by enhancing the inlier samples and distorting the outliers. The intuition is that the separability of the enhanced inliers and distorted outliers is much better than deciding on the original samples. The proposed framework applies to different related applications of anomaly and outlier detection in images and videos. The results on MNIST and Caltech-256 image datasets, along with the challenging UCSD Ped2 dataset for video anomaly detection illustrate that our proposed method learns the target class effectively and is superior to the baseline and state-of-the-art methods.
Tasks	Anomaly Detection, One-class classifier, Outlier Detection
Published	2018-02-25
URL	http://arxiv.org/abs/1802.09088v2
PDF	http://arxiv.org/pdf/1802.09088v2.pdf
PWC	https://paperswithcode.com/paper/adversarially-learned-one-class-classifier
Repo	https://github.com/jingkunchen/StyleGAN
Framework	none

Polynomial Regression As an Alternative to Neural Nets


Title	Polynomial Regression As an Alternative to Neural Nets
Authors	Xi Cheng, Bohdan Khomtchouk, Norman Matloff, Pete Mohanty
Abstract	Despite the success of neural networks (NNs), there is still a concern among many over their “black box” nature. Why do they work? Here we present a simple analytic argument that NNs are in fact essentially polynomial regression models. This view will have various implications for NNs, e.g. providing an explanation for why convergence problems arise in NNs, and it gives rough guidance on avoiding overfitting. In addition, we use this phenomenon to predict and confirm a multicollinearity property of NNs not previously reported in the literature. Most importantly, given this loose correspondence, one may choose to routinely use polynomial models instead of NNs, thus avoiding some major problems of the latter, such as having to set many tuning parameters and dealing with convergence issues. We present a number of empirical results; in each case, the accuracy of the polynomial approach matches or exceeds that of NN approaches. A many-featured, open-source software package, polyreg, is available.
Tasks
Published	2018-06-13
URL	http://arxiv.org/abs/1806.06850v3
PDF	http://arxiv.org/pdf/1806.06850v3.pdf
PWC	https://paperswithcode.com/paper/polynomial-regression-as-an-alternative-to
Repo	https://github.com/mac-theobio/Lab_meeting
Framework	none

Adaptive Scenario Discovery for Crowd Counting


Title	Adaptive Scenario Discovery for Crowd Counting
Authors	Xingjiao Wu, Yingbin Zheng, Hao Ye, Wenxin Hu, Jing Yang, Liang He
Abstract	Crowd counting, i.e., estimation number of the pedestrian in crowd images, is emerging as an important research problem with the public security applications. A key component for the crowd counting systems is the construction of counting models which are robust to various scenarios under facts such as camera perspective and physical barriers. In this paper, we present an adaptive scenario discovery framework for crowd counting. The system is structured with two parallel pathways that are trained with different sizes of the receptive field to represent different scales and crowd densities. After ensuring that these components are present in the proper geometric configuration, a third branch is designed to adaptively recalibrate the pathway-wise responses by discovering and modeling the dynamic scenarios implicitly. Our system is able to represent highly variable crowd images and achieves state-of-the-art results in two challenging benchmarks.
Tasks	Crowd Counting
Published	2018-12-06
URL	http://arxiv.org/abs/1812.02393v2
PDF	http://arxiv.org/pdf/1812.02393v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-scenario-discovery-for-crowd
Repo	https://github.com/pxq0312/ASD-crowd-counting
Framework	pytorch


Title	Detecting Gang-Involved Escalation on Social Media Using Context
Authors	Serina Chang, Ruiqi Zhong, Ethan Adams, Fei-Tzin Lee, Siddharth Varia, Desmond Patton, William Frey, Chris Kedzie, Kathleen McKeown
Abstract	Gang-involved youth in cities such as Chicago have increasingly turned to social media to post about their experiences and intents online. In some situations, when they experience the loss of a loved one, their online expression of emotion may evolve into aggression towards rival gangs and ultimately into real-world violence. In this paper, we present a novel system for detecting Aggression and Loss in social media. Our system features the use of domain-specific resources automatically derived from a large unlabeled corpus, and contextual representations of the emotional and semantic content of the user’s recent tweets as well as their interactions with other users. Incorporating context in our Convolutional Neural Network (CNN) leads to a significant improvement.
Tasks
Published	2018-09-10
URL	http://arxiv.org/abs/1809.03632v1
PDF	http://arxiv.org/pdf/1809.03632v1.pdf
PWC	https://paperswithcode.com/paper/detecting-gang-involved-escalation-on-social
Repo	https://github.com/serinachang5/contextifier
Framework	none

Learning Deep Disentangled Embeddings with the F-Statistic Loss


Title	Learning Deep Disentangled Embeddings with the F-Statistic Loss
Authors	Karl Ridgeway, Michael C. Mozer
Abstract	Deep-embedding methods aim to discover representations of a domain that make explicit the domain’s class structure and thereby support few-shot learning. Disentangling methods aim to make explicit compositional or factorial structure. We combine these two active but independent lines of research and propose a new paradigm suitable for both goals. We propose and evaluate a novel loss function based on the $F$ statistic, which describes the separation of two or more distributions. By ensuring that distinct classes are well separated on a subset of embedding dimensions, we obtain embeddings that are useful for few-shot learning. By not requiring separation on all dimensions, we encourage the discovery of disentangled representations. Our embedding method matches or beats state-of-the-art, as evaluated by performance on recall@$k$ and few-shot learning tasks. Our method also obtains performance superior to a variety of alternatives on disentangling, as evaluated by two key properties of a disentangled representation: modularity and explicitness. The goal of our work is to obtain more interpretable, manipulable, and generalizable deep representations of concepts and categories.
Tasks	Few-Shot Learning
Published	2018-02-14
URL	http://arxiv.org/abs/1802.05312v2
PDF	http://arxiv.org/pdf/1802.05312v2.pdf
PWC	https://paperswithcode.com/paper/learning-deep-disentangled-embeddings-with
Repo	https://github.com/kridgeway/f-statistic-loss-nips-2018
Framework	tf

SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation


Title	SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation
Authors	Xinyi Wang, Hieu Pham, Zihang Dai, Graham Neubig
Abstract	In this work, we examine methods for data augmentation for text-based tasks such as neural machine translation (NMT). We formulate the design of a data augmentation policy with desirable properties as an optimization problem, and derive a generic analytic solution. This solution not only subsumes some existing augmentation schemes, but also leads to an extremely simple data augmentation strategy for NMT: randomly replacing words in both the source sentence and the target sentence with other random words from their corresponding vocabularies. We name this method SwitchOut. Experiments on three translation datasets of different scales show that SwitchOut yields consistent improvements of about 0.5 BLEU, achieving better or comparable performances to strong alternatives such as word dropout (Sennrich et al., 2016a). Code to implement this method is included in the appendix.
Tasks	Data Augmentation, Machine Translation
Published	2018-08-22
URL	http://arxiv.org/abs/1808.07512v2
PDF	http://arxiv.org/pdf/1808.07512v2.pdf
PWC	https://paperswithcode.com/paper/switchout-an-efficient-data-augmentation
Repo	https://github.com/Waino/OpenNMT-py
Framework	pytorch

SIPs: Succinct Interest Points from Unsupervised Inlierness Probability Learning


Title	SIPs: Succinct Interest Points from Unsupervised Inlierness Probability Learning
Authors	Titus Cieslewski, Konstantinos G. Derpanis, Davide Scaramuzza
Abstract	A wide range of computer vision algorithms rely on identifying sparse interest points in images and establishing correspondences between them. However, only a subset of the initially identified interest points results in true correspondences (inliers). In this paper, we seek a detector that finds the minimum number of points that are likely to result in an application-dependent “sufficient” number of inliers k. To quantify this goal, we introduce the “k-succinctness” metric. Extracting a minimum number of interest points is attractive for many applications, because it can reduce computational load, memory, and data transmission. Alongside succinctness, we introduce an unsupervised training methodology for interest point detectors that is based on predicting the probability of a given pixel being an inlier. In comparison to previous learned detectors, our method requires the least amount of data pre-processing. Our detector and other state-of-the-art detectors are extensively evaluated with respect to succinctness on popular public datasets covering both indoor and outdoor scenes, and both wide and narrow baselines. In certain cases, our detector is able to obtain an equivalent amount of inliers with as little as 60% of the amount of points of other detectors. The code and trained networks are provided at https://github.com/uzh-rpg/sips2_open .
Tasks	Interest Point Detection, Pose Estimation, Visual Odometry
Published	2018-05-03
URL	https://arxiv.org/abs/1805.01358v2
PDF	https://arxiv.org/pdf/1805.01358v2.pdf
PWC	https://paperswithcode.com/paper/sips-unsupervised-succinct-interest-points
Repo	https://github.com/uzh-rpg/sips2_open
Framework	tf

Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation


Title	Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation
Authors	Chenze Shao, Yang Feng, Xilin Chen
Abstract	Neural machine translation (NMT) models are usually trained with the word-level loss using the teacher forcing algorithm, which not only evaluates the translation improperly but also suffers from exposure bias. Sequence-level training under the reinforcement framework can mitigate the problems of the word-level loss, but its performance is unstable due to the high variance of the gradient estimation. On these grounds, we present a method with a differentiable sequence-level training objective based on probabilistic n-gram matching which can avoid the reinforcement framework. In addition, this method performs greedy search in the training which uses the predicted words as context just as at inference to alleviate the problem of exposure bias. Experiment results on the NIST Chinese-to-English translation tasks show that our method significantly outperforms the reinforcement-based algorithms and achieves an improvement of 1.5 BLEU points on average over a strong baseline system.
Tasks	Machine Translation
Published	2018-09-10
URL	http://arxiv.org/abs/1809.03132v1
PDF	http://arxiv.org/pdf/1809.03132v1.pdf
PWC	https://paperswithcode.com/paper/greedy-search-with-probabilistic-n-gram
Repo	https://github.com/ictnlp/GS4NMT
Framework	pytorch

Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph


Title	Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph
Authors	Zhuoren Jiang, Yue Yin, Liangcai Gao, Yao Lu, Xiaozhong Liu
Abstract	While the volume of scholarly publications has increased at a frenetic pace, accessing and consuming the useful candidate papers, in very large digital libraries, is becoming an essential and challenging task for scholars. Unfortunately, because of language barrier, some scientists (especially the junior ones or graduate students who do not master other languages) cannot efficiently locate the publications hosted in a foreign language repository. In this study, we propose a novel solution, cross-language citation recommendation via Hierarchical Representation Learning on Heterogeneous Graph (HRLHG), to address this new problem. HRLHG can learn a representation function by mapping the publications, from multilingual repositories, to a low-dimensional joint embedding space from various kinds of vertexes and relations on a heterogeneous graph. By leveraging both global (task specific) plus local (task independent) information as well as a novel supervised hierarchical random walk algorithm, the proposed method can optimize the publication representations by maximizing the likelihood of locating the important cross-language neighborhoods on the graph. Experiment results show that the proposed method can not only outperform state-of-the-art baseline models, but also improve the interpretability of the representation model for cross-language citation recommendation task.
Tasks	Representation Learning
Published	2018-12-31
URL	http://arxiv.org/abs/1812.11709v1
PDF	http://arxiv.org/pdf/1812.11709v1.pdf
PWC	https://paperswithcode.com/paper/cross-language-citation-recommendation-via
Repo	https://github.com/GraphEmbedding/HRLHG
Framework	none

Quadratic Decomposable Submodular Function Minimization


Title	Quadratic Decomposable Submodular Function Minimization
Authors	Pan Li, Niao He, Olgica Milenkovic
Abstract	We introduce a new convex optimization problem, termed quadratic decomposable submodular function minimization. The problem is closely related to decomposable submodular function minimization and arises in many learning on graphs and hypergraphs settings, such as graph-based semi-supervised learning and PageRank. We approach the problem via a new dual strategy and describe an objective that may be optimized via random coordinate descent (RCD) methods and projections onto cones. We also establish the linear convergence rate of the RCD algorithm and develop efficient projection algorithms with provable performance guarantees. Numerical experiments in semi-supervised learning on hypergraphs confirm the efficiency of the proposed algorithm and demonstrate the significant improvements in prediction accuracy with respect to state-of-the-art methods.
Tasks
Published	2018-06-26
URL	http://arxiv.org/abs/1806.09842v3
PDF	http://arxiv.org/pdf/1806.09842v3.pdf
PWC	https://paperswithcode.com/paper/quadratic-decomposable-submodular-function
Repo	https://github.com/lipan00123/QDSDM
Framework	none

Adversarial Training Versus Weight Decay


Title	Adversarial Training Versus Weight Decay
Authors	Angus Galloway, Thomas Tanay, Graham W. Taylor
Abstract	Performance-critical machine learning models should be robust to input perturbations not seen during training. Adversarial training is a method for improving a model’s robustness to some perturbations by including them in the training process, but this tends to exacerbate other vulnerabilities of the model. The adversarial training framework has the effect of translating the data with respect to the cost function, while weight decay has a scaling effect. Although weight decay could be considered a crude regularization technique, it appears superior to adversarial training as it remains stable over a broader range of regimes and reduces all generalization errors. Equipped with these abstractions, we provide key baseline results and methodology for characterizing robustness. The two approaches can be combined to yield one small model that demonstrates good robustness to several white-box attacks associated with different metrics.
Tasks
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03308v3
PDF	http://arxiv.org/pdf/1804.03308v3.pdf
PWC	https://paperswithcode.com/paper/adversarial-training-versus-weight-decay
Repo	https://github.com/uoguelph-mlrg/adversarial_training_vs_weight_decay
Framework	tf

Virtual Codec Supervised Re-Sampling Network for Image Compression


Title	Virtual Codec Supervised Re-Sampling Network for Image Compression
Authors	Lijun Zhao, Huihui Bai, Anhong Wang, Yao Zhao
Abstract	In this paper, we propose an image re-sampling compression method by learning virtual codec network (VCN) to resolve the non-differentiable problem of quantization function for image compression. Here, the image re-sampling not only refers to image full-resolution re-sampling but also low-resolution re-sampling. We generalize this method for standard-compliant image compression (SCIC) framework and deep neural networks based compression (DNNC) framework. Specifically, an input image is measured by re-sampling network (RSN) network to get re-sampled vectors. Then, these vectors are directly quantized in the feature space in SCIC, or discrete cosine transform coefficients of these vectors are quantized to further improve coding efficiency in DNNC. At the encoder, the quantized vectors or coefficients are losslessly compressed by arithmetic coding. At the receiver, the decoded vectors are utilized to restore input image by image decoder network (IDN). In order to train RSN network and IDN network together in an end-to-end fashion, our VCN network intimates projection from the re-sampled vectors to the IDN-decoded image. As a result, gradients from IDN network to RSN network can be approximated by VCN network’s gradient. Because dimension reduction can be further achieved by quantization in some dimensional space after image re-sampling within auto-encoder architecture, we can well initialize our networks from pre-trained auto-encoder networks. Through extensive experiments and analysis, it is verified that the proposed method has more effectiveness and versatility than many state-of-the-art approaches.
Tasks	Dimensionality Reduction, Image Compression, Quantization
Published	2018-06-22
URL	http://arxiv.org/abs/1806.08514v2
PDF	http://arxiv.org/pdf/1806.08514v2.pdf
PWC	https://paperswithcode.com/paper/virtual-codec-supervised-re-sampling-network
Repo	https://github.com/mdcnn/mdcnn.github.io
Framework	none