Paper Group AWR 209
SBNet: Sparse Blocks Network for Fast Inference. Learning Controllable Fair Representations. Nonlinear Acceleration of CNNs. Adversarially Learned One-Class Classifier for Novelty Detection. Polynomial Regression As an Alternative to Neural Nets. Adaptive Scenario Discovery for Crowd Counting. Detecting Gang-Involved Escalation on Social Media Usin …
SBNet: Sparse Blocks Network for Fast Inference
Title | SBNet: Sparse Blocks Network for Fast Inference |
Authors | Mengye Ren, Andrei Pokrovsky, Bin Yang, Raquel Urtasun |
Abstract | Conventional deep convolutional neural networks (CNNs) apply convolution operators uniformly in space across all feature maps for hundreds of layers - this incurs a high computational cost for real-time applications. For many problems such as object detection and semantic segmentation, we are able to obtain a low-cost computation mask, either from a priori problem knowledge, or from a low-resolution segmentation network. We show that such computation masks can be used to reduce computation in the high-resolution main network. Variants of sparse activation CNNs have previously been explored on small-scale tasks and showed no degradation in terms of object classification accuracy, but often measured gains in terms of theoretical FLOPs without realizing a practical speed-up when compared to highly optimized dense convolution implementations. In this work, we leverage the sparsity structure of computation masks and propose a novel tiling-based sparse convolution algorithm. We verified the effectiveness of our sparse CNN on LiDAR-based 3D object detection, and we report significant wall-clock speed-ups compared to dense convolution without noticeable loss of accuracy. |
Tasks | 3D Object Detection, Object Classification, Object Detection, Semantic Segmentation |
Published | 2018-01-07 |
URL | http://arxiv.org/abs/1801.02108v2 |
http://arxiv.org/pdf/1801.02108v2.pdf | |
PWC | https://paperswithcode.com/paper/sbnet-sparse-blocks-network-for-fast |
Repo | https://github.com/uber/sbnet |
Framework | tf |
Learning Controllable Fair Representations
Title | Learning Controllable Fair Representations |
Authors | Jiaming Song, Pratyusha Kalluri, Aditya Grover, Shengjia Zhao, Stefano Ermon |
Abstract | Learning data representations that are transferable and are fair with respect to certain protected attributes is crucial to reducing unfair decisions while preserving the utility of the data. We propose an information-theoretically motivated objective for learning maximally expressive representations subject to fairness constraints. We demonstrate that a range of existing approaches optimize approximations to the Lagrangian dual of our objective. In contrast to these existing approaches, our objective allows the user to control the fairness of the representations by specifying limits on unfairness. Exploiting duality, we introduce a method that optimizes the model parameters as well as the expressiveness-fairness trade-off. Empirical evidence suggests that our proposed method can balance the trade-off between multiple notions of fairness and achieves higher expressiveness at a lower computational cost. |
Tasks | |
Published | 2018-12-11 |
URL | https://arxiv.org/abs/1812.04218v3 |
https://arxiv.org/pdf/1812.04218v3.pdf | |
PWC | https://paperswithcode.com/paper/learning-controllable-fair-representations |
Repo | https://github.com/ermongroup/lag-fairness |
Framework | tf |
Nonlinear Acceleration of CNNs
Title | Nonlinear Acceleration of CNNs |
Authors | Damien Scieur, Edouard Oyallon, Alexandre d’Aspremont, Francis Bach |
Abstract | The Regularized Nonlinear Acceleration (RNA) algorithm is an acceleration method capable of improving the rate of convergence of many optimization schemes such as gradient descend, SAGA or SVRG. Until now, its analysis is limited to convex problems, but empirical observations shows that RNA may be extended to wider settings. In this paper, we investigate further the benefits of RNA when applied to neural networks, in particular for the task of image recognition on CIFAR10 and ImageNet. With very few modifications of exiting frameworks, RNA improves slightly the optimization process of CNNs, after training. |
Tasks | |
Published | 2018-06-01 |
URL | http://arxiv.org/abs/1806.00370v1 |
http://arxiv.org/pdf/1806.00370v1.pdf | |
PWC | https://paperswithcode.com/paper/nonlinear-acceleration-of-cnns |
Repo | https://github.com/windows7lover/RegularizedNonlinearAcceleration |
Framework | pytorch |
Adversarially Learned One-Class Classifier for Novelty Detection
Title | Adversarially Learned One-Class Classifier for Novelty Detection |
Authors | Mohammad Sabokrou, Mohammad Khalooei, Mahmood Fathy, Ehsan Adeli |
Abstract | Novelty detection is the process of identifying the observation(s) that differ in some respect from the training observations (the target class). In reality, the novelty class is often absent during training, poorly sampled or not well defined. Therefore, one-class classifiers can efficiently model such problems. However, due to the unavailability of data from the novelty class, training an end-to-end deep network is a cumbersome task. In this paper, inspired by the success of generative adversarial networks for training deep models in unsupervised and semi-supervised settings, we propose an end-to-end architecture for one-class classification. Our architecture is composed of two deep networks, each of which trained by competing with each other while collaborating to understand the underlying concept in the target class, and then classify the testing samples. One network works as the novelty detector, while the other supports it by enhancing the inlier samples and distorting the outliers. The intuition is that the separability of the enhanced inliers and distorted outliers is much better than deciding on the original samples. The proposed framework applies to different related applications of anomaly and outlier detection in images and videos. The results on MNIST and Caltech-256 image datasets, along with the challenging UCSD Ped2 dataset for video anomaly detection illustrate that our proposed method learns the target class effectively and is superior to the baseline and state-of-the-art methods. |
Tasks | Anomaly Detection, One-class classifier, Outlier Detection |
Published | 2018-02-25 |
URL | http://arxiv.org/abs/1802.09088v2 |
http://arxiv.org/pdf/1802.09088v2.pdf | |
PWC | https://paperswithcode.com/paper/adversarially-learned-one-class-classifier |
Repo | https://github.com/jingkunchen/StyleGAN |
Framework | none |
Polynomial Regression As an Alternative to Neural Nets
Title | Polynomial Regression As an Alternative to Neural Nets |
Authors | Xi Cheng, Bohdan Khomtchouk, Norman Matloff, Pete Mohanty |
Abstract | Despite the success of neural networks (NNs), there is still a concern among many over their “black box” nature. Why do they work? Here we present a simple analytic argument that NNs are in fact essentially polynomial regression models. This view will have various implications for NNs, e.g. providing an explanation for why convergence problems arise in NNs, and it gives rough guidance on avoiding overfitting. In addition, we use this phenomenon to predict and confirm a multicollinearity property of NNs not previously reported in the literature. Most importantly, given this loose correspondence, one may choose to routinely use polynomial models instead of NNs, thus avoiding some major problems of the latter, such as having to set many tuning parameters and dealing with convergence issues. We present a number of empirical results; in each case, the accuracy of the polynomial approach matches or exceeds that of NN approaches. A many-featured, open-source software package, polyreg, is available. |
Tasks | |
Published | 2018-06-13 |
URL | http://arxiv.org/abs/1806.06850v3 |
http://arxiv.org/pdf/1806.06850v3.pdf | |
PWC | https://paperswithcode.com/paper/polynomial-regression-as-an-alternative-to |
Repo | https://github.com/mac-theobio/Lab_meeting |
Framework | none |
Adaptive Scenario Discovery for Crowd Counting
Title | Adaptive Scenario Discovery for Crowd Counting |
Authors | Xingjiao Wu, Yingbin Zheng, Hao Ye, Wenxin Hu, Jing Yang, Liang He |
Abstract | Crowd counting, i.e., estimation number of the pedestrian in crowd images, is emerging as an important research problem with the public security applications. A key component for the crowd counting systems is the construction of counting models which are robust to various scenarios under facts such as camera perspective and physical barriers. In this paper, we present an adaptive scenario discovery framework for crowd counting. The system is structured with two parallel pathways that are trained with different sizes of the receptive field to represent different scales and crowd densities. After ensuring that these components are present in the proper geometric configuration, a third branch is designed to adaptively recalibrate the pathway-wise responses by discovering and modeling the dynamic scenarios implicitly. Our system is able to represent highly variable crowd images and achieves state-of-the-art results in two challenging benchmarks. |
Tasks | Crowd Counting |
Published | 2018-12-06 |
URL | http://arxiv.org/abs/1812.02393v2 |
http://arxiv.org/pdf/1812.02393v2.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-scenario-discovery-for-crowd |
Repo | https://github.com/pxq0312/ASD-crowd-counting |
Framework | pytorch |
Detecting Gang-Involved Escalation on Social Media Using Context
Title | Detecting Gang-Involved Escalation on Social Media Using Context |
Authors | Serina Chang, Ruiqi Zhong, Ethan Adams, Fei-Tzin Lee, Siddharth Varia, Desmond Patton, William Frey, Chris Kedzie, Kathleen McKeown |
Abstract | Gang-involved youth in cities such as Chicago have increasingly turned to social media to post about their experiences and intents online. In some situations, when they experience the loss of a loved one, their online expression of emotion may evolve into aggression towards rival gangs and ultimately into real-world violence. In this paper, we present a novel system for detecting Aggression and Loss in social media. Our system features the use of domain-specific resources automatically derived from a large unlabeled corpus, and contextual representations of the emotional and semantic content of the user’s recent tweets as well as their interactions with other users. Incorporating context in our Convolutional Neural Network (CNN) leads to a significant improvement. |
Tasks | |
Published | 2018-09-10 |
URL | http://arxiv.org/abs/1809.03632v1 |
http://arxiv.org/pdf/1809.03632v1.pdf | |
PWC | https://paperswithcode.com/paper/detecting-gang-involved-escalation-on-social |
Repo | https://github.com/serinachang5/contextifier |
Framework | none |
Learning Deep Disentangled Embeddings with the F-Statistic Loss
Title | Learning Deep Disentangled Embeddings with the F-Statistic Loss |
Authors | Karl Ridgeway, Michael C. Mozer |
Abstract | Deep-embedding methods aim to discover representations of a domain that make explicit the domain’s class structure and thereby support few-shot learning. Disentangling methods aim to make explicit compositional or factorial structure. We combine these two active but independent lines of research and propose a new paradigm suitable for both goals. We propose and evaluate a novel loss function based on the $F$ statistic, which describes the separation of two or more distributions. By ensuring that distinct classes are well separated on a subset of embedding dimensions, we obtain embeddings that are useful for few-shot learning. By not requiring separation on all dimensions, we encourage the discovery of disentangled representations. Our embedding method matches or beats state-of-the-art, as evaluated by performance on recall@$k$ and few-shot learning tasks. Our method also obtains performance superior to a variety of alternatives on disentangling, as evaluated by two key properties of a disentangled representation: modularity and explicitness. The goal of our work is to obtain more interpretable, manipulable, and generalizable deep representations of concepts and categories. |
Tasks | Few-Shot Learning |
Published | 2018-02-14 |
URL | http://arxiv.org/abs/1802.05312v2 |
http://arxiv.org/pdf/1802.05312v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-deep-disentangled-embeddings-with |
Repo | https://github.com/kridgeway/f-statistic-loss-nips-2018 |
Framework | tf |
SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation
Title | SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation |
Authors | Xinyi Wang, Hieu Pham, Zihang Dai, Graham Neubig |
Abstract | In this work, we examine methods for data augmentation for text-based tasks such as neural machine translation (NMT). We formulate the design of a data augmentation policy with desirable properties as an optimization problem, and derive a generic analytic solution. This solution not only subsumes some existing augmentation schemes, but also leads to an extremely simple data augmentation strategy for NMT: randomly replacing words in both the source sentence and the target sentence with other random words from their corresponding vocabularies. We name this method SwitchOut. Experiments on three translation datasets of different scales show that SwitchOut yields consistent improvements of about 0.5 BLEU, achieving better or comparable performances to strong alternatives such as word dropout (Sennrich et al., 2016a). Code to implement this method is included in the appendix. |
Tasks | Data Augmentation, Machine Translation |
Published | 2018-08-22 |
URL | http://arxiv.org/abs/1808.07512v2 |
http://arxiv.org/pdf/1808.07512v2.pdf | |
PWC | https://paperswithcode.com/paper/switchout-an-efficient-data-augmentation |
Repo | https://github.com/Waino/OpenNMT-py |
Framework | pytorch |
SIPs: Succinct Interest Points from Unsupervised Inlierness Probability Learning
Title | SIPs: Succinct Interest Points from Unsupervised Inlierness Probability Learning |
Authors | Titus Cieslewski, Konstantinos G. Derpanis, Davide Scaramuzza |
Abstract | A wide range of computer vision algorithms rely on identifying sparse interest points in images and establishing correspondences between them. However, only a subset of the initially identified interest points results in true correspondences (inliers). In this paper, we seek a detector that finds the minimum number of points that are likely to result in an application-dependent “sufficient” number of inliers k. To quantify this goal, we introduce the “k-succinctness” metric. Extracting a minimum number of interest points is attractive for many applications, because it can reduce computational load, memory, and data transmission. Alongside succinctness, we introduce an unsupervised training methodology for interest point detectors that is based on predicting the probability of a given pixel being an inlier. In comparison to previous learned detectors, our method requires the least amount of data pre-processing. Our detector and other state-of-the-art detectors are extensively evaluated with respect to succinctness on popular public datasets covering both indoor and outdoor scenes, and both wide and narrow baselines. In certain cases, our detector is able to obtain an equivalent amount of inliers with as little as 60% of the amount of points of other detectors. The code and trained networks are provided at https://github.com/uzh-rpg/sips2_open . |
Tasks | Interest Point Detection, Pose Estimation, Visual Odometry |
Published | 2018-05-03 |
URL | https://arxiv.org/abs/1805.01358v2 |
https://arxiv.org/pdf/1805.01358v2.pdf | |
PWC | https://paperswithcode.com/paper/sips-unsupervised-succinct-interest-points |
Repo | https://github.com/uzh-rpg/sips2_open |
Framework | tf |
Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation
Title | Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation |
Authors | Chenze Shao, Yang Feng, Xilin Chen |
Abstract | Neural machine translation (NMT) models are usually trained with the word-level loss using the teacher forcing algorithm, which not only evaluates the translation improperly but also suffers from exposure bias. Sequence-level training under the reinforcement framework can mitigate the problems of the word-level loss, but its performance is unstable due to the high variance of the gradient estimation. On these grounds, we present a method with a differentiable sequence-level training objective based on probabilistic n-gram matching which can avoid the reinforcement framework. In addition, this method performs greedy search in the training which uses the predicted words as context just as at inference to alleviate the problem of exposure bias. Experiment results on the NIST Chinese-to-English translation tasks show that our method significantly outperforms the reinforcement-based algorithms and achieves an improvement of 1.5 BLEU points on average over a strong baseline system. |
Tasks | Machine Translation |
Published | 2018-09-10 |
URL | http://arxiv.org/abs/1809.03132v1 |
http://arxiv.org/pdf/1809.03132v1.pdf | |
PWC | https://paperswithcode.com/paper/greedy-search-with-probabilistic-n-gram |
Repo | https://github.com/ictnlp/GS4NMT |
Framework | pytorch |
Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph
Title | Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph |
Authors | Zhuoren Jiang, Yue Yin, Liangcai Gao, Yao Lu, Xiaozhong Liu |
Abstract | While the volume of scholarly publications has increased at a frenetic pace, accessing and consuming the useful candidate papers, in very large digital libraries, is becoming an essential and challenging task for scholars. Unfortunately, because of language barrier, some scientists (especially the junior ones or graduate students who do not master other languages) cannot efficiently locate the publications hosted in a foreign language repository. In this study, we propose a novel solution, cross-language citation recommendation via Hierarchical Representation Learning on Heterogeneous Graph (HRLHG), to address this new problem. HRLHG can learn a representation function by mapping the publications, from multilingual repositories, to a low-dimensional joint embedding space from various kinds of vertexes and relations on a heterogeneous graph. By leveraging both global (task specific) plus local (task independent) information as well as a novel supervised hierarchical random walk algorithm, the proposed method can optimize the publication representations by maximizing the likelihood of locating the important cross-language neighborhoods on the graph. Experiment results show that the proposed method can not only outperform state-of-the-art baseline models, but also improve the interpretability of the representation model for cross-language citation recommendation task. |
Tasks | Representation Learning |
Published | 2018-12-31 |
URL | http://arxiv.org/abs/1812.11709v1 |
http://arxiv.org/pdf/1812.11709v1.pdf | |
PWC | https://paperswithcode.com/paper/cross-language-citation-recommendation-via |
Repo | https://github.com/GraphEmbedding/HRLHG |
Framework | none |
Quadratic Decomposable Submodular Function Minimization
Title | Quadratic Decomposable Submodular Function Minimization |
Authors | Pan Li, Niao He, Olgica Milenkovic |
Abstract | We introduce a new convex optimization problem, termed quadratic decomposable submodular function minimization. The problem is closely related to decomposable submodular function minimization and arises in many learning on graphs and hypergraphs settings, such as graph-based semi-supervised learning and PageRank. We approach the problem via a new dual strategy and describe an objective that may be optimized via random coordinate descent (RCD) methods and projections onto cones. We also establish the linear convergence rate of the RCD algorithm and develop efficient projection algorithms with provable performance guarantees. Numerical experiments in semi-supervised learning on hypergraphs confirm the efficiency of the proposed algorithm and demonstrate the significant improvements in prediction accuracy with respect to state-of-the-art methods. |
Tasks | |
Published | 2018-06-26 |
URL | http://arxiv.org/abs/1806.09842v3 |
http://arxiv.org/pdf/1806.09842v3.pdf | |
PWC | https://paperswithcode.com/paper/quadratic-decomposable-submodular-function |
Repo | https://github.com/lipan00123/QDSDM |
Framework | none |
Adversarial Training Versus Weight Decay
Title | Adversarial Training Versus Weight Decay |
Authors | Angus Galloway, Thomas Tanay, Graham W. Taylor |
Abstract | Performance-critical machine learning models should be robust to input perturbations not seen during training. Adversarial training is a method for improving a model’s robustness to some perturbations by including them in the training process, but this tends to exacerbate other vulnerabilities of the model. The adversarial training framework has the effect of translating the data with respect to the cost function, while weight decay has a scaling effect. Although weight decay could be considered a crude regularization technique, it appears superior to adversarial training as it remains stable over a broader range of regimes and reduces all generalization errors. Equipped with these abstractions, we provide key baseline results and methodology for characterizing robustness. The two approaches can be combined to yield one small model that demonstrates good robustness to several white-box attacks associated with different metrics. |
Tasks | |
Published | 2018-04-10 |
URL | http://arxiv.org/abs/1804.03308v3 |
http://arxiv.org/pdf/1804.03308v3.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-training-versus-weight-decay |
Repo | https://github.com/uoguelph-mlrg/adversarial_training_vs_weight_decay |
Framework | tf |
Virtual Codec Supervised Re-Sampling Network for Image Compression
Title | Virtual Codec Supervised Re-Sampling Network for Image Compression |
Authors | Lijun Zhao, Huihui Bai, Anhong Wang, Yao Zhao |
Abstract | In this paper, we propose an image re-sampling compression method by learning virtual codec network (VCN) to resolve the non-differentiable problem of quantization function for image compression. Here, the image re-sampling not only refers to image full-resolution re-sampling but also low-resolution re-sampling. We generalize this method for standard-compliant image compression (SCIC) framework and deep neural networks based compression (DNNC) framework. Specifically, an input image is measured by re-sampling network (RSN) network to get re-sampled vectors. Then, these vectors are directly quantized in the feature space in SCIC, or discrete cosine transform coefficients of these vectors are quantized to further improve coding efficiency in DNNC. At the encoder, the quantized vectors or coefficients are losslessly compressed by arithmetic coding. At the receiver, the decoded vectors are utilized to restore input image by image decoder network (IDN). In order to train RSN network and IDN network together in an end-to-end fashion, our VCN network intimates projection from the re-sampled vectors to the IDN-decoded image. As a result, gradients from IDN network to RSN network can be approximated by VCN network’s gradient. Because dimension reduction can be further achieved by quantization in some dimensional space after image re-sampling within auto-encoder architecture, we can well initialize our networks from pre-trained auto-encoder networks. Through extensive experiments and analysis, it is verified that the proposed method has more effectiveness and versatility than many state-of-the-art approaches. |
Tasks | Dimensionality Reduction, Image Compression, Quantization |
Published | 2018-06-22 |
URL | http://arxiv.org/abs/1806.08514v2 |
http://arxiv.org/pdf/1806.08514v2.pdf | |
PWC | https://paperswithcode.com/paper/virtual-codec-supervised-re-sampling-network |
Repo | https://github.com/mdcnn/mdcnn.github.io |
Framework | none |