Paper Group ANR 461
Adversarial Metric Learning. CrowdHuman: A Benchmark for Detecting Human in a Crowd. Comparison of Discrete Choice Models and Artificial Neural Networks in Presence of Missing Variables. Batch Normalization Sampling. Avoiding a Tragedy of the Commons in the Peer Review Process. Enhanced Signal Recovery via Sparsity Inducing Image Priors. Learning t …
Adversarial Metric Learning
Title | Adversarial Metric Learning |
Authors | Shuo Chen, Chen Gong, Jian Yang, Xiang Li, Yang Wei, Jun Li |
Abstract | In the past decades, intensive efforts have been put to design various loss functions and metric forms for metric learning problem. These improvements have shown promising results when the test data is similar to the training data. However, the trained models often fail to produce reliable distances on the ambiguous test pairs due to the distribution bias between training set and test set. To address this problem, the Adversarial Metric Learning (AML) is proposed in this paper, which automatically generates adversarial pairs to remedy the distribution bias and facilitate robust metric learning. Specifically, AML consists of two adversarial stages, i.e. confusion and distinguishment. In confusion stage, the ambiguous but critical adversarial data pairs are adaptively generated to mislead the learned metric. In distinguishment stage, a metric is exhaustively learned to try its best to distinguish both the adversarial pairs and the original training pairs. Thanks to the challenges posed by the confusion stage in such competing process, the AML model is able to grasp plentiful difficult knowledge that has not been contained by the original training pairs, so the discriminability of AML can be significantly improved. The entire model is formulated into optimization framework, of which the global convergence is theoretically proved. The experimental results on toy data and practical datasets clearly demonstrate the superiority of AML to the representative state-of-the-art metric learning methodologies. |
Tasks | Metric Learning |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03170v1 |
http://arxiv.org/pdf/1802.03170v1.pdf | |
PWC | https://paperswithcode.com/paper/adversarial-metric-learning |
Repo | |
Framework | |
CrowdHuman: A Benchmark for Detecting Human in a Crowd
Title | CrowdHuman: A Benchmark for Detecting Human in a Crowd |
Authors | Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, Jian Sun |
Abstract | Human detection has witnessed impressive progress in recent years. However, the occlusion issue of detecting human in highly crowded environments is far from solved. To make matters worse, crowd scenarios are still under-represented in current human detection benchmarks. In this paper, we introduce a new dataset, called CrowdHuman, to better evaluate detectors in crowd scenarios. The CrowdHuman dataset is large, rich-annotated and contains high diversity. There are a total of $470K$ human instances from the train and validation subsets, and $~22.6$ persons per image, with various kinds of occlusions in the dataset. Each human instance is annotated with a head bounding-box, human visible-region bounding-box and human full-body bounding-box. Baseline performance of state-of-the-art detection frameworks on CrowdHuman is presented. The cross-dataset generalization results of CrowdHuman dataset demonstrate state-of-the-art performance on previous dataset including Caltech-USA, CityPersons, and Brainwash without bells and whistles. We hope our dataset will serve as a solid baseline and help promote future research in human detection tasks. |
Tasks | Human Detection, Pedestrian Detection |
Published | 2018-04-30 |
URL | http://arxiv.org/abs/1805.00123v1 |
http://arxiv.org/pdf/1805.00123v1.pdf | |
PWC | https://paperswithcode.com/paper/crowdhuman-a-benchmark-for-detecting-human-in |
Repo | |
Framework | |
Comparison of Discrete Choice Models and Artificial Neural Networks in Presence of Missing Variables
Title | Comparison of Discrete Choice Models and Artificial Neural Networks in Presence of Missing Variables |
Authors | Johan Barthélemy, Morgane Dumont, Timoteo Carletti |
Abstract | Classification, the process of assigning a label (or class) to an observation given its features, is a common task in many applications. Nonetheless in most real-life applications, the labels can not be fully explained by the observed features. Indeed there can be many factors hidden to the modellers. The unexplained variation is then treated as some random noise which is handled differently depending on the method retained by the practitioner. This work focuses on two simple and widely used supervised classification algorithms: discrete choice models and artificial neural networks in the context of binary classification. Through various numerical experiments involving continuous or discrete explanatory features, we present a comparison of the retained methods’ performance in presence of missing variables. The impact of the distribution of the two classes in the training data is also investigated. The outcomes of those experiments highlight the fact that artificial neural networks outperforms the discrete choice models, except when the distribution of the classes in the training data is highly unbalanced. Finally, this work provides some guidelines for choosing the right classifier with respect to the training data. |
Tasks | |
Published | 2018-11-06 |
URL | http://arxiv.org/abs/1811.02284v1 |
http://arxiv.org/pdf/1811.02284v1.pdf | |
PWC | https://paperswithcode.com/paper/comparison-of-discrete-choice-models-and |
Repo | |
Framework | |
Batch Normalization Sampling
Title | Batch Normalization Sampling |
Authors | Zhaodong Chen, Lei Deng, Guoqi Li, Jiawei Sun, Xing Hu, Xin Ma, Yuan Xie |
Abstract | Deep Neural Networks (DNNs) thrive in recent years in which Batch Normalization (BN) plays an indispensable role. However, it has been observed that BN is costly due to the reduction operations. In this paper, we propose alleviating this problem through sampling only a small fraction of data for normalization at each iteration. Specifically, we model it as a statistical sampling problem and identify that by sampling less correlated data, we can largely reduce the requirement of the number of data for statistics estimation in BN, which directly simplifies the reduction operations. Based on this conclusion, we propose two sampling strategies, “Batch Sampling” (randomly select several samples from each batch) and “Feature Sampling” (randomly select a small patch from each feature map of all samples), that take both computational efficiency and sample correlation into consideration. Furthermore, we introduce an extremely simple variant of BN, termed as Virtual Dataset Normalization (VDN), that can normalize the activations well with few synthetical random samples. All the proposed methods are evaluated on various datasets and networks, where an overall training speedup by up to 20% on GPU is practically achieved without the support of any specialized libraries, and the loss on accuracy and convergence rate are negligible. Finally, we extend our work to the “micro-batch normalization” problem and yield comparable performance with existing approaches at the case of tiny batch size. |
Tasks | |
Published | 2018-10-25 |
URL | http://arxiv.org/abs/1810.10962v2 |
http://arxiv.org/pdf/1810.10962v2.pdf | |
PWC | https://paperswithcode.com/paper/batch-normalization-sampling |
Repo | |
Framework | |
Avoiding a Tragedy of the Commons in the Peer Review Process
Title | Avoiding a Tragedy of the Commons in the Peer Review Process |
Authors | D Sculley, Jasper Snoek, Alex Wiltschko |
Abstract | Peer review is the foundation of scientific publication, and the task of reviewing has long been seen as a cornerstone of professional service. However, the massive growth in the field of machine learning has put this community benefit under stress, threatening both the sustainability of an effective review process and the overall progress of the field. In this position paper, we argue that a tragedy of the commons outcome may be avoided by emphasizing the professional aspects of this service. In particular, we propose a rubric to hold reviewers to an objective standard for review quality. In turn, we also propose that reviewers be given appropriate incentive. As one possible such incentive, we explore the idea of financial compensation on a per-review basis. We suggest reasonable funding models and thoughts on long term effects. |
Tasks | |
Published | 2018-12-18 |
URL | http://arxiv.org/abs/1901.06246v1 |
http://arxiv.org/pdf/1901.06246v1.pdf | |
PWC | https://paperswithcode.com/paper/avoiding-a-tragedy-of-the-commons-in-the-peer |
Repo | |
Framework | |
Enhanced Signal Recovery via Sparsity Inducing Image Priors
Title | Enhanced Signal Recovery via Sparsity Inducing Image Priors |
Authors | Hojjat Seyed Mousavi |
Abstract | Parsimony in signal representation is a topic of active research. Sparse signal processing and representation is the outcome of this line of research which has many applications in information processing and has shown significant improvement in real-world applications such as recovery, classification, clustering, super resolution, etc. This vast influence of sparse signal processing in real-world problems raises a significant need in developing novel sparse signal representation algorithms to obtain more robust systems. In such algorithms, a few open challenges remain in (a) efficiently posing sparsity on signals that can capture the structure of underlying signal and (b) the design of tractable algorithms that can recover signals under aforementioned sparse models. |
Tasks | Super-Resolution |
Published | 2018-05-13 |
URL | http://arxiv.org/abs/1805.04828v1 |
http://arxiv.org/pdf/1805.04828v1.pdf | |
PWC | https://paperswithcode.com/paper/enhanced-signal-recovery-via-sparsity |
Repo | |
Framework | |
Learning to Teach with Dynamic Loss Functions
Title | Learning to Teach with Dynamic Loss Functions |
Authors | Lijun Wu, Fei Tian, Yingce Xia, Yang Fan, Tao Qin, Jianhuang Lai, Tie-Yan Liu |
Abstract | Teaching is critical to human society: it is with teaching that prospective students are educated and human civilization can be inherited and advanced. A good teacher not only provides his/her students with qualified teaching materials (e.g., textbooks), but also sets up appropriate learning objectives (e.g., course projects and exams) considering different situations of a student. When it comes to artificial intelligence, treating machine learning models as students, the loss functions that are optimized act as perfect counterparts of the learning objective set by the teacher. In this work, we explore the possibility of imitating human teaching behaviors by dynamically and automatically outputting appropriate loss functions to train machine learning models. Different from typical learning settings in which the loss function of a machine learning model is predefined and fixed, in our framework, the loss function of a machine learning model (we call it student) is defined by another machine learning model (we call it teacher). The ultimate goal of teacher model is cultivating the student to have better performance measured on development dataset. Towards that end, similar to human teaching, the teacher, a parametric model, dynamically outputs different loss functions that will be used and optimized by its student model at different training stages. We develop an efficient learning method for the teacher model that makes gradient based optimization possible, exempt of the ineffective solutions such as policy optimization. We name our method as “learning to teach with dynamic loss functions” (L2T-DLF for short). Extensive experiments on real world tasks including image classification and neural machine translation demonstrate that our method significantly improves the quality of various student models. |
Tasks | Image Classification, Machine Translation |
Published | 2018-10-29 |
URL | http://arxiv.org/abs/1810.12081v1 |
http://arxiv.org/pdf/1810.12081v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-teach-with-dynamic-loss-functions |
Repo | |
Framework | |
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark
Title | Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark |
Authors | Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Chris Re, Matei Zaharia |
Abstract | Researchers have proposed hardware, software, and algorithmic optimizations to improve the computational performance of deep learning. While some of these optimizations perform the same operations faster (e.g., increasing GPU clock speed), many others modify the semantics of the training procedure (e.g., reduced precision), and can impact the final model’s accuracy on unseen data. Due to a lack of standard evaluation criteria that considers these trade-offs, it is difficult to directly compare these optimizations. To address this problem, we recently introduced DAWNBench, a benchmark competition focused on end-to-end training time to achieve near-state-of-the-art accuracy on an unseen dataset—a combined metric called time-to-accuracy (TTA). In this work, we analyze the entries from DAWNBench, which received optimized submissions from multiple industrial groups, to investigate the behavior of TTA as a metric as well as trends in the best-performing entries. We show that TTA has a low coefficient of variation and that models optimized for TTA generalize nearly as well as those trained using standard methods. Additionally, even though DAWNBench entries were able to train ImageNet models in under 3 minutes, we find they still underutilize hardware capabilities such as Tensor Cores. Furthermore, we find that distributed entries can spend more than half of their time on communication. We show similar findings with entries to the MLPERF v0.5 benchmark. |
Tasks | |
Published | 2018-06-04 |
URL | https://arxiv.org/abs/1806.01427v2 |
https://arxiv.org/pdf/1806.01427v2.pdf | |
PWC | https://paperswithcode.com/paper/analysis-of-dawnbench-a-time-to-accuracy |
Repo | |
Framework | |
Medical Knowledge Embedding Based on Recursive Neural Network for Multi-Disease Diagnosis
Title | Medical Knowledge Embedding Based on Recursive Neural Network for Multi-Disease Diagnosis |
Authors | Jingchi Jiang, Huanzheng Wang, Jing Xie, Xitong Guo, Yi Guan, Qiubin Yu |
Abstract | The representation of knowledge based on first-order logic captures the richness of natural language and supports multiple probabilistic inference models. Although symbolic representation enables quantitative reasoning with statistical probability, it is difficult to utilize with machine learning models as they perform numerical operations. In contrast, knowledge embedding (i.e., high-dimensional and continuous vectors) is a feasible approach to complex reasoning that can not only retain the semantic information of knowledge but also establish the quantifiable relationship among them. In this paper, we propose recursive neural knowledge network (RNKN), which combines medical knowledge based on first-order logic with recursive neural network for multi-disease diagnosis. After RNKN is efficiently trained from manually annotated Chinese Electronic Medical Records (CEMRs), diagnosis-oriented knowledge embeddings and weight matrixes are learned. Experimental results verify that the diagnostic accuracy of RNKN is superior to that of some classical machine learning models and Markov logic network (MLN). The results also demonstrate that the more explicit the evidence extracted from CEMRs is, the better is the performance achieved. RNKN gradually exhibits the interpretation of knowledge embeddings as the number of training epochs increases. |
Tasks | |
Published | 2018-09-22 |
URL | http://arxiv.org/abs/1809.08422v1 |
http://arxiv.org/pdf/1809.08422v1.pdf | |
PWC | https://paperswithcode.com/paper/medical-knowledge-embedding-based-on |
Repo | |
Framework | |
On the importance of single directions for generalization
Title | On the importance of single directions for generalization |
Authors | Ari S. Morcos, David G. T. Barrett, Neil C. Rabinowitz, Matthew Botvinick |
Abstract | Despite their ability to memorize large datasets, deep neural networks often achieve good generalization performance. However, the differences between the learned solutions of networks which generalize and those which do not remain unclear. Additionally, the tuning properties of single directions (defined as the activation of a single unit or some linear combination of units in response to some input) have been highlighted, but their importance has not been evaluated. Here, we connect these lines of inquiry to demonstrate that a network’s reliance on single directions is a good predictor of its generalization performance, across networks trained on datasets with different fractions of corrupted labels, across ensembles of networks trained on datasets with unmodified labels, across different hyperparameters, and over the course of training. While dropout only regularizes this quantity up to a point, batch normalization implicitly discourages single direction reliance, in part by decreasing the class selectivity of individual units. Finally, we find that class selectivity is a poor predictor of task importance, suggesting not only that networks which generalize well minimize their dependence on individual units by reducing their selectivity, but also that individually selective units may not be necessary for strong network performance. |
Tasks | |
Published | 2018-03-19 |
URL | http://arxiv.org/abs/1803.06959v4 |
http://arxiv.org/pdf/1803.06959v4.pdf | |
PWC | https://paperswithcode.com/paper/on-the-importance-of-single-directions-for |
Repo | |
Framework | |
Multiclass Universum SVM
Title | Multiclass Universum SVM |
Authors | Sauptik Dhar, Vladimir Cherkassky, Mohak Shah |
Abstract | We introduce Universum learning for multiclass problems and propose a novel formulation for multiclass universum SVM (MU-SVM). We also propose an analytic span bound for model selection with almost 2-4x faster computation times than standard resampling techniques. We empirically demonstrate the efficacy of the proposed MUSVM formulation on several real world datasets achieving > 20% improvement in test accuracies compared to multi-class SVM. |
Tasks | Model Selection |
Published | 2018-08-23 |
URL | http://arxiv.org/abs/1808.08111v1 |
http://arxiv.org/pdf/1808.08111v1.pdf | |
PWC | https://paperswithcode.com/paper/multiclass-universum-svm |
Repo | |
Framework | |
On Consensus-Optimality Trade-offs in Collaborative Deep Learning
Title | On Consensus-Optimality Trade-offs in Collaborative Deep Learning |
Authors | Zhanhong Jiang, Aditya Balu, Chinmay Hegde, Soumik Sarkar |
Abstract | In distributed machine learning, where agents collaboratively learn from diverse private data sets, there is a fundamental tension between consensus and optimality. In this paper, we build on recent algorithmic progresses in distributed deep learning to explore various consensus-optimality trade-offs over a fixed communication topology. First, we propose the incremental consensus-based distributed SGD (i-CDSGD) algorithm, which involves multiple consensus steps (where each agent communicates information with its neighbors) within each SGD iteration. Second, we propose the generalized consensus-based distributed SGD (g-CDSGD) algorithm that enables us to navigate the full spectrum from complete consensus (all agents agree) to complete disagreement (each agent converges to individual model parameters). We analytically establish convergence of the proposed algorithms for strongly convex and nonconvex objective functions; we also analyze the momentum variants of the algorithms for the strongly convex case. We support our algorithms via numerical experiments, and demonstrate significant improvements over existing methods for collaborative deep learning. |
Tasks | |
Published | 2018-05-30 |
URL | http://arxiv.org/abs/1805.12120v1 |
http://arxiv.org/pdf/1805.12120v1.pdf | |
PWC | https://paperswithcode.com/paper/on-consensus-optimality-trade-offs-in |
Repo | |
Framework | |
Adaptive Learning Method of Recurrent Temporal Deep Belief Network to Analyze Time Series Data
Title | Adaptive Learning Method of Recurrent Temporal Deep Belief Network to Analyze Time Series Data |
Authors | Takumi Ichimura, Shin Kamada |
Abstract | Deep Learning has the hierarchical network architecture to represent the complicated features of input patterns. Such architecture is well known to represent higher learning capability compared with some conventional models if the best set of parameters in the optimal network structure is found. We have been developing the adaptive learning method that can discover the optimal network structure in Deep Belief Network (DBN). The learning method can construct the network structure with the optimal number of hidden neurons in each Restricted Boltzmann Machine and with the optimal number of layers in the DBN during learning phase. The network structure of the learning method can be self-organized according to given input patterns of big data set. In this paper, we embed the adaptive learning method into the recurrent temporal RBM and the self-generated layer into DBN. In order to verify the effectiveness of our proposed method, the experimental results are higher classification capability than the conventional methods in this paper. |
Tasks | Time Series |
Published | 2018-07-11 |
URL | http://arxiv.org/abs/1807.03953v1 |
http://arxiv.org/pdf/1807.03953v1.pdf | |
PWC | https://paperswithcode.com/paper/adaptive-learning-method-of-recurrent |
Repo | |
Framework | |
FIRE-DES++: Enhanced Online Pruning of Base Classifiers for Dynamic Ensemble Selection
Title | FIRE-DES++: Enhanced Online Pruning of Base Classifiers for Dynamic Ensemble Selection |
Authors | Rafael M. O. Cruz, Dayvid V. R. Oliveira, George D. C. Cavalcanti, Robert Sabourin |
Abstract | Despite being very effective in several classification tasks, Dynamic Ensemble Selection (DES) techniques can select classifiers that classify all samples in the region of competence as being from the same class. The Frienemy Indecision REgion DES (FIRE-DES) tackles this problem by pre-selecting classifiers that correctly classify at least one pair of samples from different classes in the region of competence of the test sample. However, FIRE-DES applies the pre-selection for the classification of a test sample if and only if its region of competence is composed of samples from different classes (indecision region), even though this criterion is not reliable for determining if a test sample is located close to the borders of classes (true indecision region) when the region of competence is obtained using classical nearest neighbors approach. Because of that, FIRE-DES mistakes noisy regions for true indecision regions, leading to the pre-selection of incompetent classifiers, and mistakes true indecision regions for safe regions, leaving samples in such regions without any pre-selection. To tackle these issues, we propose the FIRE-DES++, an enhanced FIRE-DES that removes noise and reduces the overlap of classes in the validation set; and defines the region of competence using an equal number of samples of each class, avoiding selecting a region of competence with samples of a single class. Experiments are conducted using FIRE-DES++ with 8 different dynamic selection techniques on 64 classification datasets. Experimental results show that FIRE-DES++ increases the classification performance of all DES techniques considered in this work, outperforming FIRE-DES with 7 out of the 8 DES techniques, and outperforming state-of-the-art DES frameworks. |
Tasks | |
Published | 2018-10-01 |
URL | http://arxiv.org/abs/1810.00520v2 |
http://arxiv.org/pdf/1810.00520v2.pdf | |
PWC | https://paperswithcode.com/paper/fire-des-enhanced-online-pruning-of-base |
Repo | |
Framework | |
Joint Embedding of Meta-Path and Meta-Graph for Heterogeneous Information Networks
Title | Joint Embedding of Meta-Path and Meta-Graph for Heterogeneous Information Networks |
Authors | Lichao Sun, Lifang He, Zhipeng Huang, Bokai Cao, Congying Xia, Xiaokai Wei, Philip S. Yu |
Abstract | Meta-graph is currently the most powerful tool for similarity search on heterogeneous information networks,where a meta-graph is a composition of meta-paths that captures the complex structural information. However, current relevance computing based on meta-graph only considers the complex structural information, but ignores its embedded meta-paths information. To address this problem, we proposeMEta-GrAph-based network embedding models, called MEGA and MEGA++, respectively. The MEGA model uses normalized relevance or similarity measures that are derived from a meta-graph and its embedded meta-paths between nodes simultaneously, and then leverages tensor decomposition method to perform node embedding. The MEGA++ further facilitates the use of coupled tensor-matrix decomposition method to obtain a joint embedding for nodes, which simultaneously considers the hidden relations of all meta information of a meta-graph.Extensive experiments on two real datasets demonstrate thatMEGA and MEGA++ are more effective than state-of-the-art approaches. |
Tasks | Network Embedding |
Published | 2018-09-11 |
URL | http://arxiv.org/abs/1809.04110v1 |
http://arxiv.org/pdf/1809.04110v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-embedding-of-meta-path-and-meta-graph |
Repo | |
Framework | |