Paper Group AWR 181
Iterative Manifold Embedding Layer Learned by Incomplete Data for Large-scale Image Retrieval. Detection of Anomalies in Large Scale Accounting Data using Deep Autoencoder Networks. Conditional Adversarial Domain Adaptation. Stochastic Variational Video Prediction. Statistical Anomaly Detection via Composite Hypothesis Testing for Markov Models. Wh …
Iterative Manifold Embedding Layer Learned by Incomplete Data for Large-scale Image Retrieval
Title | Iterative Manifold Embedding Layer Learned by Incomplete Data for Large-scale Image Retrieval |
Authors | Jian Xu, Chunheng Wang, Chengzuo Qi, Cunzhao Shi, Baihua Xiao |
Abstract | Existing manifold learning methods are not appropriate for image retrieval task, because most of them are unable to process query image and they have much additional computational cost especially for large scale database. Therefore, we propose the iterative manifold embedding (IME) layer, of which the weights are learned off-line by unsupervised strategy, to explore the intrinsic manifolds by incomplete data. On the large scale database that contains 27000 images, IME layer is more than 120 times faster than other manifold learning methods to embed the original representations at query time. We embed the original descriptors of database images which lie on manifold in a high dimensional space into manifold-based representations iteratively to generate the IME representations in off-line learning stage. According to the original descriptors and the IME representations of database images, we estimate the weights of IME layer by ridge regression. In on-line retrieval stage, we employ the IME layer to map the original representation of query image with ignorable time cost (2 milliseconds). We experiment on five public standard datasets for image retrieval. The proposed IME layer significantly outperforms related dimension reduction methods and manifold learning methods. Without post-processing, Our IME layer achieves a boost in performance of state-of-the-art image retrieval methods with post-processing on most datasets, and needs less computational cost. |
Tasks | Dimensionality Reduction, Image Retrieval |
Published | 2017-07-14 |
URL | http://arxiv.org/abs/1707.09862v2 |
http://arxiv.org/pdf/1707.09862v2.pdf | |
PWC | https://paperswithcode.com/paper/iterative-manifold-embedding-layer-learned-by |
Repo | https://github.com/XJhaoren/IME_layer |
Framework | none |
Detection of Anomalies in Large Scale Accounting Data using Deep Autoencoder Networks
Title | Detection of Anomalies in Large Scale Accounting Data using Deep Autoencoder Networks |
Authors | Marco Schreyer, Timur Sattarov, Damian Borth, Andreas Dengel, Bernd Reimer |
Abstract | Learning to detect fraud in large-scale accounting data is one of the long-standing challenges in financial statement audits or fraud investigations. Nowadays, the majority of applied techniques refer to handcrafted rules derived from known fraud scenarios. While fairly successful, these rules exhibit the drawback that they often fail to generalize beyond known fraud scenarios and fraudsters gradually find ways to circumvent them. To overcome this disadvantage and inspired by the recent success of deep learning we propose the application of deep autoencoder neural networks to detect anomalous journal entries. We demonstrate that the trained network’s reconstruction error obtainable for a journal entry and regularized by the entry’s individual attribute probabilities can be interpreted as a highly adaptive anomaly assessment. Experiments on two real-world datasets of journal entries, show the effectiveness of the approach resulting in high f1-scores of 32.93 (dataset A) and 16.95 (dataset B) and less false positive alerts compared to state of the art baseline methods. Initial feedback received by chartered accountants and fraud examiners underpinned the quality of the approach in capturing highly relevant accounting anomalies. |
Tasks | |
Published | 2017-09-15 |
URL | http://arxiv.org/abs/1709.05254v2 |
http://arxiv.org/pdf/1709.05254v2.pdf | |
PWC | https://paperswithcode.com/paper/detection-of-anomalies-in-large-scale |
Repo | https://github.com/koenvandevelde/fd-autoencoder |
Framework | tf |
Conditional Adversarial Domain Adaptation
Title | Conditional Adversarial Domain Adaptation |
Authors | Mingsheng Long, Zhangjie Cao, Jianmin Wang, Michael I. Jordan |
Abstract | Adversarial learning has been embedded into deep networks to learn disentangled and transferable representations for domain adaptation. Existing adversarial domain adaptation methods may not effectively align different domains of multimodal distributions native in classification problems. In this paper, we present conditional adversarial domain adaptation, a principled framework that conditions the adversarial adaptation models on discriminative information conveyed in the classifier predictions. Conditional domain adversarial networks (CDANs) are designed with two novel conditioning strategies: multilinear conditioning that captures the cross-covariance between feature representations and classifier predictions to improve the discriminability, and entropy conditioning that controls the uncertainty of classifier predictions to guarantee the transferability. With theoretical guarantees and a few lines of codes, the approach has exceeded state-of-the-art results on five datasets. |
Tasks | Domain Adaptation |
Published | 2017-05-26 |
URL | http://arxiv.org/abs/1705.10667v4 |
http://arxiv.org/pdf/1705.10667v4.pdf | |
PWC | https://paperswithcode.com/paper/conditional-adversarial-domain-adaptation |
Repo | https://github.com/thuml/CDAN |
Framework | pytorch |
Stochastic Variational Video Prediction
Title | Stochastic Variational Video Prediction |
Authors | Mohammad Babaeizadeh, Chelsea Finn, Dumitru Erhan, Roy H. Campbell, Sergey Levine |
Abstract | Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of natural images requires the predictive model to build an intricate understanding of the natural world. Many existing methods tackle this problem by making simplifying assumptions about the environment. One common assumption is that the outcome is deterministic and there is only one plausible future. This can lead to low-quality predictions in real-world settings with stochastic dynamics. In this paper, we develop a stochastic variational video prediction (SV2P) method that predicts a different possible future for each sample of its latent variables. To the best of our knowledge, our model is the first to provide effective stochastic multi-frame prediction for real-world video. We demonstrate the capability of the proposed method in predicting detailed future frames of videos on multiple real-world datasets, both action-free and action-conditioned. We find that our proposed method produces substantially improved video predictions when compared to the same model without stochasticity, and to other stochastic video prediction methods. Our SV2P implementation will be open sourced upon publication. |
Tasks | Video Prediction |
Published | 2017-10-30 |
URL | http://arxiv.org/abs/1710.11252v2 |
http://arxiv.org/pdf/1710.11252v2.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-variational-video-prediction |
Repo | https://github.com/StanfordVL/roboturk_real_dataset |
Framework | tf |
Statistical Anomaly Detection via Composite Hypothesis Testing for Markov Models
Title | Statistical Anomaly Detection via Composite Hypothesis Testing for Markov Models |
Authors | Jing Zhang, Ioannis Ch. Paschalidis |
Abstract | Under Markovian assumptions, we leverage a Central Limit Theorem (CLT) for the empirical measure in the test statistic of the composite hypothesis Hoeffding test so as to establish weak convergence results for the test statistic, and, thereby, derive a new estimator for the threshold needed by the test. We first show the advantages of our estimator over an existing estimator by conducting extensive numerical experiments. We find that our estimator controls better for false alarms while maintaining satisfactory detection probabilities. We then apply the Hoeffding test with our threshold estimator to detecting anomalies in two distinct applications domains: one in communication networks and the other in transportation networks. The former application seeks to enhance cyber security and the latter aims at building smarter transportation systems in cities. |
Tasks | Anomaly Detection |
Published | 2017-02-27 |
URL | http://arxiv.org/abs/1702.08435v3 |
http://arxiv.org/pdf/1702.08435v3.pdf | |
PWC | https://paperswithcode.com/paper/statistical-anomaly-detection-via-composite |
Repo | https://github.com/jingzbu/ROCHM |
Framework | none |
Whatever Does Not Kill Deep Reinforcement Learning, Makes It Stronger
Title | Whatever Does Not Kill Deep Reinforcement Learning, Makes It Stronger |
Authors | Vahid Behzadan, Arslan Munir |
Abstract | Recent developments have established the vulnerability of deep Reinforcement Learning (RL) to policy manipulation attacks via adversarial perturbations. In this paper, we investigate the robustness and resilience of deep RL to training-time and test-time attacks. Through experimental results, we demonstrate that under noncontiguous training-time attacks, Deep Q-Network (DQN) agents can recover and adapt to the adversarial conditions by reactively adjusting the policy. Our results also show that policies learned under adversarial perturbations are more robust to test-time attacks. Furthermore, we compare the performance of $\epsilon$-greedy and parameter-space noise exploration methods in terms of robustness and resilience against adversarial perturbations. |
Tasks | |
Published | 2017-12-23 |
URL | http://arxiv.org/abs/1712.09344v1 |
http://arxiv.org/pdf/1712.09344v1.pdf | |
PWC | https://paperswithcode.com/paper/whatever-does-not-kill-deep-reinforcement |
Repo | https://github.com/behzadanksu/rlattack-dev |
Framework | tf |
Scene Graph Generation from Objects, Phrases and Region Captions
Title | Scene Graph Generation from Objects, Phrases and Region Captions |
Authors | Yikang Li, Wanli Ouyang, Bolei Zhou, Kun Wang, Xiaogang Wang |
Abstract | Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations, and other context information. In this work, to leverage the mutual connections across semantic levels, we propose a novel neural network model, termed as Multi-level Scene Description Network (denoted as MSDN), to solve the three vision tasks jointly in an end-to-end manner. Objects, phrases, and caption regions are first aligned with a dynamic graph based on their spatial and semantic connections. Then a feature refining structure is used to pass messages across the three levels of semantic tasks through the graph. We benchmark the learned model on three tasks, and show the joint learning across three tasks with our proposed method can bring mutual improvements over previous models. Particularly, on the scene graph generation task, our proposed method outperforms the state-of-art method with more than 3% margin. |
Tasks | Graph Generation, Object Detection, Scene Graph Generation, Scene Understanding |
Published | 2017-07-31 |
URL | http://arxiv.org/abs/1707.09700v2 |
http://arxiv.org/pdf/1707.09700v2.pdf | |
PWC | https://paperswithcode.com/paper/scene-graph-generation-from-objects-phrases |
Repo | https://github.com/yikang-li/MSDN |
Framework | pytorch |
Learning Deep and Compact Models for Gesture Recognition
Title | Learning Deep and Compact Models for Gesture Recognition |
Authors | Koustav Mullick, Anoop M. Namboodiri |
Abstract | We look at the problem of developing a compact and accurate model for gesture recognition from videos in a deep-learning framework. Towards this we propose a joint 3DCNN-LSTM model that is end-to-end trainable and is shown to be better suited to capture the dynamic information in actions. The solution achieves close to state-of-the-art accuracy on the ChaLearn dataset, with only half the model size. We also explore ways to derive a much more compact representation in a knowledge distillation framework followed by model compression. The final model is less than $1~MB$ in size, which is less than one hundredth of our initial model, with a drop of $7%$ in accuracy, and is suitable for real-time gesture recognition on mobile devices. |
Tasks | Gesture Recognition, Model Compression |
Published | 2017-12-29 |
URL | http://arxiv.org/abs/1712.10136v1 |
http://arxiv.org/pdf/1712.10136v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-deep-and-compact-models-for-gesture |
Repo | https://github.com/chriswegmann/drone_steering |
Framework | none |
ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)
Title | ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) |
Authors | Baoguang Shi, Cong Yao, Minghui Liao, Mingkun Yang, Pei Xu, Linyan Cui, Serge Belongie, Shijian Lu, Xiang Bai |
Abstract | Chinese is the most widely used language in the world. Algorithms that read Chinese text in natural images facilitate applications of various kinds. Despite the large potential value, datasets and competitions in the past primarily focus on English, which bares very different characteristics than Chinese. This report introduces RCTW, a new competition that focuses on Chinese text reading. The competition features a large-scale dataset with 12,263 annotated images. Two tasks, namely text localization and end-to-end recognition, are set up. The competition took place from January 20 to May 31, 2017. 23 valid submissions were received from 19 teams. This report includes dataset description, task definitions, evaluation protocols, and results summaries and analysis. Through this competition, we call for more future research on the Chinese text reading problem. The official website for the competition is http://rctw.vlrlab.net |
Tasks | |
Published | 2017-08-31 |
URL | http://arxiv.org/abs/1708.09585v3 |
http://arxiv.org/pdf/1708.09585v3.pdf | |
PWC | https://paperswithcode.com/paper/icdar2017-competition-on-reading-chinese-text |
Repo | https://github.com/OzHsu23/chineseocr |
Framework | tf |
Automated Conjecturing VII: The Graph Brain Project & Big Mathematics
Title | Automated Conjecturing VII: The Graph Brain Project & Big Mathematics |
Authors | N. Bushaw, C. E. Larson, N. Van Cleemput |
Abstract | The Graph Brain Project is an experiment in how the use of automated mathematical discovery software, databases, large collaboration, and systematic investigation provide a model for how mathematical research might proceed in the future. Our Project began with the development of a program that can be used to generate invariant-relation and property-relation conjectures in many areas of mathematics. This program can produce conjectures which are not implied by existing (published) theorems. Here we propose a new approach to push forward existing mathematical research goals—using automated mathematical discovery software. We suggest how to initiate and harness large-scale collaborative mathematics. We envision mathematical research labs similar to what exist in other sciences, new avenues for funding, new opportunities for training students, and a more efficient and effective use of published mathematical research. And our experiment in graph theory can be imitated in many other areas of mathematical research. Big Mathematics is the idea of large, systematic, collaborative research on problems of existing mathematical interest. What is possible when we put our skills, tools, and results together systematically? |
Tasks | |
Published | 2017-12-28 |
URL | http://arxiv.org/abs/1801.01814v1 |
http://arxiv.org/pdf/1801.01814v1.pdf | |
PWC | https://paperswithcode.com/paper/automated-conjecturing-vii-the-graph-brain |
Repo | https://github.com/math1um/objects-invariants-properties |
Framework | none |
ChainerMN: Scalable Distributed Deep Learning Framework
Title | ChainerMN: Scalable Distributed Deep Learning Framework |
Authors | Takuya Akiba, Keisuke Fukuda, Shuji Suzuki |
Abstract | One of the keys for deep learning to have made a breakthrough in various fields was to utilize high computing powers centering around GPUs. Enabling the use of further computing abilities by distributed processing is essential not only to make the deep learning bigger and faster but also to tackle unsolved challenges. We present the design, implementation, and evaluation of ChainerMN, the distributed deep learning framework we have developed. We demonstrate that ChainerMN can scale the learning process of the ResNet-50 model to the ImageNet dataset up to 128 GPUs with the parallel efficiency of 90%. |
Tasks | |
Published | 2017-10-31 |
URL | http://arxiv.org/abs/1710.11351v1 |
http://arxiv.org/pdf/1710.11351v1.pdf | |
PWC | https://paperswithcode.com/paper/chainermn-scalable-distributed-deep-learning |
Repo | https://github.com/chainer/chainermn |
Framework | none |
FacePoseNet: Making a Case for Landmark-Free Face Alignment
Title | FacePoseNet: Making a Case for Landmark-Free Face Alignment |
Authors | Fengju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, Gerard Medioni |
Abstract | We show how a simple convolutional neural network (CNN) can be trained to accurately and robustly regress 6 degrees of freedom (6DoF) 3D head pose, directly from image intensities. We further explain how this FacePoseNet (FPN) can be used to align faces in 2D and 3D as an alternative to explicit facial landmark detection for these tasks. We claim that in many cases the standard means of measuring landmark detector accuracy can be misleading when comparing different face alignments. Instead, we compare our FPN with existing methods by evaluating how they affect face recognition accuracy on the IJB-A and IJB-B benchmarks: using the same recognition pipeline, but varying the face alignment method. Our results show that (a) better landmark detection accuracy measured on the 300W benchmark does not necessarily imply better face recognition accuracy. (b) Our FPN provides superior 2D and 3D face alignment on both benchmarks. Finally, (c), FPN aligns faces at a small fraction of the computational cost of comparably accurate landmark detectors. For many purposes, FPN is thus a far faster and far more accurate face alignment method than using facial landmark detectors. |
Tasks | Face Alignment, Face Identification, Face Recognition, Face Verification, Facial Landmark Detection |
Published | 2017-08-24 |
URL | http://arxiv.org/abs/1708.07517v2 |
http://arxiv.org/pdf/1708.07517v2.pdf | |
PWC | https://paperswithcode.com/paper/faceposenet-making-a-case-for-landmark-free |
Repo | https://github.com/fengju514/Expression-Net |
Framework | tf |
CHARDA: Causal Hybrid Automata Recovery via Dynamic Analysis
Title | CHARDA: Causal Hybrid Automata Recovery via Dynamic Analysis |
Authors | Adam Summerville, Joseph Osborn, Michael Mateas |
Abstract | We propose and evaluate a new technique for learning hybrid automata automatically by observing the runtime behavior of a dynamical system. Working from a sequence of continuous state values and predicates about the environment, CHARDA recovers the distinct dynamic modes, learns a model for each mode from a given set of templates, and postulates causal guard conditions which trigger transitions between modes. Our main contribution is the use of information-theoretic measures (1)~as a cost function for data segmentation and model selection to penalize over-fitting and (2)~to determine the likely causes of each transition. CHARDA is easily extended with different classes of model templates, fitting methods, or predicates. In our experiments on a complex videogame character, CHARDA successfully discovers a reasonable over-approximation of the character’s true behaviors. Our results also compare favorably against recent work in automatically learning probabilistic timed automata in an aircraft domain: CHARDA exactly learns the modes of these simpler automata. |
Tasks | Model Selection |
Published | 2017-07-11 |
URL | http://arxiv.org/abs/1707.03336v1 |
http://arxiv.org/pdf/1707.03336v1.pdf | |
PWC | https://paperswithcode.com/paper/charda-causal-hybrid-automata-recovery-via |
Repo | https://github.com/JoeOsborn/mechlearn |
Framework | none |
Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon
Title | Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon |
Authors | Xin Dong, Shangyu Chen, Sinno Jialin Pan |
Abstract | How to develop slim and accurate deep neural networks has become crucial for real- world applications, especially for those employed in embedded systems. Though previous work along this research line has shown some promising results, most existing methods either fail to significantly compress a well-trained deep network or require a heavy retraining process for the pruned deep network to re-boost its prediction performance. In this paper, we propose a new layer-wise pruning method for deep neural networks. In our proposed method, parameters of each individual layer are pruned independently based on second order derivatives of a layer-wise error function with respect to the corresponding parameters. We prove that the final prediction performance drop after pruning is bounded by a linear combination of the reconstructed errors caused at each layer. Therefore, there is a guarantee that one only needs to perform a light retraining process on the pruned network to resume its original prediction performance. We conduct extensive experiments on benchmark datasets to demonstrate the effectiveness of our pruning method compared with several state-of-the-art baseline methods. |
Tasks | |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.07565v2 |
http://arxiv.org/pdf/1705.07565v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-prune-deep-neural-networks-via |
Repo | https://github.com/csyhhu/L-OBS |
Framework | pytorch |
Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction
Title | Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction |
Authors | Kun Gai, Xiaoqiang Zhu, Han Li, Kai Liu, Zhe Wang |
Abstract | CTR prediction in real-world business is a difficult machine learning problem with large scale nonlinear sparse data. In this paper, we introduce an industrial strength solution with model named Large Scale Piece-wise Linear Model (LS-PLM). We formulate the learning problem with $L_1$ and $L_{2,1}$ regularizers, leading to a non-convex and non-smooth optimization problem. Then, we propose a novel algorithm to solve it efficiently, based on directional derivatives and quasi-Newton method. In addition, we design a distributed system which can run on hundreds of machines parallel and provides us with the industrial scalability. LS-PLM model can capture nonlinear patterns from massive sparse data, saving us from heavy feature engineering jobs. Since 2012, LS-PLM has become the main CTR prediction model in Alibaba’s online display advertising system, serving hundreds of millions users every day. |
Tasks | Click-Through Rate Prediction, Feature Engineering |
Published | 2017-04-18 |
URL | http://arxiv.org/abs/1704.05194v1 |
http://arxiv.org/pdf/1704.05194v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-piece-wise-linear-models-from-large |
Repo | https://github.com/shenweichen/DeepCTR-PyTorch |
Framework | pytorch |