July 29, 2019

3005 words 15 mins read

Paper Group AWR 181

Iterative Manifold Embedding Layer Learned by Incomplete Data for Large-scale Image Retrieval. Detection of Anomalies in Large Scale Accounting Data using Deep Autoencoder Networks. Conditional Adversarial Domain Adaptation. Stochastic Variational Video Prediction. Statistical Anomaly Detection via Composite Hypothesis Testing for Markov Models. Wh …

Iterative Manifold Embedding Layer Learned by Incomplete Data for Large-scale Image Retrieval


Title	Iterative Manifold Embedding Layer Learned by Incomplete Data for Large-scale Image Retrieval
Authors	Jian Xu, Chunheng Wang, Chengzuo Qi, Cunzhao Shi, Baihua Xiao
Abstract	Existing manifold learning methods are not appropriate for image retrieval task, because most of them are unable to process query image and they have much additional computational cost especially for large scale database. Therefore, we propose the iterative manifold embedding (IME) layer, of which the weights are learned off-line by unsupervised strategy, to explore the intrinsic manifolds by incomplete data. On the large scale database that contains 27000 images, IME layer is more than 120 times faster than other manifold learning methods to embed the original representations at query time. We embed the original descriptors of database images which lie on manifold in a high dimensional space into manifold-based representations iteratively to generate the IME representations in off-line learning stage. According to the original descriptors and the IME representations of database images, we estimate the weights of IME layer by ridge regression. In on-line retrieval stage, we employ the IME layer to map the original representation of query image with ignorable time cost (2 milliseconds). We experiment on five public standard datasets for image retrieval. The proposed IME layer significantly outperforms related dimension reduction methods and manifold learning methods. Without post-processing, Our IME layer achieves a boost in performance of state-of-the-art image retrieval methods with post-processing on most datasets, and needs less computational cost.
Tasks	Dimensionality Reduction, Image Retrieval
Published	2017-07-14
URL	http://arxiv.org/abs/1707.09862v2
PDF	http://arxiv.org/pdf/1707.09862v2.pdf
PWC	https://paperswithcode.com/paper/iterative-manifold-embedding-layer-learned-by
Repo	https://github.com/XJhaoren/IME_layer
Framework	none

Detection of Anomalies in Large Scale Accounting Data using Deep Autoencoder Networks


Title	Detection of Anomalies in Large Scale Accounting Data using Deep Autoencoder Networks
Authors	Marco Schreyer, Timur Sattarov, Damian Borth, Andreas Dengel, Bernd Reimer
Abstract	Learning to detect fraud in large-scale accounting data is one of the long-standing challenges in financial statement audits or fraud investigations. Nowadays, the majority of applied techniques refer to handcrafted rules derived from known fraud scenarios. While fairly successful, these rules exhibit the drawback that they often fail to generalize beyond known fraud scenarios and fraudsters gradually find ways to circumvent them. To overcome this disadvantage and inspired by the recent success of deep learning we propose the application of deep autoencoder neural networks to detect anomalous journal entries. We demonstrate that the trained network’s reconstruction error obtainable for a journal entry and regularized by the entry’s individual attribute probabilities can be interpreted as a highly adaptive anomaly assessment. Experiments on two real-world datasets of journal entries, show the effectiveness of the approach resulting in high f1-scores of 32.93 (dataset A) and 16.95 (dataset B) and less false positive alerts compared to state of the art baseline methods. Initial feedback received by chartered accountants and fraud examiners underpinned the quality of the approach in capturing highly relevant accounting anomalies.
Tasks
Published	2017-09-15
URL	http://arxiv.org/abs/1709.05254v2
PDF	http://arxiv.org/pdf/1709.05254v2.pdf
PWC	https://paperswithcode.com/paper/detection-of-anomalies-in-large-scale
Repo	https://github.com/koenvandevelde/fd-autoencoder
Framework	tf

Conditional Adversarial Domain Adaptation


Title	Conditional Adversarial Domain Adaptation
Authors	Mingsheng Long, Zhangjie Cao, Jianmin Wang, Michael I. Jordan
Abstract	Adversarial learning has been embedded into deep networks to learn disentangled and transferable representations for domain adaptation. Existing adversarial domain adaptation methods may not effectively align different domains of multimodal distributions native in classification problems. In this paper, we present conditional adversarial domain adaptation, a principled framework that conditions the adversarial adaptation models on discriminative information conveyed in the classifier predictions. Conditional domain adversarial networks (CDANs) are designed with two novel conditioning strategies: multilinear conditioning that captures the cross-covariance between feature representations and classifier predictions to improve the discriminability, and entropy conditioning that controls the uncertainty of classifier predictions to guarantee the transferability. With theoretical guarantees and a few lines of codes, the approach has exceeded state-of-the-art results on five datasets.
Tasks	Domain Adaptation
Published	2017-05-26
URL	http://arxiv.org/abs/1705.10667v4
PDF	http://arxiv.org/pdf/1705.10667v4.pdf
PWC	https://paperswithcode.com/paper/conditional-adversarial-domain-adaptation
Repo	https://github.com/thuml/CDAN
Framework	pytorch

Stochastic Variational Video Prediction


Title	Stochastic Variational Video Prediction
Authors	Mohammad Babaeizadeh, Chelsea Finn, Dumitru Erhan, Roy H. Campbell, Sergey Levine
Abstract	Predicting the future in real-world settings, particularly from raw sensory observations such as images, is exceptionally challenging. Real-world events can be stochastic and unpredictable, and the high dimensionality and complexity of natural images requires the predictive model to build an intricate understanding of the natural world. Many existing methods tackle this problem by making simplifying assumptions about the environment. One common assumption is that the outcome is deterministic and there is only one plausible future. This can lead to low-quality predictions in real-world settings with stochastic dynamics. In this paper, we develop a stochastic variational video prediction (SV2P) method that predicts a different possible future for each sample of its latent variables. To the best of our knowledge, our model is the first to provide effective stochastic multi-frame prediction for real-world video. We demonstrate the capability of the proposed method in predicting detailed future frames of videos on multiple real-world datasets, both action-free and action-conditioned. We find that our proposed method produces substantially improved video predictions when compared to the same model without stochasticity, and to other stochastic video prediction methods. Our SV2P implementation will be open sourced upon publication.
Tasks	Video Prediction
Published	2017-10-30
URL	http://arxiv.org/abs/1710.11252v2
PDF	http://arxiv.org/pdf/1710.11252v2.pdf
PWC	https://paperswithcode.com/paper/stochastic-variational-video-prediction
Repo	https://github.com/StanfordVL/roboturk_real_dataset
Framework	tf

Statistical Anomaly Detection via Composite Hypothesis Testing for Markov Models


Title	Statistical Anomaly Detection via Composite Hypothesis Testing for Markov Models
Authors	Jing Zhang, Ioannis Ch. Paschalidis
Abstract	Under Markovian assumptions, we leverage a Central Limit Theorem (CLT) for the empirical measure in the test statistic of the composite hypothesis Hoeffding test so as to establish weak convergence results for the test statistic, and, thereby, derive a new estimator for the threshold needed by the test. We first show the advantages of our estimator over an existing estimator by conducting extensive numerical experiments. We find that our estimator controls better for false alarms while maintaining satisfactory detection probabilities. We then apply the Hoeffding test with our threshold estimator to detecting anomalies in two distinct applications domains: one in communication networks and the other in transportation networks. The former application seeks to enhance cyber security and the latter aims at building smarter transportation systems in cities.
Tasks	Anomaly Detection
Published	2017-02-27
URL	http://arxiv.org/abs/1702.08435v3
PDF	http://arxiv.org/pdf/1702.08435v3.pdf
PWC	https://paperswithcode.com/paper/statistical-anomaly-detection-via-composite
Repo	https://github.com/jingzbu/ROCHM
Framework	none

Whatever Does Not Kill Deep Reinforcement Learning, Makes It Stronger


Title	Whatever Does Not Kill Deep Reinforcement Learning, Makes It Stronger
Authors	Vahid Behzadan, Arslan Munir
Abstract	Recent developments have established the vulnerability of deep Reinforcement Learning (RL) to policy manipulation attacks via adversarial perturbations. In this paper, we investigate the robustness and resilience of deep RL to training-time and test-time attacks. Through experimental results, we demonstrate that under noncontiguous training-time attacks, Deep Q-Network (DQN) agents can recover and adapt to the adversarial conditions by reactively adjusting the policy. Our results also show that policies learned under adversarial perturbations are more robust to test-time attacks. Furthermore, we compare the performance of $\epsilon$-greedy and parameter-space noise exploration methods in terms of robustness and resilience against adversarial perturbations.
Tasks
Published	2017-12-23
URL	http://arxiv.org/abs/1712.09344v1
PDF	http://arxiv.org/pdf/1712.09344v1.pdf
PWC	https://paperswithcode.com/paper/whatever-does-not-kill-deep-reinforcement
Repo	https://github.com/behzadanksu/rlattack-dev
Framework	tf

Scene Graph Generation from Objects, Phrases and Region Captions


Title	Scene Graph Generation from Objects, Phrases and Region Captions
Authors	Yikang Li, Wanli Ouyang, Bolei Zhou, Kun Wang, Xiaogang Wang
Abstract	Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations, and other context information. In this work, to leverage the mutual connections across semantic levels, we propose a novel neural network model, termed as Multi-level Scene Description Network (denoted as MSDN), to solve the three vision tasks jointly in an end-to-end manner. Objects, phrases, and caption regions are first aligned with a dynamic graph based on their spatial and semantic connections. Then a feature refining structure is used to pass messages across the three levels of semantic tasks through the graph. We benchmark the learned model on three tasks, and show the joint learning across three tasks with our proposed method can bring mutual improvements over previous models. Particularly, on the scene graph generation task, our proposed method outperforms the state-of-art method with more than 3% margin.
Tasks	Graph Generation, Object Detection, Scene Graph Generation, Scene Understanding
Published	2017-07-31
URL	http://arxiv.org/abs/1707.09700v2
PDF	http://arxiv.org/pdf/1707.09700v2.pdf
PWC	https://paperswithcode.com/paper/scene-graph-generation-from-objects-phrases
Repo	https://github.com/yikang-li/MSDN
Framework	pytorch

Learning Deep and Compact Models for Gesture Recognition


Title	Learning Deep and Compact Models for Gesture Recognition
Authors	Koustav Mullick, Anoop M. Namboodiri
Abstract	We look at the problem of developing a compact and accurate model for gesture recognition from videos in a deep-learning framework. Towards this we propose a joint 3DCNN-LSTM model that is end-to-end trainable and is shown to be better suited to capture the dynamic information in actions. The solution achieves close to state-of-the-art accuracy on the ChaLearn dataset, with only half the model size. We also explore ways to derive a much more compact representation in a knowledge distillation framework followed by model compression. The final model is less than $1~MB$ in size, which is less than one hundredth of our initial model, with a drop of $7%$ in accuracy, and is suitable for real-time gesture recognition on mobile devices.
Tasks	Gesture Recognition, Model Compression
Published	2017-12-29
URL	http://arxiv.org/abs/1712.10136v1
PDF	http://arxiv.org/pdf/1712.10136v1.pdf
PWC	https://paperswithcode.com/paper/learning-deep-and-compact-models-for-gesture
Repo	https://github.com/chriswegmann/drone_steering
Framework	none

ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)


Title	ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)
Authors	Baoguang Shi, Cong Yao, Minghui Liao, Mingkun Yang, Pei Xu, Linyan Cui, Serge Belongie, Shijian Lu, Xiang Bai
Abstract	Chinese is the most widely used language in the world. Algorithms that read Chinese text in natural images facilitate applications of various kinds. Despite the large potential value, datasets and competitions in the past primarily focus on English, which bares very different characteristics than Chinese. This report introduces RCTW, a new competition that focuses on Chinese text reading. The competition features a large-scale dataset with 12,263 annotated images. Two tasks, namely text localization and end-to-end recognition, are set up. The competition took place from January 20 to May 31, 2017. 23 valid submissions were received from 19 teams. This report includes dataset description, task definitions, evaluation protocols, and results summaries and analysis. Through this competition, we call for more future research on the Chinese text reading problem. The official website for the competition is http://rctw.vlrlab.net
Tasks
Published	2017-08-31
URL	http://arxiv.org/abs/1708.09585v3
PDF	http://arxiv.org/pdf/1708.09585v3.pdf
PWC	https://paperswithcode.com/paper/icdar2017-competition-on-reading-chinese-text
Repo	https://github.com/OzHsu23/chineseocr
Framework	tf

Automated Conjecturing VII: The Graph Brain Project & Big Mathematics


Title	Automated Conjecturing VII: The Graph Brain Project & Big Mathematics
Authors	N. Bushaw, C. E. Larson, N. Van Cleemput
Abstract	The Graph Brain Project is an experiment in how the use of automated mathematical discovery software, databases, large collaboration, and systematic investigation provide a model for how mathematical research might proceed in the future. Our Project began with the development of a program that can be used to generate invariant-relation and property-relation conjectures in many areas of mathematics. This program can produce conjectures which are not implied by existing (published) theorems. Here we propose a new approach to push forward existing mathematical research goals—using automated mathematical discovery software. We suggest how to initiate and harness large-scale collaborative mathematics. We envision mathematical research labs similar to what exist in other sciences, new avenues for funding, new opportunities for training students, and a more efficient and effective use of published mathematical research. And our experiment in graph theory can be imitated in many other areas of mathematical research. Big Mathematics is the idea of large, systematic, collaborative research on problems of existing mathematical interest. What is possible when we put our skills, tools, and results together systematically?
Tasks
Published	2017-12-28
URL	http://arxiv.org/abs/1801.01814v1
PDF	http://arxiv.org/pdf/1801.01814v1.pdf
PWC	https://paperswithcode.com/paper/automated-conjecturing-vii-the-graph-brain
Repo	https://github.com/math1um/objects-invariants-properties
Framework	none

ChainerMN: Scalable Distributed Deep Learning Framework


Title	ChainerMN: Scalable Distributed Deep Learning Framework
Authors	Takuya Akiba, Keisuke Fukuda, Shuji Suzuki
Abstract	One of the keys for deep learning to have made a breakthrough in various fields was to utilize high computing powers centering around GPUs. Enabling the use of further computing abilities by distributed processing is essential not only to make the deep learning bigger and faster but also to tackle unsolved challenges. We present the design, implementation, and evaluation of ChainerMN, the distributed deep learning framework we have developed. We demonstrate that ChainerMN can scale the learning process of the ResNet-50 model to the ImageNet dataset up to 128 GPUs with the parallel efficiency of 90%.
Tasks
Published	2017-10-31
URL	http://arxiv.org/abs/1710.11351v1
PDF	http://arxiv.org/pdf/1710.11351v1.pdf
PWC	https://paperswithcode.com/paper/chainermn-scalable-distributed-deep-learning
Repo	https://github.com/chainer/chainermn
Framework	none

FacePoseNet: Making a Case for Landmark-Free Face Alignment


Title	FacePoseNet: Making a Case for Landmark-Free Face Alignment
Authors	Fengju Chang, Anh Tuan Tran, Tal Hassner, Iacopo Masi, Ram Nevatia, Gerard Medioni
Abstract	We show how a simple convolutional neural network (CNN) can be trained to accurately and robustly regress 6 degrees of freedom (6DoF) 3D head pose, directly from image intensities. We further explain how this FacePoseNet (FPN) can be used to align faces in 2D and 3D as an alternative to explicit facial landmark detection for these tasks. We claim that in many cases the standard means of measuring landmark detector accuracy can be misleading when comparing different face alignments. Instead, we compare our FPN with existing methods by evaluating how they affect face recognition accuracy on the IJB-A and IJB-B benchmarks: using the same recognition pipeline, but varying the face alignment method. Our results show that (a) better landmark detection accuracy measured on the 300W benchmark does not necessarily imply better face recognition accuracy. (b) Our FPN provides superior 2D and 3D face alignment on both benchmarks. Finally, (c), FPN aligns faces at a small fraction of the computational cost of comparably accurate landmark detectors. For many purposes, FPN is thus a far faster and far more accurate face alignment method than using facial landmark detectors.
Tasks	Face Alignment, Face Identification, Face Recognition, Face Verification, Facial Landmark Detection
Published	2017-08-24
URL	http://arxiv.org/abs/1708.07517v2
PDF	http://arxiv.org/pdf/1708.07517v2.pdf
PWC	https://paperswithcode.com/paper/faceposenet-making-a-case-for-landmark-free
Repo	https://github.com/fengju514/Expression-Net
Framework	tf

CHARDA: Causal Hybrid Automata Recovery via Dynamic Analysis


Title	CHARDA: Causal Hybrid Automata Recovery via Dynamic Analysis
Authors	Adam Summerville, Joseph Osborn, Michael Mateas
Abstract	We propose and evaluate a new technique for learning hybrid automata automatically by observing the runtime behavior of a dynamical system. Working from a sequence of continuous state values and predicates about the environment, CHARDA recovers the distinct dynamic modes, learns a model for each mode from a given set of templates, and postulates causal guard conditions which trigger transitions between modes. Our main contribution is the use of information-theoretic measures (1)~as a cost function for data segmentation and model selection to penalize over-fitting and (2)~to determine the likely causes of each transition. CHARDA is easily extended with different classes of model templates, fitting methods, or predicates. In our experiments on a complex videogame character, CHARDA successfully discovers a reasonable over-approximation of the character’s true behaviors. Our results also compare favorably against recent work in automatically learning probabilistic timed automata in an aircraft domain: CHARDA exactly learns the modes of these simpler automata.
Tasks	Model Selection
Published	2017-07-11
URL	http://arxiv.org/abs/1707.03336v1
PDF	http://arxiv.org/pdf/1707.03336v1.pdf
PWC	https://paperswithcode.com/paper/charda-causal-hybrid-automata-recovery-via
Repo	https://github.com/JoeOsborn/mechlearn
Framework	none

Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon


Title	Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon
Authors	Xin Dong, Shangyu Chen, Sinno Jialin Pan
Abstract	How to develop slim and accurate deep neural networks has become crucial for real- world applications, especially for those employed in embedded systems. Though previous work along this research line has shown some promising results, most existing methods either fail to significantly compress a well-trained deep network or require a heavy retraining process for the pruned deep network to re-boost its prediction performance. In this paper, we propose a new layer-wise pruning method for deep neural networks. In our proposed method, parameters of each individual layer are pruned independently based on second order derivatives of a layer-wise error function with respect to the corresponding parameters. We prove that the final prediction performance drop after pruning is bounded by a linear combination of the reconstructed errors caused at each layer. Therefore, there is a guarantee that one only needs to perform a light retraining process on the pruned network to resume its original prediction performance. We conduct extensive experiments on benchmark datasets to demonstrate the effectiveness of our pruning method compared with several state-of-the-art baseline methods.
Tasks
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07565v2
PDF	http://arxiv.org/pdf/1705.07565v2.pdf
PWC	https://paperswithcode.com/paper/learning-to-prune-deep-neural-networks-via
Repo	https://github.com/csyhhu/L-OBS
Framework	pytorch

Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction


Title	Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction
Authors	Kun Gai, Xiaoqiang Zhu, Han Li, Kai Liu, Zhe Wang
Abstract	CTR prediction in real-world business is a difficult machine learning problem with large scale nonlinear sparse data. In this paper, we introduce an industrial strength solution with model named Large Scale Piece-wise Linear Model (LS-PLM). We formulate the learning problem with $L_1$ and $L_{2,1}$ regularizers, leading to a non-convex and non-smooth optimization problem. Then, we propose a novel algorithm to solve it efficiently, based on directional derivatives and quasi-Newton method. In addition, we design a distributed system which can run on hundreds of machines parallel and provides us with the industrial scalability. LS-PLM model can capture nonlinear patterns from massive sparse data, saving us from heavy feature engineering jobs. Since 2012, LS-PLM has become the main CTR prediction model in Alibaba’s online display advertising system, serving hundreds of millions users every day.
Tasks	Click-Through Rate Prediction, Feature Engineering
Published	2017-04-18
URL	http://arxiv.org/abs/1704.05194v1
PDF	http://arxiv.org/pdf/1704.05194v1.pdf
PWC	https://paperswithcode.com/paper/learning-piece-wise-linear-models-from-large
Repo	https://github.com/shenweichen/DeepCTR-PyTorch
Framework	pytorch