October 17, 2019

3293 words 16 mins read

Paper Group ANR 939

Conditional Information Gain Networks. How deep should be the depth of convolutional neural networks: a backyard dog case study. An Optimal Rewiring Strategy for Reinforcement Social Learning in Cooperative Multiagent Systems. Crowd Counting using Deep Recurrent Spatial-Aware Network. Pose-Based Two-Stream Relational Networks for Action Recognition …

Conditional Information Gain Networks


Title	Conditional Information Gain Networks
Authors	Ufuk Can Biçici, Cem Keskin, Lale Akarun
Abstract	Deep neural network models owe their representational power to the high number of learnable parameters. It is often infeasible to run these largely parametrized deep models in limited resource environments, like mobile phones. Network models employing conditional computing are able to reduce computational requirements while achieving high representational power, with their ability to model hierarchies. We propose Conditional Information Gain Networks, which allow the feed forward deep neural networks to execute conditionally, skipping parts of the model based on the sample and the decision mechanisms inserted in the architecture. These decision mechanisms are trained using cost functions based on differentiable Information Gain, inspired by the training procedures of decision trees. These information gain based decision mechanisms are differentiable and can be trained end-to-end using a unified framework with a general cost function, covering both classification and decision losses. We test the effectiveness of the proposed method on MNIST and recently introduced Fashion MNIST datasets and show that our information gain based conditional execution approach can achieve better or comparable classification results using significantly fewer parameters, compared to standard convolutional neural network baselines.
Tasks
Published	2018-07-25
URL	http://arxiv.org/abs/1807.09534v1
PDF	http://arxiv.org/pdf/1807.09534v1.pdf
PWC	https://paperswithcode.com/paper/conditional-information-gain-networks
Repo
Framework

How deep should be the depth of convolutional neural networks: a backyard dog case study


Title	How deep should be the depth of convolutional neural networks: a backyard dog case study
Authors	A. N. Gorban, E. M. Mirkes, I. Y. Tyukin
Abstract	The work concerns the problem of reducing a pre-trained deep neuronal network to a smaller network, with just few layers, whilst retaining the network’s functionality on a given task The proposed approach is motivated by the observation that the aim to deliver the highest accuracy possible in the broadest range of operational conditions, which many deep neural networks models strive to achieve, may not necessarily be always needed, desired, or even achievable due to the lack of data or technical constraints. In relation to the face recognition problem, we formulated an example of such a usecase, the `backyard dog' problem. The` backyard dog’, implemented by a lean network, should correctly identify members from a limited group of individuals, a family', and should distinguish between them. At the same time, the network must produce an alarm to an image of an individual who is not in a member of the family. To produce such a network, we propose a shallowing algorithm. The algorithm takes an existing deep learning model on its input and outputs a shallowed version of it. The algorithm is non-iterative and is based on the Advanced Supervised Principal Component Analysis. Performance of the algorithm is assessed in exhaustive numerical experiments. In the above usecase, the backyard dog’ problem, the method is capable of drastically reducing the depth of deep learning neural networks, albeit at the cost of mild performance deterioration. We developed a simple non-iterative method for shallowing down pre-trained deep networks. The method is generic in the sense that it applies to a broad class of feed-forward networks, and is based on the Advanced Supervise Principal Component Analysis. The method enables generation of families of smaller-size shallower specialized networks tuned for specific operational conditions and tasks from a single larger and more universal legacy network.
Tasks	Face Recognition
Published	2018-05-03
URL	https://arxiv.org/abs/1805.01516v3
PDF	https://arxiv.org/pdf/1805.01516v3.pdf
PWC	https://paperswithcode.com/paper/how-deep-should-be-the-depth-of-convolutional
Repo
Framework


Title	An Optimal Rewiring Strategy for Reinforcement Social Learning in Cooperative Multiagent Systems
Authors	Hongyao Tang, Li Wang, Zan Wang, Tim Baarslag, Jianye Hao
Abstract	Multiagent coordination in cooperative multiagent systems (MASs) has been widely studied in both fixed-agent repeated interaction setting and the static social learning framework. However, two aspects of dynamics in real-world multiagent scenarios are currently missing in existing works. First, the network topologies can be dynamic where agents may change their connections through rewiring during the course of interactions. Second, the game matrix between each pair of agents may not be static and usually not known as a prior. Both the network dynamic and game uncertainty increase the coordination difficulty among agents. In this paper, we consider a multiagent dynamic social learning environment in which each agent can choose to rewire potential partners and interact with randomly chosen neighbors in each round. We propose an optimal rewiring strategy for agents to select most beneficial peers to interact with for the purpose of maximizing the accumulated payoff in repeated interactions. We empirically demonstrate the effectiveness and robustness of our approach through comparing with benchmark strategies. The performance of three representative learning strategies under our social learning framework with our optimal rewiring is investigated as well.
Tasks
Published	2018-05-13
URL	http://arxiv.org/abs/1805.08588v1
PDF	http://arxiv.org/pdf/1805.08588v1.pdf
PWC	https://paperswithcode.com/paper/an-optimal-rewiring-strategy-for
Repo
Framework

Crowd Counting using Deep Recurrent Spatial-Aware Network


Title	Crowd Counting using Deep Recurrent Spatial-Aware Network
Authors	Lingbo Liu, Hongjun Wang, Guanbin Li, Wanli Ouyang, Liang Lin
Abstract	Crowd counting from unconstrained scene images is a crucial task in many real-world applications like urban surveillance and management, but it is greatly challenged by the camera’s perspective that causes huge appearance variations in people’s scales and rotations. Conventional methods address such challenges by resorting to fixed multi-scale architectures that are often unable to cover the largely varied scales while ignoring the rotation variations. In this paper, we propose a unified neural network framework, named Deep Recurrent Spatial-Aware Network, which adaptively addresses the two issues in a learnable spatial transform module with a region-wise refinement process. Specifically, our framework incorporates a Recurrent Spatial-Aware Refinement (RSAR) module iteratively conducting two components: i) a Spatial Transformer Network that dynamically locates an attentional region from the crowd density map and transforms it to the suitable scale and rotation for optimal crowd estimation; ii) a Local Refinement Network that refines the density map of the attended region with residual learning. Extensive experiments on four challenging benchmarks show the effectiveness of our approach. Specifically, comparing with the existing best-performing methods, we achieve an improvement of 12% on the largest dataset WorldExpo’10 and 22.8% on the most challenging dataset UCF_CC_50.
Tasks	Crowd Counting
Published	2018-07-02
URL	http://arxiv.org/abs/1807.00601v1
PDF	http://arxiv.org/pdf/1807.00601v1.pdf
PWC	https://paperswithcode.com/paper/crowd-counting-using-deep-recurrent-spatial
Repo
Framework

Pose-Based Two-Stream Relational Networks for Action Recognition in Videos


Title	Pose-Based Two-Stream Relational Networks for Action Recognition in Videos
Authors	Wei Wang, Jinjin Zhang, Chenyang Si, Liang Wang
Abstract	Recently, pose-based action recognition has gained more and more attention due to the better performance compared with traditional appearance-based methods. However, there still exist two problems to be further solved. First, existing pose-based methods generally recognize human actions with captured 3D human poses which are very difficult to obtain in real scenarios. Second, few pose-based methods model the action-related objects in recognizing human-object interaction actions in which objects play an important role. To solve the problems above, we propose a pose-based two-stream relational network (PSRN) for action recognition. In PSRN, one stream models the temporal dynamics of the targeted 2D human pose sequences which are directly extracted from raw videos, and the other stream models the action-related objects from a randomly sampled video frame. Most importantly, instead of fusing two-streams in the class score layer as before, we propose a pose-object relational network to model the relationship between human poses and action-related objects. We evaluate the proposed PSRN on two challenging benchmarks, i.e., Sub-JHMDB and PennAction. Experimental results show that our PSRN obtains the state-the-of-art performance on Sub-JHMDB (80.2%) and PennAction (98.1%). Our work opens a new door to action recognition by combining 2D human pose extracted from raw video and image appearance.
Tasks	Action Recognition In Videos, Human-Object Interaction Detection, Temporal Action Localization
Published	2018-05-22
URL	http://arxiv.org/abs/1805.08484v1
PDF	http://arxiv.org/pdf/1805.08484v1.pdf
PWC	https://paperswithcode.com/paper/pose-based-two-stream-relational-networks-for
Repo
Framework

YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud


Title	YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud
Authors	Waleed Ali, Sherif Abdelkarim, Mohamed Zahran, Mahmoud Zidan, Ahmad El Sallab
Abstract	Object detection and classification in 3D is a key task in Automated Driving (AD). LiDAR sensors are employed to provide the 3D point cloud reconstruction of the surrounding environment, while the task of 3D object bounding box detection in real time remains a strong algorithmic challenge. In this paper, we build on the success of the one-shot regression meta-architecture in the 2D perspective image space and extend it to generate oriented 3D object bounding boxes from LiDAR point cloud. Our main contribution is in extending the loss function of YOLO v2 to include the yaw angle, the 3D box center in Cartesian coordinates and the height of the box as a direct regression problem. This formulation enables real-time performance, which is essential for automated driving. Our results are showing promising figures on KITTI benchmark, achieving real-time performance (40 fps) on Titan X GPU.
Tasks	Object Detection
Published	2018-08-07
URL	http://arxiv.org/abs/1808.02350v1
PDF	http://arxiv.org/pdf/1808.02350v1.pdf
PWC	https://paperswithcode.com/paper/yolo3d-end-to-end-real-time-3d-oriented
Repo
Framework

Reversing Two-Stream Networks with Decoding Discrepancy Penalty for Robust Action Recognition


Title	Reversing Two-Stream Networks with Decoding Discrepancy Penalty for Robust Action Recognition
Authors	Yunbo Wang, Zhiyu Yao, Hongyu Zhu, Mingsheng Long, Jianmin Wang, Philip S Yu
Abstract	We discuss the robustness and generalization ability in the realm of action recognition, showing that the mainstream neural networks are not robust to disordered frames and diverse video environments. There are two possible reasons: First, existing models lack an appropriate method to overcome the inevitable decision discrepancy between multiple streams with different input modalities. Second, by doing cross-dataset experiments, we find that the optical flow features are hard to be transferred, which affects the generalization ability of the two-stream neural networks. For robust action recognition, we present the Reversed Two-Stream Networks (Rev2Net) which has three properties: (1) It could learn more transferable, robust video features by reversing the multi-modality inputs as training supervisions. It outperforms all other compared models in challenging frames shuffle experiments and cross-dataset experiments. (2) It is highlighted by an adaptive, collaborative multi-task learning approach that is applied between decoders to penalize their disagreement in the deep feature space. We name it the decoding discrepancy penalty (DDP). (3) As the decoder streams will be removed at test time, Rev2Net makes recognition decisions purely based on raw video frames. Rev2Net achieves the best results in the cross-dataset settings and competitive results on classic action recognition tasks: 94.6% for UCF-101, 71.1% for HMDB-51 and 73.3% for Kinetics. It performs even better than most methods who take extra inputs beyond raw RGB frames.
Tasks	Multi-Task Learning, Optical Flow Estimation, Temporal Action Localization
Published	2018-11-20
URL	http://arxiv.org/abs/1811.08362v1
PDF	http://arxiv.org/pdf/1811.08362v1.pdf
PWC	https://paperswithcode.com/paper/reversing-two-stream-networks-with-decoding
Repo
Framework

Interpretation of Natural Language Rules in Conversational Machine Reading


Title	Interpretation of Natural Language Rules in Conversational Machine Reading
Authors	Marzieh Saeidi, Max Bartolo, Patrick Lewis, Sameer Singh, Tim Rocktäschel, Mike Sheldon, Guillaume Bouchard, Sebastian Riedel
Abstract	Most work in machine reading focuses on question answering problems where the answer is directly expressed in the text to read. However, many real-world question answering problems require the reading of text not because it contains the literal answer, but because it contains a recipe to derive an answer together with the reader’s background knowledge. One example is the task of interpreting regulations to answer “Can I…?” or “Do I have to…?” questions such as “I am working in Canada. Do I have to carry on paying UK National Insurance?” after reading a UK government website about this topic. This task requires both the interpretation of rules and the application of background knowledge. It is further complicated due to the fact that, in practice, most questions are underspecified, and a human assistant will regularly have to ask clarification questions such as “How long have you been working abroad?” when the answer cannot be directly derived from the question and text. In this paper, we formalise this task and develop a crowd-sourcing strategy to collect 32k task instances based on real-world rules and crowd-generated questions and scenarios. We analyse the challenges of this task and assess its difficulty by evaluating the performance of rule-based and machine-learning baselines. We observe promising results when no background knowledge is necessary, and substantial room for improvement whenever background knowledge is needed.
Tasks	Question Answering, Reading Comprehension
Published	2018-08-28
URL	http://arxiv.org/abs/1809.01494v1
PDF	http://arxiv.org/pdf/1809.01494v1.pdf
PWC	https://paperswithcode.com/paper/interpretation-of-natural-language-rules-in
Repo
Framework

Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition


Title	Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition
Authors	Yu Pan, Jing Xu, Maolin Wang, Jinmian Ye, Fei Wang, Kun Bai, Zenglin Xu
Abstract	Recurrent Neural Networks (RNNs) and their variants, such as Long-Short Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks, have achieved promising performance in sequential data modeling. The hidden layers in RNNs can be regarded as the memory units, which are helpful in storing information in sequential contexts. However, when dealing with high dimensional input data, such as video and text, the input-to-hidden linear transformation in RNNs brings high memory usage and huge computational cost. This makes the training of RNNs unscalable and difficult. To address this challenge, we propose a novel compact LSTM model, named as TR-LSTM, by utilizing the low-rank tensor ring decomposition (TRD) to reformulate the input-to-hidden transformation. Compared with other tensor decomposition methods, TR-LSTM is more stable. In addition, TR-LSTM can complete an end-to-end training and also provide a fundamental building block for RNNs in handling large input data. Experiments on real-world action recognition datasets have demonstrated the promising performance of the proposed TR-LSTM compared with the tensor train LSTM and other state-of-the-art competitors.
Tasks	Temporal Action Localization
Published	2018-11-19
URL	http://arxiv.org/abs/1811.07503v1
PDF	http://arxiv.org/pdf/1811.07503v1.pdf
PWC	https://paperswithcode.com/paper/compressing-recurrent-neural-networks-with
Repo
Framework

Defeasible Reasoning in SROEL: from Rational Entailment to Rational Closure


Title	Defeasible Reasoning in SROEL: from Rational Entailment to Rational Closure
Authors	Laura Giordano, Daniele Theseider Dupré
Abstract	In this work we study a rational extension $SROEL^R T$ of the low complexity description logic SROEL, which underlies the OWL EL ontology language. The extension involves a typicality operator T, whose semantics is based on Lehmann and Magidor’s ranked models and allows for the definition of defeasible inclusions. We consider both rational entailment and minimal entailment. We show that deciding instance checking under minimal entailment is in general $\Pi^P_2$-hard, while, under rational entailment, instance checking can be computed in polynomial time. We develop a Datalog calculus for instance checking under rational entailment and exploit it, with stratified negation, for computing the rational closure of simple KBs in polynomial time.
Tasks
Published	2018-03-23
URL	http://arxiv.org/abs/1803.08885v1
PDF	http://arxiv.org/pdf/1803.08885v1.pdf
PWC	https://paperswithcode.com/paper/defeasible-reasoning-in-sroel-from-rational
Repo
Framework

Measurement-based adaptation protocol with quantum reinforcement learning in a Rigetti quantum computer


Title	Measurement-based adaptation protocol with quantum reinforcement learning in a Rigetti quantum computer
Authors	J. Olivares-Sánchez, J. Casanova, E. Solano, L. Lamata
Abstract	We present an experimental realization of a measurement-based adaptation protocol with quantum reinforcement learning in a Rigetti cloud quantum computer. The experiment in this few-qubit superconducting chip faithfully reproduces the theoretical proposal, setting the first steps towards a semiautonomous quantum agent. This experiment paves the way towards quantum reinforcement learning with superconducting circuits.
Tasks
Published	2018-11-19
URL	http://arxiv.org/abs/1811.07594v1
PDF	http://arxiv.org/pdf/1811.07594v1.pdf
PWC	https://paperswithcode.com/paper/measurement-based-adaptation-protocol-with
Repo
Framework

Leveraging Random Label Memorization for Unsupervised Pre-Training


Title	Leveraging Random Label Memorization for Unsupervised Pre-Training
Authors	Vinaychandran Pondenkandath, Michele Alberti, Sammer Puran, Rolf Ingold, Marcus Liwicki
Abstract	We present a novel approach to leverage large unlabeled datasets by pre-training state-of-the-art deep neural networks on randomly-labeled datasets. Specifically, we train the neural networks to memorize arbitrary labels for all the samples in a dataset and use these pre-trained networks as a starting point for regular supervised learning. Our assumption is that the “memorization infrastructure” learned by the network during the random-label training proves to be beneficial for the conventional supervised learning as well. We test the effectiveness of our pre-training on several video action recognition datasets (HMDB51, UCF101, Kinetics) by comparing the results of the same network with and without the random label pre-training. Our approach yields an improvement - ranging from 1.5% on UCF-101 to 5% on Kinetics - in classification accuracy, which calls for further research in this direction.
Tasks	Temporal Action Localization
Published	2018-11-05
URL	http://arxiv.org/abs/1811.01640v1
PDF	http://arxiv.org/pdf/1811.01640v1.pdf
PWC	https://paperswithcode.com/paper/leveraging-random-label-memorization-for
Repo
Framework

Towards Adversarial Training with Moderate Performance Improvement for Neural Network Classification


Title	Towards Adversarial Training with Moderate Performance Improvement for Neural Network Classification
Authors	Xinhan Di, Pengqian Yu, Meng Tian
Abstract	It has been demonstrated that deep neural networks are prone to noisy examples particular adversarial samples during inference process. The gap between robust deep learning systems in real world applications and vulnerable neural networks is still large. Current adversarial training strategies improve the robustness against adversarial samples. However, these methods lead to accuracy reduction when the input examples are clean thus hinders the practicability. In this paper, we investigate an approach that protects the neural network classification from the adversarial samples and improves its accuracy when the input examples are clean. We demonstrate the versatility and effectiveness of our proposed approach on a variety of different networks and datasets.
Tasks
Published	2018-07-01
URL	http://arxiv.org/abs/1807.00340v1
PDF	http://arxiv.org/pdf/1807.00340v1.pdf
PWC	https://paperswithcode.com/paper/towards-adversarial-training-with-moderate
Repo
Framework

Escaping Plato’s Cave: 3D Shape From Adversarial Rendering


Title	Escaping Plato’s Cave: 3D Shape From Adversarial Rendering
Authors	Philipp Henzler, Niloy Mitra, Tobias Ritschel
Abstract	We introduce PlatonicGAN to discover the 3D structure of an object class from an unstructured collection of 2D images, i.e., where no relation between photos is known, except that they are showing instances of the same category. The key idea is to train a deep neural network to generate 3D shapes which, when rendered to images, are indistinguishable from ground truth images (for a discriminator) under various camera poses. Discriminating 2D images instead of 3D shapes allows tapping into unstructured 2D photo collections instead of relying on curated (e.g., aligned, annotated, etc.) 3D data sets. To establish constraints between 2D image observation and their 3D interpretation, we suggest a family of rendering layers that are effectively differentiable. This family includes visual hull, absorption-only (akin to x-ray), and emission-absorption. We can successfully reconstruct 3D shapes from unstructured 2D images and extensively evaluate PlatonicGAN on a range of synthetic and real data sets achieving consistent improvements over baseline methods. We further show that PlatonicGAN can be combined with 3D supervision to improve on and in some cases even surpass the quality of 3D-supervised methods.
Tasks
Published	2018-11-28
URL	https://arxiv.org/abs/1811.11606v3
PDF	https://arxiv.org/pdf/1811.11606v3.pdf
PWC	https://paperswithcode.com/paper/escaping-platos-cave-using-adversarial
Repo
Framework

Dynamic Hierarchical Empirical Bayes: A Predictive Model Applied to Online Advertising


Title	Dynamic Hierarchical Empirical Bayes: A Predictive Model Applied to Online Advertising
Authors	Yuan Yuan, Xiaojing Dong, Chen Dong, Yiwen Sun, Zhenyu Yan, Abhishek Pani
Abstract	Predicting keywords performance, such as number of impressions, click-through rate (CTR), conversion rate (CVR), revenue per click (RPC), and cost per click (CPC), is critical for sponsored search in the online advertising industry. An interesting phenomenon is that, despite the size of the overall data, the data are very sparse at the individual unit level. To overcome the sparsity and leverage hierarchical information across the data structure, we propose a Dynamic Hierarchical Empirical Bayesian (DHEB) model that dynamically determines the hierarchy through a data-driven process and provides shrinkage-based estimations. Our method is also equipped with an efficient empirical approach to derive inferences through the hierarchy. We evaluate the proposed method in both simulated and real-world datasets and compare to several competitive models. The results favor the proposed method among all comparisons in terms of both accuracy and efficiency. In the end, we design a two-phase system to serve prediction in real time.
Tasks
Published	2018-09-06
URL	http://arxiv.org/abs/1809.02213v1
PDF	http://arxiv.org/pdf/1809.02213v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-hierarchical-empirical-bayes-a
Repo
Framework