Paper Group ANR 939
Conditional Information Gain Networks. How deep should be the depth of convolutional neural networks: a backyard dog case study. An Optimal Rewiring Strategy for Reinforcement Social Learning in Cooperative Multiagent Systems. Crowd Counting using Deep Recurrent Spatial-Aware Network. Pose-Based Two-Stream Relational Networks for Action Recognition …
Conditional Information Gain Networks
Title | Conditional Information Gain Networks |
Authors | Ufuk Can Biçici, Cem Keskin, Lale Akarun |
Abstract | Deep neural network models owe their representational power to the high number of learnable parameters. It is often infeasible to run these largely parametrized deep models in limited resource environments, like mobile phones. Network models employing conditional computing are able to reduce computational requirements while achieving high representational power, with their ability to model hierarchies. We propose Conditional Information Gain Networks, which allow the feed forward deep neural networks to execute conditionally, skipping parts of the model based on the sample and the decision mechanisms inserted in the architecture. These decision mechanisms are trained using cost functions based on differentiable Information Gain, inspired by the training procedures of decision trees. These information gain based decision mechanisms are differentiable and can be trained end-to-end using a unified framework with a general cost function, covering both classification and decision losses. We test the effectiveness of the proposed method on MNIST and recently introduced Fashion MNIST datasets and show that our information gain based conditional execution approach can achieve better or comparable classification results using significantly fewer parameters, compared to standard convolutional neural network baselines. |
Tasks | |
Published | 2018-07-25 |
URL | http://arxiv.org/abs/1807.09534v1 |
http://arxiv.org/pdf/1807.09534v1.pdf | |
PWC | https://paperswithcode.com/paper/conditional-information-gain-networks |
Repo | |
Framework | |
How deep should be the depth of convolutional neural networks: a backyard dog case study
Title | How deep should be the depth of convolutional neural networks: a backyard dog case study |
Authors | A. N. Gorban, E. M. Mirkes, I. Y. Tyukin |
Abstract | The work concerns the problem of reducing a pre-trained deep neuronal network to a smaller network, with just few layers, whilst retaining the network’s functionality on a given task The proposed approach is motivated by the observation that the aim to deliver the highest accuracy possible in the broadest range of operational conditions, which many deep neural networks models strive to achieve, may not necessarily be always needed, desired, or even achievable due to the lack of data or technical constraints. In relation to the face recognition problem, we formulated an example of such a usecase, the backyard dog' problem. The backyard dog’, implemented by a lean network, should correctly identify members from a limited group of individuals, a family', and should distinguish between them. At the same time, the network must produce an alarm to an image of an individual who is not in a member of the family. To produce such a network, we propose a shallowing algorithm. The algorithm takes an existing deep learning model on its input and outputs a shallowed version of it. The algorithm is non-iterative and is based on the Advanced Supervised Principal Component Analysis. Performance of the algorithm is assessed in exhaustive numerical experiments. In the above usecase, the backyard dog’ problem, the method is capable of drastically reducing the depth of deep learning neural networks, albeit at the cost of mild performance deterioration. We developed a simple non-iterative method for shallowing down pre-trained deep networks. The method is generic in the sense that it applies to a broad class of feed-forward networks, and is based on the Advanced Supervise Principal Component Analysis. The method enables generation of families of smaller-size shallower specialized networks tuned for specific operational conditions and tasks from a single larger and more universal legacy network. |
Tasks | Face Recognition |
Published | 2018-05-03 |
URL | https://arxiv.org/abs/1805.01516v3 |
https://arxiv.org/pdf/1805.01516v3.pdf | |
PWC | https://paperswithcode.com/paper/how-deep-should-be-the-depth-of-convolutional |
Repo | |
Framework | |
An Optimal Rewiring Strategy for Reinforcement Social Learning in Cooperative Multiagent Systems
Title | An Optimal Rewiring Strategy for Reinforcement Social Learning in Cooperative Multiagent Systems |
Authors | Hongyao Tang, Li Wang, Zan Wang, Tim Baarslag, Jianye Hao |
Abstract | Multiagent coordination in cooperative multiagent systems (MASs) has been widely studied in both fixed-agent repeated interaction setting and the static social learning framework. However, two aspects of dynamics in real-world multiagent scenarios are currently missing in existing works. First, the network topologies can be dynamic where agents may change their connections through rewiring during the course of interactions. Second, the game matrix between each pair of agents may not be static and usually not known as a prior. Both the network dynamic and game uncertainty increase the coordination difficulty among agents. In this paper, we consider a multiagent dynamic social learning environment in which each agent can choose to rewire potential partners and interact with randomly chosen neighbors in each round. We propose an optimal rewiring strategy for agents to select most beneficial peers to interact with for the purpose of maximizing the accumulated payoff in repeated interactions. We empirically demonstrate the effectiveness and robustness of our approach through comparing with benchmark strategies. The performance of three representative learning strategies under our social learning framework with our optimal rewiring is investigated as well. |
Tasks | |
Published | 2018-05-13 |
URL | http://arxiv.org/abs/1805.08588v1 |
http://arxiv.org/pdf/1805.08588v1.pdf | |
PWC | https://paperswithcode.com/paper/an-optimal-rewiring-strategy-for |
Repo | |
Framework | |
Crowd Counting using Deep Recurrent Spatial-Aware Network
Title | Crowd Counting using Deep Recurrent Spatial-Aware Network |
Authors | Lingbo Liu, Hongjun Wang, Guanbin Li, Wanli Ouyang, Liang Lin |
Abstract | Crowd counting from unconstrained scene images is a crucial task in many real-world applications like urban surveillance and management, but it is greatly challenged by the camera’s perspective that causes huge appearance variations in people’s scales and rotations. Conventional methods address such challenges by resorting to fixed multi-scale architectures that are often unable to cover the largely varied scales while ignoring the rotation variations. In this paper, we propose a unified neural network framework, named Deep Recurrent Spatial-Aware Network, which adaptively addresses the two issues in a learnable spatial transform module with a region-wise refinement process. Specifically, our framework incorporates a Recurrent Spatial-Aware Refinement (RSAR) module iteratively conducting two components: i) a Spatial Transformer Network that dynamically locates an attentional region from the crowd density map and transforms it to the suitable scale and rotation for optimal crowd estimation; ii) a Local Refinement Network that refines the density map of the attended region with residual learning. Extensive experiments on four challenging benchmarks show the effectiveness of our approach. Specifically, comparing with the existing best-performing methods, we achieve an improvement of 12% on the largest dataset WorldExpo’10 and 22.8% on the most challenging dataset UCF_CC_50. |
Tasks | Crowd Counting |
Published | 2018-07-02 |
URL | http://arxiv.org/abs/1807.00601v1 |
http://arxiv.org/pdf/1807.00601v1.pdf | |
PWC | https://paperswithcode.com/paper/crowd-counting-using-deep-recurrent-spatial |
Repo | |
Framework | |
Pose-Based Two-Stream Relational Networks for Action Recognition in Videos
Title | Pose-Based Two-Stream Relational Networks for Action Recognition in Videos |
Authors | Wei Wang, Jinjin Zhang, Chenyang Si, Liang Wang |
Abstract | Recently, pose-based action recognition has gained more and more attention due to the better performance compared with traditional appearance-based methods. However, there still exist two problems to be further solved. First, existing pose-based methods generally recognize human actions with captured 3D human poses which are very difficult to obtain in real scenarios. Second, few pose-based methods model the action-related objects in recognizing human-object interaction actions in which objects play an important role. To solve the problems above, we propose a pose-based two-stream relational network (PSRN) for action recognition. In PSRN, one stream models the temporal dynamics of the targeted 2D human pose sequences which are directly extracted from raw videos, and the other stream models the action-related objects from a randomly sampled video frame. Most importantly, instead of fusing two-streams in the class score layer as before, we propose a pose-object relational network to model the relationship between human poses and action-related objects. We evaluate the proposed PSRN on two challenging benchmarks, i.e., Sub-JHMDB and PennAction. Experimental results show that our PSRN obtains the state-the-of-art performance on Sub-JHMDB (80.2%) and PennAction (98.1%). Our work opens a new door to action recognition by combining 2D human pose extracted from raw video and image appearance. |
Tasks | Action Recognition In Videos, Human-Object Interaction Detection, Temporal Action Localization |
Published | 2018-05-22 |
URL | http://arxiv.org/abs/1805.08484v1 |
http://arxiv.org/pdf/1805.08484v1.pdf | |
PWC | https://paperswithcode.com/paper/pose-based-two-stream-relational-networks-for |
Repo | |
Framework | |
YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud
Title | YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud |
Authors | Waleed Ali, Sherif Abdelkarim, Mohamed Zahran, Mahmoud Zidan, Ahmad El Sallab |
Abstract | Object detection and classification in 3D is a key task in Automated Driving (AD). LiDAR sensors are employed to provide the 3D point cloud reconstruction of the surrounding environment, while the task of 3D object bounding box detection in real time remains a strong algorithmic challenge. In this paper, we build on the success of the one-shot regression meta-architecture in the 2D perspective image space and extend it to generate oriented 3D object bounding boxes from LiDAR point cloud. Our main contribution is in extending the loss function of YOLO v2 to include the yaw angle, the 3D box center in Cartesian coordinates and the height of the box as a direct regression problem. This formulation enables real-time performance, which is essential for automated driving. Our results are showing promising figures on KITTI benchmark, achieving real-time performance (40 fps) on Titan X GPU. |
Tasks | Object Detection |
Published | 2018-08-07 |
URL | http://arxiv.org/abs/1808.02350v1 |
http://arxiv.org/pdf/1808.02350v1.pdf | |
PWC | https://paperswithcode.com/paper/yolo3d-end-to-end-real-time-3d-oriented |
Repo | |
Framework | |
Reversing Two-Stream Networks with Decoding Discrepancy Penalty for Robust Action Recognition
Title | Reversing Two-Stream Networks with Decoding Discrepancy Penalty for Robust Action Recognition |
Authors | Yunbo Wang, Zhiyu Yao, Hongyu Zhu, Mingsheng Long, Jianmin Wang, Philip S Yu |
Abstract | We discuss the robustness and generalization ability in the realm of action recognition, showing that the mainstream neural networks are not robust to disordered frames and diverse video environments. There are two possible reasons: First, existing models lack an appropriate method to overcome the inevitable decision discrepancy between multiple streams with different input modalities. Second, by doing cross-dataset experiments, we find that the optical flow features are hard to be transferred, which affects the generalization ability of the two-stream neural networks. For robust action recognition, we present the Reversed Two-Stream Networks (Rev2Net) which has three properties: (1) It could learn more transferable, robust video features by reversing the multi-modality inputs as training supervisions. It outperforms all other compared models in challenging frames shuffle experiments and cross-dataset experiments. (2) It is highlighted by an adaptive, collaborative multi-task learning approach that is applied between decoders to penalize their disagreement in the deep feature space. We name it the decoding discrepancy penalty (DDP). (3) As the decoder streams will be removed at test time, Rev2Net makes recognition decisions purely based on raw video frames. Rev2Net achieves the best results in the cross-dataset settings and competitive results on classic action recognition tasks: 94.6% for UCF-101, 71.1% for HMDB-51 and 73.3% for Kinetics. It performs even better than most methods who take extra inputs beyond raw RGB frames. |
Tasks | Multi-Task Learning, Optical Flow Estimation, Temporal Action Localization |
Published | 2018-11-20 |
URL | http://arxiv.org/abs/1811.08362v1 |
http://arxiv.org/pdf/1811.08362v1.pdf | |
PWC | https://paperswithcode.com/paper/reversing-two-stream-networks-with-decoding |
Repo | |
Framework | |
Interpretation of Natural Language Rules in Conversational Machine Reading
Title | Interpretation of Natural Language Rules in Conversational Machine Reading |
Authors | Marzieh Saeidi, Max Bartolo, Patrick Lewis, Sameer Singh, Tim Rocktäschel, Mike Sheldon, Guillaume Bouchard, Sebastian Riedel |
Abstract | Most work in machine reading focuses on question answering problems where the answer is directly expressed in the text to read. However, many real-world question answering problems require the reading of text not because it contains the literal answer, but because it contains a recipe to derive an answer together with the reader’s background knowledge. One example is the task of interpreting regulations to answer “Can I…?” or “Do I have to…?” questions such as “I am working in Canada. Do I have to carry on paying UK National Insurance?” after reading a UK government website about this topic. This task requires both the interpretation of rules and the application of background knowledge. It is further complicated due to the fact that, in practice, most questions are underspecified, and a human assistant will regularly have to ask clarification questions such as “How long have you been working abroad?” when the answer cannot be directly derived from the question and text. In this paper, we formalise this task and develop a crowd-sourcing strategy to collect 32k task instances based on real-world rules and crowd-generated questions and scenarios. We analyse the challenges of this task and assess its difficulty by evaluating the performance of rule-based and machine-learning baselines. We observe promising results when no background knowledge is necessary, and substantial room for improvement whenever background knowledge is needed. |
Tasks | Question Answering, Reading Comprehension |
Published | 2018-08-28 |
URL | http://arxiv.org/abs/1809.01494v1 |
http://arxiv.org/pdf/1809.01494v1.pdf | |
PWC | https://paperswithcode.com/paper/interpretation-of-natural-language-rules-in |
Repo | |
Framework | |
Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition
Title | Compressing Recurrent Neural Networks with Tensor Ring for Action Recognition |
Authors | Yu Pan, Jing Xu, Maolin Wang, Jinmian Ye, Fei Wang, Kun Bai, Zenglin Xu |
Abstract | Recurrent Neural Networks (RNNs) and their variants, such as Long-Short Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks, have achieved promising performance in sequential data modeling. The hidden layers in RNNs can be regarded as the memory units, which are helpful in storing information in sequential contexts. However, when dealing with high dimensional input data, such as video and text, the input-to-hidden linear transformation in RNNs brings high memory usage and huge computational cost. This makes the training of RNNs unscalable and difficult. To address this challenge, we propose a novel compact LSTM model, named as TR-LSTM, by utilizing the low-rank tensor ring decomposition (TRD) to reformulate the input-to-hidden transformation. Compared with other tensor decomposition methods, TR-LSTM is more stable. In addition, TR-LSTM can complete an end-to-end training and also provide a fundamental building block for RNNs in handling large input data. Experiments on real-world action recognition datasets have demonstrated the promising performance of the proposed TR-LSTM compared with the tensor train LSTM and other state-of-the-art competitors. |
Tasks | Temporal Action Localization |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.07503v1 |
http://arxiv.org/pdf/1811.07503v1.pdf | |
PWC | https://paperswithcode.com/paper/compressing-recurrent-neural-networks-with |
Repo | |
Framework | |
Defeasible Reasoning in SROEL: from Rational Entailment to Rational Closure
Title | Defeasible Reasoning in SROEL: from Rational Entailment to Rational Closure |
Authors | Laura Giordano, Daniele Theseider Dupré |
Abstract | In this work we study a rational extension $SROEL^R T$ of the low complexity description logic SROEL, which underlies the OWL EL ontology language. The extension involves a typicality operator T, whose semantics is based on Lehmann and Magidor’s ranked models and allows for the definition of defeasible inclusions. We consider both rational entailment and minimal entailment. We show that deciding instance checking under minimal entailment is in general $\Pi^P_2$-hard, while, under rational entailment, instance checking can be computed in polynomial time. We develop a Datalog calculus for instance checking under rational entailment and exploit it, with stratified negation, for computing the rational closure of simple KBs in polynomial time. |
Tasks | |
Published | 2018-03-23 |
URL | http://arxiv.org/abs/1803.08885v1 |
http://arxiv.org/pdf/1803.08885v1.pdf | |
PWC | https://paperswithcode.com/paper/defeasible-reasoning-in-sroel-from-rational |
Repo | |
Framework | |
Measurement-based adaptation protocol with quantum reinforcement learning in a Rigetti quantum computer
Title | Measurement-based adaptation protocol with quantum reinforcement learning in a Rigetti quantum computer |
Authors | J. Olivares-Sánchez, J. Casanova, E. Solano, L. Lamata |
Abstract | We present an experimental realization of a measurement-based adaptation protocol with quantum reinforcement learning in a Rigetti cloud quantum computer. The experiment in this few-qubit superconducting chip faithfully reproduces the theoretical proposal, setting the first steps towards a semiautonomous quantum agent. This experiment paves the way towards quantum reinforcement learning with superconducting circuits. |
Tasks | |
Published | 2018-11-19 |
URL | http://arxiv.org/abs/1811.07594v1 |
http://arxiv.org/pdf/1811.07594v1.pdf | |
PWC | https://paperswithcode.com/paper/measurement-based-adaptation-protocol-with |
Repo | |
Framework | |
Leveraging Random Label Memorization for Unsupervised Pre-Training
Title | Leveraging Random Label Memorization for Unsupervised Pre-Training |
Authors | Vinaychandran Pondenkandath, Michele Alberti, Sammer Puran, Rolf Ingold, Marcus Liwicki |
Abstract | We present a novel approach to leverage large unlabeled datasets by pre-training state-of-the-art deep neural networks on randomly-labeled datasets. Specifically, we train the neural networks to memorize arbitrary labels for all the samples in a dataset and use these pre-trained networks as a starting point for regular supervised learning. Our assumption is that the “memorization infrastructure” learned by the network during the random-label training proves to be beneficial for the conventional supervised learning as well. We test the effectiveness of our pre-training on several video action recognition datasets (HMDB51, UCF101, Kinetics) by comparing the results of the same network with and without the random label pre-training. Our approach yields an improvement - ranging from 1.5% on UCF-101 to 5% on Kinetics - in classification accuracy, which calls for further research in this direction. |
Tasks | Temporal Action Localization |
Published | 2018-11-05 |
URL | http://arxiv.org/abs/1811.01640v1 |
http://arxiv.org/pdf/1811.01640v1.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-random-label-memorization-for |
Repo | |
Framework | |
Towards Adversarial Training with Moderate Performance Improvement for Neural Network Classification
Title | Towards Adversarial Training with Moderate Performance Improvement for Neural Network Classification |
Authors | Xinhan Di, Pengqian Yu, Meng Tian |
Abstract | It has been demonstrated that deep neural networks are prone to noisy examples particular adversarial samples during inference process. The gap between robust deep learning systems in real world applications and vulnerable neural networks is still large. Current adversarial training strategies improve the robustness against adversarial samples. However, these methods lead to accuracy reduction when the input examples are clean thus hinders the practicability. In this paper, we investigate an approach that protects the neural network classification from the adversarial samples and improves its accuracy when the input examples are clean. We demonstrate the versatility and effectiveness of our proposed approach on a variety of different networks and datasets. |
Tasks | |
Published | 2018-07-01 |
URL | http://arxiv.org/abs/1807.00340v1 |
http://arxiv.org/pdf/1807.00340v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-adversarial-training-with-moderate |
Repo | |
Framework | |
Escaping Plato’s Cave: 3D Shape From Adversarial Rendering
Title | Escaping Plato’s Cave: 3D Shape From Adversarial Rendering |
Authors | Philipp Henzler, Niloy Mitra, Tobias Ritschel |
Abstract | We introduce PlatonicGAN to discover the 3D structure of an object class from an unstructured collection of 2D images, i.e., where no relation between photos is known, except that they are showing instances of the same category. The key idea is to train a deep neural network to generate 3D shapes which, when rendered to images, are indistinguishable from ground truth images (for a discriminator) under various camera poses. Discriminating 2D images instead of 3D shapes allows tapping into unstructured 2D photo collections instead of relying on curated (e.g., aligned, annotated, etc.) 3D data sets. To establish constraints between 2D image observation and their 3D interpretation, we suggest a family of rendering layers that are effectively differentiable. This family includes visual hull, absorption-only (akin to x-ray), and emission-absorption. We can successfully reconstruct 3D shapes from unstructured 2D images and extensively evaluate PlatonicGAN on a range of synthetic and real data sets achieving consistent improvements over baseline methods. We further show that PlatonicGAN can be combined with 3D supervision to improve on and in some cases even surpass the quality of 3D-supervised methods. |
Tasks | |
Published | 2018-11-28 |
URL | https://arxiv.org/abs/1811.11606v3 |
https://arxiv.org/pdf/1811.11606v3.pdf | |
PWC | https://paperswithcode.com/paper/escaping-platos-cave-using-adversarial |
Repo | |
Framework | |
Dynamic Hierarchical Empirical Bayes: A Predictive Model Applied to Online Advertising
Title | Dynamic Hierarchical Empirical Bayes: A Predictive Model Applied to Online Advertising |
Authors | Yuan Yuan, Xiaojing Dong, Chen Dong, Yiwen Sun, Zhenyu Yan, Abhishek Pani |
Abstract | Predicting keywords performance, such as number of impressions, click-through rate (CTR), conversion rate (CVR), revenue per click (RPC), and cost per click (CPC), is critical for sponsored search in the online advertising industry. An interesting phenomenon is that, despite the size of the overall data, the data are very sparse at the individual unit level. To overcome the sparsity and leverage hierarchical information across the data structure, we propose a Dynamic Hierarchical Empirical Bayesian (DHEB) model that dynamically determines the hierarchy through a data-driven process and provides shrinkage-based estimations. Our method is also equipped with an efficient empirical approach to derive inferences through the hierarchy. We evaluate the proposed method in both simulated and real-world datasets and compare to several competitive models. The results favor the proposed method among all comparisons in terms of both accuracy and efficiency. In the end, we design a two-phase system to serve prediction in real time. |
Tasks | |
Published | 2018-09-06 |
URL | http://arxiv.org/abs/1809.02213v1 |
http://arxiv.org/pdf/1809.02213v1.pdf | |
PWC | https://paperswithcode.com/paper/dynamic-hierarchical-empirical-bayes-a |
Repo | |
Framework | |