Paper Group ANR 769
EV-SegNet: Semantic Segmentation for Event-based Cameras. Training Recurrent Neural Networks via Dynamical Trajectory-Based Optimization. Multi-modal Feedback for Affordance-driven Interactive Reinforcement Learning. Abdominal Aortic Aneurysm Segmentation with a Small Number of Training Subjects. A Smart Security System with Face Recognition. Near- …
EV-SegNet: Semantic Segmentation for Event-based Cameras
Title | EV-SegNet: Semantic Segmentation for Event-based Cameras |
Authors | Iñigo Alonso, Ana C. Murillo |
Abstract | Event cameras, or Dynamic Vision Sensor (DVS), are very promising sensors which have shown several advantages over frame based cameras. However, most recent work on real applications of these cameras is focused on 3D reconstruction and 6-DOF camera tracking. Deep learning based approaches, which are leading the state-of-the-art in visual recognition tasks, could potentially take advantage of the benefits of DVS, but some adaptations are needed still needed in order to effectively work on these cameras. This work introduces a first baseline for semantic segmentation with this kind of data. We build a semantic segmentation CNN based on state-of-the-art techniques which takes event information as the only input. Besides, we propose a novel representation for DVS data that outperforms previously used event representations for related tasks. Since there is no existing labeled dataset for this task, we propose how to automatically generate approximated semantic segmentation labels for some sequences of the DDD17 dataset, which we publish together with the model, and demonstrate they are valid to train a model for DVS data only. We compare our results on semantic segmentation from DVS data with results using corresponding grayscale images, demonstrating how they are complementary and worth combining. |
Tasks | 3D Reconstruction, Semantic Segmentation |
Published | 2018-11-29 |
URL | http://arxiv.org/abs/1811.12039v1 |
http://arxiv.org/pdf/1811.12039v1.pdf | |
PWC | https://paperswithcode.com/paper/ev-segnet-semantic-segmentation-for-event |
Repo | |
Framework | |
Training Recurrent Neural Networks via Dynamical Trajectory-Based Optimization
Title | Training Recurrent Neural Networks via Dynamical Trajectory-Based Optimization |
Authors | Hamid Khodabandehlou, M. Sami Fadali |
Abstract | This paper introduces a new method to train recurrent neural networks using dynamical trajectory-based optimization. The optimization method utilizes a projected gradient system (PGS) and a quotient gradient system (QGS) to determine the feasible regions of an optimization problem and search the feasible regions for local minima. By exploring the feasible regions, local minima are identified and the local minimum with the lowest cost is chosen as the global minimum of the optimization problem. Lyapunov theory is used to prove the stability of the local minima and their stability in the presence of measurement errors. Numerical examples show that the new approach provides better results than genetic algorithm and error backpropagation (EBP) trained networks. |
Tasks | |
Published | 2018-05-10 |
URL | http://arxiv.org/abs/1805.04152v1 |
http://arxiv.org/pdf/1805.04152v1.pdf | |
PWC | https://paperswithcode.com/paper/training-recurrent-neural-networks-via |
Repo | |
Framework | |
Multi-modal Feedback for Affordance-driven Interactive Reinforcement Learning
Title | Multi-modal Feedback for Affordance-driven Interactive Reinforcement Learning |
Authors | Francisco Cruz, German I. Parisi, Stefan Wermter |
Abstract | Interactive reinforcement learning (IRL) extends traditional reinforcement learning (RL) by allowing an agent to interact with parent-like trainers during a task. In this paper, we present an IRL approach using dynamic audio-visual input in terms of vocal commands and hand gestures as feedback. Our architecture integrates multi-modal information to provide robust commands from multiple sensory cues along with a confidence value indicating the trustworthiness of the feedback. The integration process also considers the case in which the two modalities convey incongruent information. Additionally, we modulate the influence of sensory-driven feedback in the IRL task using goal-oriented knowledge in terms of contextual affordances. We implement a neural network architecture to predict the effect of performed actions with different objects to avoid failed-states, i.e., states from which it is not possible to accomplish the task. In our experimental setup, we explore the interplay of multimodal feedback and task-specific affordances in a robot cleaning scenario. We compare the learning performance of the agent under four different conditions: traditional RL, multi-modal IRL, and each of these two setups with the use of contextual affordances. Our experiments show that the best performance is obtained by using audio-visual feedback with affordancemodulated IRL. The obtained results demonstrate the importance of multi-modal sensory processing integrated with goal-oriented knowledge in IRL tasks. |
Tasks | |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.09991v1 |
http://arxiv.org/pdf/1807.09991v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-modal-feedback-for-affordance-driven |
Repo | |
Framework | |
Abdominal Aortic Aneurysm Segmentation with a Small Number of Training Subjects
Title | Abdominal Aortic Aneurysm Segmentation with a Small Number of Training Subjects |
Authors | Jian-Qing Zheng, Xiao-Yun Zhou, Qing-Biao Li, Celia Riga, Guang-Zhong Yang |
Abstract | Pre-operative Abdominal Aortic Aneurysm (AAA) 3D shape is critical for customized stent-graft design in Fenestrated Endovascular Aortic Repair (FEVAR). Traditional segmentation approaches implement expert-designed feature extractors while recent deep neural networks extract features automatically with multiple non-linear modules. Usually, a large training dataset is essential for applying deep learning on AAA segmentation. In this paper, the AAA was segmented using U-net with a small number (two) of training subjects. Firstly, Computed Tomography Angiography (CTA) slices were augmented with gray value variation and translation to avoid the overfitting caused by the small number of training subjects. Then, U-net was trained to segment the AAA. Dice Similarity Coefficients (DSCs) over 0.8 were achieved on the testing subjects. The PLZ, DLZ and aortic branches are all reconstructed reasonably, which will facilitate stent graft customization and help shape instantiation for intra-operative surgery navigation in FEVAR. |
Tasks | |
Published | 2018-04-09 |
URL | http://arxiv.org/abs/1804.02943v1 |
http://arxiv.org/pdf/1804.02943v1.pdf | |
PWC | https://paperswithcode.com/paper/abdominal-aortic-aneurysm-segmentation-with-a |
Repo | |
Framework | |
A Smart Security System with Face Recognition
Title | A Smart Security System with Face Recognition |
Authors | Trung Nguyen, Barth Lakshmanan, Weihua Sheng |
Abstract | Web-based technology has improved drastically in the past decade. As a result, security technology has become a major help to protect our daily life. In this paper, we propose a robust security based on face recognition system (SoF). In particular, we develop this system to giving access into a home for authenticated users. The classifier is trained by using a new adaptive learning method. The training data are initially collected from social networks. The accuracy of the classifier is incrementally improved as the user starts using the system. A novel method has been introduced to improve the classifier model by human interaction and social media. By using a deep learning framework - TensorFlow, it will be easy to reuse the framework to adopt with many devices and applications. |
Tasks | Face Recognition |
Published | 2018-12-03 |
URL | http://arxiv.org/abs/1812.09127v1 |
http://arxiv.org/pdf/1812.09127v1.pdf | |
PWC | https://paperswithcode.com/paper/a-smart-security-system-with-face-recognition |
Repo | |
Framework | |
Near-lossless $\ell_\infty$-constrained Image Decompression via Deep Neural Network
Title | Near-lossless $\ell_\infty$-constrained Image Decompression via Deep Neural Network |
Authors | Xi Zhang, Xiaolin Wu |
Abstract | Recently a number of CNN-based techniques were proposed to remove image compression artifacts. As in other restoration applications, these techniques all learn a mapping from decompressed patches to the original counterparts under the ubiquitous $\ell_\infty$ metric. However, this approach is incapable of restoring distinctive image details which may be statistical outliers but have high semantic importance (e.g., tiny lesions in medical images). To overcome this weakness, we propose to incorporate an $\ell_\infty$ fidelity criterion in the design of neural network so that no small, distinctive structures of the original image can be dropped or distorted. Experimental results demonstrate that the proposed method outperforms the state-of-the-art methods in $\ell_\infty$ error metric and perceptual quality, while being competitive in $\ell_2$ error metric as well. It can restore subtle image details that are otherwise destroyed or missed by other algorithms. Our research suggests a new machine learning paradigm of ultra high fidelity image compression that is ideally suited for applications in medicine, space, and sciences. |
Tasks | Image Compression, Image Restoration |
Published | 2018-01-18 |
URL | https://arxiv.org/abs/1801.07987v5 |
https://arxiv.org/pdf/1801.07987v5.pdf | |
PWC | https://paperswithcode.com/paper/near-lossless-l-infinity-constrained-multi |
Repo | |
Framework | |
SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation
Title | SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation |
Authors | Xiaolin Zhang, Yunchao Wei, Yi Yang, Thomas Huang |
Abstract | One-shot semantic segmentation poses a challenging task of recognizing the object regions from unseen categories with only one annotated example as supervision. In this paper, we propose a simple yet effective Similarity Guidance network to tackle the One-shot (SG-One) segmentation problem. We aim at predicting the segmentation mask of a query image with the reference to one densely labeled support image. To obtain the robust representative feature of the support image, we firstly propose a masked average pooling strategy for producing the guidance features using only the pixels belonging to the support image. We then leverage the cosine similarity to build the relationship between the guidance features and features of pixels from the query image. In this way, the possibilities embedded in the produced similarity maps can be adapted to guide the process of segmenting objects. Furthermore, our SG-One is a unified framework which can efficiently process both support and query images within one network and be learned in an end-to-end manner. We conduct extensive experiments on Pascal VOC 2012. In particular, our SG-One achieves the mIoU score of 46.3%, which is a new state-of-the-art. |
Tasks | Semantic Segmentation |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09091v3 |
http://arxiv.org/pdf/1810.09091v3.pdf | |
PWC | https://paperswithcode.com/paper/sg-one-similarity-guidance-network-for-one |
Repo | |
Framework | |
Preserved Structure Across Vector Space Representations
Title | Preserved Structure Across Vector Space Representations |
Authors | Andrei Amatuni, Estelle He, Elika Bergelson |
Abstract | Certain concepts, words, and images are intuitively more similar than others (dog vs. cat, dog vs. spoon), though quantifying such similarity is notoriously difficult. Indeed, this kind of computation is likely a critical part of learning the category boundaries for words within a given language. Here, we use a set of 27 items (e.g. ‘dog’) that are highly common in infants’ input, and use both image- and word-based algorithms to independently compute similarity among them. We find three key results. First, the pairwise item similarities derived within image-space and word-space are correlated, suggesting preserved structure among these extremely different representational formats. Second, the closest ‘neighbors’ for each item, within each space, showed significant overlap (e.g. both found ‘egg’ as a neighbor of ‘apple’). Third, items with the most overlapping neighbors are later-learned by infants and toddlers. We conclude that this approach, which does not rely on human ratings of similarity, may nevertheless reflect stable within-class structure across these two spaces. We speculate that such invariance might aid lexical acquisition, by serving as an informative marker of category boundaries. |
Tasks | |
Published | 2018-02-02 |
URL | http://arxiv.org/abs/1802.00840v2 |
http://arxiv.org/pdf/1802.00840v2.pdf | |
PWC | https://paperswithcode.com/paper/preserved-structure-across-vector-space |
Repo | |
Framework | |
ViewpointS: towards a Collective Brain
Title | ViewpointS: towards a Collective Brain |
Authors | Philippe Lemoisson, Stefano A. Cerri |
Abstract | Tracing knowledge acquisition and linking learning events to interaction between peers is a major challenge of our times. We have conceived, designed and evaluated a new paradigm for constructing and using collective knowledge by Web interactions that we called ViewpointS. By exploiting the similarity with Edelman’s Theory of Neuronal Group Selection (TNGS), we conjecture that it may be metaphorically considered a Collective Brain, especially effective in the case of trans-disciplinary representations. Far from being without doubts, in the paper we present the reasons (and the limits) of our proposal that aims to become a useful integrating tool for future quantitative explorations of individual as well as collective learning at different degrees of granu-larity. We are therefore challenging each of the current approaches: the logical one in the semantic Web, the statistical one in mining and deep learning, the social one in recommender systems based on authority and trust; not in each of their own preferred field of operation, rather in their integration weaknesses far from the holistic and dynamic behavior of the human brain. |
Tasks | Recommendation Systems |
Published | 2018-09-03 |
URL | http://arxiv.org/abs/1809.00564v1 |
http://arxiv.org/pdf/1809.00564v1.pdf | |
PWC | https://paperswithcode.com/paper/viewpoints-towards-a-collective-brain |
Repo | |
Framework | |
Data Poisoning Attack against Unsupervised Node Embedding Methods
Title | Data Poisoning Attack against Unsupervised Node Embedding Methods |
Authors | Mingjie Sun, Jian Tang, Huichen Li, Bo Li, Chaowei Xiao, Yao Chen, Dawn Song |
Abstract | Unsupervised node embedding methods (e.g., DeepWalk, LINE, and node2vec) have attracted growing interests given their simplicity and effectiveness. However, although these methods have been proved effective in a variety of applications, none of the existing work has analyzed the robustness of them. This could be very risky if these methods are attacked by an adversarial party. In this paper, we take the task of link prediction as an example, which is one of the most fundamental problems for graph analysis, and introduce a data positioning attack to node embedding methods. We give a complete characterization of attacker’s utilities and present efficient solutions to adversarial attacks for two popular node embedding methods: DeepWalk and LINE. We evaluate our proposed attack model on multiple real-world graphs. Experimental results show that our proposed model can significantly affect the results of link prediction by slightly changing the graph structures (e.g., adding or removing a few edges). We also show that our proposed model is very general and can be transferable across different embedding methods. Finally, we conduct a case study on a coauthor network to better understand our attack method. |
Tasks | data poisoning, Link Prediction |
Published | 2018-10-30 |
URL | http://arxiv.org/abs/1810.12881v2 |
http://arxiv.org/pdf/1810.12881v2.pdf | |
PWC | https://paperswithcode.com/paper/data-poisoning-attack-against-unsupervised |
Repo | |
Framework | |
Episodic Memory Deep Q-Networks
Title | Episodic Memory Deep Q-Networks |
Authors | Zichuan Lin, Tianqi Zhao, Guangwen Yang, Lintao Zhang |
Abstract | Reinforcement learning (RL) algorithms have made huge progress in recent years by leveraging the power of deep neural networks (DNN). Despite the success, deep RL algorithms are known to be sample inefficient, often requiring many rounds of interaction with the environments to obtain satisfactory performance. Recently, episodic memory based RL has attracted attention due to its ability to latch on good actions quickly. In this paper, we present a simple yet effective biologically inspired RL algorithm called Episodic Memory Deep Q-Networks (EMDQN), which leverages episodic memory to supervise an agent during training. Experiments show that our proposed method can lead to better sample efficiency and is more likely to find good policies. It only requires 1/5 of the interactions of DQN to achieve many state-of-the-art performances on Atari games, significantly outperforming regular DQN and other episodic memory based RL algorithms. |
Tasks | Atari Games |
Published | 2018-05-19 |
URL | http://arxiv.org/abs/1805.07603v1 |
http://arxiv.org/pdf/1805.07603v1.pdf | |
PWC | https://paperswithcode.com/paper/episodic-memory-deep-q-networks |
Repo | |
Framework | |
Learning Student Networks via Feature Embedding
Title | Learning Student Networks via Feature Embedding |
Authors | Hanting Chen, Yunhe Wang, Chang Xu, Chao Xu, Dacheng Tao |
Abstract | Deep convolutional neural networks have been widely used in numerous applications, but their demanding storage and computational resource requirements prevent their applications on mobile devices. Knowledge distillation aims to optimize a portable student network by taking the knowledge from a well-trained heavy teacher network. Traditional teacher-student based methods used to rely on additional fully-connected layers to bridge intermediate layers of teacher and student networks, which brings in a large number of auxiliary parameters. In contrast, this paper aims to propagate information from teacher to student without introducing new variables which need to be optimized. We regard the teacher-student paradigm from a new perspective of feature embedding. By introducing the locality preserving loss, the student network is encouraged to generate the low-dimensional features which could inherit intrinsic properties of their corresponding high-dimensional features from teacher network. The resulting portable network thus can naturally maintain the performance as that of the teacher network. Theoretical analysis is provided to justify the lower computation complexity of the proposed method. Experiments on benchmark datasets and well-trained networks suggest that the proposed algorithm is superior to state-of-the-art teacher-student learning methods in terms of computational and storage complexity. |
Tasks | |
Published | 2018-12-17 |
URL | http://arxiv.org/abs/1812.06597v1 |
http://arxiv.org/pdf/1812.06597v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-student-networks-via-feature |
Repo | |
Framework | |
A General Pipeline for 3D Detection of Vehicles
Title | A General Pipeline for 3D Detection of Vehicles |
Authors | Xinxin Du, Marcelo H. Ang Jr., Sertac Karaman, Daniela Rus |
Abstract | Autonomous driving requires 3D perception of vehicles and other objects in the in environment. Much of the current methods support 2D vehicle detection. This paper proposes a flexible pipeline to adopt any 2D detection network and fuse it with a 3D point cloud to generate 3D information with minimum changes of the 2D detection networks. To identify the 3D box, an effective model fitting algorithm is developed based on generalised car models and score maps. A two-stage convolutional neural network (CNN) is proposed to refine the detected 3D box. This pipeline is tested on the KITTI dataset using two different 2D detection networks. The 3D detection results based on these two networks are similar, demonstrating the flexibility of the proposed pipeline. The results rank second among the 3D detection algorithms, indicating its competencies in 3D detection. |
Tasks | 3D Object Detection, Autonomous Driving |
Published | 2018-02-12 |
URL | http://arxiv.org/abs/1803.00387v1 |
http://arxiv.org/pdf/1803.00387v1.pdf | |
PWC | https://paperswithcode.com/paper/a-general-pipeline-for-3d-detection-of |
Repo | |
Framework | |
Learning a Robust Society of Tracking Parts using Co-occurrence Constraints
Title | Learning a Robust Society of Tracking Parts using Co-occurrence Constraints |
Authors | Elena Burceanu, Marius Leordeanu |
Abstract | Object tracking is an essential problem in computer vision that has been researched for several decades. One of the main challenges in tracking is to adapt to object appearance changes over time and avoiding drifting to background clutter. We address this challenge by proposing a deep neural network composed of different parts, which functions as a society of tracking parts. They work in conjunction according to a certain policy and learn from each other in a robust manner, using co-occurrence constraints that ensure robust inference and learning. From a structural point of view, our network is composed of two main pathways. One pathway is more conservative. It carefully monitors a large set of simple tracker parts learned as linear filters over deep feature activation maps. It assigns the parts different roles. It promotes the reliable ones and removes the inconsistent ones. We learn these filters simultaneously in an efficient way, with a single closed-form formulation, for which we propose novel theoretical properties. The second pathway is more progressive. It is learned completely online and thus it is able to better model object appearance changes. In order to adapt in a robust manner, it is learned only on highly confident frames, which are decided using co-occurrences with the first pathway. Thus, our system has the full benefit of two main approaches in tracking. The larger set of simpler filter parts offers robustness, while the full deep network learned online provides adaptability to change. As shown in the experimental section, our approach achieves state of the art performance on the challenging VOT17 benchmark, outperforming the published methods both on the general EAO metric and in the number of fails, by a significant margin. |
Tasks | Object Tracking |
Published | 2018-04-05 |
URL | http://arxiv.org/abs/1804.01771v2 |
http://arxiv.org/pdf/1804.01771v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-a-robust-society-of-tracking-parts |
Repo | |
Framework | |
A Human Mixed Strategy Approach to Deep Reinforcement Learning
Title | A Human Mixed Strategy Approach to Deep Reinforcement Learning |
Authors | Ngoc Duy Nguyen, Saeid Nahavandi, Thanh Nguyen |
Abstract | In 2015, Google’s DeepMind announced an advancement in creating an autonomous agent based on deep reinforcement learning (DRL) that could beat a professional player in a series of 49 Atari games. However, the current manifestation of DRL is still immature, and has significant drawbacks. One of DRL’s imperfections is its lack of “exploration” during the training process, especially when working with high-dimensional problems. In this paper, we propose a mixed strategy approach that mimics behaviors of human when interacting with environment, and create a “thinking” agent that allows for more efficient exploration in the DRL training process. The simulation results based on the Breakout game show that our scheme achieves a higher probability of obtaining a maximum score than does the baseline DRL algorithm, i.e., the asynchronous advantage actor-critic method. The proposed scheme therefore can be applied effectively to solving a complicated task in a real-world application. |
Tasks | Atari Games, Efficient Exploration |
Published | 2018-04-05 |
URL | http://arxiv.org/abs/1804.01874v1 |
http://arxiv.org/pdf/1804.01874v1.pdf | |
PWC | https://paperswithcode.com/paper/a-human-mixed-strategy-approach-to-deep |
Repo | |
Framework | |