January 31, 2020

3087 words 15 mins read

Paper Group ANR 106

Online Continuous DR-Submodular Maximization with Long-Term Budget Constraints. Generating Adversarial Perturbation with Root Mean Square Gradient. Streamlined Dense Video Captioning. Towards Reliable, Automated General Movement Assessment for Perinatal Stroke Screening in Infants Using Wearable Accelerometers. Analyzing and Interpreting Neural Net …

Online Continuous DR-Submodular Maximization with Long-Term Budget Constraints


Title	Online Continuous DR-Submodular Maximization with Long-Term Budget Constraints
Authors	Omid Sadeghi, Maryam Fazel
Abstract	In this paper, we study a class of online optimization problems with long-term budget constraints where the objective functions are not necessarily concave (nor convex) but they instead satisfy the Diminishing Returns (DR) property. Specifically, a sequence of monotone DR-submodular objective functions ${f_t(x)}{t=1}^T$ and monotone linear budget functions ${\langle p_t,x \rangle }{t=1}^T$ arrive over time and assuming a total targeted budget $B_T$, the goal is to choose points $x_t$ at each time $t\in{1,\dots,T}$, without knowing $f_t$ and $p_t$ on that step, to achieve sub-linear regret bound while the total budget violation $\sum_{t=1}^T \langle p_t,x_t \rangle -B_T$ is sub-linear as well. Prior work has shown that achieving sub-linear regret is impossible if the budget functions are chosen adversarially. Therefore, we modify the notion of regret by comparing the agent against a $(1-\frac{1}{e})$-approximation to the best fixed decision in hindsight which satisfies the budget constraint proportionally over any window of length $W$. We propose the Online Saddle Point Hybrid Gradient (OSPHG) algorithm to solve this class of online problems. For $W=T$, we recover the aforementioned impossibility result. However, when $W=o(T)$, we show that it is possible to obtain sub-linear bounds for both the $(1-\frac{1}{e})$-regret and the total budget violation.
Tasks
Published	2019-06-30
URL	https://arxiv.org/abs/1907.00316v1
PDF	https://arxiv.org/pdf/1907.00316v1.pdf
PWC	https://paperswithcode.com/paper/online-continuous-dr-submodular-maximization
Repo
Framework

Generating Adversarial Perturbation with Root Mean Square Gradient


Title	Generating Adversarial Perturbation with Root Mean Square Gradient
Authors	Yatie Xiao, Chi-Man Pun, Jizhe Zhou
Abstract	We focus our attention on the problem of generating adversarial perturbations based on the gradient in image classification domain
Tasks	Image Classification
Published	2019-01-13
URL	https://arxiv.org/abs/1901.03706v5
PDF	https://arxiv.org/pdf/1901.03706v5.pdf
PWC	https://paperswithcode.com/paper/generating-adversarial-perturbation-with-root
Repo
Framework

Streamlined Dense Video Captioning


Title	Streamlined Dense Video Captioning
Authors	Jonghwan Mun, Linjie Yang, Zhou Ren, Ning Xu, Bohyung Han
Abstract	Dense video captioning is an extremely challenging task since accurate and coherent description of events in a video requires holistic understanding of video contents as well as contextual reasoning of individual events. Most existing approaches handle this problem by first detecting event proposals from a video and then captioning on a subset of the proposals. As a result, the generated sentences are prone to be redundant or inconsistent since they fail to consider temporal dependency between events. To tackle this challenge, we propose a novel dense video captioning framework, which models temporal dependency across events in a video explicitly and leverages visual and linguistic context from prior events for coherent storytelling. This objective is achieved by 1) integrating an event sequence generation network to select a sequence of event proposals adaptively, and 2) feeding the sequence of event proposals to our sequential video captioning network, which is trained by reinforcement learning with two-level rewards at both event and episode levels for better context modeling. The proposed technique achieves outstanding performances on ActivityNet Captions dataset in most metrics.
Tasks	Dense Video Captioning, Video Captioning
Published	2019-04-08
URL	http://arxiv.org/abs/1904.03870v1
PDF	http://arxiv.org/pdf/1904.03870v1.pdf
PWC	https://paperswithcode.com/paper/streamlined-dense-video-captioning
Repo
Framework

Towards Reliable, Automated General Movement Assessment for Perinatal Stroke Screening in Infants Using Wearable Accelerometers


Title	Towards Reliable, Automated General Movement Assessment for Perinatal Stroke Screening in Infants Using Wearable Accelerometers
Authors	Yan Gao, Yang Long, Yu Guan, Anna Basu, Jessica Baggaley, Thomas Ploetz
Abstract	Perinatal stroke (PS) is a serious condition that, if undetected and thus untreated, often leads to life-long disability, in particular Cerebral Palsy (CP). In clinical settings, Prechtl’s General Movement Assessment (GMA) can be used to classify infant movements using a Gestalt approach, identifying infants at high risk of developing PS. Training and maintenance of assessment skills are essential and expensive for the correct use of GMA, yet many practitioners lack these skills, preventing larger-scale screening and leading to significant risks of missing opportunities for early detection and intervention for affected infants. We present an automated approach to GMA, based on body-worn accelerometers and a novel sensor data analysis method-Discriminative Pattern Discovery (DPD)-that is designed to cope with scenarios where only coarse annotations of data are available for model training. We demonstrate the effectiveness of our approach in a study with 34 newborns (21 typically developing infants and 13 PS infants with abnormal movements). Our method is able to correctly recognise the trials with abnormal movements with at least the accuracy that is required by newly trained human annotators (75%), which is encouraging towards our ultimate goal of an automated PS screening system that can be used population-wide.
Tasks
Published	2019-02-21
URL	http://arxiv.org/abs/1902.08068v1
PDF	http://arxiv.org/pdf/1902.08068v1.pdf
PWC	https://paperswithcode.com/paper/towards-reliable-automated-general-movement
Repo
Framework

Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop


Title	Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop
Authors	Afra Alishahi, Grzegorz Chrupała, Tal Linzen
Abstract	The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques specifically developed for analyzing and understanding the inner-workings and representations acquired by neural models of language. Approaches included: systematic manipulation of input to neural networks and investigating the impact on their performance, testing whether interpretable knowledge can be decoded from intermediate representations acquired by neural networks, proposing modifications to neural network architectures to make their knowledge state or generated output more explainable, and examining the performance of networks on simplified or formal languages. Here we review a number of representative studies in each category.
Tasks
Published	2019-04-05
URL	http://arxiv.org/abs/1904.04063v1
PDF	http://arxiv.org/pdf/1904.04063v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-and-interpreting-neural-networks
Repo
Framework

NetTailor: Tuning the Architecture, Not Just the Weights


Title	NetTailor: Tuning the Architecture, Not Just the Weights
Authors	Pedro Morgado, Nuno Vasconcelos
Abstract	Real-world applications of object recognition often require the solution of multiple tasks in a single platform. Under the standard paradigm of network fine-tuning, an entirely new CNN is learned per task, and the final network size is independent of task complexity. This is wasteful, since simple tasks require smaller networks than more complex tasks, and limits the number of tasks that can be solved simultaneously. To address these problems, we propose a transfer learning procedure, denoted NetTailor, in which layers of a pre-trained CNN are used as universal blocks that can be combined with small task-specific layers to generate new networks. Besides minimizing classification error, the new network is trained to mimic the internal activations of a strong unconstrained CNN, and minimize its complexity by the combination of 1) a soft-attention mechanism over blocks and 2) complexity regularization constraints. In this way, NetTailor can adapt the network architecture, not just its weights, to the target task. Experiments show that networks adapted to simple tasks, such as character or traffic sign recognition, become significantly smaller than those adapted to hard tasks, such as fine-grained recognition. More importantly, due to the modular nature of the procedure, this reduction in network complexity is achieved without compromise of either parameter sharing across tasks, or classification accuracy.
Tasks	Object Recognition, Traffic Sign Recognition, Transfer Learning
Published	2019-06-29
URL	https://arxiv.org/abs/1907.00274v1
PDF	https://arxiv.org/pdf/1907.00274v1.pdf
PWC	https://paperswithcode.com/paper/nettailor-tuning-the-architecture-not-just-1
Repo
Framework

Integrating Knowledge and Reasoning in Image Understanding


Title	Integrating Knowledge and Reasoning in Image Understanding
Authors	Somak Aditya, Yezhou Yang, Chitta Baral
Abstract	Deep learning based data-driven approaches have been successfully applied in various image understanding applications ranging from object recognition, semantic segmentation to visual question answering. However, the lack of knowledge integration as well as higher-level reasoning capabilities with the methods still pose a hindrance. In this work, we present a brief survey of a few representative reasoning mechanisms, knowledge integration methods and their corresponding image understanding applications developed by various groups of researchers, approaching the problem from a variety of angles. Furthermore, we discuss upon key efforts on integrating external knowledge with neural networks. Taking cues from these efforts, we conclude by discussing potential pathways to improve reasoning capabilities.
Tasks	Object Recognition, Question Answering, Semantic Segmentation, Visual Question Answering
Published	2019-06-24
URL	https://arxiv.org/abs/1906.09954v1
PDF	https://arxiv.org/pdf/1906.09954v1.pdf
PWC	https://paperswithcode.com/paper/integrating-knowledge-and-reasoning-in-image
Repo
Framework

A shallow residual neural network to predict the visual cortex response


Title	A shallow residual neural network to predict the visual cortex response
Authors	Anne-Ruth José Meijer, Arnoud Visser
Abstract	Understanding how the visual cortex of the human brain really works is still an open problem for science today. A better understanding of natural intelligence could also benefit object-recognition algorithms based on convolutional neural networks. In this paper we demonstrate the asset of using a shallow residual neural network for this task. The benefit of this approach is that earlier stages of the network can be accurately trained, which allows us to add more layers at the earlier stage. With this additional layer the prediction of the visual brain activity improves from $10.4%$ (block 1) to $15.53%$ (last fully connected layer). By training the network for more than 10 epochs this improvement can become even larger.
Tasks	Object Recognition
Published	2019-06-27
URL	https://arxiv.org/abs/1906.11578v1
PDF	https://arxiv.org/pdf/1906.11578v1.pdf
PWC	https://paperswithcode.com/paper/a-shallow-residual-neural-network-to-predict
Repo
Framework

AlignFlow: Cycle Consistent Learning from Multiple Domains via Normalizing Flows


Title	AlignFlow: Cycle Consistent Learning from Multiple Domains via Normalizing Flows
Authors	Aditya Grover, Christopher Chute, Rui Shu, Zhangjie Cao, Stefano Ermon
Abstract	Given datasets from multiple domains, a key challenge is to efficiently exploit these data sources for modeling a target domain. Variants of this problem have been studied in many contexts, such as cross-domain translation and domain adaptation. We propose AlignFlow, a generative modeling framework that models each domain via a normalizing flow. The use of normalizing flows allows for a) flexibility in specifying learning objectives via adversarial training, maximum likelihood estimation, or a hybrid of the two methods; and b) learning and exact inference of a shared representation in the latent space of the generative model. We derive a uniform set of conditions under which AlignFlow is marginally-consistent for the different learning objectives. Furthermore, we show that AlignFlow guarantees exact cycle consistency in mapping datapoints from a source domain to target and back to the source domain. Empirically, AlignFlow outperforms relevant baselines on image-to-image translation and unsupervised domain adaptation and can be used to simultaneously interpolate across the various domains using the learned representation.
Tasks	Density Estimation, Domain Adaptation, Image-to-Image Translation, Unsupervised Domain Adaptation
Published	2019-05-30
URL	https://arxiv.org/abs/1905.12892v2
PDF	https://arxiv.org/pdf/1905.12892v2.pdf
PWC	https://paperswithcode.com/paper/alignflow-cycle-consistent-learning-from
Repo
Framework

Reducing the dilution: An analysis of the information sensitiveness of capsule network with a practical improvement method


Title	Reducing the dilution: An analysis of the information sensitiveness of capsule network with a practical improvement method
Authors	Zonglin Yang, Xinggang Wang
Abstract	Capsule network has shown various advantages over convolutional neural network (CNN). It keeps more precise spatial information than CNN and uses equivariance instead of invariance during inference and highly potential to be a new effective tool for visual tasks. However, the current capsule networks have incompatible performance with CNN when facing datasets with background and complex target objects and are lacking in universal and efficient regularization method. We analyze a main reason of the incompatible performance as the conflict between information sensitiveness of capsule network and unreasonably higher activation value distribution of capsules in primary capsule layer. Correspondingly, we propose a practical improvement method by restraining the activation value of capsules in primary capsule layer to suppress non-informative capsules and highlight discriminative capsules. In the experiments, the method has achieved better performances on various mainstream datasets. In addition, the proposed improvement methods can be seen as a suitable, simple and efficient regularization method that can be generally used in capsule network.
Tasks
Published	2019-03-25
URL	https://arxiv.org/abs/1903.10588v3
PDF	https://arxiv.org/pdf/1903.10588v3.pdf
PWC	https://paperswithcode.com/paper/reducing-the-dilution-analysis-of-the
Repo
Framework

Long-Duration Fully Autonomous Operation of Rotorcraft Unmanned Aerial Systems for Remote-Sensing Data Acquisition


Title	Long-Duration Fully Autonomous Operation of Rotorcraft Unmanned Aerial Systems for Remote-Sensing Data Acquisition
Authors	Danylo Malyuta, Christian Brommer, Daniel Hentzen, Thomas Stastny, Roland Siegwart, Roland Brockers
Abstract	Recent applications of unmanned aerial systems (UAS) to precision agriculture have shown increased ease and efficiency in data collection at precise remote locations. However, further enhancement of the field requires operation over long periods of time, e.g. days or weeks. This has so far been impractical due to the limited flight times of such platforms and the requirement of humans in the loop for operation. To overcome these limitations, we propose a fully autonomous rotorcraft UAS that is capable of performing repeated flights for long-term observation missions without any human intervention. We address two key technologies that are critical for such a system: full platform autonomy to enable mission execution independently from human operators and the ability of vision-based precision landing on a recharging station for automated energy replenishment. High-level autonomous decision making is implemented as a hierarchy of master and slave state machines. Vision-based precision landing is enabled by estimating the landing pad’s pose using a bundle of AprilTag fiducials configured for detection from a wide range of altitudes. We provide an extensive evaluation of the landing pad pose estimation accuracy as a function of the bundle’s geometry. The functionality of the complete system is demonstrated through two indoor experiments with a duration of 11 and 10.6 hours, and one outdoor experiment with a duration of 4 hours. The UAS executed 16, 48 and 22 flights respectively during these experiments. In the outdoor experiment, the ratio between flying to collect data and charging was 1 to 10, which is similar to past work in this domain. All flights were fully autonomous with no human in the loop. To our best knowledge this is the first research publication about the long-term outdoor operation of a quadrotor system with no human interaction.
Tasks	Decision Making, Pose Estimation
Published	2019-08-18
URL	https://arxiv.org/abs/1908.06381v1
PDF	https://arxiv.org/pdf/1908.06381v1.pdf
PWC	https://paperswithcode.com/paper/long-duration-fully-autonomous-operation-of
Repo
Framework

Generalizing Energy-based Generative ConvNets from Particle Evolution Perspective


Title	Generalizing Energy-based Generative ConvNets from Particle Evolution Perspective
Authors	Yang Wu, Xu Cai, Pengxu Wei, Guanbin Li, Liang Lin
Abstract	Compared with Generative Adversarial Networks (GAN), Energy-Based generative Models (EBMs) possess two appealing properties: i) they can be directly optimized without requiring an auxiliary network during the learning and synthesizing; ii) they can better approximate underlying distribution of the observed data by learning explicitly potential functions. This paper studies a branch of EBMs, i.e., energy-based Generative ConvNets (GCNs), which minimize their energy function defined by a bottom-up ConvNet. From the perspective of particle physics, we solve the problem of unstable energy dissipation that might damage the quality of the synthesized samples during the maximum likelihood learning. Specifically, we firstly establish a connection between classical FRAME model [1] and dynamic physics process and generalize the GCN in discrete flow with a certain metric measure from particle perspective. To address KL-vanishing issue, we then reformulate GCN from the KL discrete flow with KL divergence measure to a Jordan-Kinderleher-Otto (JKO) discrete flow with Wasserastein distance metric and derive a Wasserastein GCN (wGCN). Based on these theoretical studies on GCN, we finally derive a Generalized GCN (GGCN) to further improve the model generalization and learning capability. GGCN introduces a hidden space mapping strategy by employing a normal distribution for the reference distribution to address the learning bias issue. Due to MCMC sampling in GCNs, it still suffers from a serious time-consuming issue when sampling steps increase; thus a trainable non-linear upsampling function and an amortized learning are proposed to improve the learning efficiency. Our proposed GGCN is trained in a symmetrical learning manner. Our method surpass the existing models in both model stability and the quality of generated samples on several widely-used face and natural image datasets.
Tasks
Published	2019-10-31
URL	https://arxiv.org/abs/1910.14216v5
PDF	https://arxiv.org/pdf/1910.14216v5.pdf
PWC	https://paperswithcode.com/paper/generalizing-energy-based-generative-convnets
Repo
Framework

Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator


Title	Analyzing the Variance of Policy Gradient Estimators for the Linear-Quadratic Regulator
Authors	James A. Preiss, Sébastien M. R. Arnold, Chen-Yu Wei, Marius Kloft
Abstract	We study the variance of the REINFORCE policy gradient estimator in environments with continuous state and action spaces, linear dynamics, quadratic cost, and Gaussian noise. These simple environments allow us to derive bounds on the estimator variance in terms of the environment and noise parameters. We compare the predictions of our bounds to the empirical variance in simulation experiments.
Tasks
Published	2019-10-02
URL	https://arxiv.org/abs/1910.01249v1
PDF	https://arxiv.org/pdf/1910.01249v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-the-variance-of-policy-gradient
Repo
Framework

Challenges in Designing Datasets and Validation for Autonomous Driving


Title	Challenges in Designing Datasets and Validation for Autonomous Driving
Authors	Michal Uricar, David Hurych, Pavel Krizek, Senthil Yogamani
Abstract	Autonomous driving is getting a lot of attention in the last decade and will be the hot topic at least until the first successful certification of a car with Level 5 autonomy. There are many public datasets in the academic community. However, they are far away from what a robust industrial production system needs. There is a large gap between academic and industrial setting and a substantial way from a research prototype, built on public datasets, to a deployable solution which is a challenging task. In this paper, we focus on bad practices that often happen in the autonomous driving from an industrial deployment perspective. Data design deserves at least the same amount of attention as the model design. There is very little attention paid to these issues in the scientific community, and we hope this paper encourages better formalization of dataset design. More specifically, we focus on the datasets design and validation scheme for autonomous driving, where we would like to highlight the common problems, wrong assumptions, and steps towards avoiding them, as well as some open problems.
Tasks	Autonomous Driving
Published	2019-01-26
URL	http://arxiv.org/abs/1901.09270v1
PDF	http://arxiv.org/pdf/1901.09270v1.pdf
PWC	https://paperswithcode.com/paper/challenges-in-designing-datasets-and
Repo
Framework

WiCV 2019: The Sixth Women In Computer Vision Workshop


Title	WiCV 2019: The Sixth Women In Computer Vision Workshop
Authors	Irene Amerini, Elena Balashova, Sayna Ebrahimi, Kathryn Leonard, Arsha Nagrani, Amaia Salvador
Abstract	In this paper we present the Women in Computer Vision Workshop - WiCV 2019, organized in conjunction with CVPR 2019. This event is meant for increasing the visibility and inclusion of women researchers in the computer vision field. Computer vision and machine learning have made incredible progress over the past years, but the number of female researchers is still low both in academia and in industry. WiCV is organized especially for the following reason: to raise visibility of female researchers, to increase collaborations between them, and to provide mentorship to female junior researchers in the field. In this paper, we present a report of trends over the past years, along with a summary of statistics regarding presenters, attendees, and sponsorship for the current workshop.
Tasks
Published	2019-09-23
URL	https://arxiv.org/abs/1909.10225v1
PDF	https://arxiv.org/pdf/1909.10225v1.pdf
PWC	https://paperswithcode.com/paper/190910225
Repo
Framework