April 2, 2020

3185 words 15 mins read

Paper Group ANR 176

Syn2Real: Forgery Classification via Unsupervised Domain Adaptation. Large Hole Image Inpainting With Compress-Decompression Network. SF-Net: Single-Frame Supervision for Temporal Action Localization. Weakly-Supervised Multi-Person Action Recognition in 360$^{\circ}$ Videos. Action Graphs: Weakly-supervised Action Localization with Graph Convolutio …

Syn2Real: Forgery Classification via Unsupervised Domain Adaptation


Title	Syn2Real: Forgery Classification via Unsupervised Domain Adaptation
Authors	Akash Kumar, Arnav Bhavasar
Abstract	In recent years, image manipulation is becoming increasingly more accessible, yielding more natural-looking images, owing to the modern tools in image processing and computer vision techniques. The task of the identification of forged images has become very challenging. Amongst different types of forgeries, the cases of Copy-Move forgery are increasing manifold, due to the difficulties involved to detect this tampering. To tackle such problems, publicly available datasets are insufficient. In this paper, we propose to create a synthetic forged dataset using deep semantic image inpainting and copy-move forgery algorithm. However, models trained on these datasets have a significant drop in performance when tested on more realistic data. To alleviate this problem, we use unsupervised domain adaptation networks to detect copy-move forgery in new domains by mapping the feature space from our synthetically generated dataset. Furthermore, we improvised the F1 score on CASIA and CoMoFoD dataset to 80.3% and 78.8%, respectively. Our approach can be helpful in those cases where the classification of data is unavailable.
Tasks	Domain Adaptation, Image Inpainting, Unsupervised Domain Adaptation
Published	2020-02-03
URL	https://arxiv.org/abs/2002.00807v1
PDF	https://arxiv.org/pdf/2002.00807v1.pdf
PWC	https://paperswithcode.com/paper/syn2real-forgery-classification-via
Repo
Framework

Large Hole Image Inpainting With Compress-Decompression Network


Title	Large Hole Image Inpainting With Compress-Decompression Network
Authors	Zhenghang Wu, Yidong Cui
Abstract	Image inpainting technology can patch images with missing pixels. Existing methods propose convolutional neural networks to repair corrupted images. The networks focus on the valid pixels around the missing pixels, use the encoder-decoder structure to extract valuable information, and use the information to fix the vacancy. However, if the missing part is too large to provide useful information, the result will exist blur, color mixing, and object confusion. In order to patch the large hole image, we study the existing approaches and propose a new network, the compression-decompression network. The compression network takes responsibility for inpainting and generating a down-sample image. The decompression network takes responsibility for extending the down-sample image into the original resolution. We construct the compression network with the residual network and propose a similar texture selection algorithm to extend the image that is better than using the super-resolution network. We evaluate our model over Places2 and CelebA data set and use the similarity ratio as the metric. The result shows that our model has better performance when the inpainting task has many conflicts.
Tasks	Image Inpainting, Super-Resolution
Published	2020-02-01
URL	https://arxiv.org/abs/2002.00199v1
PDF	https://arxiv.org/pdf/2002.00199v1.pdf
PWC	https://paperswithcode.com/paper/large-hole-image-inpainting-with-compress
Repo
Framework

SF-Net: Single-Frame Supervision for Temporal Action Localization


Title	SF-Net: Single-Frame Supervision for Temporal Action Localization
Authors	Fan Ma, Linchao Zhu, Yi Yang, Shengxin Zha, Gourab Kundu, Matt Feiszli, Zheng Shou
Abstract	In this paper, we study an intermediate form of supervision, i.e., single-frame supervision, for temporal action localization (TAL). To obtain the single-frame supervision, the annotators are asked to identify only a single frame within the temporal window of an action. This can significantly reduce the labor cost of obtaining full supervision which requires annotating the action boundary. Compared to the weak supervision that only annotates the video-level label, the single-frame supervision introduces extra temporal action signals while maintaining low annotation overhead. To make full use of such single-frame supervision, we propose a unified system called SF-Net. First, we propose to predict an actionness score for each video frame. Along with a typical category score, the actionness score can provide comprehensive information about the occurrence of a potential action and aid the temporal boundary refinement during inference. Second, we mine pseudo action and background frames based on the single-frame annotations. We identify pseudo action frames by adaptively expanding each annotated single frame to its nearby, contextual frames and we mine pseudo background frames from all the unannotated frames across multiple videos. Together with the ground-truth labeled frames, these pseudo-labeled frames are further used for training the classifier. In extensive experiments on THUMOS14, GTEA, and BEOID, SF-Net significantly improves upon state-of-the-art weakly-supervised methods in terms of both segment localization and single-frame localization. Notably, SF-Net achieves comparable results to its fully-supervised counterpart which requires much more resource intensive annotations.
Tasks	Action Localization, Temporal Action Localization
Published	2020-03-15
URL	https://arxiv.org/abs/2003.06845v3
PDF	https://arxiv.org/pdf/2003.06845v3.pdf
PWC	https://paperswithcode.com/paper/sf-net-single-frame-supervision-for-temporal
Repo
Framework

Weakly-Supervised Multi-Person Action Recognition in 360$^{\circ}$ Videos


Title	Weakly-Supervised Multi-Person Action Recognition in 360$^{\circ}$ Videos
Authors	Junnan Li, Jianquan Liu, Yongkang Wong, Shoji Nishimura, Mohan Kankanhalli
Abstract	The recent development of commodity 360$^{\circ}$ cameras have enabled a single video to capture an entire scene, which endows promising potentials in surveillance scenarios. However, research in omnidirectional video analysis has lagged behind the hardware advances. In this work, we address the important problem of action recognition in top-view 360$^{\circ}$ videos. Due to the wide filed-of-view, 360$^{\circ}$ videos usually capture multiple people performing actions at the same time. Furthermore, the appearance of people are deformed. The proposed framework first transforms omnidirectional videos into panoramic videos, then it extracts spatial-temporal features using region-based 3D CNNs for action recognition. We propose a weakly-supervised method based on multi-instance multi-label learning, which trains the model to recognize and localize multiple actions in a video using only video-level action labels as supervision. We perform experiments to quantitatively validate the efficacy of the proposed method and qualitatively demonstrate action localization results. To enable research in this direction, we introduce 360Action, the first omnidirectional video dataset for multi-person action recognition.
Tasks	Action Localization, Multi-Label Learning
Published	2020-02-09
URL	https://arxiv.org/abs/2002.03266v1
PDF	https://arxiv.org/pdf/2002.03266v1.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-multi-person-action
Repo
Framework

Action Graphs: Weakly-supervised Action Localization with Graph Convolution Networks


Title	Action Graphs: Weakly-supervised Action Localization with Graph Convolution Networks
Authors	Maheen Rashid, Hedvig Kjellström, Yong Jae Lee
Abstract	We present a method for weakly-supervised action localization based on graph convolutions. In order to find and classify video time segments that correspond to relevant action classes, a system must be able to both identify discriminative time segments in each video, and identify the full extent of each action. Achieving this with weak video level labels requires the system to use similarity and dissimilarity between moments across videos in the training data to understand both how an action appears, as well as the sub-actions that comprise the action’s full extent. However, current methods do not make explicit use of similarity between video moments to inform the localization and classification predictions. We present a novel method that uses graph convolutions to explicitly model similarity between video moments. Our method utilizes similarity graphs that encode appearance and motion, and pushes the state of the art on THUMOS ‘14, ActivityNet 1.2, and Charades for weakly supervised action localization.
Tasks	Action Localization, Weakly Supervised Action Localization
Published	2020-02-04
URL	https://arxiv.org/abs/2002.01449v1
PDF	https://arxiv.org/pdf/2002.01449v1.pdf
PWC	https://paperswithcode.com/paper/action-graphs-weakly-supervised-action
Repo
Framework

Multi-label Relation Modeling in Facial Action Units Detection


Title	Multi-label Relation Modeling in Facial Action Units Detection
Authors	Xianpeng Ji, Yu Ding, Lincheng Li, Yu Chen, Changjie Fan
Abstract	This paper describes an approach to the facial action units detections. The involved action units (AU) include AU1 (Inner Brow Raiser), AU2 (Outer Brow Raiser), AU4 (Brow Lowerer), AU6 (Cheek Raise), AU12 (Lip Corner Puller), AU15 (Lip Corner Depressor), AU20 (Lip Stretcher), and AU25 (Lip Part). Our work relies on the dataset released by the FG-2020 Competition: Affective Behavior Analysis In-the-Wild (ABAW). The proposed method consists of the data preprocessing, the feature extraction and the AU classification. The data preprocessing includes the detection of face texture and landmarks. The texture static and landmark dynamic features are extracted through neural networks and then fused as the feature latent representation. Finally, the fused feature is taken as the initial hidden state of a recurrent neural network with a trainable lookup AU table. The output of the RNN is the results of AU classification. The detected accuracy is evaluated with 0.5$\times$accuracy + 0.5$\times$F1. Our method achieve 0.56 with the validation data that is specified by the organization committee.
Tasks	Multi-Label Classification, Multi-Label Text Classification, Text Classification
Published	2020-02-04
URL	https://arxiv.org/abs/2002.01105v2
PDF	https://arxiv.org/pdf/2002.01105v2.pdf
PWC	https://paperswithcode.com/paper/to-sequencemulti-label-relation-modeling-in
Repo
Framework

Deep Variational Luenberger-type Observer for Stochastic Video Prediction


Title	Deep Variational Luenberger-type Observer for Stochastic Video Prediction
Authors	Dong Wang, Feng Zhou, Zheng Yan, Guang Yao, Zongxuan Liu, Wennan Ma, Cewu Lu
Abstract	Considering the inherent stochasticity and uncertainty, predicting future video frames is exceptionally challenging. In this work, we study the problem of video prediction by combining interpretability of stochastic state space models and representation learning of deep neural networks. Our model builds upon an variational encoder which transforms the input video into a latent feature space and a Luenberger-type observer which captures the dynamic evolution of the latent features. This enables the decomposition of videos into static features and dynamics in an unsupervised manner. By deriving the stability theory of the nonlinear Luenberger-type observer, the hidden states in the feature space become insensitive with respect to the initial values, which improves the robustness of the overall model. Furthermore, the variational lower bound on the data log-likelihood can be derived to obtain the tractable posterior prediction distribution based on the variational principle. Finally, the experiments such as the Bouncing Balls dataset and the Pendulum dataset are provided to demonstrate the proposed model outperforms concurrent works.
Tasks	Representation Learning, Video Prediction
Published	2020-02-12
URL	https://arxiv.org/abs/2003.00835v1
PDF	https://arxiv.org/pdf/2003.00835v1.pdf
PWC	https://paperswithcode.com/paper/deep-variational-luenberger-type-observer-for
Repo
Framework

LIBTwinSVM: A Library for Twin Support Vector Machines


Title	LIBTwinSVM: A Library for Twin Support Vector Machines
Authors	Amir M. Mir, Mahdi Rahbar, Jalal A. Nasiri
Abstract	This paper presents LIBTwinSVM, a free, efficient, and open source library for Twin Support Vector Machines (TSVMs). Our library provides a set of useful functionalities such as fast TSVMs estimators, model selection, visualization, a graphical user interface (GUI) application, and a Python application programming interface (API). The benchmarks results indicate the effectiveness of the LIBTwinSVM library for large-scale classification problems. The source code of LIBTwinSVM library, installation guide, documentation, and usage examples are available at https://github.com/mir-am/LIBTwinSVM.
Tasks	Model Selection
Published	2020-01-27
URL	https://arxiv.org/abs/2001.10073v1
PDF	https://arxiv.org/pdf/2001.10073v1.pdf
PWC	https://paperswithcode.com/paper/libtwinsvm-a-library-for-twin-support-vector
Repo
Framework

Dynamic Task Weighting Methods for Multi-task Networks in Autonomous Driving Systems


Title	Dynamic Task Weighting Methods for Multi-task Networks in Autonomous Driving Systems
Authors	Isabelle Leang, Ganesh Sistu, Fabian Burger, Andrei Bursuc, Senthil Yogamani
Abstract	Deep multi-task networks are of particular interest for autonomous driving systems. They can potentially strike an excellent trade-off between predictive performance, hardware constraints and efficient use of information from multiple types of annotations and modalities. However, training such models is non-trivial and requires balancing the learning of all tasks as their respective losses display different scales, ranges and dynamics across training. Multiple task weighting methods that adjust the losses in an adaptive way have been proposed recently on different datasets and combinations of tasks, making it difficult to compare them. In this work, we review and systematically evaluate nine task weighting strategies on common grounds on three automotive datasets (KITTI, Cityscapes and WoodScape). We then propose a novel method combining evolutionary meta-learning and task-based selective backpropagation, for finding the task weights and training the network reliably. Our method outperforms state-of-the-art methods by $3%$ on a two-task application.
Tasks	Autonomous Driving, Meta-Learning
Published	2020-01-07
URL	https://arxiv.org/abs/2001.02223v1
PDF	https://arxiv.org/pdf/2001.02223v1.pdf
PWC	https://paperswithcode.com/paper/dynamic-task-weighting-methods-for-multi-task
Repo
Framework

Tune smarter not harder: A principled approach to tuning learning rates for shallow nets


Title	Tune smarter not harder: A principled approach to tuning learning rates for shallow nets
Authors	Thulasi Tholeti, Sheetal Kalyani
Abstract	Effective hyper-parameter tuning is essential to guarantee the performance that neural networks have come to be known for. In this work, a principled approach to choosing the learning rate is proposed for shallow feedforward neural networks. We associate the learning rate with the gradient Lipschitz constant of the objective to be minimized while training. An upper bound on the mentioned constant is derived and a search algorithm, which always results in non-divergent traces, is proposed to exploit the derived bound. It is shown through simulations that the proposed search method significantly outperforms the existing tuning methods such as Tree Parzen Estimators (TPE). The proposed method is applied to two different existing applications, namely, channel estimation in a wireless communication system and prediction of the exchange currency rates, and it is shown to pick better learning rates than the existing methods using the same or lesser compute power.
Tasks
Published	2020-03-22
URL	https://arxiv.org/abs/2003.09844v1
PDF	https://arxiv.org/pdf/2003.09844v1.pdf
PWC	https://paperswithcode.com/paper/tune-smarter-not-harder-a-principled-approach
Repo
Framework

Comprehensive Taxonomies of Nature- and Bio-inspired Optimization: Inspiration versus Algorithmic Behavior, Critical Analysis and Recommendations


Title	Comprehensive Taxonomies of Nature- and Bio-inspired Optimization: Inspiration versus Algorithmic Behavior, Critical Analysis and Recommendations
Authors	Daniel Molina, Javier Poyatos, Javier Del Ser, Salvador García, Amir Hussain, Francisco Herrera
Abstract	In recent years, a great variety of nature- and bio-inspired algorithms has been reported in the literature. This algorithmic family simulates different biological processes observed in Nature in order to efficiently address complex optimization problems. In the last years the number of bio-inspired optimization approaches in literature has grown considerably, reaching unprecedented levels that dark the future prospects of this field of research. This paper addresses this problem by proposing two comprehensive, principle-based taxonomies that allow researchers to organize existing and future algorithmic developments into well-defined categories, considering two different criteria: the source of inspiration and the behavior of each algorithm. Using these taxonomies we review more than three hundred publications dealing with nature-inspired and bio-inspired algorithms, and proposals falling within each of these categories are examined, leading to a critical summary of design trends and similarities between them, and the identification of the most similar classical algorithm for each reviewed paper. From our analysis we conclude that a poor relationship is often found between the natural inspiration of an algorithm and its behavior. Furthermore, similarities in terms of behavior between different algorithms are greater than what is claimed in their public disclosure: specifically, we show that more than one-third of the reviewed bio-inspired solvers are versions of classical algorithms. Grounded on the conclusions of our critical analysis, we give several recommendations and points of improvement for better methodological practices in this active and growing research field.
Tasks
Published	2020-02-19
URL	https://arxiv.org/abs/2002.08136v2
PDF	https://arxiv.org/pdf/2002.08136v2.pdf
PWC	https://paperswithcode.com/paper/taxonomy-of-bio-inspired-algorithms
Repo
Framework

Input representation in recurrent neural networks dynamics


Title	Input representation in recurrent neural networks dynamics
Authors	Pietro Verzelli, Cesare Alippi, Lorenzo Livi, Peter Tino
Abstract	Reservoir computing is a popular approach to design recurrent neural networks, due to its training simplicity and its approximation performance. The recurrent part of these networks is not trained (e.g. via gradient descent), making them appealing for analytical studies, raising the interest of a vast community of researcher spanning from dynamical systems to neuroscience. It emerges that, even in the simple linear case, the working principle of these networks is not fully understood and the applied research is usually driven by heuristics. A novel analysis of the dynamics of such networks is proposed, which allows one to express the state evolution using the controllability matrix. Such a matrix encodes salient characteristics of the network dynamics: in particular, its rank can be used as an input-indepedent measure of the memory of the network. Using the proposed approach, it is possible to compare different architectures and explain why a cyclic topology achieves favourable results.
Tasks
Published	2020-03-24
URL	https://arxiv.org/abs/2003.10585v1
PDF	https://arxiv.org/pdf/2003.10585v1.pdf
PWC	https://paperswithcode.com/paper/input-representation-in-recurrent-neural
Repo
Framework

Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle


Title	Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle
Authors	Qilei Zhang, Jinying Lin, Qixin Sha, Bo He, Guangliang Li
Abstract	Autonomous underwater vehicle (AUV) plays an increasingly important role in ocean exploration. Existing AUVs are usually not fully autonomous and generally limited to pre-planning or pre-programming tasks. Reinforcement learning (RL) and deep reinforcement learning have been introduced into the AUV design and research to improve its autonomy. However, these methods are still difficult to apply directly to the actual AUV system because of the sparse rewards and low learning efficiency. In this paper, we proposed a deep interactive reinforcement learning method for path following of AUV by combining the advantages of deep reinforcement learning and interactive RL. In addition, since the human trainer cannot provide human rewards for AUV when it is running in the ocean and AUV needs to adapt to a changing environment, we further propose a deep reinforcement learning method that learns from both human rewards and environmental rewards at the same time. We test our methods in two path following tasks—straight line and sinusoids curve following of AUV by simulating in the Gazebo platform. Our experimental results show that with our proposed deep interactive RL method, AUV can converge faster than a DQN learner from only environmental reward. Moreover, AUV learning with our deep RL from both human and environmental rewards can also achieve a similar or even better performance than that with the deep interactive RL method and can adapt to the actual environment by further learning from environmental rewards.
Tasks
Published	2020-01-10
URL	https://arxiv.org/abs/2001.03359v1
PDF	https://arxiv.org/pdf/2001.03359v1.pdf
PWC	https://paperswithcode.com/paper/deep-interactive-reinforcement-learning-for
Repo
Framework

The Statistical Complexity of Early Stopped Mirror Descent


Title	The Statistical Complexity of Early Stopped Mirror Descent
Authors	Tomas Vaškevičius, Varun Kanade, Patrick Rebeschini
Abstract	Recently there has been a surge of interest in understanding implicit regularization properties of iterative gradient-based optimization algorithms. In this paper, we study the statistical guarantees on the excess risk achieved by early stopped unconstrained mirror descent algorithms applied to the unregularized empirical risk with squared loss for linear models and kernel methods. We identify a link between offset Rademacher complexities and potential-based analysis of mirror descent that allows disentangling statistics from optimization in the analysis of such algorithms. Our main result characterizes the statistical performance of the path traced by the iterates of mirror descent in terms of offset complexities of certain function classes depending only on the choice of the mirror map, initialization point, step-size, and number of iterations. We apply our theory to recover, in a rather clean and elegant manner, some of the recent results in the implicit regularization literature, while also showing how to improve upon them in some settings.
Tasks
Published	2020-02-01
URL	https://arxiv.org/abs/2002.00189v1
PDF	https://arxiv.org/pdf/2002.00189v1.pdf
PWC	https://paperswithcode.com/paper/the-statistical-complexity-of-early-stopped
Repo
Framework

Momentum-Net for Low-Dose CT Image Reconstruction


Title	Momentum-Net for Low-Dose CT Image Reconstruction
Authors	Siqi Ye, Yong Long, Il Yong Chun
Abstract	This paper applies the recent fast iterative neural network framework, Momentum-Net, using appropriate models to low-dose X-ray computed tomography (LDCT) image reconstruction. At each layer of the proposed Momentum-Net, the model-based image reconstruction module solves the majorized penalized weighted least-square problem, and the image refining module uses a four-layer convolutional autoencoder. Experimental results with the NIH AAPM-Mayo Clinic Low Dose CT Grand Challenge dataset show that the proposed Momentum-Net architecture significantly improves image reconstruction accuracy, compared to a state-of-the-art noniterative image denoising deep neural network (NN), WavResNet (in LDCT). We also investigated the spectral normalization technique that applies to image refining NN learning to satisfy the nonexpansive NN property; however, experimental results show that this does not improve the image reconstruction performance of Momentum-Net.
Tasks	Denoising, Image Denoising, Image Reconstruction
Published	2020-02-27
URL	https://arxiv.org/abs/2002.12018v3
PDF	https://arxiv.org/pdf/2002.12018v3.pdf
PWC	https://paperswithcode.com/paper/momentum-net-for-low-dose-ct-image
Repo
Framework