January 25, 2020

3307 words 16 mins read

Paper Group ANR 1691

Facial Synthesis from Visual Attributes via Sketch using Multi-Scale Generators. Transfer Learning for Prosthetics Using Imitation Learning. CryptoNN: Training Neural Networks over Encrypted Data. Deep Metric Learning with Alternating Projections onto Feasible Sets. Nonstochastic Multiarmed Bandits with Unrestricted Delays. Recommendation as a Comm …

Facial Synthesis from Visual Attributes via Sketch using Multi-Scale Generators


Title	Facial Synthesis from Visual Attributes via Sketch using Multi-Scale Generators
Authors	Xing Di, Vishal M. Patel
Abstract	Automatic synthesis of faces from visual attributes is an important problem in computer vision and has wide applications in law enforcement and entertainment. With the advent of deep generative convolutional neural networks (CNNs), attempts have been made to synthesize face images from attributes and text descriptions. In this paper, we take a different approach, where we formulate the original problem as a stage-wise learning problem. We first synthesize the facial sketch corresponding to the visual attributes and then we generate the face image based on the synthesized sketch. The proposed framework, is based on a combination of two different Generative Adversarial Networks (GANs) - (1) a sketch generator network which synthesizes realistic sketch from the input attributes, and (2) a face generator network which synthesizes facial images from the synthesized sketch images with the help of facial attributes. Extensive experiments and comparison with recent methods are performed to verify the effectiveness of the proposed attribute-based two-stage face synthesis method.
Tasks	Face Generation
Published	2019-12-17
URL	https://arxiv.org/abs/1912.10479v1
PDF	https://arxiv.org/pdf/1912.10479v1.pdf
PWC	https://paperswithcode.com/paper/facial-synthesis-from-visual-attributes-via
Repo
Framework

Transfer Learning for Prosthetics Using Imitation Learning


Title	Transfer Learning for Prosthetics Using Imitation Learning
Authors	Montaser Mohammedalamen, Waleed D. Khamies, Benjamin Rosman
Abstract	In this paper, We Apply Reinforcement learning (RL) techniques to train a realistic biomechanical model to work with different people and on different walking environments. We benchmarking 3 RL algorithms: Deep Deterministic Policy Gradient (DDPG), Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) in OpenSim environment, Also we apply imitation learning to a prosthetics domain to reduce the training time needed to design customized prosthetics. We use DDPG algorithm to train an original expert agent. We then propose a modification to the Dataset Aggregation (DAgger) algorithm to reuse the expert knowledge and train a new target agent to replicate that behaviour in fewer than 5 iterations, compared to the 100 iterations taken by the expert agent which means reducing training time by 95%. Our modifications to the DAgger algorithm improve the balance between exploiting the expert policy and exploring the environment. We show empirically that these improve convergence time of the target agent, particularly when there is some degree of variation between expert and naive agent.
Tasks	Imitation Learning, Transfer Learning
Published	2019-01-15
URL	http://arxiv.org/abs/1901.04772v1
PDF	http://arxiv.org/pdf/1901.04772v1.pdf
PWC	https://paperswithcode.com/paper/transfer-learning-for-prosthetics-using
Repo
Framework

CryptoNN: Training Neural Networks over Encrypted Data


Title	CryptoNN: Training Neural Networks over Encrypted Data
Authors	Runhua Xu, James B. D. Joshi, Chao Li
Abstract	Emerging neural networks based machine learning techniques such as deep learning and its variants have shown tremendous potential in many application domains. However, they raise serious privacy concerns due to the risk of leakage of highly privacy-sensitive data when data collected from users is used to train neural network models to support predictive tasks. To tackle such serious privacy concerns, several privacy-preserving approaches have been proposed in the literature that use either secure multi-party computation (SMC) or homomorphic encryption (HE) as the underlying mechanisms. However, neither of these cryptographic approaches provides an efficient solution towards constructing a privacy-preserving machine learning model, as well as supporting both the training and inference phases. To tackle the above issue, we propose a CryptoNN framework that supports training a neural network model over encrypted data by using the emerging functional encryption scheme instead of SMC or HE. We also construct a functional encryption scheme for basic arithmetic computation to support the requirement of the proposed CryptoNN framework. We present performance evaluation and security analysis of the underlying crypto scheme and show through our experiments that CryptoNN achieves accuracy that is similar to those of the baseline neural network models on the MNIST dataset.
Tasks
Published	2019-04-15
URL	http://arxiv.org/abs/1904.07303v2
PDF	http://arxiv.org/pdf/1904.07303v2.pdf
PWC	https://paperswithcode.com/paper/cryptonn-training-neural-networks-over
Repo
Framework

Deep Metric Learning with Alternating Projections onto Feasible Sets


Title	Deep Metric Learning with Alternating Projections onto Feasible Sets
Authors	Oğul Can, Yeti Ziya Gürbüz, A. Aydın Alatan
Abstract	During the training of networks for distance metric learning, minimizers of the typical loss functions can be considered as “feasible points” satisfying a set of constraints imposed by the training data. To this end, we reformulate distance metric learning problem as finding a feasible point of a constraint set where the embedding vectors of the training data satisfy desired intra-class and inter-class proximity. The feasible set induced by the constraint set is expressed as the intersection of the relaxed feasible sets which enforce the proximity constraints only for particular samples (a sample from each class) of the training data. Then, the feasible point problem is to be approximately solved by performing alternating projections onto those feasible sets. Such an approach introduces a regularization term and results in minimizing a typical loss function with a systematic batch set construction where these batches are constrained to contain the same sample from each class for a certain number of iterations. Moreover, these particular samples can be considered as the class representatives, allowing efficient utilization of hard class mining during batch construction. The proposed technique is applied with the well-accepted losses and evaluated on Stanford Online Products, CAR196 and CUB200-2011 datasets for image retrieval and clustering. Outperforming state-of-the-art, the proposed approach consistently improves the performance of the integrated loss functions with no additional computational cost and boosts the performance further by hard negative class mining.
Tasks	Image Retrieval, Metric Learning
Published	2019-07-17
URL	https://arxiv.org/abs/1907.07585v2
PDF	https://arxiv.org/pdf/1907.07585v2.pdf
PWC	https://paperswithcode.com/paper/deep-metric-learning-with-alternating
Repo
Framework

Nonstochastic Multiarmed Bandits with Unrestricted Delays


Title	Nonstochastic Multiarmed Bandits with Unrestricted Delays
Authors	Tobias Sommer Thune, Nicolò Cesa-Bianchi, Yevgeny Seldin
Abstract	We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical nor bounded. We first prove that “delayed” Exp3 achieves the $O(\sqrt{(KT + D)\ln K} )$ regret bound conjectured by Cesa-Bianchi et al. [2019] in the case of variable, but bounded delays. Here, $K$ is the number of actions and $D$ is the total delay over $T$ rounds. We then introduce a new algorithm that lifts the requirement of bounded delays by using a wrapper that skips rounds with excessively large delays. The new algorithm maintains the same regret bound, but similar to its predecessor requires prior knowledge of $D$ and $T$. For this algorithm we then construct a novel doubling scheme that forgoes the prior knowledge requirement under the assumption that the delays are available at action time (rather than at loss observation time). This assumption is satisfied in a broad range of applications, including interaction with servers and service providers. The resulting oracle regret bound is of order $\min_\beta (S_\beta+\beta \ln K + (KT + D_\beta)/\beta)$, where $S_\beta$ is the number of observations with delay exceeding $\beta$, and $D_\beta$ is the total delay of observations with delay below $\beta$. The bound relaxes to $O (\sqrt{(KT + D)\ln K} )$, but we also provide examples where $D_\beta \ll D$ and the oracle bound has a polynomially better dependence on the problem parameters.
Tasks
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00670v2
PDF	https://arxiv.org/pdf/1906.00670v2.pdf
PWC	https://paperswithcode.com/paper/190600670
Repo
Framework

Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue


Title	Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue
Authors	Dongyeop Kang, Anusha Balakrishnan, Pararth Shah, Paul Crook, Y-Lan Boureau, Jason Weston
Abstract	Traditional recommendation systems produce static rather than interactive recommendations invariant to a user’s specific requests, clarifications, or current mood, and can suffer from the cold-start problem if their tastes are unknown. These issues can be alleviated by treating recommendation as an interactive dialogue task instead, where an expert recommender can sequentially ask about someone’s preferences, react to their requests, and recommend more appropriate items. In this work, we collect a goal-driven recommendation dialogue dataset (GoRecDial), which consists of 9,125 dialogue games and 81,260 conversation turns between pairs of human workers recommending movies to each other. The task is specifically designed as a cooperative game between two players working towards a quantifiable common goal. We leverage the dataset to develop an end-to-end dialogue system that can simultaneously converse and recommend. Models are first trained to imitate the behavior of human players without considering the task goal itself (supervised training). We then finetune our models on simulated bot-bot conversations between two paired pre-trained models (bot-play), in order to achieve the dialogue goal. Our experiments show that models finetuned with bot-play learn improved dialogue strategies, reach the dialogue goal more often when paired with a human, and are rated as more consistent by humans compared to models trained without bot-play. The dataset and code are publicly available through the ParlAI framework.
Tasks	Recommendation Systems
Published	2019-09-09
URL	https://arxiv.org/abs/1909.03922v1
PDF	https://arxiv.org/pdf/1909.03922v1.pdf
PWC	https://paperswithcode.com/paper/recommendation-as-a-communication-game-self
Repo
Framework

Quantifying Algorithmic Biases over Time


Title	Quantifying Algorithmic Biases over Time
Authors	Vivek K. Singh, Ishaan Singh
Abstract	Algorithms now permeate multiple aspects of human lives and multiple recent results have reported that these algorithms may have biases pertaining to gender, race, and other demographic characteristics. The metrics used to quantify such biases have still focused on a static notion of algorithms. However, algorithms evolve over time. For instance, Tay (a conversational bot launched by Microsoft) was arguably not biased at its launch but quickly became biased, sexist, and racist over time. We suggest a set of intuitive metrics to study the variations in biases over time and present the results for a case study for genders represented in images resulting from a Twitter image search for #Nurse and #Doctor over a period of 21 days. Results indicate that biases vary significantly over time and the direction of bias could appear to be different on different days. Hence, one-shot measurements may not suffice for understanding algorithmic bias, thus motivating further work on studying biases in algorithms over time.
Tasks	Image Retrieval
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01671v1
PDF	https://arxiv.org/pdf/1907.01671v1.pdf
PWC	https://paperswithcode.com/paper/quantifying-algorithmic-biases-over-time
Repo
Framework

A Dual Camera System for High Spatiotemporal Resolution Video Acquisition


Title	A Dual Camera System for High Spatiotemporal Resolution Video Acquisition
Authors	Ming Cheng, Zhan Ma, M. Salman Asif, Yiling Xu, Haojie Liu, Wenbo Bao, Jun Sun
Abstract	This paper presents a dual camera system for high spatiotemporal resolution (HSTR) video acquisition, where one camera shoots a video with high spatial resolution and low frame rate (HSR-LFR) and another one captures a low spatial resolution and high frame rate (LSR-HFR) video. Our main goal is to combine videos from LSR-HFR and HSR-LFR cameras to create an HSTR video. We propose an end-to-end learning framework, AWnet, mainly consisting of a FlowNet and a FusionNet that learn an adaptive weighting function in pixel domain to combine inputs in a frame recurrent fashion. To improve the reconstruction quality for cameras used in reality, we also introduce noise regularization under the same framework. Our method has demonstrated noticeable performance gains in terms of both objective PSNR measurement in simulation with different publicly available video and light-field datasets and subjective evaluation with real data captured by dual iPhone 7 and Grasshopper3 cameras. Ablation studies are further conducted to investigate and explore various aspects (such as reference structure, camera parallax, exposure time, etc) of our system to fully understand its capability for potential applications.
Tasks
Published	2019-09-28
URL	https://arxiv.org/abs/1909.13051v2
PDF	https://arxiv.org/pdf/1909.13051v2.pdf
PWC	https://paperswithcode.com/paper/a-dual-camera-system-for-high-spatiotemporal
Repo
Framework

Ground Plane based Absolute Scale Estimation for Monocular Visual Odometry


Title	Ground Plane based Absolute Scale Estimation for Monocular Visual Odometry
Authors	Dingfu Zhou, Yuchao Dai, Hongdong Li
Abstract	Recovering the absolute metric scale from a monocular camera is a challenging but highly desirable problem for monocular camera-based systems. By using different kinds of cues, various approaches have been proposed for scale estimation, such as camera height, object size etc. In this paper, firstly, we summarize different kinds of scale estimation approaches. Then, we propose a robust divide and conquer the absolute scale estimation method based on the ground plane and camera height by analyzing the advantages and disadvantages of different approaches. By using the estimated scale, an effective scale correction strategy has been proposed to reduce the scale drift during the Monocular Visual Odometry (VO) estimation process. Finally, the effectiveness and robustness of the proposed method have been verified on both public and self-collected image sequences.
Tasks	Monocular Visual Odometry, Visual Odometry
Published	2019-03-03
URL	http://arxiv.org/abs/1903.00912v1
PDF	http://arxiv.org/pdf/1903.00912v1.pdf
PWC	https://paperswithcode.com/paper/ground-plane-based-absolute-scale-estimation
Repo
Framework

Visual Coin-Tracking: Tracking of Planar Double-Sided Objects


Title	Visual Coin-Tracking: Tracking of Planar Double-Sided Objects
Authors	Jonáš Šerých, Jiří Matas
Abstract	We introduce a new video analysis problem – tracking of rigid planar objects in sequences where both their sides are visible. Such coin-like objects often rotate fast with respect to an arbitrary axis producing unique challenges, such as fast incident light and aspect ratio change and rotational motion blur. Despite being common, neither tracking sequences containing coin-like objects nor suitable algorithm have been published. As a second contribution, we present a novel coin-tracking benchmark containing 17 video sequences annotated with object segmentation masks. Experiments show that the sequences differ significantly from the ones encountered in standard tracking datasets. We propose a baseline coin-tracking method based on convolutional neural network segmentation and explicit pose modeling. Its performance confirms that coin-tracking is an open and challenging problem.
Tasks	Semantic Segmentation
Published	2019-08-07
URL	https://arxiv.org/abs/1908.02664v1
PDF	https://arxiv.org/pdf/1908.02664v1.pdf
PWC	https://paperswithcode.com/paper/visual-coin-tracking-tracking-of-planar
Repo
Framework

Hadamard Codebook Based Deep Hashing


Title	Hadamard Codebook Based Deep Hashing
Authors	Shen Chen, Liujuan Cao, Mingbao Lin, Yan Wang, Xiaoshuai Sun, Chenglin Wu, Jingfei Qiu, Rongrong Ji
Abstract	As an approximate nearest neighbor search technique, hashing has been widely applied in large-scale image retrieval due to its excellent efficiency. Most supervised deep hashing methods have similar loss designs with embedding learning, while quantizing the continuous high-dim feature into compact binary space. We argue that the existing deep hashing schemes are defective in two issues that seriously affect the performance, i.e., bit independence and bit balance. The former refers to hash codes of different classes should be independent of each other, while the latter means each bit should have a balanced distribution of +1s and -1s. In this paper, we propose a novel supervised deep hashing method, termed Hadamard Codebook based Deep Hashing (HCDH), which solves the above two problems in a unified formulation. Specifically, we utilize an off-the-shelf algorithm to generate a binary Hadamard codebook to satisfy the requirement of bit independence and bit balance, which subsequently serves as the desired outputs of the hash functions learning. We also introduce a projection matrix to solve the inconsistency between the order of Hadamard matrix and the number of classes. Besides, the proposed HCDH further exploits the supervised labels by constructing a classifier on top of the outputs of hash functions. Extensive experiments demonstrate that HCDH can yield discriminative and balanced binary codes, which well outperforms many state-of-the-arts on three widely-used benchmarks.
Tasks	Image Retrieval
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09182v1
PDF	https://arxiv.org/pdf/1910.09182v1.pdf
PWC	https://paperswithcode.com/paper/hadamard-codebook-based-deep-hashing
Repo
Framework

Trajectory Prediction by Coupling Scene-LSTM with Human Movement LSTM


Title	Trajectory Prediction by Coupling Scene-LSTM with Human Movement LSTM
Authors	Manh Huynh, Gita Alaghband
Abstract	We develop a novel human trajectory prediction system that incorporates the scene information (Scene-LSTM) as well as individual pedestrian movement (Pedestrian-LSTM) trained simultaneously within static crowded scenes. We superimpose a two-level grid structure (grid cells and subgrids) on the scene to encode spatial granularity plus common human movements. The Scene-LSTM captures the commonly traveled paths that can be used to significantly influence the accuracy of human trajectory prediction in local areas (i.e. grid cells). We further design scene data filters, consisting of a hard filter and a soft filter, to select the relevant scene information in a local region when necessary and combine it with Pedestrian-LSTM for forecasting a pedestrian’s future locations. The experimental results on several publicly available datasets demonstrate that our method outperforms related works and can produce more accurate predicted trajectories in different scene contexts.
Tasks	Trajectory Prediction
Published	2019-08-23
URL	https://arxiv.org/abs/1908.08908v1
PDF	https://arxiv.org/pdf/1908.08908v1.pdf
PWC	https://paperswithcode.com/paper/trajectory-prediction-by-coupling-scene-lstm
Repo
Framework

Metalearned Neural Memory


Title	Metalearned Neural Memory
Authors	Tsendsuren Munkhdalai, Alessandro Sordoni, Tong Wang, Adam Trischler
Abstract	We augment recurrent neural networks with an external memory mechanism that builds upon recent progress in metalearning. We conceptualize this memory as a rapidly adaptable function that we parameterize as a deep neural network. Reading from the neural memory function amounts to pushing an input (the key vector) through the function to produce an output (the value vector). Writing to memory means changing the function; specifically, updating the parameters of the neural network to encode desired information. We leverage training and algorithmic techniques from metalearning to update the neural memory function in one shot. The proposed memory-augmented model achieves strong performance on a variety of learning problems, from supervised question answering to reinforcement learning.
Tasks	Question Answering
Published	2019-07-23
URL	https://arxiv.org/abs/1907.09720v2
PDF	https://arxiv.org/pdf/1907.09720v2.pdf
PWC	https://paperswithcode.com/paper/metalearned-neural-memory
Repo
Framework

Optimization and Abstraction: A Synergistic Approach for Analyzing Neural Network Robustness


Title	Optimization and Abstraction: A Synergistic Approach for Analyzing Neural Network Robustness
Authors	Greg Anderson, Shankara Pailoor, Isil Dillig, Swarat Chaudhuri
Abstract	In recent years, the notion of local robustness (or robustness for short) has emerged as a desirable property of deep neural networks. Intuitively, robustness means that small perturbations to an input do not cause the network to perform misclassifications. In this paper, we present a novel algorithm for verifying robustness properties of neural networks. Our method synergistically combines gradient-based optimization methods for counterexample search with abstraction-based proof search to obtain a sound and ({\delta}-)complete decision procedure. Our method also employs a data-driven approach to learn a verification policy that guides abstract interpretation during proof search. We have implemented the proposed approach in a tool called Charon and experimentally evaluated it on hundreds of benchmarks. Our experiments show that the proposed approach significantly outperforms three state-of-the-art tools, namely AI^2 , Reluplex, and Reluval.
Tasks
Published	2019-04-22
URL	http://arxiv.org/abs/1904.09959v2
PDF	http://arxiv.org/pdf/1904.09959v2.pdf
PWC	https://paperswithcode.com/paper/optimization-abstraction-a-synergistic
Repo
Framework

UniDual: A Unified Model for Image and Video Understanding


Title	UniDual: A Unified Model for Image and Video Understanding
Authors	Yufei Wang, Du Tran, Lorenzo Torresani
Abstract	Although a video is effectively a sequence of images, visual perception systems typically model images and videos separately, thus failing to exploit the correlation and the synergy provided by these two media. While a few prior research efforts have explored the benefits of leveraging still-image datasets for video analysis, or vice-versa, most of these attempts have been limited to pretraining a model on one type of visual modality and then adapting it via finetuning on the other modality. In contrast, in this paper we introduce a framework that enables joint training of a unified model on mixed collections of image and video examples spanning different tasks. The key ingredient in our architecture design is a new network block, which we name UniDual. It consists of a shared 2D spatial convolution followed by two parallel point-wise convolutional layers, one devoted to images and the other one used for videos. For video input, the point-wise filtering implements a temporal convolution. For image input, it performs a pixel-wise nonlinear transformation. Repeated stacking of such blocks gives rise to a network where images and videos undergo partially distinct execution pathways, unified by spatial convolutions (capturing commonalities in visual appearance) but separated by point-wise operations (modeling patterns specific to each modality). Extensive experiments on Kinetics and ImageNet demonstrate that our UniDual model jointly trained on these datasets yields substantial accuracy gains for both tasks, compared to 1) training separate models, 2) traditional multi-task learning and 3) the conventional framework of pretraining-followed-by-finetuning. On Kinetics, the UniDual architecture applied to a state-of-the-art video backbone model (R(2+1)D-152) yields an additional video@1 accuracy gain of 1.5%.
Tasks	Multi-Task Learning, Video Understanding
Published	2019-06-10
URL	https://arxiv.org/abs/1906.03857v2
PDF	https://arxiv.org/pdf/1906.03857v2.pdf
PWC	https://paperswithcode.com/paper/unidual-a-unified-model-for-image-and-video
Repo
Framework