July 30, 2019

3041 words 15 mins read

Paper Group AWR 55

Skill2vec: Machine Learning Approach for Determining the Relevant Skills from Job Description. Neural 3D Mesh Renderer. Visual Interaction Networks. DeepDSL: A Compilation-based Domain-Specific Language for Deep Learning. Valid Inference Corrected for Outlier Removal. Word Embeddings via Tensor Factorization. Efficient variational Bayesian neural n …

Skill2vec: Machine Learning Approach for Determining the Relevant Skills from Job Description


Title	Skill2vec: Machine Learning Approach for Determining the Relevant Skills from Job Description
Authors	Le Van-Duyet, Vo Minh Quan, Dang Quang An
Abstract	Unsupervise learned word embeddings have seen tremendous success in numerous Natural Language Processing (NLP) tasks in recent years. The main contribution of this paper is to develop a technique called Skill2vec, which applies machine learning techniques in recruitment to enhance the search strategy to find candidates possessing the appropriate skills. Skill2vec is a neural network architecture inspired by Word2vec, developed by Mikolov et al. in 2013. It transforms skills to new vector space, which has the characteristics of calculation and presents skills relationships. We conducted an experiment evaluation manually by a recruitment company’s domain experts to demonstrate the effectiveness of our approach.
Tasks	Word Embeddings
Published	2017-07-31
URL	https://arxiv.org/abs/1707.09751v3
PDF	https://arxiv.org/pdf/1707.09751v3.pdf
PWC	https://paperswithcode.com/paper/skill2vec-machine-learning-approach-for
Repo	https://github.com/duyetdev/skill2vec-dataset
Framework	none

Neural 3D Mesh Renderer


Title	Neural 3D Mesh Renderer
Authors	Hiroharu Kato, Yoshitaka Ushiku, Tatsuya Harada
Abstract	For modeling the 3D world behind 2D images, which 3D representation is most appropriate? A polygon mesh is a promising candidate for its compactness and geometric properties. However, it is not straightforward to model a polygon mesh from 2D images using neural networks because the conversion from a mesh to an image, or rendering, involves a discrete operation called rasterization, which prevents back-propagation. Therefore, in this work, we propose an approximate gradient for rasterization that enables the integration of rendering into neural networks. Using this renderer, we perform single-image 3D mesh reconstruction with silhouette image supervision and our system outperforms the existing voxel-based approach. Additionally, we perform gradient-based 3D mesh editing operations, such as 2D-to-3D style transfer and 3D DeepDream, with 2D supervision for the first time. These applications demonstrate the potential of the integration of a mesh renderer into neural networks and the effectiveness of our proposed renderer.
Tasks	3D Object Reconstruction, Style Transfer
Published	2017-11-20
URL	http://arxiv.org/abs/1711.07566v1
PDF	http://arxiv.org/pdf/1711.07566v1.pdf
PWC	https://paperswithcode.com/paper/neural-3d-mesh-renderer
Repo	https://github.com/laughtervv/tf_neural_renderer
Framework	tf

Visual Interaction Networks


Title	Visual Interaction Networks
Authors	Nicholas Watters, Andrea Tacchetti, Theophane Weber, Razvan Pascanu, Peter Battaglia, Daniel Zoran
Abstract	From just a glance, humans can make rich predictions about the future state of a wide range of physical systems. On the other hand, modern approaches from engineering, robotics, and graphics are often restricted to narrow domains and require direct measurements of the underlying states. We introduce the Visual Interaction Network, a general-purpose model for learning the dynamics of a physical system from raw visual observations. Our model consists of a perceptual front-end based on convolutional neural networks and a dynamics predictor based on interaction networks. Through joint training, the perceptual front-end learns to parse a dynamic visual scene into a set of factored latent object representations. The dynamics predictor learns to roll these states forward in time by computing their interactions and dynamics, producing a predicted physical trajectory of arbitrary length. We found that from just six input video frames the Visual Interaction Network can generate accurate future trajectories of hundreds of time steps on a wide range of physical systems. Our model can also be applied to scenes with invisible objects, inferring their future states from their effects on the visible objects, and can implicitly infer the unknown mass of objects. Our results demonstrate that the perceptual module and the object-based dynamics predictor module can induce factored latent representations that support accurate dynamical predictions. This work opens new opportunities for model-based decision-making and planning from raw sensory observations in complex physical environments.
Tasks	Decision Making
Published	2017-06-05
URL	http://arxiv.org/abs/1706.01433v1
PDF	http://arxiv.org/pdf/1706.01433v1.pdf
PWC	https://paperswithcode.com/paper/visual-interaction-networks
Repo	https://github.com/gitlimlab/Relation-Network-Tensorflow
Framework	tf

DeepDSL: A Compilation-based Domain-Specific Language for Deep Learning


Title	DeepDSL: A Compilation-based Domain-Specific Language for Deep Learning
Authors	Tian Zhao, Xiaobing Huang, Yu Cao
Abstract	In recent years, Deep Learning (DL) has found great success in domains such as multimedia understanding. However, the complex nature of multimedia data makes it difficult to develop DL-based software. The state-of-the art tools, such as Caffe, TensorFlow, Torch7, and CNTK, while are successful in their applicable domains, are programming libraries with fixed user interface, internal representation, and execution environment. This makes it difficult to implement portable and customized DL applications. In this paper, we present DeepDSL, a domain specific language (DSL) embedded in Scala, that compiles deep networks written in DeepDSL to Java source code. Deep DSL provides (1) intuitive constructs to support compact encoding of deep networks; (2) symbolic gradient derivation of the networks; (3) static analysis for memory consumption and error detection; and (4) DSL-level optimization to improve memory and runtime efficiency. DeepDSL programs are compiled into compact, efficient, customizable, and portable Java source code, which operates the CUDA and CUDNN interfaces running on Nvidia GPU via a Java Native Interface (JNI) library. We evaluated DeepDSL with a number of popular DL networks. Our experiments show that the compiled programs have very competitive runtime performance and memory efficiency compared to the existing libraries.
Tasks
Published	2017-01-09
URL	http://arxiv.org/abs/1701.02284v1
PDF	http://arxiv.org/pdf/1701.02284v1.pdf
PWC	https://paperswithcode.com/paper/deepdsl-a-compilation-based-domain-specific
Repo	https://github.com/deepdsl/deepdsl
Framework	tf

Valid Inference Corrected for Outlier Removal


Title	Valid Inference Corrected for Outlier Removal
Authors	Shuxiao Chen, Jacob Bien
Abstract	Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to (1) identify and remove outliers by looking at the data and (2) to fit OLS and form confidence intervals and p-values on the remaining data as if this were the original data collected. This standard “detect-and-forget” approach has been shown to be problematic, and in this paper we highlight the fact that it can lead to invalid inference and show how recently developed tools in selective inference can be used to properly account for outlier detection and removal. Our inferential procedures apply to a general class of outlier removal procedures that includes several of the most commonly used approaches. We conduct simulations to corroborate the theoretical results, and we apply our method to three real data sets to illustrate how our inferential results can differ from the traditional detect-and-forget strategy. A companion R package, outference, implements these new procedures with an interface that matches the functions commonly used for inference with lm in R.
Tasks	Outlier Detection
Published	2017-11-29
URL	https://arxiv.org/abs/1711.10635v3
PDF	https://arxiv.org/pdf/1711.10635v3.pdf
PWC	https://paperswithcode.com/paper/valid-inference-corrected-for-outlier-removal
Repo	https://github.com/shuxiaoc/outference
Framework	none

Word Embeddings via Tensor Factorization


Title	Word Embeddings via Tensor Factorization
Authors	Eric Bailey, Shuchin Aeron
Abstract	Most popular word embedding techniques involve implicit or explicit factorization of a word co-occurrence based matrix into low rank factors. In this paper, we aim to generalize this trend by using numerical methods to factor higher-order word co-occurrence based arrays, or \textit{tensors}. We present four word embeddings using tensor factorization and analyze their advantages and disadvantages. One of our main contributions is a novel joint symmetric tensor factorization technique related to the idea of coupled tensor factorization. We show that embeddings based on tensor factorization can be used to discern the various meanings of polysemous words without being explicitly trained to do so, and motivate the intuition behind why this works in a way that doesn’t with existing methods. We also modify an existing word embedding evaluation metric known as Outlier Detection [Camacho-Collados and Navigli, 2016] to evaluate the quality of the order-$N$ relations that a word embedding captures, and show that tensor-based methods outperform existing matrix-based methods at this task. Experimentally, we show that all of our word embeddings either outperform or are competitive with state-of-the-art baselines commonly used today on a variety of recent datasets. Suggested applications of tensor factorization-based word embeddings are given, and all source code and pre-trained vectors are publicly available online.
Tasks	Outlier Detection, Word Embeddings
Published	2017-04-10
URL	http://arxiv.org/abs/1704.02686v2
PDF	http://arxiv.org/pdf/1704.02686v2.pdf
PWC	https://paperswithcode.com/paper/word-embeddings-via-tensor-factorization
Repo	https://github.com/dnguyen1196/word-embedding-cp
Framework	tf

Efficient variational Bayesian neural network ensembles for outlier detection


Title	Efficient variational Bayesian neural network ensembles for outlier detection
Authors	Nick Pawlowski, Miguel Jaques, Ben Glocker
Abstract	In this work we perform outlier detection using ensembles of neural networks obtained by variational approximation of the posterior in a Bayesian neural network setting. The variational parameters are obtained by sampling from the true posterior by gradient descent. We show our outlier detection results are comparable to those obtained using other efficient ensembling methods.
Tasks	Outlier Detection
Published	2017-03-20
URL	http://arxiv.org/abs/1703.06749v2
PDF	http://arxiv.org/pdf/1703.06749v2.pdf
PWC	https://paperswithcode.com/paper/efficient-variational-bayesian-neural-network
Repo	https://github.com/pawni/sgld_online_approximation
Framework	tf

Query-adaptive Video Summarization via Quality-aware Relevance Estimation


Title	Query-adaptive Video Summarization via Quality-aware Relevance Estimation
Authors	Arun Balajee Vasudevan, Michael Gygli, Anna Volokitin, Luc Van Gool
Abstract	Although the problem of automatic video summarization has recently received a lot of attention, the problem of creating a video summary that also highlights elements relevant to a search query has been less studied. We address this problem by posing query-relevant summarization as a video frame subset selection problem, which lets us optimise for summaries which are simultaneously diverse, representative of the entire video, and relevant to a text query. We quantify relevance by measuring the distance between frames and queries in a common textual-visual semantic embedding space induced by a neural network. In addition, we extend the model to capture query-independent properties, such as frame quality. We compare our method against previous state of the art on textual-visual embeddings for thumbnail selection and show that our model outperforms them on relevance prediction. Furthermore, we introduce a new dataset, annotated with diversity and query-specific relevance labels. On this dataset, we train and test our complete model for video summarization and show that it outperforms standard baselines such as Maximal Marginal Relevance.
Tasks	Video Summarization
Published	2017-05-01
URL	http://arxiv.org/abs/1705.00581v2
PDF	http://arxiv.org/pdf/1705.00581v2.pdf
PWC	https://paperswithcode.com/paper/query-adaptive-video-summarization-via
Repo	https://github.com/arunbalajeev/query-video-summary
Framework	none

PAC-Bayes and Domain Adaptation


Title	PAC-Bayes and Domain Adaptation
Authors	Pascal Germain, Amaury Habrard, François Laviolette, Emilie Morvant
Abstract	We provide two main contributions in PAC-Bayesian theory for domain adaptation where the objective is to learn, from a source distribution, a well-performing majority vote on a different, but related, target distribution. Firstly, we propose an improvement of the previous approach we proposed in Germain et al. (2013), which relies on a novel distribution pseudodistance based on a disagreement averaging, allowing us to derive a new tighter domain adaptation bound for the target risk. While this bound stands in the spirit of common domain adaptation works, we derive a second bound (introduced in Germain et al., 2016) that brings a new perspective on domain adaptation by deriving an upper bound on the target risk where the distributions’ divergence-expressed as a ratio-controls the trade-off between a source error measure and the target voters’ disagreement. We discuss and compare both results, from which we obtain PAC-Bayesian generalization bounds. Furthermore, from the PAC-Bayesian specialization to linear classifiers, we infer two learning algorithms, and we evaluate them on real data.
Tasks	Domain Adaptation
Published	2017-07-17
URL	https://arxiv.org/abs/1707.05712v3
PDF	https://arxiv.org/pdf/1707.05712v3.pdf
PWC	https://paperswithcode.com/paper/pac-bayes-and-domain-adaptation
Repo	https://github.com/GRAAL-Research/domain_adaptation_of_linear_classifiers
Framework	none

Deep Learning with Low Precision by Half-wave Gaussian Quantization


Title	Deep Learning with Low Precision by Half-wave Gaussian Quantization
Authors	Zhaowei Cai, Xiaodong He, Jian Sun, Nuno Vasconcelos
Abstract	The problem of quantizing the activations of a deep neural network is considered. An examination of the popular binary quantization approach shows that this consists of approximating a classical non-linearity, the hyperbolic tangent, by two functions: a piecewise constant sign function, which is used in feedforward network computations, and a piecewise linear hard tanh function, used in the backpropagation step during network learning. The problem of approximating the ReLU non-linearity, widely used in the recent deep learning literature, is then considered. An half-wave Gaussian quantizer (HWGQ) is proposed for forward approximation and shown to have efficient implementation, by exploiting the statistics of of network activations and batch normalization operations commonly used in the literature. To overcome the problem of gradient mismatch, due to the use of different forward and backward approximations, several piece-wise backward approximators are then investigated. The implementation of the resulting quantized network, denoted as HWGQ-Net, is shown to achieve much closer performance to full precision networks, such as AlexNet, ResNet, GoogLeNet and VGG-Net, than previously available low-precision networks, with 1-bit binary weights and 2-bit quantized activations.
Tasks	Quantization
Published	2017-02-03
URL	http://arxiv.org/abs/1702.00953v1
PDF	http://arxiv.org/pdf/1702.00953v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-with-low-precision-by-half-wave
Repo	https://github.com/zhaoweicai/hwgq
Framework	none

Semantic Image Synthesis via Adversarial Learning


Title	Semantic Image Synthesis via Adversarial Learning
Authors	Hao Dong, Simiao Yu, Chao Wu, Yike Guo
Abstract	In this paper, we propose a way of synthesizing realistic images directly with natural language description, which has many useful applications, e.g. intelligent image manipulation. We attempt to accomplish such synthesis: given a source image and a target text description, our model synthesizes images to meet two requirements: 1) being realistic while matching the target text description; 2) maintaining other image features that are irrelevant to the text description. The model should be able to disentangle the semantic information from the two modalities (image and text), and generate new images from the combined semantics. To achieve this, we proposed an end-to-end neural architecture that leverages adversarial learning to automatically learn implicit loss functions, which are optimized to fulfill the aforementioned two requirements. We have evaluated our model by conducting experiments on Caltech-200 bird dataset and Oxford-102 flower dataset, and have demonstrated that our model is capable of synthesizing realistic images that match the given descriptions, while still maintain other features of original images.
Tasks	Image Generation
Published	2017-07-21
URL	http://arxiv.org/abs/1707.06873v1
PDF	http://arxiv.org/pdf/1707.06873v1.pdf
PWC	https://paperswithcode.com/paper/semantic-image-synthesis-via-adversarial
Repo	https://github.com/vtddggg/BilinearGAN_for_LBIE
Framework	pytorch


Title	MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments
Authors	Manolis Savva, Angel X. Chang, Alexey Dosovitskiy, Thomas Funkhouser, Vladlen Koltun
Abstract	We present MINOS, a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments. The simulator leverages large datasets of complex 3D environments and supports flexible configuration of multimodal sensor suites. We use MINOS to benchmark deep-learning-based navigation methods, to analyze the influence of environmental complexity on navigation performance, and to carry out a controlled study of multimodality in sensorimotor learning. The experiments show that current deep reinforcement learning approaches fail in large realistic environments. The experiments also indicate that multimodality is beneficial in learning to navigate cluttered scenes. MINOS is released open-source to the research community at http://minosworld.org . A video that shows MINOS can be found at https://youtu.be/c0mL9K64q84
Tasks
Published	2017-12-11
URL	http://arxiv.org/abs/1712.03931v1
PDF	http://arxiv.org/pdf/1712.03931v1.pdf
PWC	https://paperswithcode.com/paper/minos-multimodal-indoor-simulator-for
Repo	https://github.com/minosworld/minos
Framework	none

Multimodal Visual Concept Learning with Weakly Supervised Techniques


Title	Multimodal Visual Concept Learning with Weakly Supervised Techniques
Authors	Giorgos Bouritsas, Petros Koutras, Athanasia Zlatintsi, Petros Maragos
Abstract	Despite the availability of a huge amount of video data accompanied by descriptive texts, it is not always easy to exploit the information contained in natural language in order to automatically recognize video concepts. Towards this goal, in this paper we use textual cues as means of supervision, introducing two weakly supervised techniques that extend the Multiple Instance Learning (MIL) framework: the Fuzzy Sets Multiple Instance Learning (FSMIL) and the Probabilistic Labels Multiple Instance Learning (PLMIL). The former encodes the spatio-temporal imprecision of the linguistic descriptions with Fuzzy Sets, while the latter models different interpretations of each description’s semantics with Probabilistic Labels, both formulated through a convex optimization algorithm. In addition, we provide a novel technique to extract weak labels in the presence of complex semantics, that consists of semantic similarity computations. We evaluate our methods on two distinct problems, namely face and action recognition, in the challenging and realistic setting of movies accompanied by their screenplays, contained in the COGNIMUSE database. We show that, on both tasks, our method considerably outperforms a state-of-the-art weakly supervised approach, as well as other baselines.
Tasks	Multiple Instance Learning, Semantic Similarity, Semantic Textual Similarity, Temporal Action Localization
Published	2017-12-03
URL	http://arxiv.org/abs/1712.00796v3
PDF	http://arxiv.org/pdf/1712.00796v3.pdf
PWC	https://paperswithcode.com/paper/multimodal-visual-concept-learning-with
Repo	https://github.com/gbouritsas/cvpr18_multimodal_weakly_supervised_learning
Framework	none

GP-GAN: Towards Realistic High-Resolution Image Blending


Title	GP-GAN: Towards Realistic High-Resolution Image Blending
Authors	Huikai Wu, Shuai Zheng, Junge Zhang, Kaiqi Huang
Abstract	It is common but challenging to address high-resolution image blending in the automatic photo editing application. In this paper, we would like to focus on solving the problem of high-resolution image blending, where the composite images are provided. We propose a framework called Gaussian-Poisson Generative Adversarial Network (GP-GAN) to leverage the strengths of the classical gradient-based approach and Generative Adversarial Networks. To the best of our knowledge, it’s the first work that explores the capability of GANs in high-resolution image blending task. Concretely, we propose Gaussian-Poisson Equation to formulate the high-resolution image blending problem, which is a joint optimization constrained by the gradient and color information. Inspired by the prior works, we obtain gradient information via applying gradient filters. To generate the color information, we propose a Blending GAN to learn the mapping between the composite images and the well-blended ones. Compared to the alternative methods, our approach can deliver high-resolution, realistic images with fewer bleedings and unpleasant artifacts. Experiments confirm that our approach achieves the state-of-the-art performance on Transient Attributes dataset. A user study on Amazon Mechanical Turk finds that the majority of workers are in favor of the proposed method.
Tasks	Conditional Image Generation, Image Generation
Published	2017-03-21
URL	https://arxiv.org/abs/1703.07195v3
PDF	https://arxiv.org/pdf/1703.07195v3.pdf
PWC	https://paperswithcode.com/paper/gp-gan-towards-realistic-high-resolution
Repo	https://github.com/msinghal34/Image-Blending-using-GP-GANs
Framework	pytorch

Improved Adversarial Systems for 3D Object Generation and Reconstruction


Title	Improved Adversarial Systems for 3D Object Generation and Reconstruction
Authors	Edward Smith, David Meger
Abstract	This paper describes a new approach for training generative adversarial networks (GAN) to understand the detailed 3D shape of objects. While GANs have been used in this domain previously, they are notoriously hard to train, especially for the complex joint data distribution over 3D objects of many categories and orientations. Our method extends previous work by employing the Wasserstein distance normalized with gradient penalization as a training objective. This enables improved generation from the joint object shape distribution. Our system can also reconstruct 3D shape from 2D images and perform shape completion from occluded 2.5D range scans. We achieve notable quantitative improvements in comparison to existing baselines
Tasks
Published	2017-07-29
URL	http://arxiv.org/abs/1707.09557v3
PDF	http://arxiv.org/pdf/1707.09557v3.pdf
PWC	https://paperswithcode.com/paper/improved-adversarial-systems-for-3d-object
Repo	https://github.com/kingcheng2000/3D-IWGAN
Framework	none