Paper Group AWR 55
Skill2vec: Machine Learning Approach for Determining the Relevant Skills from Job Description. Neural 3D Mesh Renderer. Visual Interaction Networks. DeepDSL: A Compilation-based Domain-Specific Language for Deep Learning. Valid Inference Corrected for Outlier Removal. Word Embeddings via Tensor Factorization. Efficient variational Bayesian neural n …
Skill2vec: Machine Learning Approach for Determining the Relevant Skills from Job Description
Title | Skill2vec: Machine Learning Approach for Determining the Relevant Skills from Job Description |
Authors | Le Van-Duyet, Vo Minh Quan, Dang Quang An |
Abstract | Unsupervise learned word embeddings have seen tremendous success in numerous Natural Language Processing (NLP) tasks in recent years. The main contribution of this paper is to develop a technique called Skill2vec, which applies machine learning techniques in recruitment to enhance the search strategy to find candidates possessing the appropriate skills. Skill2vec is a neural network architecture inspired by Word2vec, developed by Mikolov et al. in 2013. It transforms skills to new vector space, which has the characteristics of calculation and presents skills relationships. We conducted an experiment evaluation manually by a recruitment company’s domain experts to demonstrate the effectiveness of our approach. |
Tasks | Word Embeddings |
Published | 2017-07-31 |
URL | https://arxiv.org/abs/1707.09751v3 |
https://arxiv.org/pdf/1707.09751v3.pdf | |
PWC | https://paperswithcode.com/paper/skill2vec-machine-learning-approach-for |
Repo | https://github.com/duyetdev/skill2vec-dataset |
Framework | none |
Neural 3D Mesh Renderer
Title | Neural 3D Mesh Renderer |
Authors | Hiroharu Kato, Yoshitaka Ushiku, Tatsuya Harada |
Abstract | For modeling the 3D world behind 2D images, which 3D representation is most appropriate? A polygon mesh is a promising candidate for its compactness and geometric properties. However, it is not straightforward to model a polygon mesh from 2D images using neural networks because the conversion from a mesh to an image, or rendering, involves a discrete operation called rasterization, which prevents back-propagation. Therefore, in this work, we propose an approximate gradient for rasterization that enables the integration of rendering into neural networks. Using this renderer, we perform single-image 3D mesh reconstruction with silhouette image supervision and our system outperforms the existing voxel-based approach. Additionally, we perform gradient-based 3D mesh editing operations, such as 2D-to-3D style transfer and 3D DeepDream, with 2D supervision for the first time. These applications demonstrate the potential of the integration of a mesh renderer into neural networks and the effectiveness of our proposed renderer. |
Tasks | 3D Object Reconstruction, Style Transfer |
Published | 2017-11-20 |
URL | http://arxiv.org/abs/1711.07566v1 |
http://arxiv.org/pdf/1711.07566v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-3d-mesh-renderer |
Repo | https://github.com/laughtervv/tf_neural_renderer |
Framework | tf |
Visual Interaction Networks
Title | Visual Interaction Networks |
Authors | Nicholas Watters, Andrea Tacchetti, Theophane Weber, Razvan Pascanu, Peter Battaglia, Daniel Zoran |
Abstract | From just a glance, humans can make rich predictions about the future state of a wide range of physical systems. On the other hand, modern approaches from engineering, robotics, and graphics are often restricted to narrow domains and require direct measurements of the underlying states. We introduce the Visual Interaction Network, a general-purpose model for learning the dynamics of a physical system from raw visual observations. Our model consists of a perceptual front-end based on convolutional neural networks and a dynamics predictor based on interaction networks. Through joint training, the perceptual front-end learns to parse a dynamic visual scene into a set of factored latent object representations. The dynamics predictor learns to roll these states forward in time by computing their interactions and dynamics, producing a predicted physical trajectory of arbitrary length. We found that from just six input video frames the Visual Interaction Network can generate accurate future trajectories of hundreds of time steps on a wide range of physical systems. Our model can also be applied to scenes with invisible objects, inferring their future states from their effects on the visible objects, and can implicitly infer the unknown mass of objects. Our results demonstrate that the perceptual module and the object-based dynamics predictor module can induce factored latent representations that support accurate dynamical predictions. This work opens new opportunities for model-based decision-making and planning from raw sensory observations in complex physical environments. |
Tasks | Decision Making |
Published | 2017-06-05 |
URL | http://arxiv.org/abs/1706.01433v1 |
http://arxiv.org/pdf/1706.01433v1.pdf | |
PWC | https://paperswithcode.com/paper/visual-interaction-networks |
Repo | https://github.com/gitlimlab/Relation-Network-Tensorflow |
Framework | tf |
DeepDSL: A Compilation-based Domain-Specific Language for Deep Learning
Title | DeepDSL: A Compilation-based Domain-Specific Language for Deep Learning |
Authors | Tian Zhao, Xiaobing Huang, Yu Cao |
Abstract | In recent years, Deep Learning (DL) has found great success in domains such as multimedia understanding. However, the complex nature of multimedia data makes it difficult to develop DL-based software. The state-of-the art tools, such as Caffe, TensorFlow, Torch7, and CNTK, while are successful in their applicable domains, are programming libraries with fixed user interface, internal representation, and execution environment. This makes it difficult to implement portable and customized DL applications. In this paper, we present DeepDSL, a domain specific language (DSL) embedded in Scala, that compiles deep networks written in DeepDSL to Java source code. Deep DSL provides (1) intuitive constructs to support compact encoding of deep networks; (2) symbolic gradient derivation of the networks; (3) static analysis for memory consumption and error detection; and (4) DSL-level optimization to improve memory and runtime efficiency. DeepDSL programs are compiled into compact, efficient, customizable, and portable Java source code, which operates the CUDA and CUDNN interfaces running on Nvidia GPU via a Java Native Interface (JNI) library. We evaluated DeepDSL with a number of popular DL networks. Our experiments show that the compiled programs have very competitive runtime performance and memory efficiency compared to the existing libraries. |
Tasks | |
Published | 2017-01-09 |
URL | http://arxiv.org/abs/1701.02284v1 |
http://arxiv.org/pdf/1701.02284v1.pdf | |
PWC | https://paperswithcode.com/paper/deepdsl-a-compilation-based-domain-specific |
Repo | https://github.com/deepdsl/deepdsl |
Framework | tf |
Valid Inference Corrected for Outlier Removal
Title | Valid Inference Corrected for Outlier Removal |
Authors | Shuxiao Chen, Jacob Bien |
Abstract | Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to (1) identify and remove outliers by looking at the data and (2) to fit OLS and form confidence intervals and p-values on the remaining data as if this were the original data collected. This standard “detect-and-forget” approach has been shown to be problematic, and in this paper we highlight the fact that it can lead to invalid inference and show how recently developed tools in selective inference can be used to properly account for outlier detection and removal. Our inferential procedures apply to a general class of outlier removal procedures that includes several of the most commonly used approaches. We conduct simulations to corroborate the theoretical results, and we apply our method to three real data sets to illustrate how our inferential results can differ from the traditional detect-and-forget strategy. A companion R package, outference, implements these new procedures with an interface that matches the functions commonly used for inference with lm in R. |
Tasks | Outlier Detection |
Published | 2017-11-29 |
URL | https://arxiv.org/abs/1711.10635v3 |
https://arxiv.org/pdf/1711.10635v3.pdf | |
PWC | https://paperswithcode.com/paper/valid-inference-corrected-for-outlier-removal |
Repo | https://github.com/shuxiaoc/outference |
Framework | none |
Word Embeddings via Tensor Factorization
Title | Word Embeddings via Tensor Factorization |
Authors | Eric Bailey, Shuchin Aeron |
Abstract | Most popular word embedding techniques involve implicit or explicit factorization of a word co-occurrence based matrix into low rank factors. In this paper, we aim to generalize this trend by using numerical methods to factor higher-order word co-occurrence based arrays, or \textit{tensors}. We present four word embeddings using tensor factorization and analyze their advantages and disadvantages. One of our main contributions is a novel joint symmetric tensor factorization technique related to the idea of coupled tensor factorization. We show that embeddings based on tensor factorization can be used to discern the various meanings of polysemous words without being explicitly trained to do so, and motivate the intuition behind why this works in a way that doesn’t with existing methods. We also modify an existing word embedding evaluation metric known as Outlier Detection [Camacho-Collados and Navigli, 2016] to evaluate the quality of the order-$N$ relations that a word embedding captures, and show that tensor-based methods outperform existing matrix-based methods at this task. Experimentally, we show that all of our word embeddings either outperform or are competitive with state-of-the-art baselines commonly used today on a variety of recent datasets. Suggested applications of tensor factorization-based word embeddings are given, and all source code and pre-trained vectors are publicly available online. |
Tasks | Outlier Detection, Word Embeddings |
Published | 2017-04-10 |
URL | http://arxiv.org/abs/1704.02686v2 |
http://arxiv.org/pdf/1704.02686v2.pdf | |
PWC | https://paperswithcode.com/paper/word-embeddings-via-tensor-factorization |
Repo | https://github.com/dnguyen1196/word-embedding-cp |
Framework | tf |
Efficient variational Bayesian neural network ensembles for outlier detection
Title | Efficient variational Bayesian neural network ensembles for outlier detection |
Authors | Nick Pawlowski, Miguel Jaques, Ben Glocker |
Abstract | In this work we perform outlier detection using ensembles of neural networks obtained by variational approximation of the posterior in a Bayesian neural network setting. The variational parameters are obtained by sampling from the true posterior by gradient descent. We show our outlier detection results are comparable to those obtained using other efficient ensembling methods. |
Tasks | Outlier Detection |
Published | 2017-03-20 |
URL | http://arxiv.org/abs/1703.06749v2 |
http://arxiv.org/pdf/1703.06749v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-variational-bayesian-neural-network |
Repo | https://github.com/pawni/sgld_online_approximation |
Framework | tf |
Query-adaptive Video Summarization via Quality-aware Relevance Estimation
Title | Query-adaptive Video Summarization via Quality-aware Relevance Estimation |
Authors | Arun Balajee Vasudevan, Michael Gygli, Anna Volokitin, Luc Van Gool |
Abstract | Although the problem of automatic video summarization has recently received a lot of attention, the problem of creating a video summary that also highlights elements relevant to a search query has been less studied. We address this problem by posing query-relevant summarization as a video frame subset selection problem, which lets us optimise for summaries which are simultaneously diverse, representative of the entire video, and relevant to a text query. We quantify relevance by measuring the distance between frames and queries in a common textual-visual semantic embedding space induced by a neural network. In addition, we extend the model to capture query-independent properties, such as frame quality. We compare our method against previous state of the art on textual-visual embeddings for thumbnail selection and show that our model outperforms them on relevance prediction. Furthermore, we introduce a new dataset, annotated with diversity and query-specific relevance labels. On this dataset, we train and test our complete model for video summarization and show that it outperforms standard baselines such as Maximal Marginal Relevance. |
Tasks | Video Summarization |
Published | 2017-05-01 |
URL | http://arxiv.org/abs/1705.00581v2 |
http://arxiv.org/pdf/1705.00581v2.pdf | |
PWC | https://paperswithcode.com/paper/query-adaptive-video-summarization-via |
Repo | https://github.com/arunbalajeev/query-video-summary |
Framework | none |
PAC-Bayes and Domain Adaptation
Title | PAC-Bayes and Domain Adaptation |
Authors | Pascal Germain, Amaury Habrard, François Laviolette, Emilie Morvant |
Abstract | We provide two main contributions in PAC-Bayesian theory for domain adaptation where the objective is to learn, from a source distribution, a well-performing majority vote on a different, but related, target distribution. Firstly, we propose an improvement of the previous approach we proposed in Germain et al. (2013), which relies on a novel distribution pseudodistance based on a disagreement averaging, allowing us to derive a new tighter domain adaptation bound for the target risk. While this bound stands in the spirit of common domain adaptation works, we derive a second bound (introduced in Germain et al., 2016) that brings a new perspective on domain adaptation by deriving an upper bound on the target risk where the distributions’ divergence-expressed as a ratio-controls the trade-off between a source error measure and the target voters’ disagreement. We discuss and compare both results, from which we obtain PAC-Bayesian generalization bounds. Furthermore, from the PAC-Bayesian specialization to linear classifiers, we infer two learning algorithms, and we evaluate them on real data. |
Tasks | Domain Adaptation |
Published | 2017-07-17 |
URL | https://arxiv.org/abs/1707.05712v3 |
https://arxiv.org/pdf/1707.05712v3.pdf | |
PWC | https://paperswithcode.com/paper/pac-bayes-and-domain-adaptation |
Repo | https://github.com/GRAAL-Research/domain_adaptation_of_linear_classifiers |
Framework | none |
Deep Learning with Low Precision by Half-wave Gaussian Quantization
Title | Deep Learning with Low Precision by Half-wave Gaussian Quantization |
Authors | Zhaowei Cai, Xiaodong He, Jian Sun, Nuno Vasconcelos |
Abstract | The problem of quantizing the activations of a deep neural network is considered. An examination of the popular binary quantization approach shows that this consists of approximating a classical non-linearity, the hyperbolic tangent, by two functions: a piecewise constant sign function, which is used in feedforward network computations, and a piecewise linear hard tanh function, used in the backpropagation step during network learning. The problem of approximating the ReLU non-linearity, widely used in the recent deep learning literature, is then considered. An half-wave Gaussian quantizer (HWGQ) is proposed for forward approximation and shown to have efficient implementation, by exploiting the statistics of of network activations and batch normalization operations commonly used in the literature. To overcome the problem of gradient mismatch, due to the use of different forward and backward approximations, several piece-wise backward approximators are then investigated. The implementation of the resulting quantized network, denoted as HWGQ-Net, is shown to achieve much closer performance to full precision networks, such as AlexNet, ResNet, GoogLeNet and VGG-Net, than previously available low-precision networks, with 1-bit binary weights and 2-bit quantized activations. |
Tasks | Quantization |
Published | 2017-02-03 |
URL | http://arxiv.org/abs/1702.00953v1 |
http://arxiv.org/pdf/1702.00953v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-with-low-precision-by-half-wave |
Repo | https://github.com/zhaoweicai/hwgq |
Framework | none |
Semantic Image Synthesis via Adversarial Learning
Title | Semantic Image Synthesis via Adversarial Learning |
Authors | Hao Dong, Simiao Yu, Chao Wu, Yike Guo |
Abstract | In this paper, we propose a way of synthesizing realistic images directly with natural language description, which has many useful applications, e.g. intelligent image manipulation. We attempt to accomplish such synthesis: given a source image and a target text description, our model synthesizes images to meet two requirements: 1) being realistic while matching the target text description; 2) maintaining other image features that are irrelevant to the text description. The model should be able to disentangle the semantic information from the two modalities (image and text), and generate new images from the combined semantics. To achieve this, we proposed an end-to-end neural architecture that leverages adversarial learning to automatically learn implicit loss functions, which are optimized to fulfill the aforementioned two requirements. We have evaluated our model by conducting experiments on Caltech-200 bird dataset and Oxford-102 flower dataset, and have demonstrated that our model is capable of synthesizing realistic images that match the given descriptions, while still maintain other features of original images. |
Tasks | Image Generation |
Published | 2017-07-21 |
URL | http://arxiv.org/abs/1707.06873v1 |
http://arxiv.org/pdf/1707.06873v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-image-synthesis-via-adversarial |
Repo | https://github.com/vtddggg/BilinearGAN_for_LBIE |
Framework | pytorch |
MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments
Title | MINOS: Multimodal Indoor Simulator for Navigation in Complex Environments |
Authors | Manolis Savva, Angel X. Chang, Alexey Dosovitskiy, Thomas Funkhouser, Vladlen Koltun |
Abstract | We present MINOS, a simulator designed to support the development of multisensory models for goal-directed navigation in complex indoor environments. The simulator leverages large datasets of complex 3D environments and supports flexible configuration of multimodal sensor suites. We use MINOS to benchmark deep-learning-based navigation methods, to analyze the influence of environmental complexity on navigation performance, and to carry out a controlled study of multimodality in sensorimotor learning. The experiments show that current deep reinforcement learning approaches fail in large realistic environments. The experiments also indicate that multimodality is beneficial in learning to navigate cluttered scenes. MINOS is released open-source to the research community at http://minosworld.org . A video that shows MINOS can be found at https://youtu.be/c0mL9K64q84 |
Tasks | |
Published | 2017-12-11 |
URL | http://arxiv.org/abs/1712.03931v1 |
http://arxiv.org/pdf/1712.03931v1.pdf | |
PWC | https://paperswithcode.com/paper/minos-multimodal-indoor-simulator-for |
Repo | https://github.com/minosworld/minos |
Framework | none |
Multimodal Visual Concept Learning with Weakly Supervised Techniques
Title | Multimodal Visual Concept Learning with Weakly Supervised Techniques |
Authors | Giorgos Bouritsas, Petros Koutras, Athanasia Zlatintsi, Petros Maragos |
Abstract | Despite the availability of a huge amount of video data accompanied by descriptive texts, it is not always easy to exploit the information contained in natural language in order to automatically recognize video concepts. Towards this goal, in this paper we use textual cues as means of supervision, introducing two weakly supervised techniques that extend the Multiple Instance Learning (MIL) framework: the Fuzzy Sets Multiple Instance Learning (FSMIL) and the Probabilistic Labels Multiple Instance Learning (PLMIL). The former encodes the spatio-temporal imprecision of the linguistic descriptions with Fuzzy Sets, while the latter models different interpretations of each description’s semantics with Probabilistic Labels, both formulated through a convex optimization algorithm. In addition, we provide a novel technique to extract weak labels in the presence of complex semantics, that consists of semantic similarity computations. We evaluate our methods on two distinct problems, namely face and action recognition, in the challenging and realistic setting of movies accompanied by their screenplays, contained in the COGNIMUSE database. We show that, on both tasks, our method considerably outperforms a state-of-the-art weakly supervised approach, as well as other baselines. |
Tasks | Multiple Instance Learning, Semantic Similarity, Semantic Textual Similarity, Temporal Action Localization |
Published | 2017-12-03 |
URL | http://arxiv.org/abs/1712.00796v3 |
http://arxiv.org/pdf/1712.00796v3.pdf | |
PWC | https://paperswithcode.com/paper/multimodal-visual-concept-learning-with |
Repo | https://github.com/gbouritsas/cvpr18_multimodal_weakly_supervised_learning |
Framework | none |
GP-GAN: Towards Realistic High-Resolution Image Blending
Title | GP-GAN: Towards Realistic High-Resolution Image Blending |
Authors | Huikai Wu, Shuai Zheng, Junge Zhang, Kaiqi Huang |
Abstract | It is common but challenging to address high-resolution image blending in the automatic photo editing application. In this paper, we would like to focus on solving the problem of high-resolution image blending, where the composite images are provided. We propose a framework called Gaussian-Poisson Generative Adversarial Network (GP-GAN) to leverage the strengths of the classical gradient-based approach and Generative Adversarial Networks. To the best of our knowledge, it’s the first work that explores the capability of GANs in high-resolution image blending task. Concretely, we propose Gaussian-Poisson Equation to formulate the high-resolution image blending problem, which is a joint optimization constrained by the gradient and color information. Inspired by the prior works, we obtain gradient information via applying gradient filters. To generate the color information, we propose a Blending GAN to learn the mapping between the composite images and the well-blended ones. Compared to the alternative methods, our approach can deliver high-resolution, realistic images with fewer bleedings and unpleasant artifacts. Experiments confirm that our approach achieves the state-of-the-art performance on Transient Attributes dataset. A user study on Amazon Mechanical Turk finds that the majority of workers are in favor of the proposed method. |
Tasks | Conditional Image Generation, Image Generation |
Published | 2017-03-21 |
URL | https://arxiv.org/abs/1703.07195v3 |
https://arxiv.org/pdf/1703.07195v3.pdf | |
PWC | https://paperswithcode.com/paper/gp-gan-towards-realistic-high-resolution |
Repo | https://github.com/msinghal34/Image-Blending-using-GP-GANs |
Framework | pytorch |
Improved Adversarial Systems for 3D Object Generation and Reconstruction
Title | Improved Adversarial Systems for 3D Object Generation and Reconstruction |
Authors | Edward Smith, David Meger |
Abstract | This paper describes a new approach for training generative adversarial networks (GAN) to understand the detailed 3D shape of objects. While GANs have been used in this domain previously, they are notoriously hard to train, especially for the complex joint data distribution over 3D objects of many categories and orientations. Our method extends previous work by employing the Wasserstein distance normalized with gradient penalization as a training objective. This enables improved generation from the joint object shape distribution. Our system can also reconstruct 3D shape from 2D images and perform shape completion from occluded 2.5D range scans. We achieve notable quantitative improvements in comparison to existing baselines |
Tasks | |
Published | 2017-07-29 |
URL | http://arxiv.org/abs/1707.09557v3 |
http://arxiv.org/pdf/1707.09557v3.pdf | |
PWC | https://paperswithcode.com/paper/improved-adversarial-systems-for-3d-object |
Repo | https://github.com/kingcheng2000/3D-IWGAN |
Framework | none |