July 29, 2019

3030 words 15 mins read

Paper Group AWR 169

Paper Group AWR 169

Neural Models for Sequence Chunking. WESPE: Weakly Supervised Photo Enhancer for Digital Cameras. StreetStyle: Exploring world-wide clothing styles from millions of photos. Geometric Dimensionality Reduction for Subsequent Classification. jsCoq: Towards Hybrid Theorem Proving Interfaces. Deep Shape Matching. A Unified Approach of Multi-scale Deep a …

Neural Models for Sequence Chunking

Title Neural Models for Sequence Chunking
Authors Feifei Zhai, Saloni Potdar, Bing Xiang, Bowen Zhou
Abstract Many natural language understanding (NLU) tasks, such as shallow parsing (i.e., text chunking) and semantic slot filling, require the assignment of representative labels to the meaningful chunks in a sentence. Most of the current deep neural network (DNN) based methods consider these tasks as a sequence labeling problem, in which a word, rather than a chunk, is treated as the basic unit for labeling. These chunks are then inferred by the standard IOB (Inside-Outside-Beginning) labels. In this paper, we propose an alternative approach by investigating the use of DNN for sequence chunking, and propose three neural models so that each chunk can be treated as a complete unit for labeling. Experimental results show that the proposed neural sequence chunking models can achieve start-of-the-art performance on both the text chunking and slot filling tasks.
Tasks Chunking, Slot Filling
Published 2017-01-15
URL http://arxiv.org/abs/1701.04027v1
PDF http://arxiv.org/pdf/1701.04027v1.pdf
PWC https://paperswithcode.com/paper/neural-models-for-sequence-chunking
Repo https://github.com/threelittlemonkeys/pointer-network-pytorch
Framework pytorch

WESPE: Weakly Supervised Photo Enhancer for Digital Cameras

Title WESPE: Weakly Supervised Photo Enhancer for Digital Cameras
Authors Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey, Luc Van Gool
Abstract Low-end and compact mobile cameras demonstrate limited photo quality mainly due to space, hardware and budget constraints. In this work, we propose a deep learning solution that translates photos taken by cameras with limited capabilities into DSLR-quality photos automatically. We tackle this problem by introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image Generative Adversarial Network-based architecture. The proposed model is trained by under weak supervision: unlike previous works, there is no need for strong supervision in the form of a large annotated dataset of aligned original/enhanced photo pairs. The sole requirement is two distinct datasets: one from the source camera, and one composed of arbitrary high-quality images that can be generally crawled from the Internet - the visual content they exhibit may be unrelated. Hence, our solution is repeatable for any camera: collecting the data and training can be achieved in a couple of hours. In this work, we emphasize on extensive evaluation of obtained results. Besides standard objective metrics and subjective user study, we train a virtual rater in the form of a separate CNN that mimics human raters on Flickr data and use this network to get reference scores for both original and enhanced photos. Our experiments on the DPED, KITTI and Cityscapes datasets as well as pictures from several generations of smartphones demonstrate that WESPE produces comparable or improved qualitative results with state-of-the-art strongly supervised methods.
Tasks
Published 2017-09-04
URL http://arxiv.org/abs/1709.01118v2
PDF http://arxiv.org/pdf/1709.01118v2.pdf
PWC https://paperswithcode.com/paper/wespe-weakly-supervised-photo-enhancer-for
Repo https://github.com/kirkutirev/photo_enhancer
Framework pytorch

StreetStyle: Exploring world-wide clothing styles from millions of photos

Title StreetStyle: Exploring world-wide clothing styles from millions of photos
Authors Kevin Matzen, Kavita Bala, Noah Snavely
Abstract Each day billions of photographs are uploaded to photo-sharing services and social media platforms. These images are packed with information about how people live around the world. In this paper we exploit this rich trove of data to understand fashion and style trends worldwide. We present a framework for visual discovery at scale, analyzing clothing and fashion across millions of images of people around the world and spanning several years. We introduce a large-scale dataset of photos of people annotated with clothing attributes, and use this dataset to train attribute classifiers via deep learning. We also present a method for discovering visually consistent style clusters that capture useful visual correlations in this massive dataset. Using these tools, we analyze millions of photos to derive visual insight, producing a first-of-its-kind analysis of global and per-city fashion choices and spatio-temporal trends.
Tasks
Published 2017-06-06
URL http://arxiv.org/abs/1706.01869v1
PDF http://arxiv.org/pdf/1706.01869v1.pdf
PWC https://paperswithcode.com/paper/streetstyle-exploring-world-wide-clothing
Repo https://github.com/vihardesu/clothing-choice
Framework none

Geometric Dimensionality Reduction for Subsequent Classification

Title Geometric Dimensionality Reduction for Subsequent Classification
Authors Joshua T. Vogelstein, Eric Bridgeford, Minh Tang, Da Zheng, Randal Burns, Mauro Maggioni
Abstract Classifying samples into categories becomes intractable when a single sample can have millions to billions of features, such as in genetics or imaging data. Principal Components Analysis (PCA) is widely used to identify a low-dimensional representation of such features for further analysis. However, PCA, as well as most manifold learning techniques, operates on the means and variances of the data, ignoring class labels, such as whether or not a subject has cancer, thereby discarding information that could substantially improve downstream classification performance. We describe an approach, Linear Optimal Low-rank projection (LOL), which extends PCA by operating on the means and variances of each class of data, rather than pooling all classes together. We prove, and substantiate with synthetic and real data experiments, that LOL leads to a better representation of the data for subsequent classification than other linear approaches, while adding negligible computational cost. The simplicity of LOL enables its flexibility, leading to the development of several variants that improve its accuracy, robustness, and computational efficiency. Using a novel dataset of magnetic resonance imaging scans consisting of 500 million features and 400 gigabytes of data, we demonstrate that LOL achieves better accuracy than other methods for any dimensionality, while only requiring a few minutes on a standard desktop computer.
Tasks Dimensionality Reduction
Published 2017-09-05
URL http://arxiv.org/abs/1709.01233v6
PDF http://arxiv.org/pdf/1709.01233v6.pdf
PWC https://paperswithcode.com/paper/geometric-dimensionality-reduction-for
Repo https://github.com/neurodata/LOL
Framework none

jsCoq: Towards Hybrid Theorem Proving Interfaces

Title jsCoq: Towards Hybrid Theorem Proving Interfaces
Authors Emilio Jesús Gallego Arias, Benoît Pin, Pierre Jouvelot
Abstract We describe jsCcoq, a new platform and user environment for the Coq interactive proof assistant. The jsCoq system targets the HTML5-ECMAScript 2015 specification, and it is typically run inside a standards-compliant browser, without the need of external servers or services. Targeting educational use, jsCoq allows the user to start interaction with proof scripts right away, thanks to its self-contained nature. Indeed, a full Coq environment is packed along the proof scripts, easing distribution and installation. Starting to use jsCoq is as easy as clicking on a link. The current release ships more than 10 popular Coq libraries, and supports popular books such as Software Foundations or Certified Programming with Dependent Types. The new target platform has opened up new interaction and display possibilities. It has also fostered the development of some new Coq-related technology. In particular, we have implemented a new serialization-based protocol for interaction with the proof assistant, as well as a new package format for library distribution.
Tasks Automated Theorem Proving
Published 2017-01-25
URL http://arxiv.org/abs/1701.07125v1
PDF http://arxiv.org/pdf/1701.07125v1.pdf
PWC https://paperswithcode.com/paper/jscoq-towards-hybrid-theorem-proving
Repo https://github.com/ejgallego/jscoq
Framework none

Deep Shape Matching

Title Deep Shape Matching
Authors Filip Radenović, Giorgos Tolias, Ondřej Chum
Abstract We cast shape matching as metric learning with convolutional networks. We break the end-to-end process of image representation into two parts. Firstly, well established efficient methods are chosen to turn the images into edge maps. Secondly, the network is trained with edge maps of landmark images, which are automatically obtained by a structure-from-motion pipeline. The learned representation is evaluated on a range of different tasks, providing improvements on challenging cases of domain generalization, generic sketch-based image retrieval or its fine-grained counterpart. In contrast to other methods that learn a different model per task, object category, or domain, we use the same network throughout all our experiments, achieving state-of-the-art results in multiple benchmarks.
Tasks Domain Generalization, Image Retrieval, Metric Learning, Sketch-Based Image Retrieval
Published 2017-09-11
URL http://arxiv.org/abs/1709.03409v2
PDF http://arxiv.org/pdf/1709.03409v2.pdf
PWC https://paperswithcode.com/paper/deep-shape-matching
Repo https://github.com/filipradenovic/cnnimageretrieval
Framework pytorch

A Unified Approach of Multi-scale Deep and Hand-crafted Features for Defocus Estimation

Title A Unified Approach of Multi-scale Deep and Hand-crafted Features for Defocus Estimation
Authors Jinsun Park, Yu-Wing Tai, Donghyeon Cho, In So Kweon
Abstract In this paper, we introduce robust and synergetic hand-crafted features and a simple but efficient deep feature from a convolutional neural network (CNN) architecture for defocus estimation. This paper systematically analyzes the effectiveness of different features, and shows how each feature can compensate for the weaknesses of other features when they are concatenated. For a full defocus map estimation, we extract image patches on strong edges sparsely, after which we use them for deep and hand-crafted feature extraction. In order to reduce the degree of patch-scale dependency, we also propose a multi-scale patch extraction strategy. A sparse defocus map is generated using a neural network classifier followed by a probability-joint bilateral filter. The final defocus map is obtained from the sparse defocus map with guidance from an edge-preserving filtered input image. Experimental results show that our algorithm is superior to state-of-the-art algorithms in terms of defocus estimation. Our work can be used for applications such as segmentation, blur magnification, all-in-focus image generation, and 3-D estimation.
Tasks Defocus Estimation, Image Generation
Published 2017-04-28
URL http://arxiv.org/abs/1704.08992v1
PDF http://arxiv.org/pdf/1704.08992v1.pdf
PWC https://paperswithcode.com/paper/a-unified-approach-of-multi-scale-deep-and
Repo https://github.com/zzangjinsun/DHDE_CVPR17
Framework none

Semi-Supervised Haptic Material Recognition for Robots using Generative Adversarial Networks

Title Semi-Supervised Haptic Material Recognition for Robots using Generative Adversarial Networks
Authors Zackory Erickson, Sonia Chernova, Charles C. Kemp
Abstract Material recognition enables robots to incorporate knowledge of material properties into their interactions with everyday objects. For example, material recognition opens up opportunities for clearer communication with a robot, such as “bring me the metal coffee mug”, and recognizing plastic versus metal is crucial when using a microwave or oven. However, collecting labeled training data with a robot is often more difficult than unlabeled data. We present a semi-supervised learning approach for material recognition that uses generative adversarial networks (GANs) with haptic features such as force, temperature, and vibration. Our approach achieves state-of-the-art results and enables a robot to estimate the material class of household objects with ~90% accuracy when 92% of the training data are unlabeled. We explore how well this approach can recognize the material of new objects and we discuss challenges facing generalization. To motivate learning from unlabeled training data, we also compare results against several common supervised learning classifiers. In addition, we have released the dataset used for this work which consists of time-series haptic measurements from a robot that conducted thousands of interactions with 72 household objects.
Tasks Material Recognition, Time Series
Published 2017-07-10
URL http://arxiv.org/abs/1707.02796v2
PDF http://arxiv.org/pdf/1707.02796v2.pdf
PWC https://paperswithcode.com/paper/semi-supervised-haptic-material-recognition
Repo https://github.com/healthcare-robotics/mr-gan
Framework none

Faster ICA under orthogonal constraint

Title Faster ICA under orthogonal constraint
Authors Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort
Abstract Independent Component Analysis (ICA) is a technique for unsupervised exploration of multi-channel data widely used in observational sciences. In its classical form, ICA relies on modeling the data as a linear mixture of non-Gaussian independent sources. The problem can be seen as a likelihood maximization problem. We introduce Picard-O, a preconditioned L-BFGS strategy over the set of orthogonal matrices, which can quickly separate both super- and sub-Gaussian signals. It returns the same set of sources as the widely used FastICA algorithm. Through numerical experiments, we show that our method is faster and more robust than FastICA on real data.
Tasks
Published 2017-11-29
URL http://arxiv.org/abs/1711.10873v1
PDF http://arxiv.org/pdf/1711.10873v1.pdf
PWC https://paperswithcode.com/paper/faster-ica-under-orthogonal-constraint
Repo https://github.com/pierreablin/picard
Framework none

Graph Attention Networks

Title Graph Attention Networks
Authors Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio
Abstract We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods’ features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved or matched state-of-the-art results across four established transductive and inductive graph benchmarks: the Cora, Citeseer and Pubmed citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs remain unseen during training).
Tasks Document Classification, Graph Embedding, Graph Regression, Link Prediction, Node Classification, Skeleton Based Action Recognition
Published 2017-10-30
URL http://arxiv.org/abs/1710.10903v3
PDF http://arxiv.org/pdf/1710.10903v3.pdf
PWC https://paperswithcode.com/paper/graph-attention-networks
Repo https://github.com/YunseobShin/wiki_GAT
Framework tf

Monotonic Chunkwise Attention

Title Monotonic Chunkwise Attention
Authors Chung-Cheng Chiu, Colin Raffel
Abstract Sequence-to-sequence models with soft attention have been successfully applied to a wide variety of problems, but their decoding process incurs a quadratic time and space cost and is inapplicable to real-time sequence transduction. To address these issues, we propose Monotonic Chunkwise Attention (MoChA), which adaptively splits the input sequence into small chunks over which soft attention is computed. We show that models utilizing MoChA can be trained efficiently with standard backpropagation while allowing online and linear-time decoding at test time. When applied to online speech recognition, we obtain state-of-the-art results and match the performance of a model using an offline soft attention mechanism. In document summarization experiments where we do not expect monotonic alignments, we show significantly improved performance compared to a baseline monotonic attention-based model.
Tasks Document Summarization, Speech Recognition
Published 2017-12-14
URL http://arxiv.org/abs/1712.05382v2
PDF http://arxiv.org/pdf/1712.05382v2.pdf
PWC https://paperswithcode.com/paper/monotonic-chunkwise-attention
Repo https://github.com/craffel/mocha
Framework tf

ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation

Title ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation
Authors Chang Zhou, Jinze Bai, Junshuai Song, Xiaofei Liu, Zhengchao Zhao, Xiusi Chen, Jun Gao
Abstract A user can be represented as what he/she does along the history. A common way to deal with the user modeling problem is to manually extract all kinds of aggregated features over the heterogeneous behaviors, which may fail to fully represent the data itself due to limited human instinct. Recent works usually use RNN-based methods to give an overall embedding of a behavior sequence, which then could be exploited by the downstream applications. However, this can only preserve very limited information, or aggregated memories of a person. When a downstream application requires to facilitate the modeled user features, it may lose the integrity of the specific highly correlated behavior of the user, and introduce noises derived from unrelated behaviors. This paper proposes an attention based user behavior modeling framework called ATRank, which we mainly use for recommendation tasks. Heterogeneous user behaviors are considered in our model that we project all types of behaviors into multiple latent semantic spaces, where influence can be made among the behaviors via self-attention. Downstream applications then can use the user behavior vectors via vanilla attention. Experiments show that ATRank can achieve better performance and faster training process. We further explore ATRank to use one unified model to predict different types of user behaviors at the same time, showing a comparable performance with the highly optimized individual models.
Tasks
Published 2017-11-17
URL http://arxiv.org/abs/1711.06632v2
PDF http://arxiv.org/pdf/1711.06632v2.pdf
PWC https://paperswithcode.com/paper/atrank-an-attention-based-user-behavior
Repo https://github.com/johnlevi/recsys
Framework none

Porcupine Neural Networks: (Almost) All Local Optima are Global

Title Porcupine Neural Networks: (Almost) All Local Optima are Global
Authors Soheil Feizi, Hamid Javadi, Jesse Zhang, David Tse
Abstract Neural networks have been used prominently in several machine learning and statistics applications. In general, the underlying optimization of neural networks is non-convex which makes their performance analysis challenging. In this paper, we take a novel approach to this problem by asking whether one can constrain neural network weights to make its optimization landscape have good theoretical properties while at the same time, be a good approximation for the unconstrained one. For two-layer neural networks, we provide affirmative answers to these questions by introducing Porcupine Neural Networks (PNNs) whose weight vectors are constrained to lie over a finite set of lines. We show that most local optima of PNN optimizations are global while we have a characterization of regions where bad local optimizers may exist. Moreover, our theoretical and empirical results suggest that an unconstrained neural network can be approximated using a polynomially-large PNN.
Tasks
Published 2017-10-05
URL http://arxiv.org/abs/1710.02196v1
PDF http://arxiv.org/pdf/1710.02196v1.pdf
PWC https://paperswithcode.com/paper/porcupine-neural-networks-almost-all-local
Repo https://github.com/jessemzhang/porcupine_neural_networks
Framework tf

Deep Learning based Large Scale Visual Recommendation and Search for E-Commerce

Title Deep Learning based Large Scale Visual Recommendation and Search for E-Commerce
Authors Devashish Shankar, Sujay Narumanchi, H A Ananya, Pramod Kompalli, Krishnendu Chaudhury
Abstract In this paper, we present a unified end-to-end approach to build a large scale Visual Search and Recommendation system for e-commerce. Previous works have targeted these problems in isolation. We believe a more effective and elegant solution could be obtained by tackling them together. We propose a unified Deep Convolutional Neural Network architecture, called VisNet, to learn embeddings to capture the notion of visual similarity, across several semantic granularities. We demonstrate the superiority of our approach for the task of image retrieval, by comparing against the state-of-the-art on the Exact Street2Shop dataset. We then share the design decisions and trade-offs made while deploying the model to power Visual Recommendations across a catalog of 50M products, supporting 2K queries a second at Flipkart, India’s largest e-commerce company. The deployment of our solution has yielded a significant business impact, as measured by the conversion-rate.
Tasks Image Retrieval
Published 2017-03-07
URL http://arxiv.org/abs/1703.02344v1
PDF http://arxiv.org/pdf/1703.02344v1.pdf
PWC https://paperswithcode.com/paper/deep-learning-based-large-scale-visual
Repo https://github.com/bombdiggity/paper-bag
Framework tf

Multi-Content GAN for Few-Shot Font Style Transfer

Title Multi-Content GAN for Few-Shot Font Style Transfer
Authors Samaneh Azadi, Matthew Fisher, Vladimir Kim, Zhaowen Wang, Eli Shechtman, Trevor Darrell
Abstract In this work, we focus on the challenge of taking partial observations of highly-stylized text and generalizing the observations to generate unobserved glyphs in the ornamented typeface. To generate a set of multi-content images following a consistent style from very few examples, we propose an end-to-end stacked conditional GAN model considering content along channels and style along network layers. Our proposed network transfers the style of given glyphs to the contents of unseen ones, capturing highly stylized fonts found in the real-world such as those on movie posters or infographics. We seek to transfer both the typographic stylization (ex. serifs and ears) as well as the textual stylization (ex. color gradients and effects.) We base our experiments on our collected data set including 10,000 fonts with different styles and demonstrate effective generalization from a very small number of observed glyphs.
Tasks Font Style Transfer, Style Transfer
Published 2017-12-01
URL http://arxiv.org/abs/1712.00516v1
PDF http://arxiv.org/pdf/1712.00516v1.pdf
PWC https://paperswithcode.com/paper/multi-content-gan-for-few-shot-font-style
Repo https://github.com/Pengxiao-Wang/Typeface-and-Font-Style-Transfer
Framework none
comments powered by Disqus