July 29, 2019

3030 words 15 mins read

Paper Group AWR 169

Neural Models for Sequence Chunking. WESPE: Weakly Supervised Photo Enhancer for Digital Cameras. StreetStyle: Exploring world-wide clothing styles from millions of photos. Geometric Dimensionality Reduction for Subsequent Classification. jsCoq: Towards Hybrid Theorem Proving Interfaces. Deep Shape Matching. A Unified Approach of Multi-scale Deep a …

Neural Models for Sequence Chunking


Title	Neural Models for Sequence Chunking
Authors	Feifei Zhai, Saloni Potdar, Bing Xiang, Bowen Zhou
Abstract	Many natural language understanding (NLU) tasks, such as shallow parsing (i.e., text chunking) and semantic slot filling, require the assignment of representative labels to the meaningful chunks in a sentence. Most of the current deep neural network (DNN) based methods consider these tasks as a sequence labeling problem, in which a word, rather than a chunk, is treated as the basic unit for labeling. These chunks are then inferred by the standard IOB (Inside-Outside-Beginning) labels. In this paper, we propose an alternative approach by investigating the use of DNN for sequence chunking, and propose three neural models so that each chunk can be treated as a complete unit for labeling. Experimental results show that the proposed neural sequence chunking models can achieve start-of-the-art performance on both the text chunking and slot filling tasks.
Tasks	Chunking, Slot Filling
Published	2017-01-15
URL	http://arxiv.org/abs/1701.04027v1
PDF	http://arxiv.org/pdf/1701.04027v1.pdf
PWC	https://paperswithcode.com/paper/neural-models-for-sequence-chunking
Repo	https://github.com/threelittlemonkeys/pointer-network-pytorch
Framework	pytorch

WESPE: Weakly Supervised Photo Enhancer for Digital Cameras


Title	WESPE: Weakly Supervised Photo Enhancer for Digital Cameras
Authors	Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey, Luc Van Gool
Abstract	Low-end and compact mobile cameras demonstrate limited photo quality mainly due to space, hardware and budget constraints. In this work, we propose a deep learning solution that translates photos taken by cameras with limited capabilities into DSLR-quality photos automatically. We tackle this problem by introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image Generative Adversarial Network-based architecture. The proposed model is trained by under weak supervision: unlike previous works, there is no need for strong supervision in the form of a large annotated dataset of aligned original/enhanced photo pairs. The sole requirement is two distinct datasets: one from the source camera, and one composed of arbitrary high-quality images that can be generally crawled from the Internet - the visual content they exhibit may be unrelated. Hence, our solution is repeatable for any camera: collecting the data and training can be achieved in a couple of hours. In this work, we emphasize on extensive evaluation of obtained results. Besides standard objective metrics and subjective user study, we train a virtual rater in the form of a separate CNN that mimics human raters on Flickr data and use this network to get reference scores for both original and enhanced photos. Our experiments on the DPED, KITTI and Cityscapes datasets as well as pictures from several generations of smartphones demonstrate that WESPE produces comparable or improved qualitative results with state-of-the-art strongly supervised methods.
Tasks
Published	2017-09-04
URL	http://arxiv.org/abs/1709.01118v2
PDF	http://arxiv.org/pdf/1709.01118v2.pdf
PWC	https://paperswithcode.com/paper/wespe-weakly-supervised-photo-enhancer-for
Repo	https://github.com/kirkutirev/photo_enhancer
Framework	pytorch

StreetStyle: Exploring world-wide clothing styles from millions of photos


Title	StreetStyle: Exploring world-wide clothing styles from millions of photos
Authors	Kevin Matzen, Kavita Bala, Noah Snavely
Abstract	Each day billions of photographs are uploaded to photo-sharing services and social media platforms. These images are packed with information about how people live around the world. In this paper we exploit this rich trove of data to understand fashion and style trends worldwide. We present a framework for visual discovery at scale, analyzing clothing and fashion across millions of images of people around the world and spanning several years. We introduce a large-scale dataset of photos of people annotated with clothing attributes, and use this dataset to train attribute classifiers via deep learning. We also present a method for discovering visually consistent style clusters that capture useful visual correlations in this massive dataset. Using these tools, we analyze millions of photos to derive visual insight, producing a first-of-its-kind analysis of global and per-city fashion choices and spatio-temporal trends.
Tasks
Published	2017-06-06
URL	http://arxiv.org/abs/1706.01869v1
PDF	http://arxiv.org/pdf/1706.01869v1.pdf
PWC	https://paperswithcode.com/paper/streetstyle-exploring-world-wide-clothing
Repo	https://github.com/vihardesu/clothing-choice
Framework	none

Geometric Dimensionality Reduction for Subsequent Classification


Title	Geometric Dimensionality Reduction for Subsequent Classification
Authors	Joshua T. Vogelstein, Eric Bridgeford, Minh Tang, Da Zheng, Randal Burns, Mauro Maggioni
Abstract	Classifying samples into categories becomes intractable when a single sample can have millions to billions of features, such as in genetics or imaging data. Principal Components Analysis (PCA) is widely used to identify a low-dimensional representation of such features for further analysis. However, PCA, as well as most manifold learning techniques, operates on the means and variances of the data, ignoring class labels, such as whether or not a subject has cancer, thereby discarding information that could substantially improve downstream classification performance. We describe an approach, Linear Optimal Low-rank projection (LOL), which extends PCA by operating on the means and variances of each class of data, rather than pooling all classes together. We prove, and substantiate with synthetic and real data experiments, that LOL leads to a better representation of the data for subsequent classification than other linear approaches, while adding negligible computational cost. The simplicity of LOL enables its flexibility, leading to the development of several variants that improve its accuracy, robustness, and computational efficiency. Using a novel dataset of magnetic resonance imaging scans consisting of 500 million features and 400 gigabytes of data, we demonstrate that LOL achieves better accuracy than other methods for any dimensionality, while only requiring a few minutes on a standard desktop computer.
Tasks	Dimensionality Reduction
Published	2017-09-05
URL	http://arxiv.org/abs/1709.01233v6
PDF	http://arxiv.org/pdf/1709.01233v6.pdf
PWC	https://paperswithcode.com/paper/geometric-dimensionality-reduction-for
Repo	https://github.com/neurodata/LOL
Framework	none

jsCoq: Towards Hybrid Theorem Proving Interfaces


Title	jsCoq: Towards Hybrid Theorem Proving Interfaces
Authors	Emilio Jesús Gallego Arias, Benoît Pin, Pierre Jouvelot
Abstract	We describe jsCcoq, a new platform and user environment for the Coq interactive proof assistant. The jsCoq system targets the HTML5-ECMAScript 2015 specification, and it is typically run inside a standards-compliant browser, without the need of external servers or services. Targeting educational use, jsCoq allows the user to start interaction with proof scripts right away, thanks to its self-contained nature. Indeed, a full Coq environment is packed along the proof scripts, easing distribution and installation. Starting to use jsCoq is as easy as clicking on a link. The current release ships more than 10 popular Coq libraries, and supports popular books such as Software Foundations or Certified Programming with Dependent Types. The new target platform has opened up new interaction and display possibilities. It has also fostered the development of some new Coq-related technology. In particular, we have implemented a new serialization-based protocol for interaction with the proof assistant, as well as a new package format for library distribution.
Tasks	Automated Theorem Proving
Published	2017-01-25
URL	http://arxiv.org/abs/1701.07125v1
PDF	http://arxiv.org/pdf/1701.07125v1.pdf
PWC	https://paperswithcode.com/paper/jscoq-towards-hybrid-theorem-proving
Repo	https://github.com/ejgallego/jscoq
Framework	none

Deep Shape Matching


Title	Deep Shape Matching
Authors	Filip Radenović, Giorgos Tolias, Ondřej Chum
Abstract	We cast shape matching as metric learning with convolutional networks. We break the end-to-end process of image representation into two parts. Firstly, well established efficient methods are chosen to turn the images into edge maps. Secondly, the network is trained with edge maps of landmark images, which are automatically obtained by a structure-from-motion pipeline. The learned representation is evaluated on a range of different tasks, providing improvements on challenging cases of domain generalization, generic sketch-based image retrieval or its fine-grained counterpart. In contrast to other methods that learn a different model per task, object category, or domain, we use the same network throughout all our experiments, achieving state-of-the-art results in multiple benchmarks.
Tasks	Domain Generalization, Image Retrieval, Metric Learning, Sketch-Based Image Retrieval
Published	2017-09-11
URL	http://arxiv.org/abs/1709.03409v2
PDF	http://arxiv.org/pdf/1709.03409v2.pdf
PWC	https://paperswithcode.com/paper/deep-shape-matching
Repo	https://github.com/filipradenovic/cnnimageretrieval
Framework	pytorch

A Unified Approach of Multi-scale Deep and Hand-crafted Features for Defocus Estimation


Title	A Unified Approach of Multi-scale Deep and Hand-crafted Features for Defocus Estimation
Authors	Jinsun Park, Yu-Wing Tai, Donghyeon Cho, In So Kweon
Abstract	In this paper, we introduce robust and synergetic hand-crafted features and a simple but efficient deep feature from a convolutional neural network (CNN) architecture for defocus estimation. This paper systematically analyzes the effectiveness of different features, and shows how each feature can compensate for the weaknesses of other features when they are concatenated. For a full defocus map estimation, we extract image patches on strong edges sparsely, after which we use them for deep and hand-crafted feature extraction. In order to reduce the degree of patch-scale dependency, we also propose a multi-scale patch extraction strategy. A sparse defocus map is generated using a neural network classifier followed by a probability-joint bilateral filter. The final defocus map is obtained from the sparse defocus map with guidance from an edge-preserving filtered input image. Experimental results show that our algorithm is superior to state-of-the-art algorithms in terms of defocus estimation. Our work can be used for applications such as segmentation, blur magnification, all-in-focus image generation, and 3-D estimation.
Tasks	Defocus Estimation, Image Generation
Published	2017-04-28
URL	http://arxiv.org/abs/1704.08992v1
PDF	http://arxiv.org/pdf/1704.08992v1.pdf
PWC	https://paperswithcode.com/paper/a-unified-approach-of-multi-scale-deep-and
Repo	https://github.com/zzangjinsun/DHDE_CVPR17
Framework	none

Semi-Supervised Haptic Material Recognition for Robots using Generative Adversarial Networks


Title	Semi-Supervised Haptic Material Recognition for Robots using Generative Adversarial Networks
Authors	Zackory Erickson, Sonia Chernova, Charles C. Kemp
Abstract	Material recognition enables robots to incorporate knowledge of material properties into their interactions with everyday objects. For example, material recognition opens up opportunities for clearer communication with a robot, such as “bring me the metal coffee mug”, and recognizing plastic versus metal is crucial when using a microwave or oven. However, collecting labeled training data with a robot is often more difficult than unlabeled data. We present a semi-supervised learning approach for material recognition that uses generative adversarial networks (GANs) with haptic features such as force, temperature, and vibration. Our approach achieves state-of-the-art results and enables a robot to estimate the material class of household objects with ~90% accuracy when 92% of the training data are unlabeled. We explore how well this approach can recognize the material of new objects and we discuss challenges facing generalization. To motivate learning from unlabeled training data, we also compare results against several common supervised learning classifiers. In addition, we have released the dataset used for this work which consists of time-series haptic measurements from a robot that conducted thousands of interactions with 72 household objects.
Tasks	Material Recognition, Time Series
Published	2017-07-10
URL	http://arxiv.org/abs/1707.02796v2
PDF	http://arxiv.org/pdf/1707.02796v2.pdf
PWC	https://paperswithcode.com/paper/semi-supervised-haptic-material-recognition
Repo	https://github.com/healthcare-robotics/mr-gan
Framework	none

Faster ICA under orthogonal constraint


Title	Faster ICA under orthogonal constraint
Authors	Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort
Abstract	Independent Component Analysis (ICA) is a technique for unsupervised exploration of multi-channel data widely used in observational sciences. In its classical form, ICA relies on modeling the data as a linear mixture of non-Gaussian independent sources. The problem can be seen as a likelihood maximization problem. We introduce Picard-O, a preconditioned L-BFGS strategy over the set of orthogonal matrices, which can quickly separate both super- and sub-Gaussian signals. It returns the same set of sources as the widely used FastICA algorithm. Through numerical experiments, we show that our method is faster and more robust than FastICA on real data.
Tasks
Published	2017-11-29
URL	http://arxiv.org/abs/1711.10873v1
PDF	http://arxiv.org/pdf/1711.10873v1.pdf
PWC	https://paperswithcode.com/paper/faster-ica-under-orthogonal-constraint
Repo	https://github.com/pierreablin/picard
Framework	none

Graph Attention Networks


Title	Graph Attention Networks
Authors	Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio
Abstract	We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods’ features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved or matched state-of-the-art results across four established transductive and inductive graph benchmarks: the Cora, Citeseer and Pubmed citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs remain unseen during training).
Tasks	Document Classification, Graph Embedding, Graph Regression, Link Prediction, Node Classification, Skeleton Based Action Recognition
Published	2017-10-30
URL	http://arxiv.org/abs/1710.10903v3
PDF	http://arxiv.org/pdf/1710.10903v3.pdf
PWC	https://paperswithcode.com/paper/graph-attention-networks
Repo	https://github.com/YunseobShin/wiki_GAT
Framework	tf

Monotonic Chunkwise Attention


Title	Monotonic Chunkwise Attention
Authors	Chung-Cheng Chiu, Colin Raffel
Abstract	Sequence-to-sequence models with soft attention have been successfully applied to a wide variety of problems, but their decoding process incurs a quadratic time and space cost and is inapplicable to real-time sequence transduction. To address these issues, we propose Monotonic Chunkwise Attention (MoChA), which adaptively splits the input sequence into small chunks over which soft attention is computed. We show that models utilizing MoChA can be trained efficiently with standard backpropagation while allowing online and linear-time decoding at test time. When applied to online speech recognition, we obtain state-of-the-art results and match the performance of a model using an offline soft attention mechanism. In document summarization experiments where we do not expect monotonic alignments, we show significantly improved performance compared to a baseline monotonic attention-based model.
Tasks	Document Summarization, Speech Recognition
Published	2017-12-14
URL	http://arxiv.org/abs/1712.05382v2
PDF	http://arxiv.org/pdf/1712.05382v2.pdf
PWC	https://paperswithcode.com/paper/monotonic-chunkwise-attention
Repo	https://github.com/craffel/mocha
Framework	tf

ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation


Title	ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation
Authors	Chang Zhou, Jinze Bai, Junshuai Song, Xiaofei Liu, Zhengchao Zhao, Xiusi Chen, Jun Gao
Abstract	A user can be represented as what he/she does along the history. A common way to deal with the user modeling problem is to manually extract all kinds of aggregated features over the heterogeneous behaviors, which may fail to fully represent the data itself due to limited human instinct. Recent works usually use RNN-based methods to give an overall embedding of a behavior sequence, which then could be exploited by the downstream applications. However, this can only preserve very limited information, or aggregated memories of a person. When a downstream application requires to facilitate the modeled user features, it may lose the integrity of the specific highly correlated behavior of the user, and introduce noises derived from unrelated behaviors. This paper proposes an attention based user behavior modeling framework called ATRank, which we mainly use for recommendation tasks. Heterogeneous user behaviors are considered in our model that we project all types of behaviors into multiple latent semantic spaces, where influence can be made among the behaviors via self-attention. Downstream applications then can use the user behavior vectors via vanilla attention. Experiments show that ATRank can achieve better performance and faster training process. We further explore ATRank to use one unified model to predict different types of user behaviors at the same time, showing a comparable performance with the highly optimized individual models.
Tasks
Published	2017-11-17
URL	http://arxiv.org/abs/1711.06632v2
PDF	http://arxiv.org/pdf/1711.06632v2.pdf
PWC	https://paperswithcode.com/paper/atrank-an-attention-based-user-behavior
Repo	https://github.com/johnlevi/recsys
Framework	none

Porcupine Neural Networks: (Almost) All Local Optima are Global


Title	Porcupine Neural Networks: (Almost) All Local Optima are Global
Authors	Soheil Feizi, Hamid Javadi, Jesse Zhang, David Tse
Abstract	Neural networks have been used prominently in several machine learning and statistics applications. In general, the underlying optimization of neural networks is non-convex which makes their performance analysis challenging. In this paper, we take a novel approach to this problem by asking whether one can constrain neural network weights to make its optimization landscape have good theoretical properties while at the same time, be a good approximation for the unconstrained one. For two-layer neural networks, we provide affirmative answers to these questions by introducing Porcupine Neural Networks (PNNs) whose weight vectors are constrained to lie over a finite set of lines. We show that most local optima of PNN optimizations are global while we have a characterization of regions where bad local optimizers may exist. Moreover, our theoretical and empirical results suggest that an unconstrained neural network can be approximated using a polynomially-large PNN.
Tasks
Published	2017-10-05
URL	http://arxiv.org/abs/1710.02196v1
PDF	http://arxiv.org/pdf/1710.02196v1.pdf
PWC	https://paperswithcode.com/paper/porcupine-neural-networks-almost-all-local
Repo	https://github.com/jessemzhang/porcupine_neural_networks
Framework	tf

Deep Learning based Large Scale Visual Recommendation and Search for E-Commerce


Title	Deep Learning based Large Scale Visual Recommendation and Search for E-Commerce
Authors	Devashish Shankar, Sujay Narumanchi, H A Ananya, Pramod Kompalli, Krishnendu Chaudhury
Abstract	In this paper, we present a unified end-to-end approach to build a large scale Visual Search and Recommendation system for e-commerce. Previous works have targeted these problems in isolation. We believe a more effective and elegant solution could be obtained by tackling them together. We propose a unified Deep Convolutional Neural Network architecture, called VisNet, to learn embeddings to capture the notion of visual similarity, across several semantic granularities. We demonstrate the superiority of our approach for the task of image retrieval, by comparing against the state-of-the-art on the Exact Street2Shop dataset. We then share the design decisions and trade-offs made while deploying the model to power Visual Recommendations across a catalog of 50M products, supporting 2K queries a second at Flipkart, India’s largest e-commerce company. The deployment of our solution has yielded a significant business impact, as measured by the conversion-rate.
Tasks	Image Retrieval
Published	2017-03-07
URL	http://arxiv.org/abs/1703.02344v1
PDF	http://arxiv.org/pdf/1703.02344v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-based-large-scale-visual
Repo	https://github.com/bombdiggity/paper-bag
Framework	tf

Multi-Content GAN for Few-Shot Font Style Transfer


Title	Multi-Content GAN for Few-Shot Font Style Transfer
Authors	Samaneh Azadi, Matthew Fisher, Vladimir Kim, Zhaowen Wang, Eli Shechtman, Trevor Darrell
Abstract	In this work, we focus on the challenge of taking partial observations of highly-stylized text and generalizing the observations to generate unobserved glyphs in the ornamented typeface. To generate a set of multi-content images following a consistent style from very few examples, we propose an end-to-end stacked conditional GAN model considering content along channels and style along network layers. Our proposed network transfers the style of given glyphs to the contents of unseen ones, capturing highly stylized fonts found in the real-world such as those on movie posters or infographics. We seek to transfer both the typographic stylization (ex. serifs and ears) as well as the textual stylization (ex. color gradients and effects.) We base our experiments on our collected data set including 10,000 fonts with different styles and demonstrate effective generalization from a very small number of observed glyphs.
Tasks	Font Style Transfer, Style Transfer
Published	2017-12-01
URL	http://arxiv.org/abs/1712.00516v1
PDF	http://arxiv.org/pdf/1712.00516v1.pdf
PWC	https://paperswithcode.com/paper/multi-content-gan-for-few-shot-font-style
Repo	https://github.com/Pengxiao-Wang/Typeface-and-Font-Style-Transfer
Framework	none