Paper Group AWR 169
Neural Models for Sequence Chunking. WESPE: Weakly Supervised Photo Enhancer for Digital Cameras. StreetStyle: Exploring world-wide clothing styles from millions of photos. Geometric Dimensionality Reduction for Subsequent Classification. jsCoq: Towards Hybrid Theorem Proving Interfaces. Deep Shape Matching. A Unified Approach of Multi-scale Deep a …
Neural Models for Sequence Chunking
Title | Neural Models for Sequence Chunking |
Authors | Feifei Zhai, Saloni Potdar, Bing Xiang, Bowen Zhou |
Abstract | Many natural language understanding (NLU) tasks, such as shallow parsing (i.e., text chunking) and semantic slot filling, require the assignment of representative labels to the meaningful chunks in a sentence. Most of the current deep neural network (DNN) based methods consider these tasks as a sequence labeling problem, in which a word, rather than a chunk, is treated as the basic unit for labeling. These chunks are then inferred by the standard IOB (Inside-Outside-Beginning) labels. In this paper, we propose an alternative approach by investigating the use of DNN for sequence chunking, and propose three neural models so that each chunk can be treated as a complete unit for labeling. Experimental results show that the proposed neural sequence chunking models can achieve start-of-the-art performance on both the text chunking and slot filling tasks. |
Tasks | Chunking, Slot Filling |
Published | 2017-01-15 |
URL | http://arxiv.org/abs/1701.04027v1 |
http://arxiv.org/pdf/1701.04027v1.pdf | |
PWC | https://paperswithcode.com/paper/neural-models-for-sequence-chunking |
Repo | https://github.com/threelittlemonkeys/pointer-network-pytorch |
Framework | pytorch |
WESPE: Weakly Supervised Photo Enhancer for Digital Cameras
Title | WESPE: Weakly Supervised Photo Enhancer for Digital Cameras |
Authors | Andrey Ignatov, Nikolay Kobyshev, Radu Timofte, Kenneth Vanhoey, Luc Van Gool |
Abstract | Low-end and compact mobile cameras demonstrate limited photo quality mainly due to space, hardware and budget constraints. In this work, we propose a deep learning solution that translates photos taken by cameras with limited capabilities into DSLR-quality photos automatically. We tackle this problem by introducing a weakly supervised photo enhancer (WESPE) - a novel image-to-image Generative Adversarial Network-based architecture. The proposed model is trained by under weak supervision: unlike previous works, there is no need for strong supervision in the form of a large annotated dataset of aligned original/enhanced photo pairs. The sole requirement is two distinct datasets: one from the source camera, and one composed of arbitrary high-quality images that can be generally crawled from the Internet - the visual content they exhibit may be unrelated. Hence, our solution is repeatable for any camera: collecting the data and training can be achieved in a couple of hours. In this work, we emphasize on extensive evaluation of obtained results. Besides standard objective metrics and subjective user study, we train a virtual rater in the form of a separate CNN that mimics human raters on Flickr data and use this network to get reference scores for both original and enhanced photos. Our experiments on the DPED, KITTI and Cityscapes datasets as well as pictures from several generations of smartphones demonstrate that WESPE produces comparable or improved qualitative results with state-of-the-art strongly supervised methods. |
Tasks | |
Published | 2017-09-04 |
URL | http://arxiv.org/abs/1709.01118v2 |
http://arxiv.org/pdf/1709.01118v2.pdf | |
PWC | https://paperswithcode.com/paper/wespe-weakly-supervised-photo-enhancer-for |
Repo | https://github.com/kirkutirev/photo_enhancer |
Framework | pytorch |
StreetStyle: Exploring world-wide clothing styles from millions of photos
Title | StreetStyle: Exploring world-wide clothing styles from millions of photos |
Authors | Kevin Matzen, Kavita Bala, Noah Snavely |
Abstract | Each day billions of photographs are uploaded to photo-sharing services and social media platforms. These images are packed with information about how people live around the world. In this paper we exploit this rich trove of data to understand fashion and style trends worldwide. We present a framework for visual discovery at scale, analyzing clothing and fashion across millions of images of people around the world and spanning several years. We introduce a large-scale dataset of photos of people annotated with clothing attributes, and use this dataset to train attribute classifiers via deep learning. We also present a method for discovering visually consistent style clusters that capture useful visual correlations in this massive dataset. Using these tools, we analyze millions of photos to derive visual insight, producing a first-of-its-kind analysis of global and per-city fashion choices and spatio-temporal trends. |
Tasks | |
Published | 2017-06-06 |
URL | http://arxiv.org/abs/1706.01869v1 |
http://arxiv.org/pdf/1706.01869v1.pdf | |
PWC | https://paperswithcode.com/paper/streetstyle-exploring-world-wide-clothing |
Repo | https://github.com/vihardesu/clothing-choice |
Framework | none |
Geometric Dimensionality Reduction for Subsequent Classification
Title | Geometric Dimensionality Reduction for Subsequent Classification |
Authors | Joshua T. Vogelstein, Eric Bridgeford, Minh Tang, Da Zheng, Randal Burns, Mauro Maggioni |
Abstract | Classifying samples into categories becomes intractable when a single sample can have millions to billions of features, such as in genetics or imaging data. Principal Components Analysis (PCA) is widely used to identify a low-dimensional representation of such features for further analysis. However, PCA, as well as most manifold learning techniques, operates on the means and variances of the data, ignoring class labels, such as whether or not a subject has cancer, thereby discarding information that could substantially improve downstream classification performance. We describe an approach, Linear Optimal Low-rank projection (LOL), which extends PCA by operating on the means and variances of each class of data, rather than pooling all classes together. We prove, and substantiate with synthetic and real data experiments, that LOL leads to a better representation of the data for subsequent classification than other linear approaches, while adding negligible computational cost. The simplicity of LOL enables its flexibility, leading to the development of several variants that improve its accuracy, robustness, and computational efficiency. Using a novel dataset of magnetic resonance imaging scans consisting of 500 million features and 400 gigabytes of data, we demonstrate that LOL achieves better accuracy than other methods for any dimensionality, while only requiring a few minutes on a standard desktop computer. |
Tasks | Dimensionality Reduction |
Published | 2017-09-05 |
URL | http://arxiv.org/abs/1709.01233v6 |
http://arxiv.org/pdf/1709.01233v6.pdf | |
PWC | https://paperswithcode.com/paper/geometric-dimensionality-reduction-for |
Repo | https://github.com/neurodata/LOL |
Framework | none |
jsCoq: Towards Hybrid Theorem Proving Interfaces
Title | jsCoq: Towards Hybrid Theorem Proving Interfaces |
Authors | Emilio Jesús Gallego Arias, Benoît Pin, Pierre Jouvelot |
Abstract | We describe jsCcoq, a new platform and user environment for the Coq interactive proof assistant. The jsCoq system targets the HTML5-ECMAScript 2015 specification, and it is typically run inside a standards-compliant browser, without the need of external servers or services. Targeting educational use, jsCoq allows the user to start interaction with proof scripts right away, thanks to its self-contained nature. Indeed, a full Coq environment is packed along the proof scripts, easing distribution and installation. Starting to use jsCoq is as easy as clicking on a link. The current release ships more than 10 popular Coq libraries, and supports popular books such as Software Foundations or Certified Programming with Dependent Types. The new target platform has opened up new interaction and display possibilities. It has also fostered the development of some new Coq-related technology. In particular, we have implemented a new serialization-based protocol for interaction with the proof assistant, as well as a new package format for library distribution. |
Tasks | Automated Theorem Proving |
Published | 2017-01-25 |
URL | http://arxiv.org/abs/1701.07125v1 |
http://arxiv.org/pdf/1701.07125v1.pdf | |
PWC | https://paperswithcode.com/paper/jscoq-towards-hybrid-theorem-proving |
Repo | https://github.com/ejgallego/jscoq |
Framework | none |
Deep Shape Matching
Title | Deep Shape Matching |
Authors | Filip Radenović, Giorgos Tolias, Ondřej Chum |
Abstract | We cast shape matching as metric learning with convolutional networks. We break the end-to-end process of image representation into two parts. Firstly, well established efficient methods are chosen to turn the images into edge maps. Secondly, the network is trained with edge maps of landmark images, which are automatically obtained by a structure-from-motion pipeline. The learned representation is evaluated on a range of different tasks, providing improvements on challenging cases of domain generalization, generic sketch-based image retrieval or its fine-grained counterpart. In contrast to other methods that learn a different model per task, object category, or domain, we use the same network throughout all our experiments, achieving state-of-the-art results in multiple benchmarks. |
Tasks | Domain Generalization, Image Retrieval, Metric Learning, Sketch-Based Image Retrieval |
Published | 2017-09-11 |
URL | http://arxiv.org/abs/1709.03409v2 |
http://arxiv.org/pdf/1709.03409v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-shape-matching |
Repo | https://github.com/filipradenovic/cnnimageretrieval |
Framework | pytorch |
A Unified Approach of Multi-scale Deep and Hand-crafted Features for Defocus Estimation
Title | A Unified Approach of Multi-scale Deep and Hand-crafted Features for Defocus Estimation |
Authors | Jinsun Park, Yu-Wing Tai, Donghyeon Cho, In So Kweon |
Abstract | In this paper, we introduce robust and synergetic hand-crafted features and a simple but efficient deep feature from a convolutional neural network (CNN) architecture for defocus estimation. This paper systematically analyzes the effectiveness of different features, and shows how each feature can compensate for the weaknesses of other features when they are concatenated. For a full defocus map estimation, we extract image patches on strong edges sparsely, after which we use them for deep and hand-crafted feature extraction. In order to reduce the degree of patch-scale dependency, we also propose a multi-scale patch extraction strategy. A sparse defocus map is generated using a neural network classifier followed by a probability-joint bilateral filter. The final defocus map is obtained from the sparse defocus map with guidance from an edge-preserving filtered input image. Experimental results show that our algorithm is superior to state-of-the-art algorithms in terms of defocus estimation. Our work can be used for applications such as segmentation, blur magnification, all-in-focus image generation, and 3-D estimation. |
Tasks | Defocus Estimation, Image Generation |
Published | 2017-04-28 |
URL | http://arxiv.org/abs/1704.08992v1 |
http://arxiv.org/pdf/1704.08992v1.pdf | |
PWC | https://paperswithcode.com/paper/a-unified-approach-of-multi-scale-deep-and |
Repo | https://github.com/zzangjinsun/DHDE_CVPR17 |
Framework | none |
Semi-Supervised Haptic Material Recognition for Robots using Generative Adversarial Networks
Title | Semi-Supervised Haptic Material Recognition for Robots using Generative Adversarial Networks |
Authors | Zackory Erickson, Sonia Chernova, Charles C. Kemp |
Abstract | Material recognition enables robots to incorporate knowledge of material properties into their interactions with everyday objects. For example, material recognition opens up opportunities for clearer communication with a robot, such as “bring me the metal coffee mug”, and recognizing plastic versus metal is crucial when using a microwave or oven. However, collecting labeled training data with a robot is often more difficult than unlabeled data. We present a semi-supervised learning approach for material recognition that uses generative adversarial networks (GANs) with haptic features such as force, temperature, and vibration. Our approach achieves state-of-the-art results and enables a robot to estimate the material class of household objects with ~90% accuracy when 92% of the training data are unlabeled. We explore how well this approach can recognize the material of new objects and we discuss challenges facing generalization. To motivate learning from unlabeled training data, we also compare results against several common supervised learning classifiers. In addition, we have released the dataset used for this work which consists of time-series haptic measurements from a robot that conducted thousands of interactions with 72 household objects. |
Tasks | Material Recognition, Time Series |
Published | 2017-07-10 |
URL | http://arxiv.org/abs/1707.02796v2 |
http://arxiv.org/pdf/1707.02796v2.pdf | |
PWC | https://paperswithcode.com/paper/semi-supervised-haptic-material-recognition |
Repo | https://github.com/healthcare-robotics/mr-gan |
Framework | none |
Faster ICA under orthogonal constraint
Title | Faster ICA under orthogonal constraint |
Authors | Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort |
Abstract | Independent Component Analysis (ICA) is a technique for unsupervised exploration of multi-channel data widely used in observational sciences. In its classical form, ICA relies on modeling the data as a linear mixture of non-Gaussian independent sources. The problem can be seen as a likelihood maximization problem. We introduce Picard-O, a preconditioned L-BFGS strategy over the set of orthogonal matrices, which can quickly separate both super- and sub-Gaussian signals. It returns the same set of sources as the widely used FastICA algorithm. Through numerical experiments, we show that our method is faster and more robust than FastICA on real data. |
Tasks | |
Published | 2017-11-29 |
URL | http://arxiv.org/abs/1711.10873v1 |
http://arxiv.org/pdf/1711.10873v1.pdf | |
PWC | https://paperswithcode.com/paper/faster-ica-under-orthogonal-constraint |
Repo | https://github.com/pierreablin/picard |
Framework | none |
Graph Attention Networks
Title | Graph Attention Networks |
Authors | Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio |
Abstract | We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods’ features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved or matched state-of-the-art results across four established transductive and inductive graph benchmarks: the Cora, Citeseer and Pubmed citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs remain unseen during training). |
Tasks | Document Classification, Graph Embedding, Graph Regression, Link Prediction, Node Classification, Skeleton Based Action Recognition |
Published | 2017-10-30 |
URL | http://arxiv.org/abs/1710.10903v3 |
http://arxiv.org/pdf/1710.10903v3.pdf | |
PWC | https://paperswithcode.com/paper/graph-attention-networks |
Repo | https://github.com/YunseobShin/wiki_GAT |
Framework | tf |
Monotonic Chunkwise Attention
Title | Monotonic Chunkwise Attention |
Authors | Chung-Cheng Chiu, Colin Raffel |
Abstract | Sequence-to-sequence models with soft attention have been successfully applied to a wide variety of problems, but their decoding process incurs a quadratic time and space cost and is inapplicable to real-time sequence transduction. To address these issues, we propose Monotonic Chunkwise Attention (MoChA), which adaptively splits the input sequence into small chunks over which soft attention is computed. We show that models utilizing MoChA can be trained efficiently with standard backpropagation while allowing online and linear-time decoding at test time. When applied to online speech recognition, we obtain state-of-the-art results and match the performance of a model using an offline soft attention mechanism. In document summarization experiments where we do not expect monotonic alignments, we show significantly improved performance compared to a baseline monotonic attention-based model. |
Tasks | Document Summarization, Speech Recognition |
Published | 2017-12-14 |
URL | http://arxiv.org/abs/1712.05382v2 |
http://arxiv.org/pdf/1712.05382v2.pdf | |
PWC | https://paperswithcode.com/paper/monotonic-chunkwise-attention |
Repo | https://github.com/craffel/mocha |
Framework | tf |
ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation
Title | ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation |
Authors | Chang Zhou, Jinze Bai, Junshuai Song, Xiaofei Liu, Zhengchao Zhao, Xiusi Chen, Jun Gao |
Abstract | A user can be represented as what he/she does along the history. A common way to deal with the user modeling problem is to manually extract all kinds of aggregated features over the heterogeneous behaviors, which may fail to fully represent the data itself due to limited human instinct. Recent works usually use RNN-based methods to give an overall embedding of a behavior sequence, which then could be exploited by the downstream applications. However, this can only preserve very limited information, or aggregated memories of a person. When a downstream application requires to facilitate the modeled user features, it may lose the integrity of the specific highly correlated behavior of the user, and introduce noises derived from unrelated behaviors. This paper proposes an attention based user behavior modeling framework called ATRank, which we mainly use for recommendation tasks. Heterogeneous user behaviors are considered in our model that we project all types of behaviors into multiple latent semantic spaces, where influence can be made among the behaviors via self-attention. Downstream applications then can use the user behavior vectors via vanilla attention. Experiments show that ATRank can achieve better performance and faster training process. We further explore ATRank to use one unified model to predict different types of user behaviors at the same time, showing a comparable performance with the highly optimized individual models. |
Tasks | |
Published | 2017-11-17 |
URL | http://arxiv.org/abs/1711.06632v2 |
http://arxiv.org/pdf/1711.06632v2.pdf | |
PWC | https://paperswithcode.com/paper/atrank-an-attention-based-user-behavior |
Repo | https://github.com/johnlevi/recsys |
Framework | none |
Porcupine Neural Networks: (Almost) All Local Optima are Global
Title | Porcupine Neural Networks: (Almost) All Local Optima are Global |
Authors | Soheil Feizi, Hamid Javadi, Jesse Zhang, David Tse |
Abstract | Neural networks have been used prominently in several machine learning and statistics applications. In general, the underlying optimization of neural networks is non-convex which makes their performance analysis challenging. In this paper, we take a novel approach to this problem by asking whether one can constrain neural network weights to make its optimization landscape have good theoretical properties while at the same time, be a good approximation for the unconstrained one. For two-layer neural networks, we provide affirmative answers to these questions by introducing Porcupine Neural Networks (PNNs) whose weight vectors are constrained to lie over a finite set of lines. We show that most local optima of PNN optimizations are global while we have a characterization of regions where bad local optimizers may exist. Moreover, our theoretical and empirical results suggest that an unconstrained neural network can be approximated using a polynomially-large PNN. |
Tasks | |
Published | 2017-10-05 |
URL | http://arxiv.org/abs/1710.02196v1 |
http://arxiv.org/pdf/1710.02196v1.pdf | |
PWC | https://paperswithcode.com/paper/porcupine-neural-networks-almost-all-local |
Repo | https://github.com/jessemzhang/porcupine_neural_networks |
Framework | tf |
Deep Learning based Large Scale Visual Recommendation and Search for E-Commerce
Title | Deep Learning based Large Scale Visual Recommendation and Search for E-Commerce |
Authors | Devashish Shankar, Sujay Narumanchi, H A Ananya, Pramod Kompalli, Krishnendu Chaudhury |
Abstract | In this paper, we present a unified end-to-end approach to build a large scale Visual Search and Recommendation system for e-commerce. Previous works have targeted these problems in isolation. We believe a more effective and elegant solution could be obtained by tackling them together. We propose a unified Deep Convolutional Neural Network architecture, called VisNet, to learn embeddings to capture the notion of visual similarity, across several semantic granularities. We demonstrate the superiority of our approach for the task of image retrieval, by comparing against the state-of-the-art on the Exact Street2Shop dataset. We then share the design decisions and trade-offs made while deploying the model to power Visual Recommendations across a catalog of 50M products, supporting 2K queries a second at Flipkart, India’s largest e-commerce company. The deployment of our solution has yielded a significant business impact, as measured by the conversion-rate. |
Tasks | Image Retrieval |
Published | 2017-03-07 |
URL | http://arxiv.org/abs/1703.02344v1 |
http://arxiv.org/pdf/1703.02344v1.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-based-large-scale-visual |
Repo | https://github.com/bombdiggity/paper-bag |
Framework | tf |
Multi-Content GAN for Few-Shot Font Style Transfer
Title | Multi-Content GAN for Few-Shot Font Style Transfer |
Authors | Samaneh Azadi, Matthew Fisher, Vladimir Kim, Zhaowen Wang, Eli Shechtman, Trevor Darrell |
Abstract | In this work, we focus on the challenge of taking partial observations of highly-stylized text and generalizing the observations to generate unobserved glyphs in the ornamented typeface. To generate a set of multi-content images following a consistent style from very few examples, we propose an end-to-end stacked conditional GAN model considering content along channels and style along network layers. Our proposed network transfers the style of given glyphs to the contents of unseen ones, capturing highly stylized fonts found in the real-world such as those on movie posters or infographics. We seek to transfer both the typographic stylization (ex. serifs and ears) as well as the textual stylization (ex. color gradients and effects.) We base our experiments on our collected data set including 10,000 fonts with different styles and demonstrate effective generalization from a very small number of observed glyphs. |
Tasks | Font Style Transfer, Style Transfer |
Published | 2017-12-01 |
URL | http://arxiv.org/abs/1712.00516v1 |
http://arxiv.org/pdf/1712.00516v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-content-gan-for-few-shot-font-style |
Repo | https://github.com/Pengxiao-Wang/Typeface-and-Font-Style-Transfer |
Framework | none |