October 20, 2019

3031 words 15 mins read

Paper Group AWR 352

Paper Group AWR 352

Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation. HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments. Computational Optimal Transport. Learning to See in the Dark. On Adversarial Examples for Character-Level Neural Machine Translation. A Simple Baseline Algorithm for Graph Classification. Audio-Visual Scene Ana …

Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation

Title Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation
Authors Jian Zhang, Avner May, Tri Dao, Christopher Ré
Abstract We investigate how to train kernel approximation methods that generalize well under a memory budget. Building on recent theoretical work, we define a measure of kernel approximation error which we find to be more predictive of the empirical generalization performance of kernel approximation methods than conventional metrics. An important consequence of this definition is that a kernel approximation matrix must be high rank to attain close approximation. Because storing a high-rank approximation is memory intensive, we propose using a low-precision quantization of random Fourier features (LP-RFFs) to build a high-rank approximation under a memory budget. Theoretically, we show quantization has a negligible effect on generalization performance in important settings. Empirically, we demonstrate across four benchmark datasets that LP-RFFs can match the performance of full-precision RFFs and the Nystr"{o}m method, with 3x-10x and 50x-460x less memory, respectively.
Tasks Quantization
Published 2018-10-31
URL http://arxiv.org/abs/1811.00155v2
PDF http://arxiv.org/pdf/1811.00155v2.pdf
PWC https://paperswithcode.com/paper/low-precision-random-fourier-features-for
Repo https://github.com/HazyResearch/lp_rffs
Framework pytorch

HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments

Title HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments
Authors Akari Asai, Sara Evensen, Behzad Golshan, Alon Halevy, Vivian Li, Andrei Lopatenko, Daniela Stepanov, Yoshihiko Suhara, Wang-Chiew Tan, Yinzhan Xu
Abstract The science of happiness is an area of positive psychology concerned with understanding what behaviors make people happy in a sustainable fashion. Recently, there has been interest in developing technologies that help incorporate the findings of the science of happiness into users’ daily lives by steering them towards behaviors that increase happiness. With the goal of building technology that can understand how people express their happy moments in text, we crowd-sourced HappyDB, a corpus of 100,000 happy moments that we make publicly available. This paper describes HappyDB and its properties, and outlines several important NLP problems that can be studied with the help of the corpus. We also apply several state-of-the-art analysis techniques to analyze HappyDB. Our results demonstrate the need for deeper NLP techniques to be developed which makes HappyDB an exciting resource for follow-on research.
Tasks Art Analysis
Published 2018-01-23
URL http://arxiv.org/abs/1801.07746v2
PDF http://arxiv.org/pdf/1801.07746v2.pdf
PWC https://paperswithcode.com/paper/happydb-a-corpus-of-100000-crowdsourced-happy
Repo https://github.com/Nathanlang14/MSCS-5931-HappyDB
Framework none

Computational Optimal Transport

Title Computational Optimal Transport
Authors Gabriel Peyré, Marco Cuturi
Abstract Optimal transport (OT) theory can be informally described using the words of the French mathematician Gaspard Monge (1746-1818): A worker with a shovel in hand has to move a large pile of sand lying on a construction site. The goal of the worker is to erect with all that sand a target pile with a prescribed shape (for example, that of a giant sand castle). Naturally, the worker wishes to minimize her total effort, quantified for instance as the total distance or time spent carrying shovelfuls of sand. Mathematicians interested in OT cast that problem as that of comparing two probability distributions, two different piles of sand of the same volume. They consider all of the many possible ways to morph, transport or reshape the first pile into the second, and associate a “global” cost to every such transport, using the “local” consideration of how much it costs to move a grain of sand from one place to another. Recent years have witnessed the spread of OT in several fields, thanks to the emergence of approximate solvers that can scale to sizes and dimensions that are relevant to data sciences. Thanks to this newfound scalability, OT is being increasingly used to unlock various problems in imaging sciences (such as color or texture processing), computer vision and graphics (for shape manipulation) or machine learning (for regression, classification and density fitting). This short book reviews OT with a bias toward numerical methods and their applications in data sciences, and sheds lights on the theoretical properties of OT that make it particularly useful for some of these applications.
Tasks
Published 2018-03-01
URL https://arxiv.org/abs/1803.00567v4
PDF https://arxiv.org/pdf/1803.00567v4.pdf
PWC https://paperswithcode.com/paper/computational-optimal-transport
Repo https://github.com/currymj/SinkhornDistance.jl
Framework none

Learning to See in the Dark

Title Learning to See in the Dark
Authors Chen Chen, Qifeng Chen, Jia Xu, Vladlen Koltun
Abstract Imaging in low light is challenging due to low photon count and low SNR. Short-exposure images suffer from noise, while long exposure can induce blur and is often impractical. A variety of denoising, deblurring, and enhancement techniques have been proposed, but their effectiveness is limited in extreme conditions, such as video-rate imaging at night. To support the development of learning-based pipelines for low-light image processing, we introduce a dataset of raw short-exposure low-light images, with corresponding long-exposure reference images. Using the presented dataset, we develop a pipeline for processing low-light images, based on end-to-end training of a fully-convolutional network. The network operates directly on raw sensor data and replaces much of the traditional image processing pipeline, which tends to perform poorly on such data. We report promising results on the new dataset, analyze factors that affect performance, and highlight opportunities for future work. The results are shown in the supplementary video at https://youtu.be/qWKUFK7MWvg
Tasks Deblurring, Denoising
Published 2018-05-04
URL http://arxiv.org/abs/1805.01934v1
PDF http://arxiv.org/pdf/1805.01934v1.pdf
PWC https://paperswithcode.com/paper/learning-to-see-in-the-dark
Repo https://github.com/cydonia999/Learning_to_See_in_the_Dark_PyTorch
Framework pytorch

On Adversarial Examples for Character-Level Neural Machine Translation

Title On Adversarial Examples for Character-Level Neural Machine Translation
Authors Javid Ebrahimi, Daniel Lowd, Dejing Dou
Abstract Evaluating on adversarial examples has become a standard procedure to measure robustness of deep learning models. Due to the difficulty of creating white-box adversarial examples for discrete text input, most analyses of the robustness of NLP models have been done through black-box adversarial examples. We investigate adversarial examples for character-level neural machine translation (NMT), and contrast black-box adversaries with a novel white-box adversary, which employs differentiable string-edit operations to rank adversarial changes. We propose two novel types of attacks which aim to remove or change a word in a translation, rather than simply break the NMT. We demonstrate that white-box adversarial examples are significantly stronger than their black-box counterparts in different attack scenarios, which show more serious vulnerabilities than previously known. In addition, after performing adversarial training, which takes only 3 times longer than regular training, we can improve the model’s robustness significantly.
Tasks Machine Translation
Published 2018-06-23
URL http://arxiv.org/abs/1806.09030v1
PDF http://arxiv.org/pdf/1806.09030v1.pdf
PWC https://paperswithcode.com/paper/on-adversarial-examples-for-character-level
Repo https://github.com/alankarj/robust_nlp
Framework none

A Simple Baseline Algorithm for Graph Classification

Title A Simple Baseline Algorithm for Graph Classification
Authors Nathan de Lara, Edouard Pineau
Abstract Graph classification has recently received a lot of attention from various fields of machine learning e.g. kernel methods, sequential modeling or graph embedding. All these approaches offer promising results with different respective strengths and weaknesses. However, most of them rely on complex mathematics and require heavy computational power to achieve their best performance. We propose a simple and fast algorithm based on the spectral decomposition of graph Laplacian to perform graph classification and get a first reference score for a dataset. We show that this method obtains competitive results compared to state-of-the-art algorithms.
Tasks Graph Classification, Graph Embedding
Published 2018-10-22
URL http://arxiv.org/abs/1810.09155v2
PDF http://arxiv.org/pdf/1810.09155v2.pdf
PWC https://paperswithcode.com/paper/a-simple-baseline-algorithm-for-graph
Repo https://github.com/benedekrozemberczki/karateclub
Framework none

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

Title Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Authors Andrew Owens, Alexei A. Efros
Abstract The thud of a bouncing ball, the onset of speech as lips open – when visual and audio events occur together, it suggests that there might be a common, underlying event that produced both signals. In this paper, we argue that the visual and audio components of a video signal should be modeled jointly using a fused multisensory representation. We propose to learn such a representation in a self-supervised way, by training a neural network to predict whether video frames and audio are temporally aligned. We use this learned representation for three applications: (a) sound source localization, i.e. visualizing the source of sound in a video; (b) audio-visual action recognition; and (c) on/off-screen audio source separation, e.g. removing the off-screen translator’s voice from a foreign official’s speech. Code, models, and video results are available on our webpage: http://andrewowens.com/multisensory
Tasks Temporal Action Localization
Published 2018-04-10
URL http://arxiv.org/abs/1804.03641v2
PDF http://arxiv.org/pdf/1804.03641v2.pdf
PWC https://paperswithcode.com/paper/audio-visual-scene-analysis-with-self
Repo https://github.com/andrewowens/multisensory
Framework tf

Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition

Title Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
Authors George Sterpu, Christian Saam, Naomi Harte
Abstract Automatic speech recognition can potentially benefit from the lip motion patterns, complementing acoustic speech to improve the overall recognition performance, particularly in noise. In this paper we propose an audio-visual fusion strategy that goes beyond simple feature concatenation and learns to automatically align the two modalities, leading to enhanced representations which increase the recognition accuracy in both clean and noisy conditions. We test our strategy on the TCD-TIMIT and LRS2 datasets, designed for large vocabulary continuous speech recognition, applying three types of noise at different power ratios. We also exploit state of the art Sequence-to-Sequence architectures, showing that our method can be easily integrated. Results show relative improvements from 7% up to 30% on TCD-TIMIT over the acoustic modality alone, depending on the acoustic noise level. We anticipate that the fusion strategy can easily generalise to many other multimodal tasks which involve correlated modalities. Code available online on GitHub: https://github.com/georgesterpu/Sigmedia-AVSR
Tasks Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published 2018-09-05
URL http://arxiv.org/abs/1809.01728v3
PDF http://arxiv.org/pdf/1809.01728v3.pdf
PWC https://paperswithcode.com/paper/attention-based-audio-visual-fusion-for
Repo https://github.com/georgesterpu/Sigmedia-AVSR
Framework tf

SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task

Title SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task
Authors Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, Dragomir Radev
Abstract Most existing studies in text-to-SQL tasks do not require generating complex SQL queries with multiple clauses or sub-queries, and generalizing to new, unseen databases. In this paper we propose SyntaxSQLNet, a syntax tree network to address the complex and cross-domain text-to-SQL generation task. SyntaxSQLNet employs a SQL specific syntax tree-based decoder with SQL generation path history and table-aware column attention encoders. We evaluate SyntaxSQLNet on the Spider text-to-SQL task, which contains databases with multiple tables and complex SQL queries with multiple SQL clauses and nested queries. We use a database split setting where databases in the test set are unseen during training. Experimental results show that SyntaxSQLNet can handle a significantly greater number of complex SQL examples than prior work, outperforming the previous state-of-the-art model by 7.3% in exact matching accuracy. We also show that SyntaxSQLNet can further improve the performance by an additional 7.5% using a cross-domain augmentation method, resulting in a 14.8% improvement in total. To our knowledge, we are the first to study this complex and cross-domain text-to-SQL task.
Tasks Semantic Parsing, Text-To-Sql
Published 2018-10-11
URL http://arxiv.org/abs/1810.05237v2
PDF http://arxiv.org/pdf/1810.05237v2.pdf
PWC https://paperswithcode.com/paper/syntaxsqlnet-syntax-tree-networks-for-complex
Repo https://github.com/heyanger/sqltools
Framework none

AutoLoc: Weakly-supervised Temporal Action Localization

Title AutoLoc: Weakly-supervised Temporal Action Localization
Authors Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang
Abstract Temporal Action Localization (TAL) in untrimmed video is important for many applications. But it is very expensive to annotate the segment-level ground truth (action class and temporal boundary). This raises the interest of addressing TAL with weak supervision, namely only video-level annotations are available during training). However, the state-of-the-art weakly-supervised TAL methods only focus on generating good Class Activation Sequence (CAS) over time but conduct simple thresholding on CAS to localize actions. In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance. We propose a novel Outer-Inner-Contrastive (OIC) loss to automatically discover the needed segment-level supervision for training such a boundary predictor. Our method achieves dramatically improved performance: under the IoU threshold 0.5, our method improves mAP on THUMOS’14 from 13.7% to 21.2% and mAP on ActivityNet from 7.4% to 27.3%. It is also very encouraging to see that our weakly-supervised method achieves comparable results with some fully-supervised methods.
Tasks Action Localization, Temporal Action Localization, Weakly-supervised Temporal Action Localization
Published 2018-07-22
URL http://arxiv.org/abs/1807.08333v2
PDF http://arxiv.org/pdf/1807.08333v2.pdf
PWC https://paperswithcode.com/paper/autoloc-weakly-supervised-temporal-action
Repo https://github.com/zhengshou/AutoLoc
Framework none

MeshCNN: A Network with an Edge

Title MeshCNN: A Network with an Edge
Authors Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes, Shachar Fleishman, Daniel Cohen-Or
Abstract Polygonal meshes provide an efficient representation for 3D shapes. They explicitly capture both shape surface and topology, and leverage non-uniformity to represent large flat regions as well as sharp, intricate features. This non-uniformity and irregularity, however, inhibits mesh analysis efforts using neural networks that combine convolution and pooling operations. In this paper, we utilize the unique properties of the mesh for a direct analysis of 3D shapes using MeshCNN, a convolutional neural network designed specifically for triangular meshes. Analogous to classic CNNs, MeshCNN combines specialized convolution and pooling layers that operate on the mesh edges, by leveraging their intrinsic geodesic connections. Convolutions are applied on edges and the four edges of their incident triangles, and pooling is applied via an edge collapse operation that retains surface topology, thereby, generating new mesh connectivity for the subsequent convolutions. MeshCNN learns which edges to collapse, thus forming a task-driven process where the network exposes and expands the important features while discarding the redundant ones. We demonstrate the effectiveness of our task-driven pooling on various learning tasks applied to 3D meshes.
Tasks 3D Object Classification, 3D Part Segmentation, 3D Shape Analysis, Cube Engraving Classification
Published 2018-09-16
URL http://arxiv.org/abs/1809.05910v2
PDF http://arxiv.org/pdf/1809.05910v2.pdf
PWC https://paperswithcode.com/paper/meshcnn-a-network-with-an-edge
Repo https://github.com/ranahanocka/MeshCNN
Framework pytorch

Automated Strabismus Detection for Telemedicine Applications

Title Automated Strabismus Detection for Telemedicine Applications
Authors Jiewei Lu, Zhun Fan, Ce Zheng, Jingan Feng, Longtao Huang, Wenji Li, Erik D. Goodman
Abstract Strabismus is one of the most influential ophthalmologic diseases in human’s life. Timely detection of strabismus contributes to its prognosis and treatment. Telemedicine, which has great potential to alleviate the growing demand of the diagnosis of ophthalmologic diseases, is an effective method to achieve timely strabismus detection. In this paper, a tele strabismus dataset is established by the ophthalmologists. Then an end-to-end framework named as RF-CNN is proposed to achieve automated strabismus detection on the established tele strabismus dataset. RF-CNN first performs eye region segmentation on each individual image, and further classifies the segmented eye regions with deep neural networks. The experimental results on the established tele strabismus dataset demonstrates that the proposed RF-CNN can have a good performance on automated strabismus detection for telemedicine application. Code is made publicly available at: https://github.com/jieWeiLu/Strabismus-Detection-for-Telemedicine-Application.
Tasks
Published 2018-09-09
URL http://arxiv.org/abs/1809.02940v3
PDF http://arxiv.org/pdf/1809.02940v3.pdf
PWC https://paperswithcode.com/paper/automated-strabismus-detection-for
Repo https://github.com/jieWeiLu/Strabismus-Detection-for-Telemedicine-Application
Framework tf

Perturbation Robust Representations of Topological Persistence Diagrams

Title Perturbation Robust Representations of Topological Persistence Diagrams
Authors Anirudh Som, Kowshik Thopalli, Karthikeyan Natesan Ramamurthy, Vinay Venkataraman, Ankita Shukla, Pavan Turaga
Abstract Topological methods for data analysis present opportunities for enforcing certain invariances of broad interest in computer vision, including view-point in activity analysis, articulation in shape analysis, and measurement invariance in non-linear dynamical modeling. The increasing success of these methods is attributed to the complementary information that topology provides, as well as availability of tools for computing topological summaries such as persistence diagrams. However, persistence diagrams are multi-sets of points and hence it is not straightforward to fuse them with features used for contemporary machine learning tools like deep-nets. In this paper we present theoretically well-grounded approaches to develop novel perturbation robust topological representations, with the long-term view of making them amenable to fusion with contemporary learning architectures. We term the proposed representation as Perturbed Topological Signatures, which live on a Grassmann manifold and hence can be efficiently used in machine learning pipelines. We explore the use of the proposed descriptor on three applications: 3D shape analysis, view-invariant activity analysis, and non-linear dynamical modeling. We show favorable results in both high-level recognition performance and time-complexity when compared to other baseline methods.
Tasks 3D Shape Analysis
Published 2018-07-26
URL http://arxiv.org/abs/1807.10400v1
PDF http://arxiv.org/pdf/1807.10400v1.pdf
PWC https://paperswithcode.com/paper/perturbation-robust-representations-of
Repo https://github.com/anirudhsom/Perturbed-Topological-Signature
Framework none

NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications

Title NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications
Authors Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam
Abstract This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. While many existing algorithms simplify networks based on the number of MACs or weights, optimizing those indirect metrics may not necessarily reduce the direct metrics, such as latency and energy consumption. To solve this problem, NetAdapt incorporates direct metrics into its adaptation algorithm. These direct metrics are evaluated using empirical measurements, so that detailed knowledge of the platform and toolchain is not required. NetAdapt automatically and progressively simplifies a pre-trained network until the resource budget is met while maximizing the accuracy. Experiment results show that NetAdapt achieves better accuracy versus latency trade-offs on both mobile CPU and mobile GPU, compared with the state-of-the-art automated network simplification algorithms. For image classification on the ImageNet dataset, NetAdapt achieves up to a 1.7$\times$ speedup in measured inference latency with equal or higher accuracy on MobileNets (V1&V2).
Tasks Image Classification
Published 2018-04-09
URL http://arxiv.org/abs/1804.03230v2
PDF http://arxiv.org/pdf/1804.03230v2.pdf
PWC https://paperswithcode.com/paper/netadapt-platform-aware-neural-network
Repo https://github.com/wchliao/multi-task-machine-learning-papers
Framework none

Improving Similarity Search with High-dimensional Locality-sensitive Hashing

Title Improving Similarity Search with High-dimensional Locality-sensitive Hashing
Authors Jaiyam Sharma, Saket Navlakha
Abstract We propose a new class of data-independent locality-sensitive hashing (LSH) algorithms based on the fruit fly olfactory circuit. The fundamental difference of this approach is that, instead of assigning hashes as dense points in a low dimensional space, hashes are assigned in a high dimensional space, which enhances their separability. We show theoretically and empirically that this new family of hash functions is locality-sensitive and preserves rank similarity for inputs in any `p space. We then analyze different variations on this strategy and show empirically that they outperform existing LSH methods for nearest-neighbors search on six benchmark datasets. Finally, we propose a multi-probe version of our algorithm that achieves higher performance for the same query time, or conversely, that maintains performance of prior approaches while taking significantly less indexing time and memory. Overall, our approach leverages the advantages of separability provided by high-dimensional spaces, while still remaining computationally efficient |
Tasks
Published 2018-12-05
URL http://arxiv.org/abs/1812.01844v1
PDF http://arxiv.org/pdf/1812.01844v1.pdf
PWC https://paperswithcode.com/paper/improving-similarity-search-with-high
Repo https://github.com/dataplayer12/Fly-LSH
Framework tf
comments powered by Disqus