October 20, 2019

3031 words 15 mins read

Paper Group AWR 352

Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation. HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments. Computational Optimal Transport. Learning to See in the Dark. On Adversarial Examples for Character-Level Neural Machine Translation. A Simple Baseline Algorithm for Graph Classification. Audio-Visual Scene Ana …

Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation


Title	Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation
Authors	Jian Zhang, Avner May, Tri Dao, Christopher Ré
Abstract	We investigate how to train kernel approximation methods that generalize well under a memory budget. Building on recent theoretical work, we define a measure of kernel approximation error which we find to be more predictive of the empirical generalization performance of kernel approximation methods than conventional metrics. An important consequence of this definition is that a kernel approximation matrix must be high rank to attain close approximation. Because storing a high-rank approximation is memory intensive, we propose using a low-precision quantization of random Fourier features (LP-RFFs) to build a high-rank approximation under a memory budget. Theoretically, we show quantization has a negligible effect on generalization performance in important settings. Empirically, we demonstrate across four benchmark datasets that LP-RFFs can match the performance of full-precision RFFs and the Nystr"{o}m method, with 3x-10x and 50x-460x less memory, respectively.
Tasks	Quantization
Published	2018-10-31
URL	http://arxiv.org/abs/1811.00155v2
PDF	http://arxiv.org/pdf/1811.00155v2.pdf
PWC	https://paperswithcode.com/paper/low-precision-random-fourier-features-for
Repo	https://github.com/HazyResearch/lp_rffs
Framework	pytorch

HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments


Title	HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments
Authors	Akari Asai, Sara Evensen, Behzad Golshan, Alon Halevy, Vivian Li, Andrei Lopatenko, Daniela Stepanov, Yoshihiko Suhara, Wang-Chiew Tan, Yinzhan Xu
Abstract	The science of happiness is an area of positive psychology concerned with understanding what behaviors make people happy in a sustainable fashion. Recently, there has been interest in developing technologies that help incorporate the findings of the science of happiness into users’ daily lives by steering them towards behaviors that increase happiness. With the goal of building technology that can understand how people express their happy moments in text, we crowd-sourced HappyDB, a corpus of 100,000 happy moments that we make publicly available. This paper describes HappyDB and its properties, and outlines several important NLP problems that can be studied with the help of the corpus. We also apply several state-of-the-art analysis techniques to analyze HappyDB. Our results demonstrate the need for deeper NLP techniques to be developed which makes HappyDB an exciting resource for follow-on research.
Tasks	Art Analysis
Published	2018-01-23
URL	http://arxiv.org/abs/1801.07746v2
PDF	http://arxiv.org/pdf/1801.07746v2.pdf
PWC	https://paperswithcode.com/paper/happydb-a-corpus-of-100000-crowdsourced-happy
Repo	https://github.com/Nathanlang14/MSCS-5931-HappyDB
Framework	none

Computational Optimal Transport


Title	Computational Optimal Transport
Authors	Gabriel Peyré, Marco Cuturi
Abstract	Optimal transport (OT) theory can be informally described using the words of the French mathematician Gaspard Monge (1746-1818): A worker with a shovel in hand has to move a large pile of sand lying on a construction site. The goal of the worker is to erect with all that sand a target pile with a prescribed shape (for example, that of a giant sand castle). Naturally, the worker wishes to minimize her total effort, quantified for instance as the total distance or time spent carrying shovelfuls of sand. Mathematicians interested in OT cast that problem as that of comparing two probability distributions, two different piles of sand of the same volume. They consider all of the many possible ways to morph, transport or reshape the first pile into the second, and associate a “global” cost to every such transport, using the “local” consideration of how much it costs to move a grain of sand from one place to another. Recent years have witnessed the spread of OT in several fields, thanks to the emergence of approximate solvers that can scale to sizes and dimensions that are relevant to data sciences. Thanks to this newfound scalability, OT is being increasingly used to unlock various problems in imaging sciences (such as color or texture processing), computer vision and graphics (for shape manipulation) or machine learning (for regression, classification and density fitting). This short book reviews OT with a bias toward numerical methods and their applications in data sciences, and sheds lights on the theoretical properties of OT that make it particularly useful for some of these applications.
Tasks
Published	2018-03-01
URL	https://arxiv.org/abs/1803.00567v4
PDF	https://arxiv.org/pdf/1803.00567v4.pdf
PWC	https://paperswithcode.com/paper/computational-optimal-transport
Repo	https://github.com/currymj/SinkhornDistance.jl
Framework	none

Learning to See in the Dark


Title	Learning to See in the Dark
Authors	Chen Chen, Qifeng Chen, Jia Xu, Vladlen Koltun
Abstract	Imaging in low light is challenging due to low photon count and low SNR. Short-exposure images suffer from noise, while long exposure can induce blur and is often impractical. A variety of denoising, deblurring, and enhancement techniques have been proposed, but their effectiveness is limited in extreme conditions, such as video-rate imaging at night. To support the development of learning-based pipelines for low-light image processing, we introduce a dataset of raw short-exposure low-light images, with corresponding long-exposure reference images. Using the presented dataset, we develop a pipeline for processing low-light images, based on end-to-end training of a fully-convolutional network. The network operates directly on raw sensor data and replaces much of the traditional image processing pipeline, which tends to perform poorly on such data. We report promising results on the new dataset, analyze factors that affect performance, and highlight opportunities for future work. The results are shown in the supplementary video at https://youtu.be/qWKUFK7MWvg
Tasks	Deblurring, Denoising
Published	2018-05-04
URL	http://arxiv.org/abs/1805.01934v1
PDF	http://arxiv.org/pdf/1805.01934v1.pdf
PWC	https://paperswithcode.com/paper/learning-to-see-in-the-dark
Repo	https://github.com/cydonia999/Learning_to_See_in_the_Dark_PyTorch
Framework	pytorch

On Adversarial Examples for Character-Level Neural Machine Translation


Title	On Adversarial Examples for Character-Level Neural Machine Translation
Authors	Javid Ebrahimi, Daniel Lowd, Dejing Dou
Abstract	Evaluating on adversarial examples has become a standard procedure to measure robustness of deep learning models. Due to the difficulty of creating white-box adversarial examples for discrete text input, most analyses of the robustness of NLP models have been done through black-box adversarial examples. We investigate adversarial examples for character-level neural machine translation (NMT), and contrast black-box adversaries with a novel white-box adversary, which employs differentiable string-edit operations to rank adversarial changes. We propose two novel types of attacks which aim to remove or change a word in a translation, rather than simply break the NMT. We demonstrate that white-box adversarial examples are significantly stronger than their black-box counterparts in different attack scenarios, which show more serious vulnerabilities than previously known. In addition, after performing adversarial training, which takes only 3 times longer than regular training, we can improve the model’s robustness significantly.
Tasks	Machine Translation
Published	2018-06-23
URL	http://arxiv.org/abs/1806.09030v1
PDF	http://arxiv.org/pdf/1806.09030v1.pdf
PWC	https://paperswithcode.com/paper/on-adversarial-examples-for-character-level
Repo	https://github.com/alankarj/robust_nlp
Framework	none

A Simple Baseline Algorithm for Graph Classification


Title	A Simple Baseline Algorithm for Graph Classification
Authors	Nathan de Lara, Edouard Pineau
Abstract	Graph classification has recently received a lot of attention from various fields of machine learning e.g. kernel methods, sequential modeling or graph embedding. All these approaches offer promising results with different respective strengths and weaknesses. However, most of them rely on complex mathematics and require heavy computational power to achieve their best performance. We propose a simple and fast algorithm based on the spectral decomposition of graph Laplacian to perform graph classification and get a first reference score for a dataset. We show that this method obtains competitive results compared to state-of-the-art algorithms.
Tasks	Graph Classification, Graph Embedding
Published	2018-10-22
URL	http://arxiv.org/abs/1810.09155v2
PDF	http://arxiv.org/pdf/1810.09155v2.pdf
PWC	https://paperswithcode.com/paper/a-simple-baseline-algorithm-for-graph
Repo	https://github.com/benedekrozemberczki/karateclub
Framework	none

Audio-Visual Scene Analysis with Self-Supervised Multisensory Features


Title	Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Authors	Andrew Owens, Alexei A. Efros
Abstract	The thud of a bouncing ball, the onset of speech as lips open – when visual and audio events occur together, it suggests that there might be a common, underlying event that produced both signals. In this paper, we argue that the visual and audio components of a video signal should be modeled jointly using a fused multisensory representation. We propose to learn such a representation in a self-supervised way, by training a neural network to predict whether video frames and audio are temporally aligned. We use this learned representation for three applications: (a) sound source localization, i.e. visualizing the source of sound in a video; (b) audio-visual action recognition; and (c) on/off-screen audio source separation, e.g. removing the off-screen translator’s voice from a foreign official’s speech. Code, models, and video results are available on our webpage: http://andrewowens.com/multisensory
Tasks	Temporal Action Localization
Published	2018-04-10
URL	http://arxiv.org/abs/1804.03641v2
PDF	http://arxiv.org/pdf/1804.03641v2.pdf
PWC	https://paperswithcode.com/paper/audio-visual-scene-analysis-with-self
Repo	https://github.com/andrewowens/multisensory
Framework	tf

Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition


Title	Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
Authors	George Sterpu, Christian Saam, Naomi Harte
Abstract	Automatic speech recognition can potentially benefit from the lip motion patterns, complementing acoustic speech to improve the overall recognition performance, particularly in noise. In this paper we propose an audio-visual fusion strategy that goes beyond simple feature concatenation and learns to automatically align the two modalities, leading to enhanced representations which increase the recognition accuracy in both clean and noisy conditions. We test our strategy on the TCD-TIMIT and LRS2 datasets, designed for large vocabulary continuous speech recognition, applying three types of noise at different power ratios. We also exploit state of the art Sequence-to-Sequence architectures, showing that our method can be easily integrated. Results show relative improvements from 7% up to 30% on TCD-TIMIT over the acoustic modality alone, depending on the acoustic noise level. We anticipate that the fusion strategy can easily generalise to many other multimodal tasks which involve correlated modalities. Code available online on GitHub: https://github.com/georgesterpu/Sigmedia-AVSR
Tasks	Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published	2018-09-05
URL	http://arxiv.org/abs/1809.01728v3
PDF	http://arxiv.org/pdf/1809.01728v3.pdf
PWC	https://paperswithcode.com/paper/attention-based-audio-visual-fusion-for
Repo	https://github.com/georgesterpu/Sigmedia-AVSR
Framework	tf

SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task


Title	SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task
Authors	Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, Dragomir Radev
Abstract	Most existing studies in text-to-SQL tasks do not require generating complex SQL queries with multiple clauses or sub-queries, and generalizing to new, unseen databases. In this paper we propose SyntaxSQLNet, a syntax tree network to address the complex and cross-domain text-to-SQL generation task. SyntaxSQLNet employs a SQL specific syntax tree-based decoder with SQL generation path history and table-aware column attention encoders. We evaluate SyntaxSQLNet on the Spider text-to-SQL task, which contains databases with multiple tables and complex SQL queries with multiple SQL clauses and nested queries. We use a database split setting where databases in the test set are unseen during training. Experimental results show that SyntaxSQLNet can handle a significantly greater number of complex SQL examples than prior work, outperforming the previous state-of-the-art model by 7.3% in exact matching accuracy. We also show that SyntaxSQLNet can further improve the performance by an additional 7.5% using a cross-domain augmentation method, resulting in a 14.8% improvement in total. To our knowledge, we are the first to study this complex and cross-domain text-to-SQL task.
Tasks	Semantic Parsing, Text-To-Sql
Published	2018-10-11
URL	http://arxiv.org/abs/1810.05237v2
PDF	http://arxiv.org/pdf/1810.05237v2.pdf
PWC	https://paperswithcode.com/paper/syntaxsqlnet-syntax-tree-networks-for-complex
Repo	https://github.com/heyanger/sqltools
Framework	none

AutoLoc: Weakly-supervised Temporal Action Localization


Title	AutoLoc: Weakly-supervised Temporal Action Localization
Authors	Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang
Abstract	Temporal Action Localization (TAL) in untrimmed video is important for many applications. But it is very expensive to annotate the segment-level ground truth (action class and temporal boundary). This raises the interest of addressing TAL with weak supervision, namely only video-level annotations are available during training). However, the state-of-the-art weakly-supervised TAL methods only focus on generating good Class Activation Sequence (CAS) over time but conduct simple thresholding on CAS to localize actions. In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance. We propose a novel Outer-Inner-Contrastive (OIC) loss to automatically discover the needed segment-level supervision for training such a boundary predictor. Our method achieves dramatically improved performance: under the IoU threshold 0.5, our method improves mAP on THUMOS’14 from 13.7% to 21.2% and mAP on ActivityNet from 7.4% to 27.3%. It is also very encouraging to see that our weakly-supervised method achieves comparable results with some fully-supervised methods.
Tasks	Action Localization, Temporal Action Localization, Weakly-supervised Temporal Action Localization
Published	2018-07-22
URL	http://arxiv.org/abs/1807.08333v2
PDF	http://arxiv.org/pdf/1807.08333v2.pdf
PWC	https://paperswithcode.com/paper/autoloc-weakly-supervised-temporal-action
Repo	https://github.com/zhengshou/AutoLoc
Framework	none

MeshCNN: A Network with an Edge


Title	MeshCNN: A Network with an Edge
Authors	Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes, Shachar Fleishman, Daniel Cohen-Or
Abstract	Polygonal meshes provide an efficient representation for 3D shapes. They explicitly capture both shape surface and topology, and leverage non-uniformity to represent large flat regions as well as sharp, intricate features. This non-uniformity and irregularity, however, inhibits mesh analysis efforts using neural networks that combine convolution and pooling operations. In this paper, we utilize the unique properties of the mesh for a direct analysis of 3D shapes using MeshCNN, a convolutional neural network designed specifically for triangular meshes. Analogous to classic CNNs, MeshCNN combines specialized convolution and pooling layers that operate on the mesh edges, by leveraging their intrinsic geodesic connections. Convolutions are applied on edges and the four edges of their incident triangles, and pooling is applied via an edge collapse operation that retains surface topology, thereby, generating new mesh connectivity for the subsequent convolutions. MeshCNN learns which edges to collapse, thus forming a task-driven process where the network exposes and expands the important features while discarding the redundant ones. We demonstrate the effectiveness of our task-driven pooling on various learning tasks applied to 3D meshes.
Tasks	3D Object Classification, 3D Part Segmentation, 3D Shape Analysis, Cube Engraving Classification
Published	2018-09-16
URL	http://arxiv.org/abs/1809.05910v2
PDF	http://arxiv.org/pdf/1809.05910v2.pdf
PWC	https://paperswithcode.com/paper/meshcnn-a-network-with-an-edge
Repo	https://github.com/ranahanocka/MeshCNN
Framework	pytorch

Automated Strabismus Detection for Telemedicine Applications


Title	Automated Strabismus Detection for Telemedicine Applications
Authors	Jiewei Lu, Zhun Fan, Ce Zheng, Jingan Feng, Longtao Huang, Wenji Li, Erik D. Goodman
Abstract	Strabismus is one of the most influential ophthalmologic diseases in human’s life. Timely detection of strabismus contributes to its prognosis and treatment. Telemedicine, which has great potential to alleviate the growing demand of the diagnosis of ophthalmologic diseases, is an effective method to achieve timely strabismus detection. In this paper, a tele strabismus dataset is established by the ophthalmologists. Then an end-to-end framework named as RF-CNN is proposed to achieve automated strabismus detection on the established tele strabismus dataset. RF-CNN first performs eye region segmentation on each individual image, and further classifies the segmented eye regions with deep neural networks. The experimental results on the established tele strabismus dataset demonstrates that the proposed RF-CNN can have a good performance on automated strabismus detection for telemedicine application. Code is made publicly available at: https://github.com/jieWeiLu/Strabismus-Detection-for-Telemedicine-Application.
Tasks
Published	2018-09-09
URL	http://arxiv.org/abs/1809.02940v3
PDF	http://arxiv.org/pdf/1809.02940v3.pdf
PWC	https://paperswithcode.com/paper/automated-strabismus-detection-for
Repo	https://github.com/jieWeiLu/Strabismus-Detection-for-Telemedicine-Application
Framework	tf

Perturbation Robust Representations of Topological Persistence Diagrams


Title	Perturbation Robust Representations of Topological Persistence Diagrams
Authors	Anirudh Som, Kowshik Thopalli, Karthikeyan Natesan Ramamurthy, Vinay Venkataraman, Ankita Shukla, Pavan Turaga
Abstract	Topological methods for data analysis present opportunities for enforcing certain invariances of broad interest in computer vision, including view-point in activity analysis, articulation in shape analysis, and measurement invariance in non-linear dynamical modeling. The increasing success of these methods is attributed to the complementary information that topology provides, as well as availability of tools for computing topological summaries such as persistence diagrams. However, persistence diagrams are multi-sets of points and hence it is not straightforward to fuse them with features used for contemporary machine learning tools like deep-nets. In this paper we present theoretically well-grounded approaches to develop novel perturbation robust topological representations, with the long-term view of making them amenable to fusion with contemporary learning architectures. We term the proposed representation as Perturbed Topological Signatures, which live on a Grassmann manifold and hence can be efficiently used in machine learning pipelines. We explore the use of the proposed descriptor on three applications: 3D shape analysis, view-invariant activity analysis, and non-linear dynamical modeling. We show favorable results in both high-level recognition performance and time-complexity when compared to other baseline methods.
Tasks	3D Shape Analysis
Published	2018-07-26
URL	http://arxiv.org/abs/1807.10400v1
PDF	http://arxiv.org/pdf/1807.10400v1.pdf
PWC	https://paperswithcode.com/paper/perturbation-robust-representations-of
Repo	https://github.com/anirudhsom/Perturbed-Topological-Signature
Framework	none

NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications


Title	NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications
Authors	Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam
Abstract	This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. While many existing algorithms simplify networks based on the number of MACs or weights, optimizing those indirect metrics may not necessarily reduce the direct metrics, such as latency and energy consumption. To solve this problem, NetAdapt incorporates direct metrics into its adaptation algorithm. These direct metrics are evaluated using empirical measurements, so that detailed knowledge of the platform and toolchain is not required. NetAdapt automatically and progressively simplifies a pre-trained network until the resource budget is met while maximizing the accuracy. Experiment results show that NetAdapt achieves better accuracy versus latency trade-offs on both mobile CPU and mobile GPU, compared with the state-of-the-art automated network simplification algorithms. For image classification on the ImageNet dataset, NetAdapt achieves up to a 1.7$\times$ speedup in measured inference latency with equal or higher accuracy on MobileNets (V1&V2).
Tasks	Image Classification
Published	2018-04-09
URL	http://arxiv.org/abs/1804.03230v2
PDF	http://arxiv.org/pdf/1804.03230v2.pdf
PWC	https://paperswithcode.com/paper/netadapt-platform-aware-neural-network
Repo	https://github.com/wchliao/multi-task-machine-learning-papers
Framework	none

Improving Similarity Search with High-dimensional Locality-sensitive Hashing


Title	Improving Similarity Search with High-dimensional Locality-sensitive Hashing
Authors	Jaiyam Sharma, Saket Navlakha
Abstract	We propose a new class of data-independent locality-sensitive hashing (LSH) algorithms based on the fruit fly olfactory circuit. The fundamental difference of this approach is that, instead of assigning hashes as dense points in a low dimensional space, hashes are assigned in a high dimensional space, which enhances their separability. We show theoretically and empirically that this new family of hash functions is locality-sensitive and preserves rank similarity for inputs in any `p space. We then analyze different variations on this strategy and show empirically that they outperform existing LSH methods for nearest-neighbors search on six benchmark datasets. Finally, we propose a multi-probe version of our algorithm that achieves higher performance for the same query time, or conversely, that maintains performance of prior approaches while taking significantly less indexing time and memory. Overall, our approach leverages the advantages of separability provided by high-dimensional spaces, while still remaining computationally efficient \|
Tasks
Published	2018-12-05
URL	http://arxiv.org/abs/1812.01844v1
PDF	http://arxiv.org/pdf/1812.01844v1.pdf
PWC	https://paperswithcode.com/paper/improving-similarity-search-with-high
Repo	https://github.com/dataplayer12/Fly-LSH
Framework	tf