Paper Group AWR 352
Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation. HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments. Computational Optimal Transport. Learning to See in the Dark. On Adversarial Examples for Character-Level Neural Machine Translation. A Simple Baseline Algorithm for Graph Classification. Audio-Visual Scene Ana …
Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation
Title | Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation |
Authors | Jian Zhang, Avner May, Tri Dao, Christopher Ré |
Abstract | We investigate how to train kernel approximation methods that generalize well under a memory budget. Building on recent theoretical work, we define a measure of kernel approximation error which we find to be more predictive of the empirical generalization performance of kernel approximation methods than conventional metrics. An important consequence of this definition is that a kernel approximation matrix must be high rank to attain close approximation. Because storing a high-rank approximation is memory intensive, we propose using a low-precision quantization of random Fourier features (LP-RFFs) to build a high-rank approximation under a memory budget. Theoretically, we show quantization has a negligible effect on generalization performance in important settings. Empirically, we demonstrate across four benchmark datasets that LP-RFFs can match the performance of full-precision RFFs and the Nystr"{o}m method, with 3x-10x and 50x-460x less memory, respectively. |
Tasks | Quantization |
Published | 2018-10-31 |
URL | http://arxiv.org/abs/1811.00155v2 |
http://arxiv.org/pdf/1811.00155v2.pdf | |
PWC | https://paperswithcode.com/paper/low-precision-random-fourier-features-for |
Repo | https://github.com/HazyResearch/lp_rffs |
Framework | pytorch |
HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments
Title | HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments |
Authors | Akari Asai, Sara Evensen, Behzad Golshan, Alon Halevy, Vivian Li, Andrei Lopatenko, Daniela Stepanov, Yoshihiko Suhara, Wang-Chiew Tan, Yinzhan Xu |
Abstract | The science of happiness is an area of positive psychology concerned with understanding what behaviors make people happy in a sustainable fashion. Recently, there has been interest in developing technologies that help incorporate the findings of the science of happiness into users’ daily lives by steering them towards behaviors that increase happiness. With the goal of building technology that can understand how people express their happy moments in text, we crowd-sourced HappyDB, a corpus of 100,000 happy moments that we make publicly available. This paper describes HappyDB and its properties, and outlines several important NLP problems that can be studied with the help of the corpus. We also apply several state-of-the-art analysis techniques to analyze HappyDB. Our results demonstrate the need for deeper NLP techniques to be developed which makes HappyDB an exciting resource for follow-on research. |
Tasks | Art Analysis |
Published | 2018-01-23 |
URL | http://arxiv.org/abs/1801.07746v2 |
http://arxiv.org/pdf/1801.07746v2.pdf | |
PWC | https://paperswithcode.com/paper/happydb-a-corpus-of-100000-crowdsourced-happy |
Repo | https://github.com/Nathanlang14/MSCS-5931-HappyDB |
Framework | none |
Computational Optimal Transport
Title | Computational Optimal Transport |
Authors | Gabriel Peyré, Marco Cuturi |
Abstract | Optimal transport (OT) theory can be informally described using the words of the French mathematician Gaspard Monge (1746-1818): A worker with a shovel in hand has to move a large pile of sand lying on a construction site. The goal of the worker is to erect with all that sand a target pile with a prescribed shape (for example, that of a giant sand castle). Naturally, the worker wishes to minimize her total effort, quantified for instance as the total distance or time spent carrying shovelfuls of sand. Mathematicians interested in OT cast that problem as that of comparing two probability distributions, two different piles of sand of the same volume. They consider all of the many possible ways to morph, transport or reshape the first pile into the second, and associate a “global” cost to every such transport, using the “local” consideration of how much it costs to move a grain of sand from one place to another. Recent years have witnessed the spread of OT in several fields, thanks to the emergence of approximate solvers that can scale to sizes and dimensions that are relevant to data sciences. Thanks to this newfound scalability, OT is being increasingly used to unlock various problems in imaging sciences (such as color or texture processing), computer vision and graphics (for shape manipulation) or machine learning (for regression, classification and density fitting). This short book reviews OT with a bias toward numerical methods and their applications in data sciences, and sheds lights on the theoretical properties of OT that make it particularly useful for some of these applications. |
Tasks | |
Published | 2018-03-01 |
URL | https://arxiv.org/abs/1803.00567v4 |
https://arxiv.org/pdf/1803.00567v4.pdf | |
PWC | https://paperswithcode.com/paper/computational-optimal-transport |
Repo | https://github.com/currymj/SinkhornDistance.jl |
Framework | none |
Learning to See in the Dark
Title | Learning to See in the Dark |
Authors | Chen Chen, Qifeng Chen, Jia Xu, Vladlen Koltun |
Abstract | Imaging in low light is challenging due to low photon count and low SNR. Short-exposure images suffer from noise, while long exposure can induce blur and is often impractical. A variety of denoising, deblurring, and enhancement techniques have been proposed, but their effectiveness is limited in extreme conditions, such as video-rate imaging at night. To support the development of learning-based pipelines for low-light image processing, we introduce a dataset of raw short-exposure low-light images, with corresponding long-exposure reference images. Using the presented dataset, we develop a pipeline for processing low-light images, based on end-to-end training of a fully-convolutional network. The network operates directly on raw sensor data and replaces much of the traditional image processing pipeline, which tends to perform poorly on such data. We report promising results on the new dataset, analyze factors that affect performance, and highlight opportunities for future work. The results are shown in the supplementary video at https://youtu.be/qWKUFK7MWvg |
Tasks | Deblurring, Denoising |
Published | 2018-05-04 |
URL | http://arxiv.org/abs/1805.01934v1 |
http://arxiv.org/pdf/1805.01934v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-to-see-in-the-dark |
Repo | https://github.com/cydonia999/Learning_to_See_in_the_Dark_PyTorch |
Framework | pytorch |
On Adversarial Examples for Character-Level Neural Machine Translation
Title | On Adversarial Examples for Character-Level Neural Machine Translation |
Authors | Javid Ebrahimi, Daniel Lowd, Dejing Dou |
Abstract | Evaluating on adversarial examples has become a standard procedure to measure robustness of deep learning models. Due to the difficulty of creating white-box adversarial examples for discrete text input, most analyses of the robustness of NLP models have been done through black-box adversarial examples. We investigate adversarial examples for character-level neural machine translation (NMT), and contrast black-box adversaries with a novel white-box adversary, which employs differentiable string-edit operations to rank adversarial changes. We propose two novel types of attacks which aim to remove or change a word in a translation, rather than simply break the NMT. We demonstrate that white-box adversarial examples are significantly stronger than their black-box counterparts in different attack scenarios, which show more serious vulnerabilities than previously known. In addition, after performing adversarial training, which takes only 3 times longer than regular training, we can improve the model’s robustness significantly. |
Tasks | Machine Translation |
Published | 2018-06-23 |
URL | http://arxiv.org/abs/1806.09030v1 |
http://arxiv.org/pdf/1806.09030v1.pdf | |
PWC | https://paperswithcode.com/paper/on-adversarial-examples-for-character-level |
Repo | https://github.com/alankarj/robust_nlp |
Framework | none |
A Simple Baseline Algorithm for Graph Classification
Title | A Simple Baseline Algorithm for Graph Classification |
Authors | Nathan de Lara, Edouard Pineau |
Abstract | Graph classification has recently received a lot of attention from various fields of machine learning e.g. kernel methods, sequential modeling or graph embedding. All these approaches offer promising results with different respective strengths and weaknesses. However, most of them rely on complex mathematics and require heavy computational power to achieve their best performance. We propose a simple and fast algorithm based on the spectral decomposition of graph Laplacian to perform graph classification and get a first reference score for a dataset. We show that this method obtains competitive results compared to state-of-the-art algorithms. |
Tasks | Graph Classification, Graph Embedding |
Published | 2018-10-22 |
URL | http://arxiv.org/abs/1810.09155v2 |
http://arxiv.org/pdf/1810.09155v2.pdf | |
PWC | https://paperswithcode.com/paper/a-simple-baseline-algorithm-for-graph |
Repo | https://github.com/benedekrozemberczki/karateclub |
Framework | none |
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Title | Audio-Visual Scene Analysis with Self-Supervised Multisensory Features |
Authors | Andrew Owens, Alexei A. Efros |
Abstract | The thud of a bouncing ball, the onset of speech as lips open – when visual and audio events occur together, it suggests that there might be a common, underlying event that produced both signals. In this paper, we argue that the visual and audio components of a video signal should be modeled jointly using a fused multisensory representation. We propose to learn such a representation in a self-supervised way, by training a neural network to predict whether video frames and audio are temporally aligned. We use this learned representation for three applications: (a) sound source localization, i.e. visualizing the source of sound in a video; (b) audio-visual action recognition; and (c) on/off-screen audio source separation, e.g. removing the off-screen translator’s voice from a foreign official’s speech. Code, models, and video results are available on our webpage: http://andrewowens.com/multisensory |
Tasks | Temporal Action Localization |
Published | 2018-04-10 |
URL | http://arxiv.org/abs/1804.03641v2 |
http://arxiv.org/pdf/1804.03641v2.pdf | |
PWC | https://paperswithcode.com/paper/audio-visual-scene-analysis-with-self |
Repo | https://github.com/andrewowens/multisensory |
Framework | tf |
Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition
Title | Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition |
Authors | George Sterpu, Christian Saam, Naomi Harte |
Abstract | Automatic speech recognition can potentially benefit from the lip motion patterns, complementing acoustic speech to improve the overall recognition performance, particularly in noise. In this paper we propose an audio-visual fusion strategy that goes beyond simple feature concatenation and learns to automatically align the two modalities, leading to enhanced representations which increase the recognition accuracy in both clean and noisy conditions. We test our strategy on the TCD-TIMIT and LRS2 datasets, designed for large vocabulary continuous speech recognition, applying three types of noise at different power ratios. We also exploit state of the art Sequence-to-Sequence architectures, showing that our method can be easily integrated. Results show relative improvements from 7% up to 30% on TCD-TIMIT over the acoustic modality alone, depending on the acoustic noise level. We anticipate that the fusion strategy can easily generalise to many other multimodal tasks which involve correlated modalities. Code available online on GitHub: https://github.com/georgesterpu/Sigmedia-AVSR |
Tasks | Large Vocabulary Continuous Speech Recognition, Speech Recognition |
Published | 2018-09-05 |
URL | http://arxiv.org/abs/1809.01728v3 |
http://arxiv.org/pdf/1809.01728v3.pdf | |
PWC | https://paperswithcode.com/paper/attention-based-audio-visual-fusion-for |
Repo | https://github.com/georgesterpu/Sigmedia-AVSR |
Framework | tf |
SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task
Title | SyntaxSQLNet: Syntax Tree Networks for Complex and Cross-DomainText-to-SQL Task |
Authors | Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, Dragomir Radev |
Abstract | Most existing studies in text-to-SQL tasks do not require generating complex SQL queries with multiple clauses or sub-queries, and generalizing to new, unseen databases. In this paper we propose SyntaxSQLNet, a syntax tree network to address the complex and cross-domain text-to-SQL generation task. SyntaxSQLNet employs a SQL specific syntax tree-based decoder with SQL generation path history and table-aware column attention encoders. We evaluate SyntaxSQLNet on the Spider text-to-SQL task, which contains databases with multiple tables and complex SQL queries with multiple SQL clauses and nested queries. We use a database split setting where databases in the test set are unseen during training. Experimental results show that SyntaxSQLNet can handle a significantly greater number of complex SQL examples than prior work, outperforming the previous state-of-the-art model by 7.3% in exact matching accuracy. We also show that SyntaxSQLNet can further improve the performance by an additional 7.5% using a cross-domain augmentation method, resulting in a 14.8% improvement in total. To our knowledge, we are the first to study this complex and cross-domain text-to-SQL task. |
Tasks | Semantic Parsing, Text-To-Sql |
Published | 2018-10-11 |
URL | http://arxiv.org/abs/1810.05237v2 |
http://arxiv.org/pdf/1810.05237v2.pdf | |
PWC | https://paperswithcode.com/paper/syntaxsqlnet-syntax-tree-networks-for-complex |
Repo | https://github.com/heyanger/sqltools |
Framework | none |
AutoLoc: Weakly-supervised Temporal Action Localization
Title | AutoLoc: Weakly-supervised Temporal Action Localization |
Authors | Zheng Shou, Hang Gao, Lei Zhang, Kazuyuki Miyazawa, Shih-Fu Chang |
Abstract | Temporal Action Localization (TAL) in untrimmed video is important for many applications. But it is very expensive to annotate the segment-level ground truth (action class and temporal boundary). This raises the interest of addressing TAL with weak supervision, namely only video-level annotations are available during training). However, the state-of-the-art weakly-supervised TAL methods only focus on generating good Class Activation Sequence (CAS) over time but conduct simple thresholding on CAS to localize actions. In this paper, we first develop a novel weakly-supervised TAL framework called AutoLoc to directly predict the temporal boundary of each action instance. We propose a novel Outer-Inner-Contrastive (OIC) loss to automatically discover the needed segment-level supervision for training such a boundary predictor. Our method achieves dramatically improved performance: under the IoU threshold 0.5, our method improves mAP on THUMOS’14 from 13.7% to 21.2% and mAP on ActivityNet from 7.4% to 27.3%. It is also very encouraging to see that our weakly-supervised method achieves comparable results with some fully-supervised methods. |
Tasks | Action Localization, Temporal Action Localization, Weakly-supervised Temporal Action Localization |
Published | 2018-07-22 |
URL | http://arxiv.org/abs/1807.08333v2 |
http://arxiv.org/pdf/1807.08333v2.pdf | |
PWC | https://paperswithcode.com/paper/autoloc-weakly-supervised-temporal-action |
Repo | https://github.com/zhengshou/AutoLoc |
Framework | none |
MeshCNN: A Network with an Edge
Title | MeshCNN: A Network with an Edge |
Authors | Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes, Shachar Fleishman, Daniel Cohen-Or |
Abstract | Polygonal meshes provide an efficient representation for 3D shapes. They explicitly capture both shape surface and topology, and leverage non-uniformity to represent large flat regions as well as sharp, intricate features. This non-uniformity and irregularity, however, inhibits mesh analysis efforts using neural networks that combine convolution and pooling operations. In this paper, we utilize the unique properties of the mesh for a direct analysis of 3D shapes using MeshCNN, a convolutional neural network designed specifically for triangular meshes. Analogous to classic CNNs, MeshCNN combines specialized convolution and pooling layers that operate on the mesh edges, by leveraging their intrinsic geodesic connections. Convolutions are applied on edges and the four edges of their incident triangles, and pooling is applied via an edge collapse operation that retains surface topology, thereby, generating new mesh connectivity for the subsequent convolutions. MeshCNN learns which edges to collapse, thus forming a task-driven process where the network exposes and expands the important features while discarding the redundant ones. We demonstrate the effectiveness of our task-driven pooling on various learning tasks applied to 3D meshes. |
Tasks | 3D Object Classification, 3D Part Segmentation, 3D Shape Analysis, Cube Engraving Classification |
Published | 2018-09-16 |
URL | http://arxiv.org/abs/1809.05910v2 |
http://arxiv.org/pdf/1809.05910v2.pdf | |
PWC | https://paperswithcode.com/paper/meshcnn-a-network-with-an-edge |
Repo | https://github.com/ranahanocka/MeshCNN |
Framework | pytorch |
Automated Strabismus Detection for Telemedicine Applications
Title | Automated Strabismus Detection for Telemedicine Applications |
Authors | Jiewei Lu, Zhun Fan, Ce Zheng, Jingan Feng, Longtao Huang, Wenji Li, Erik D. Goodman |
Abstract | Strabismus is one of the most influential ophthalmologic diseases in human’s life. Timely detection of strabismus contributes to its prognosis and treatment. Telemedicine, which has great potential to alleviate the growing demand of the diagnosis of ophthalmologic diseases, is an effective method to achieve timely strabismus detection. In this paper, a tele strabismus dataset is established by the ophthalmologists. Then an end-to-end framework named as RF-CNN is proposed to achieve automated strabismus detection on the established tele strabismus dataset. RF-CNN first performs eye region segmentation on each individual image, and further classifies the segmented eye regions with deep neural networks. The experimental results on the established tele strabismus dataset demonstrates that the proposed RF-CNN can have a good performance on automated strabismus detection for telemedicine application. Code is made publicly available at: https://github.com/jieWeiLu/Strabismus-Detection-for-Telemedicine-Application. |
Tasks | |
Published | 2018-09-09 |
URL | http://arxiv.org/abs/1809.02940v3 |
http://arxiv.org/pdf/1809.02940v3.pdf | |
PWC | https://paperswithcode.com/paper/automated-strabismus-detection-for |
Repo | https://github.com/jieWeiLu/Strabismus-Detection-for-Telemedicine-Application |
Framework | tf |
Perturbation Robust Representations of Topological Persistence Diagrams
Title | Perturbation Robust Representations of Topological Persistence Diagrams |
Authors | Anirudh Som, Kowshik Thopalli, Karthikeyan Natesan Ramamurthy, Vinay Venkataraman, Ankita Shukla, Pavan Turaga |
Abstract | Topological methods for data analysis present opportunities for enforcing certain invariances of broad interest in computer vision, including view-point in activity analysis, articulation in shape analysis, and measurement invariance in non-linear dynamical modeling. The increasing success of these methods is attributed to the complementary information that topology provides, as well as availability of tools for computing topological summaries such as persistence diagrams. However, persistence diagrams are multi-sets of points and hence it is not straightforward to fuse them with features used for contemporary machine learning tools like deep-nets. In this paper we present theoretically well-grounded approaches to develop novel perturbation robust topological representations, with the long-term view of making them amenable to fusion with contemporary learning architectures. We term the proposed representation as Perturbed Topological Signatures, which live on a Grassmann manifold and hence can be efficiently used in machine learning pipelines. We explore the use of the proposed descriptor on three applications: 3D shape analysis, view-invariant activity analysis, and non-linear dynamical modeling. We show favorable results in both high-level recognition performance and time-complexity when compared to other baseline methods. |
Tasks | 3D Shape Analysis |
Published | 2018-07-26 |
URL | http://arxiv.org/abs/1807.10400v1 |
http://arxiv.org/pdf/1807.10400v1.pdf | |
PWC | https://paperswithcode.com/paper/perturbation-robust-representations-of |
Repo | https://github.com/anirudhsom/Perturbed-Topological-Signature |
Framework | none |
NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications
Title | NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications |
Authors | Tien-Ju Yang, Andrew Howard, Bo Chen, Xiao Zhang, Alec Go, Mark Sandler, Vivienne Sze, Hartwig Adam |
Abstract | This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. While many existing algorithms simplify networks based on the number of MACs or weights, optimizing those indirect metrics may not necessarily reduce the direct metrics, such as latency and energy consumption. To solve this problem, NetAdapt incorporates direct metrics into its adaptation algorithm. These direct metrics are evaluated using empirical measurements, so that detailed knowledge of the platform and toolchain is not required. NetAdapt automatically and progressively simplifies a pre-trained network until the resource budget is met while maximizing the accuracy. Experiment results show that NetAdapt achieves better accuracy versus latency trade-offs on both mobile CPU and mobile GPU, compared with the state-of-the-art automated network simplification algorithms. For image classification on the ImageNet dataset, NetAdapt achieves up to a 1.7$\times$ speedup in measured inference latency with equal or higher accuracy on MobileNets (V1&V2). |
Tasks | Image Classification |
Published | 2018-04-09 |
URL | http://arxiv.org/abs/1804.03230v2 |
http://arxiv.org/pdf/1804.03230v2.pdf | |
PWC | https://paperswithcode.com/paper/netadapt-platform-aware-neural-network |
Repo | https://github.com/wchliao/multi-task-machine-learning-papers |
Framework | none |
Improving Similarity Search with High-dimensional Locality-sensitive Hashing
Title | Improving Similarity Search with High-dimensional Locality-sensitive Hashing |
Authors | Jaiyam Sharma, Saket Navlakha |
Abstract | We propose a new class of data-independent locality-sensitive hashing (LSH) algorithms based on the fruit fly olfactory circuit. The fundamental difference of this approach is that, instead of assigning hashes as dense points in a low dimensional space, hashes are assigned in a high dimensional space, which enhances their separability. We show theoretically and empirically that this new family of hash functions is locality-sensitive and preserves rank similarity for inputs in any `p space. We then analyze different variations on this strategy and show empirically that they outperform existing LSH methods for nearest-neighbors search on six benchmark datasets. Finally, we propose a multi-probe version of our algorithm that achieves higher performance for the same query time, or conversely, that maintains performance of prior approaches while taking significantly less indexing time and memory. Overall, our approach leverages the advantages of separability provided by high-dimensional spaces, while still remaining computationally efficient | |
Tasks | |
Published | 2018-12-05 |
URL | http://arxiv.org/abs/1812.01844v1 |
http://arxiv.org/pdf/1812.01844v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-similarity-search-with-high |
Repo | https://github.com/dataplayer12/Fly-LSH |
Framework | tf |