January 25, 2020

2985 words 15 mins read

Paper Group ANR 1652

Yelp Food Identification via Image Feature Extraction and Classification. An Investigation of Biases in Web Search Engine Query Suggestions. Differentially Private Survival Function Estimation. Random Directional Attack for Fooling Deep Neural Networks. Attentive Action and Context Factorization. Interactive-predictive neural multimodal systems. Le …

Yelp Food Identification via Image Feature Extraction and Classification


Title	Yelp Food Identification via Image Feature Extraction and Classification
Authors	Fanbo Sun, Zhixiang Gu, Bo Feng
Abstract	Yelp has been one of the most popular local service search engine in US since 2004. It is powered by crowd-sourced text reviews and photo reviews. Restaurant customers and business owners upload photo images to Yelp, including reviewing or advertising either food, drinks, or inside and outside decorations. It is obviously not so effective that labels for food photos rely on human editors, which is an issue should be addressed by innovative machine learning approaches. In this paper, we present a simple but effective approach which can identify up to ten kinds of food via raw photos from the challenge dataset. We use 1) image pre-processing techniques, including filtering and image augmentation, 2) feature extraction via convolutional neural networks (CNN), and 3) three ways of classification algorithms. Then, we illustrate the classification accuracy by tuning parameters for augmentations, CNN, and classification. Our experimental results show this simple but effective approach to identify up to 10 food types from images.
Tasks	Image Augmentation
Published	2019-02-11
URL	http://arxiv.org/abs/1902.05413v1
PDF	http://arxiv.org/pdf/1902.05413v1.pdf
PWC	https://paperswithcode.com/paper/yelp-food-identification-via-image-feature
Repo
Framework

An Investigation of Biases in Web Search Engine Query Suggestions


Title	An Investigation of Biases in Web Search Engine Query Suggestions
Authors	Malte Bonart, Anastasiia Samokhina, Gernot Heisenberg, Philipp Schaer
Abstract	Survey-based studies suggest that search engines are trusted more than social media or even traditional news, although cases of false information or defamation are known. In this study, we analyze query suggestion features of three search engines to see if these features introduce some bias into the query and search process that might compromise this trust. We test our approach on person-related search suggestions by querying the names of politicians from the German Bundestag before the German federal election of 2017. This study introduces a framework to systematically examine and automatically analyze the varieties in different query suggestions for person names offered by major search engines. To test our framework, we collected data from the Google, Bing, and DuckDuckGo query suggestion APIs over a period of four months for 629 different names of German politicians. The suggestions were clustered and statistically analyzed with regards to different biases, like gender, party, or age and with regards to the stability of the suggestions over time.
Tasks
Published	2019-12-02
URL	https://arxiv.org/abs/1912.00651v1
PDF	https://arxiv.org/pdf/1912.00651v1.pdf
PWC	https://paperswithcode.com/paper/an-investigation-of-biases-in-web-search
Repo
Framework

Differentially Private Survival Function Estimation


Title	Differentially Private Survival Function Estimation
Authors	Lovedeep Gondara, Ke Wang
Abstract	Survival function estimation is used in many disciplines, but it is most common in medical analytics in the form of the Kaplan-Meier estimator. Sensitive data (patient records) is used in the estimation without any explicit control on the information leakage, which is a significant privacy concern. We propose a first differentially private estimator of the survival function and show that it can be easily extended to provide differentially private confidence intervals and test statistics without spending any extra privacy budget. We further provide extensions for differentially private estimation of the competing risk cumulative incidence function, Nelson-Aalen’s estimator for the hazard function, etc. Using eleven real-life clinical datasets, we provide empirical evidence that our proposed method provides good utility while simultaneously providing strong privacy guarantees.
Tasks
Published	2019-10-04
URL	https://arxiv.org/abs/1910.05108v2
PDF	https://arxiv.org/pdf/1910.05108v2.pdf
PWC	https://paperswithcode.com/paper/differentially-private-survival-function
Repo
Framework

Random Directional Attack for Fooling Deep Neural Networks


Title	Random Directional Attack for Fooling Deep Neural Networks
Authors	Wenjian Luo, Chenwang Wu, Nan Zhou, Li Ni
Abstract	Deep neural networks (DNNs) have been widely used in many fields such as images processing, speech recognition; however, they are vulnerable to adversarial examples, and this is a security issue worthy of attention. Because the training process of DNNs converge the loss by updating the weights along the gradient descent direction, many gradient-based methods attempt to destroy the DNN model by adding perturbations in the gradient direction. Unfortunately, as the model is nonlinear in most cases, the addition of perturbations in the gradient direction does not necessarily increase loss. Thus, we propose a random directed attack (RDA) for generating adversarial examples in this paper. Rather than limiting the gradient direction to generate an attack, RDA searches the attack direction based on hill climbing and uses multiple strategies to avoid local optima that cause attack failure. Compared with state-of-the-art gradient-based methods, the attack performance of RDA is very competitive. Moreover, RDA can attack without any internal knowledge of the model, and its performance under black-box attack is similar to that of the white-box attack in most cases, which is difficult to achieve using existing gradient-based attack methods.
Tasks	Speech Recognition
Published	2019-08-06
URL	https://arxiv.org/abs/1908.02658v1
PDF	https://arxiv.org/pdf/1908.02658v1.pdf
PWC	https://paperswithcode.com/paper/random-directional-attack-for-fooling-deep
Repo
Framework

Attentive Action and Context Factorization


Title	Attentive Action and Context Factorization
Authors	Yang Wang, Vinh Tran, Gedas Bertasius, Lorenzo Torresani, Minh Hoai
Abstract	We propose a method for human action recognition, one that can localize the spatiotemporal regions that `define’ the actions. This is a challenging task due to the subtlety of human actions in video and the co-occurrence of contextual elements. To address this challenge, we utilize conjugate samples of human actions, which are video clips that are contextually similar to human action samples but do not contain the action. We introduce a novel attentional mechanism that can spatially and temporally separate human actions from the co-occurring contextual factors. The separation of the action and context factors is weakly supervised, eliminating the need for laboriously detailed annotation of these two factors in training samples. Our method can be used to build human action classifiers with higher accuracy and better interpretability. Experiments on several human action recognition datasets demonstrate the quantitative and qualitative benefits of our approach. \|
Tasks	Temporal Action Localization
Published	2019-04-10
URL	http://arxiv.org/abs/1904.05410v1
PDF	http://arxiv.org/pdf/1904.05410v1.pdf
PWC	https://paperswithcode.com/paper/attentive-action-and-context-factorization
Repo
Framework

Interactive-predictive neural multimodal systems


Title	Interactive-predictive neural multimodal systems
Authors	Álvaro Peris, Francisco Casacuberta
Abstract	Despite the advances achieved by neural models in sequence to sequence learning, exploited in a variety of tasks, they still make errors. In many use cases, these are corrected by a human expert in a posterior revision process. The interactive-predictive framework aims to minimize the human effort spent on this process by considering partial corrections for iteratively refining the hypothesis. In this work, we generalize the interactive-predictive approach, typically applied in to machine translation field, to tackle other multimodal problems namely, image and video captioning. We study the application of this framework to multimodal neural sequence to sequence models. We show that, following this framework, we approximately halve the effort spent for correcting the outputs generated by the automatic systems. Moreover, we deploy our systems in a publicly accessible demonstration, that allows to better understand the behavior of the interactive-predictive framework.
Tasks	Machine Translation, Video Captioning
Published	2019-05-30
URL	https://arxiv.org/abs/1905.12980v1
PDF	https://arxiv.org/pdf/1905.12980v1.pdf
PWC	https://paperswithcode.com/paper/interactive-predictive-neural-multimodal
Repo
Framework

Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model


Title	Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model
Authors	Erik Nijkamp, Mitch Hill, Song-Chun Zhu, Ying Nian Wu
Abstract	This paper studies a curious phenomenon in learning energy-based model (EBM) using MCMC. In each learning iteration, we generate synthesized examples by running a non-convergent, non-mixing, and non-persistent short-run MCMC toward the current model, always starting from the same initial distribution such as uniform noise distribution, and always running a fixed number of MCMC steps. After generating synthesized examples, we then update the model parameters according to the maximum likelihood learning gradient, as if the synthesized examples are fair samples from the current model. We treat this non-convergent short-run MCMC as a learned generator model or a flow model. We provide arguments for treating the learned non-convergent short-run MCMC as a valid model. We show that the learned short-run MCMC is capable of generating realistic images. More interestingly, unlike traditional EBM or MCMC, the learned short-run MCMC is capable of reconstructing observed images and interpolating between images, like generator or flow models. The code can be found in the Appendix.
Tasks
Published	2019-04-22
URL	https://arxiv.org/abs/1904.09770v4
PDF	https://arxiv.org/pdf/1904.09770v4.pdf
PWC	https://paperswithcode.com/paper/on-learning-non-convergent-short-run-mcmc
Repo
Framework

Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed


Title	Exploring attention mechanism for acoustic-based classification of speech utterances into system-directed and non-system-directed
Authors	Atta Norouzian, Bogdan Mazoure, Dermot Connolly, Daniel Willett
Abstract	Voice controlled virtual assistants (VAs) are now available in smartphones, cars, and standalone devices in homes. In most cases, the user needs to first “wake-up” the VA by saying a particular word/phrase every time he or she wants the VA to do something. Eliminating the need for saying the wake-up word for every interaction could improve the user experience. This would require the VA to have the capability to detect the speech that is being directed at it and respond accordingly. In other words, the challenge is to distinguish between system-directed and non-system-directed speech utterances. In this paper, we present a number of neural network architectures for tackling this classification problem based on using only acoustic features. These architectures are based on using convolutional, recurrent and feed-forward layers. In addition, we investigate the use of an attention mechanism applied to the output of the convolutional and the recurrent layers. It is shown that incorporating the proposed attention mechanism into the models always leads to significant improvement in classification accuracy. The best model achieved equal error rates of 16.25 and 15.62 percents on two distinct realistic datasets.
Tasks
Published	2019-02-01
URL	http://arxiv.org/abs/1902.00570v1
PDF	http://arxiv.org/pdf/1902.00570v1.pdf
PWC	https://paperswithcode.com/paper/exploring-attention-mechanism-for-acoustic
Repo
Framework

Enhancing Semantic Word Representations by Embedding Deeper Word Relationships


Title	Enhancing Semantic Word Representations by Embedding Deeper Word Relationships
Authors	Anupiya Nugaliyadde, Kok Wai Wong, Ferdous Sohel, Hong Xie
Abstract	Word representations are created using analogy context-based statistics and lexical relations on words. Word representations are inputs for the learning models in Natural Language Understanding (NLU) tasks. However, to understand language, knowing only the context is not sufficient. Reading between the lines is a key component of NLU. Embedding deeper word relationships which are not represented in the context enhances the word representation. This paper presents a word embedding which combines an analogy, context-based statistics using Word2Vec, and deeper word relationships using Conceptnet, to create an expanded word representation. In order to fine-tune the word representation, Self-Organizing Map is used to optimize it. The proposed word representation is compared with semantic word representations using Simlex 999. Furthermore, the use of 3D visual representations has shown to be capable of representing the similarity and association between words. The proposed word representation shows a Spearman correlation score of 0.886 and provided the best results when compared to the current state-of-the-art methods, and exceed the human performance of 0.78.
Tasks
Published	2019-01-22
URL	http://arxiv.org/abs/1901.07176v1
PDF	http://arxiv.org/pdf/1901.07176v1.pdf
PWC	https://paperswithcode.com/paper/enhancing-semantic-word-representations-by
Repo
Framework

A Neural, Interactive-predictive System for Multimodal Sequence to Sequence Tasks


Title	A Neural, Interactive-predictive System for Multimodal Sequence to Sequence Tasks
Authors	Álvaro Peris, Francisco Casacuberta
Abstract	We present a demonstration of a neural interactive-predictive system for tackling multimodal sequence to sequence tasks. The system generates text predictions to different sequence to sequence tasks: machine translation, image and video captioning. These predictions are revised by a human agent, who introduces corrections in the form of characters. The system reacts to each correction, providing alternative hypotheses, compelling with the feedback provided by the user. The final objective is to reduce the human effort required during this correction process. This system is implemented following a client-server architecture. For accessing the system, we developed a website, which communicates with the neural model, hosted in a local server. From this website, the different tasks can be tackled following the interactive-predictive framework. We open-source all the code developed for building this system. The demonstration in hosted in http://casmacat.prhlt.upv.es/interactive-seq2seq.
Tasks	Machine Translation, Video Captioning
Published	2019-05-20
URL	https://arxiv.org/abs/1905.08181v2
PDF	https://arxiv.org/pdf/1905.08181v2.pdf
PWC	https://paperswithcode.com/paper/a-neural-interactive-predictive-system-for
Repo
Framework

Performance-Efficiency Trade-off of Low-Precision Numerical Formats in Deep Neural Networks


Title	Performance-Efficiency Trade-off of Low-Precision Numerical Formats in Deep Neural Networks
Authors	Zachariah Carmichael, Hamed F. Langroudi, Char Khazanov, Jeffrey Lillie, John L. Gustafson, Dhireesha Kudithipudi
Abstract	Deep neural networks (DNNs) have been demonstrated as effective prognostic models across various domains, e.g. natural language processing, computer vision, and genomics. However, modern-day DNNs demand high compute and memory storage for executing any reasonably complex task. To optimize the inference time and alleviate the power consumption of these networks, DNN accelerators with low-precision representations of data and DNN parameters are being actively studied. An interesting research question is in how low-precision networks can be ported to edge-devices with similar performance as high-precision networks. In this work, we employ the fixed-point, floating point, and posit numerical formats at $\leq$8-bit precision within a DNN accelerator, Deep Positron, with exact multiply-and-accumulate (EMAC) units for inference. A unified analysis quantifies the trade-offs between overall network efficiency and performance across five classification tasks. Our results indicate that posits are a natural fit for DNN inference, outperforming at $\leq$8-bit precision, and can be realized with competitive resource requirements relative to those of floating point.
Tasks
Published	2019-03-25
URL	http://arxiv.org/abs/1903.10584v1
PDF	http://arxiv.org/pdf/1903.10584v1.pdf
PWC	https://paperswithcode.com/paper/performance-efficiency-trade-off-of-low
Repo
Framework

Investigating Channel Pruning through Structural Redundancy Reduction – A Statistical Study


Title	Investigating Channel Pruning through Structural Redundancy Reduction – A Statistical Study
Authors	Chengcheng Li, Zi Wang, Dali Wang, Xiangyang Wang, Hairong Qi
Abstract	Most existing channel pruning methods formulate the pruning task from a perspective of inefficiency reduction which iteratively rank and remove the least important filters, or find the set of filters that minimizes some reconstruction errors after pruning. In this work, we investigate the channel pruning from a new perspective with statistical modeling. We hypothesize that the number of filters at a certain layer reflects the level of ‘redundancy’ in that layer and thus formulate the pruning problem from the aspect of redundancy reduction. Based on both theoretic analysis and empirical studies, we make an important discovery: randomly pruning filters from layers of high redundancy outperforms pruning the least important filters across all layers based on the state-of-the-art ranking criterion. These results advance our understanding of pruning and further testify to the recent findings that the structure of the pruned model plays a key role in the network efficiency as compared to inherited weights.
Tasks
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06498v3
PDF	https://arxiv.org/pdf/1905.06498v3.pdf
PWC	https://paperswithcode.com/paper/investigating-channel-pruning-through
Repo
Framework

Deep Learning Approach for Receipt Recognition


Title	Deep Learning Approach for Receipt Recognition
Authors	Anh Duc Le, Dung Van Pham, Tuan Anh Nguyen
Abstract	Inspired by the recent successes of deep learning on Computer Vision and Natural Language Processing, we present a deep learning approach for recognizing scanned receipts. The recognition system has two main modules: text detection based on Connectionist Text Proposal Network and text recognition based on Attention-based Encoder-Decoder. We also proposed pre-processing to extract receipt area and OCR verification to ignore handwriting. The experiments on the dataset of the Robust Reading Challenge on Scanned Receipts OCR and Information Extraction 2019 demonstrate that the accuracies were improved by integrating the pre-processing and the OCR verification. Our recognition system achieved 71.9% of the F1 score for detection and recognition task.
Tasks	Optical Character Recognition
Published	2019-05-30
URL	https://arxiv.org/abs/1905.12817v1
PDF	https://arxiv.org/pdf/1905.12817v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-approach-for-receipt
Repo
Framework

Cross-Domain Collaborative Filtering via Translation-based Learning


Title	Cross-Domain Collaborative Filtering via Translation-based Learning
Authors	Dimitrios Rafailidis
Abstract	With the proliferation of social media platforms and e-commerce sites, several cross-domain collaborative filtering strategies have been recently introduced to transfer the knowledge of user preferences across domains. The main challenge of cross-domain recommendation is to weigh and learn users’ different behaviors in multiple domains. In this paper, we propose a Cross-Domain collaborative filtering model following a Translation-based strategy, namely CDT. In our model, we learn the embedding space with translation vectors and capture high-order feature interactions in users’ multiple preferences across domains. In doing so, we efficiently compute the transitivity between feature latent embeddings, that is if feature pairs have high interaction weights in the latent space, then feature embeddings with no observed interactions across the domains will be closely related as well. We formulate our objective function as a ranking problem in factorization machines and learn the model’s parameters via gradient descent. In addition, to better capture the non-linearity in user preferences across domains we extend the proposed CDT model by using a deep learning strategy, namely DeepCDT. Our experiments on six publicly available cross-domain tasks demonstrate the effectiveness of the proposed models, outperforming other state-of-the-art cross-domain strategies.
Tasks
Published	2019-08-11
URL	https://arxiv.org/abs/1908.06169v1
PDF	https://arxiv.org/pdf/1908.06169v1.pdf
PWC	https://paperswithcode.com/paper/cross-domain-collaborative-filtering-via
Repo
Framework

Generating Token-Level Explanations for Natural Language Inference


Title	Generating Token-Level Explanations for Natural Language Inference
Authors	James Thorne, Andreas Vlachos, Christos Christodoulopoulos, Arpit Mittal
Abstract	The task of Natural Language Inference (NLI) is widely modeled as supervised sentence pair classification. While there has been a lot of work recently on generating explanations of the predictions of classifiers on a single piece of text, there have been no attempts to generate explanations of classifiers operating on pairs of sentences. In this paper, we show that it is possible to generate token-level explanations for NLI without the need for training data explicitly annotated for this purpose. We use a simple LSTM architecture and evaluate both LIME and Anchor explanations for this task. We compare these to a Multiple Instance Learning (MIL) method that uses thresholded attention make token-level predictions. The approach we present in this paper is a novel extension of zero-shot single-sentence tagging to sentence pairs for NLI. We conduct our experiments on the well-studied SNLI dataset that was recently augmented with manually annotation of the tokens that explain the entailment relation. We find that our white-box MIL-based method, while orders of magnitude faster, does not reach the same accuracy as the black-box methods.
Tasks	Multiple Instance Learning, Natural Language Inference
Published	2019-04-24
URL	http://arxiv.org/abs/1904.10717v1
PDF	http://arxiv.org/pdf/1904.10717v1.pdf
PWC	https://paperswithcode.com/paper/generating-token-level-explanations-for
Repo
Framework