October 18, 2019

2755 words 13 mins read

Paper Group ANR 469

Paper Group ANR 469

Variational Information Bottleneck on Vector Quantized Autoencoders. Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction. Interpretable Spatio-temporal Attention for Video Action Recognition. An Overview of Machine Teaching. Improving Moderation of Online Discussions via Interpretable Neural Models. …

Variational Information Bottleneck on Vector Quantized Autoencoders

Title Variational Information Bottleneck on Vector Quantized Autoencoders
Authors Hanwei Wu, Markus Flierl
Abstract In this paper, we provide an information-theoretic interpretation of the Vector Quantized-Variational Autoencoder (VQ-VAE). We show that the loss function of the original VQ-VAE can be derived from the variational deterministic information bottleneck (VDIB) principle. On the other hand, the VQ-VAE trained by the Expectation Maximization (EM) algorithm can be viewed as an approximation to the variational information bottleneck(VIB) principle.
Tasks
Published 2018-08-02
URL http://arxiv.org/abs/1808.01048v1
PDF http://arxiv.org/pdf/1808.01048v1.pdf
PWC https://paperswithcode.com/paper/variational-information-bottleneck-on-vector
Repo
Framework

Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction

Title Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction
Authors Onno Kampman, Elham J. Barezi, Dario Bertero, Pascale Fung
Abstract We propose a tri-modal architecture to predict Big Five personality trait scores from video clips with different channels for audio, text, and video data. For each channel, stacked Convolutional Neural Networks are employed. The channels are fused both on decision-level and by concatenating their respective fully connected layers. It is shown that a multimodal fusion approach outperforms each single modality channel, with an improvement of 9.4% over the best individual modality (video). Full backpropagation is also shown to be better than a linear combination of modalities, meaning complex interactions between modalities can be leveraged to build better models. Furthermore, we can see the prediction relevance of each modality for each trait. The described model can be used to increase the emotional intelligence of virtual agents.
Tasks
Published 2018-05-02
URL http://arxiv.org/abs/1805.00705v2
PDF http://arxiv.org/pdf/1805.00705v2.pdf
PWC https://paperswithcode.com/paper/investigating-audio-visual-and-text-fusion
Repo
Framework

Interpretable Spatio-temporal Attention for Video Action Recognition

Title Interpretable Spatio-temporal Attention for Video Action Recognition
Authors Lili Meng, Bo Zhao, Bo Chang, Gao Huang, Wei Sun, Frederich Tung, Leonid Sigal
Abstract Inspired by the observation that humans are able to process videos efficiently by only paying attention where and when it is needed, we propose an interpretable and easy plug-in spatial-temporal attention mechanism for video action recognition. For spatial attention, we learn a saliency mask to allow the model to focus on the most salient parts of the feature maps. For temporal attention, we employ a convolutional LSTM based attention mechanism to identify the most relevant frames from an input video. Further, we propose a set of regularizers to ensure that our attention mechanism attends to coherent regions in space and time. Our model not only improves video action recognition accuracy, but also localizes discriminative regions both spatially and temporally, despite being trained in a weakly-supervised manner with only classification labels (no bounding box labels or time frame temporal labels). We evaluate our approach on several public video action recognition datasets with ablation studies. Furthermore, we quantitatively and qualitatively evaluate our model’s ability to localize discriminative regions spatially and critical frames temporally. Experimental results demonstrate the efficacy of our approach, showing superior or comparable accuracy with the state-of-the-art methods while increasing model interpretability.
Tasks Temporal Action Localization
Published 2018-10-01
URL https://arxiv.org/abs/1810.04511v2
PDF https://arxiv.org/pdf/1810.04511v2.pdf
PWC https://paperswithcode.com/paper/interpretable-spatio-temporal-attention-for
Repo
Framework

An Overview of Machine Teaching

Title An Overview of Machine Teaching
Authors Xiaojin Zhu, Adish Singla, Sandra Zilles, Anna N. Rafferty
Abstract In this paper we try to organize machine teaching as a coherent set of ideas. Each idea is presented as varying along a dimension. The collection of dimensions then form the problem space of machine teaching, such that existing teaching problems can be characterized in this space. We hope this organization allows us to gain deeper understanding of individual teaching problems, discover connections among them, and identify gaps in the field.
Tasks
Published 2018-01-18
URL http://arxiv.org/abs/1801.05927v1
PDF http://arxiv.org/pdf/1801.05927v1.pdf
PWC https://paperswithcode.com/paper/an-overview-of-machine-teaching
Repo
Framework

Improving Moderation of Online Discussions via Interpretable Neural Models

Title Improving Moderation of Online Discussions via Interpretable Neural Models
Authors Andrej Švec, Matúš Pikuliak, Marián Šimko, Mária Bieliková
Abstract Growing amount of comments make online discussions difficult to moderate by human moderators only. Antisocial behavior is a common occurrence that often discourages other users from participating in discussion. We propose a neural network based method that partially automates the moderation process. It consists of two steps. First, we detect inappropriate comments for moderators to see. Second, we highlight inappropriate parts within these comments to make the moderation faster. We evaluated our method on data from a major Slovak news discussion platform.
Tasks
Published 2018-09-18
URL http://arxiv.org/abs/1809.06906v1
PDF http://arxiv.org/pdf/1809.06906v1.pdf
PWC https://paperswithcode.com/paper/improving-moderation-of-online-discussions
Repo
Framework

Towards Advanced Phenotypic Mutations in Cartesian Genetic Programming

Title Towards Advanced Phenotypic Mutations in Cartesian Genetic Programming
Authors Roman Kalkreuth
Abstract Cartesian Genetic Programming is often used with a point mutation as the sole genetic operator. In this paper, we propose two phenotypic mutation techniques and take a step towards advanced phenotypic mutations in Cartesian Genetic Programming. The functionality of the proposed mutations is inspired by biological evolution which mutates DNA sequences by inserting and deleting nucleotides. Experiments with symbolic regression and boolean functions problems show a better search performance when the proposed mutations are in use. The results of our experiments indicate that the use of phenotypic mutations could be beneficial for the use of Cartesian Genetic Programming.
Tasks
Published 2018-03-16
URL http://arxiv.org/abs/1803.06127v1
PDF http://arxiv.org/pdf/1803.06127v1.pdf
PWC https://paperswithcode.com/paper/towards-advanced-phenotypic-mutations-in
Repo
Framework

CIKM AnalytiCup 2017 Lazada Product Title Quality Challenge An Ensemble of Deep and Shallow Learning to predict the Quality of Product Titles

Title CIKM AnalytiCup 2017 Lazada Product Title Quality Challenge An Ensemble of Deep and Shallow Learning to predict the Quality of Product Titles
Authors Karamjit Singh, Vishal Sunder
Abstract We present an approach where two different models (Deep and Shallow) are trained separately on the data and a weighted average of the outputs is taken as the final result. For the Deep approach, we use different combinations of models like Convolution Neural Network, pretrained word2vec embeddings and LSTMs to get representations which are then used to train a Deep Neural Network. For Clarity prediction, we also use an Attentive Pooling approach for the pooling operation so as to be aware of the Title-Category pair. For the shallow approach, we use boosting technique LightGBM on features generated using title and categories. We find that an ensemble of these approaches does a better job than using them alone suggesting that the results of the deep and shallow approach are highly complementary
Tasks
Published 2018-04-01
URL http://arxiv.org/abs/1804.01000v1
PDF http://arxiv.org/pdf/1804.01000v1.pdf
PWC https://paperswithcode.com/paper/cikm-analyticup-2017-lazada-product-title
Repo
Framework

Webcam-based Eye Gaze Tracking under Natural Head Movement

Title Webcam-based Eye Gaze Tracking under Natural Head Movement
Authors Kalin Stefanov
Abstract This manuscript investigates and proposes a visual gaze tracker that tackles the problem using only an ordinary web camera and no prior knowledge in any sense (scene set-up, camera intrinsic and/or extrinsic parameters). The tracker we propose is based on the observation that our desire to grant the freedom of natural head movement to the user requires 3D modeling of the scene set-up. Although, using a single low resolution web camera bounds us in dimensions (no depth can be recovered), we propose ways to cope with this drawback and model the scene in front of the user. We tackle this three-dimensional problem by realizing that it can be viewed as series of two-dimensional special cases. Then, we propose a procedure that treats each movement of the user’s head as a special two-dimensional case, hence reducing the complexity of the problem back to two dimensions. Furthermore, the proposed tracker is calibration free and discards this tedious part of all previously mentioned trackers. Experimental results show that the proposed tracker achieves good results, given the restrictions on it. We can report that the tracker commits a mean error of (56.95, 70.82) pixels in x and y direction, respectively, when the user’s head is as static as possible (no chin-rests are used). Furthermore, we can report that the proposed tracker commits a mean error of (87.18, 103.86) pixels in x and y direction, respectively, under natural head movement.
Tasks Calibration
Published 2018-03-29
URL http://arxiv.org/abs/1803.11088v1
PDF http://arxiv.org/pdf/1803.11088v1.pdf
PWC https://paperswithcode.com/paper/webcam-based-eye-gaze-tracking-under-natural
Repo
Framework

Scale equivariance in CNNs with vector fields

Title Scale equivariance in CNNs with vector fields
Authors Diego Marcos, Benjamin Kellenberger, Sylvain Lobry, Devis Tuia
Abstract We study the effect of injecting local scale equivariance into Convolutional Neural Networks. This is done by applying each convolutional filter at multiple scales. The output is a vector field encoding for the maximally activating scale and the scale itself, which is further processed by the following convolutional layers. This allows all the intermediate representations to be locally scale equivariant. We show that this improves the performance of the model by over $20%$ in the scale equivariant task of regressing the scaling factor applied to randomly scaled MNIST digits. Furthermore, we find it also useful for scale invariant tasks, such as the actual classification of randomly scaled digits. This highlights the usefulness of allowing for a compact representation that can also learn relationships between different local scales by keeping internal scale equivariance.
Tasks
Published 2018-07-31
URL http://arxiv.org/abs/1807.11783v1
PDF http://arxiv.org/pdf/1807.11783v1.pdf
PWC https://paperswithcode.com/paper/scale-equivariance-in-cnns-with-vector-fields
Repo
Framework

Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning

Title Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning
Authors Weifeng Ge, Sibei Yang, Yizhou Yu
Abstract Supervised object detection and semantic segmentation require object or even pixel level annotations. When there exist image level labels only, it is challenging for weakly supervised algorithms to achieve accurate predictions. The accuracy achieved by top weakly supervised algorithms is still significantly lower than their fully supervised counterparts. In this paper, we propose a novel weakly supervised curriculum learning pipeline for multi-label object recognition, detection and semantic segmentation. In this pipeline, we first obtain intermediate object localization and pixel labeling results for the training images, and then use such results to train task-specific deep networks in a fully supervised manner. The entire process consists of four stages, including object localization in the training images, filtering and fusing object instances, pixel labeling for the training images, and task-specific network training. To obtain clean object instances in the training images, we propose a novel algorithm for filtering, fusing and classifying object instances collected from multiple solution mechanisms. In this algorithm, we incorporate both metric learning and density-based clustering to filter detected object instances. Experiments show that our weakly supervised pipeline achieves state-of-the-art results in multi-label image classification as well as weakly supervised object detection and very competitive results in weakly supervised semantic segmentation on MS-COCO, PASCAL VOC 2007 and PASCAL VOC 2012.
Tasks Image Classification, Metric Learning, Multi-Label Classification, Object Detection, Object Localization, Object Recognition, Semantic Segmentation, Weakly Supervised Object Detection, Weakly-Supervised Semantic Segmentation
Published 2018-02-26
URL http://arxiv.org/abs/1802.09129v1
PDF http://arxiv.org/pdf/1802.09129v1.pdf
PWC https://paperswithcode.com/paper/multi-evidence-filtering-and-fusion-for-multi
Repo
Framework

Fingerprint liveness detection using local quality features

Title Fingerprint liveness detection using local quality features
Authors Ram Prakash Sharma, Somnath Dey
Abstract Fingerprint-based recognition has been widely deployed in various applications. However, current recognition systems are vulnerable to spoofing attacks which make use of an artificial replica of a fingerprint to deceive the sensors. In such scenarios, fingerprint liveness detection ensures the actual presence of a real legitimate fingerprint in contrast to a fake self-manufactured synthetic sample. In this paper, we propose a static software-based approach using quality features to detect the liveness in a fingerprint. We have extracted features from a single fingerprint image to overcome the issues faced in dynamic software-based approaches which require longer computational time and user cooperation. The proposed system extracts 8 sensor independent quality features on a local level containing minute details of the ridge-valley structure of real and fake fingerprints. These local quality features constitutes a 13-dimensional feature vector. The system is tested on a publically available dataset of LivDet 2009 competition. The experimental results exhibit supremacy of the proposed method over current state-of-the-art approaches providing least average classification error of 5.3% for LivDet 2009. Additionally, effectiveness of the best performing features over LivDet 2009 is evaluated on the latest LivDet 2015 dataset which contain fingerprints fabricated using unknown spoof materials. An average classification error rate of 4.22% is achieved in comparison with 4.49% obtained by the LivDet 2015 winner. Further, the proposed system utilizes a single fingerprint image, which results in faster implications and makes it more user-friendly.
Tasks
Published 2018-06-08
URL http://arxiv.org/abs/1806.02974v1
PDF http://arxiv.org/pdf/1806.02974v1.pdf
PWC https://paperswithcode.com/paper/fingerprint-liveness-detection-using-local
Repo
Framework

Abstractive Dialogue Summarization with Sentence-Gated Modeling Optimized by Dialogue Acts

Title Abstractive Dialogue Summarization with Sentence-Gated Modeling Optimized by Dialogue Acts
Authors Chih-Wen Goo, Yun-Nung Chen
Abstract Neural abstractive summarization has been increasingly studied, where the prior work mainly focused on summarizing single-speaker documents (news, scientific publications, etc). In dialogues, there are different interactions between speakers, which are usually defined as dialogue acts. The interactive signals may provide informative cues for better summarizing dialogues. This paper proposes to explicitly leverage dialogue acts in a neural summarization model, where a sentence-gated mechanism is designed for modeling the relationship between dialogue acts and the summary. The experiments show that our proposed model significantly improves the abstractive summarization performance compared to the state-of-the-art baselines on AMI meeting corpus, demonstrating the usefulness of the interactive signal provided by dialogue acts.
Tasks Abstractive Text Summarization
Published 2018-09-15
URL http://arxiv.org/abs/1809.05715v2
PDF http://arxiv.org/pdf/1809.05715v2.pdf
PWC https://paperswithcode.com/paper/abstractive-dialogue-summarization-with
Repo
Framework

Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation Defenses

Title Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation Defenses
Authors Mohammad Hashemi, Greg Cusack, Eric Keller
Abstract It has been shown that adversaries can craft example inputs to neural networks which are similar to legitimate inputs but have been created to purposely cause the neural network to misclassify the input. These adversarial examples are crafted, for example, by calculating gradients of a carefully defined loss function with respect to the input. As a countermeasure, some researchers have tried to design robust models by blocking or obfuscating gradients, even in white-box settings. Another line of research proposes introducing a separate detector to attempt to detect adversarial examples. This approach also makes use of gradient obfuscation techniques, for example, to prevent the adversary from trying to fool the detector. In this paper, we introduce stochastic substitute training, a gray-box approach that can craft adversarial examples for defenses which obfuscate gradients. For those defenses that have tried to make models more robust, with our technique, an adversary can craft adversarial examples with no knowledge of the defense. For defenses that attempt to detect the adversarial examples, with our technique, an adversary only needs very limited information about the defense to craft adversarial examples. We demonstrate our technique by applying it against two defenses which make models more robust and two defenses which detect adversarial examples.
Tasks
Published 2018-10-23
URL http://arxiv.org/abs/1810.10031v1
PDF http://arxiv.org/pdf/1810.10031v1.pdf
PWC https://paperswithcode.com/paper/stochastic-substitute-training-a-gray-box
Repo
Framework

Energy-Efficient Inference Accelerator for Memory-Augmented Neural Networks on an FPGA

Title Energy-Efficient Inference Accelerator for Memory-Augmented Neural Networks on an FPGA
Authors Seongsik Park, Jaehee Jang, Seijoon Kim, Sungroh Yoon
Abstract Memory-augmented neural networks (MANNs) are designed for question-answering tasks. It is difficult to run a MANN effectively on accelerators designed for other neural networks (NNs), in particular on mobile devices, because MANNs require recurrent data paths and various types of operations related to external memory access. We implement an accelerator for MANNs on a field-programmable gate array (FPGA) based on a data flow architecture. Inference times are also reduced by inference thresholding, which is a data-based maximum inner-product search specialized for natural language tasks. Measurements on the bAbI data show that the energy efficiency of the accelerator (FLOPS/kJ) was higher than that of an NVIDIA TITAN V GPU by a factor of about 125, increasing to 140 with inference thresholding
Tasks Question Answering
Published 2018-05-21
URL http://arxiv.org/abs/1805.07978v2
PDF http://arxiv.org/pdf/1805.07978v2.pdf
PWC https://paperswithcode.com/paper/energy-efficient-inference-accelerator-for
Repo
Framework

A Review on the Application of Natural Computing in Environmental Informatics

Title A Review on the Application of Natural Computing in Environmental Informatics
Authors Andreas Kamilaris
Abstract Natural computing offers new opportunities to understand, model and analyze the complexity of the physical and human-created environment. This paper examines the application of natural computing in environmental informatics, by investigating related work in this research field. Various nature-inspired techniques are presented, which have been employed to solve different relevant problems. Advantages and disadvantages of these techniques are discussed, together with analysis of how natural computing is generally used in environmental research.
Tasks
Published 2018-08-01
URL http://arxiv.org/abs/1808.00260v1
PDF http://arxiv.org/pdf/1808.00260v1.pdf
PWC https://paperswithcode.com/paper/a-review-on-the-application-of-natural
Repo
Framework
comments powered by Disqus