October 18, 2019

2755 words 13 mins read

Paper Group ANR 469

Variational Information Bottleneck on Vector Quantized Autoencoders. Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction. Interpretable Spatio-temporal Attention for Video Action Recognition. An Overview of Machine Teaching. Improving Moderation of Online Discussions via Interpretable Neural Models. …

Variational Information Bottleneck on Vector Quantized Autoencoders


Title	Variational Information Bottleneck on Vector Quantized Autoencoders
Authors	Hanwei Wu, Markus Flierl
Abstract	In this paper, we provide an information-theoretic interpretation of the Vector Quantized-Variational Autoencoder (VQ-VAE). We show that the loss function of the original VQ-VAE can be derived from the variational deterministic information bottleneck (VDIB) principle. On the other hand, the VQ-VAE trained by the Expectation Maximization (EM) algorithm can be viewed as an approximation to the variational information bottleneck(VIB) principle.
Tasks
Published	2018-08-02
URL	http://arxiv.org/abs/1808.01048v1
PDF	http://arxiv.org/pdf/1808.01048v1.pdf
PWC	https://paperswithcode.com/paper/variational-information-bottleneck-on-vector
Repo
Framework

Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction


Title	Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction
Authors	Onno Kampman, Elham J. Barezi, Dario Bertero, Pascale Fung
Abstract	We propose a tri-modal architecture to predict Big Five personality trait scores from video clips with different channels for audio, text, and video data. For each channel, stacked Convolutional Neural Networks are employed. The channels are fused both on decision-level and by concatenating their respective fully connected layers. It is shown that a multimodal fusion approach outperforms each single modality channel, with an improvement of 9.4% over the best individual modality (video). Full backpropagation is also shown to be better than a linear combination of modalities, meaning complex interactions between modalities can be leveraged to build better models. Furthermore, we can see the prediction relevance of each modality for each trait. The described model can be used to increase the emotional intelligence of virtual agents.
Tasks
Published	2018-05-02
URL	http://arxiv.org/abs/1805.00705v2
PDF	http://arxiv.org/pdf/1805.00705v2.pdf
PWC	https://paperswithcode.com/paper/investigating-audio-visual-and-text-fusion
Repo
Framework

Interpretable Spatio-temporal Attention for Video Action Recognition


Title	Interpretable Spatio-temporal Attention for Video Action Recognition
Authors	Lili Meng, Bo Zhao, Bo Chang, Gao Huang, Wei Sun, Frederich Tung, Leonid Sigal
Abstract	Inspired by the observation that humans are able to process videos efficiently by only paying attention where and when it is needed, we propose an interpretable and easy plug-in spatial-temporal attention mechanism for video action recognition. For spatial attention, we learn a saliency mask to allow the model to focus on the most salient parts of the feature maps. For temporal attention, we employ a convolutional LSTM based attention mechanism to identify the most relevant frames from an input video. Further, we propose a set of regularizers to ensure that our attention mechanism attends to coherent regions in space and time. Our model not only improves video action recognition accuracy, but also localizes discriminative regions both spatially and temporally, despite being trained in a weakly-supervised manner with only classification labels (no bounding box labels or time frame temporal labels). We evaluate our approach on several public video action recognition datasets with ablation studies. Furthermore, we quantitatively and qualitatively evaluate our model’s ability to localize discriminative regions spatially and critical frames temporally. Experimental results demonstrate the efficacy of our approach, showing superior or comparable accuracy with the state-of-the-art methods while increasing model interpretability.
Tasks	Temporal Action Localization
Published	2018-10-01
URL	https://arxiv.org/abs/1810.04511v2
PDF	https://arxiv.org/pdf/1810.04511v2.pdf
PWC	https://paperswithcode.com/paper/interpretable-spatio-temporal-attention-for
Repo
Framework

An Overview of Machine Teaching


Title	An Overview of Machine Teaching
Authors	Xiaojin Zhu, Adish Singla, Sandra Zilles, Anna N. Rafferty
Abstract	In this paper we try to organize machine teaching as a coherent set of ideas. Each idea is presented as varying along a dimension. The collection of dimensions then form the problem space of machine teaching, such that existing teaching problems can be characterized in this space. We hope this organization allows us to gain deeper understanding of individual teaching problems, discover connections among them, and identify gaps in the field.
Tasks
Published	2018-01-18
URL	http://arxiv.org/abs/1801.05927v1
PDF	http://arxiv.org/pdf/1801.05927v1.pdf
PWC	https://paperswithcode.com/paper/an-overview-of-machine-teaching
Repo
Framework

Improving Moderation of Online Discussions via Interpretable Neural Models


Title	Improving Moderation of Online Discussions via Interpretable Neural Models
Authors	Andrej Švec, Matúš Pikuliak, Marián Šimko, Mária Bieliková
Abstract	Growing amount of comments make online discussions difficult to moderate by human moderators only. Antisocial behavior is a common occurrence that often discourages other users from participating in discussion. We propose a neural network based method that partially automates the moderation process. It consists of two steps. First, we detect inappropriate comments for moderators to see. Second, we highlight inappropriate parts within these comments to make the moderation faster. We evaluated our method on data from a major Slovak news discussion platform.
Tasks
Published	2018-09-18
URL	http://arxiv.org/abs/1809.06906v1
PDF	http://arxiv.org/pdf/1809.06906v1.pdf
PWC	https://paperswithcode.com/paper/improving-moderation-of-online-discussions
Repo
Framework

Towards Advanced Phenotypic Mutations in Cartesian Genetic Programming


Title	Towards Advanced Phenotypic Mutations in Cartesian Genetic Programming
Authors	Roman Kalkreuth
Abstract	Cartesian Genetic Programming is often used with a point mutation as the sole genetic operator. In this paper, we propose two phenotypic mutation techniques and take a step towards advanced phenotypic mutations in Cartesian Genetic Programming. The functionality of the proposed mutations is inspired by biological evolution which mutates DNA sequences by inserting and deleting nucleotides. Experiments with symbolic regression and boolean functions problems show a better search performance when the proposed mutations are in use. The results of our experiments indicate that the use of phenotypic mutations could be beneficial for the use of Cartesian Genetic Programming.
Tasks
Published	2018-03-16
URL	http://arxiv.org/abs/1803.06127v1
PDF	http://arxiv.org/pdf/1803.06127v1.pdf
PWC	https://paperswithcode.com/paper/towards-advanced-phenotypic-mutations-in
Repo
Framework

CIKM AnalytiCup 2017 Lazada Product Title Quality Challenge An Ensemble of Deep and Shallow Learning to predict the Quality of Product Titles


Title	CIKM AnalytiCup 2017 Lazada Product Title Quality Challenge An Ensemble of Deep and Shallow Learning to predict the Quality of Product Titles
Authors	Karamjit Singh, Vishal Sunder
Abstract	We present an approach where two different models (Deep and Shallow) are trained separately on the data and a weighted average of the outputs is taken as the final result. For the Deep approach, we use different combinations of models like Convolution Neural Network, pretrained word2vec embeddings and LSTMs to get representations which are then used to train a Deep Neural Network. For Clarity prediction, we also use an Attentive Pooling approach for the pooling operation so as to be aware of the Title-Category pair. For the shallow approach, we use boosting technique LightGBM on features generated using title and categories. We find that an ensemble of these approaches does a better job than using them alone suggesting that the results of the deep and shallow approach are highly complementary
Tasks
Published	2018-04-01
URL	http://arxiv.org/abs/1804.01000v1
PDF	http://arxiv.org/pdf/1804.01000v1.pdf
PWC	https://paperswithcode.com/paper/cikm-analyticup-2017-lazada-product-title
Repo
Framework

Webcam-based Eye Gaze Tracking under Natural Head Movement


Title	Webcam-based Eye Gaze Tracking under Natural Head Movement
Authors	Kalin Stefanov
Abstract	This manuscript investigates and proposes a visual gaze tracker that tackles the problem using only an ordinary web camera and no prior knowledge in any sense (scene set-up, camera intrinsic and/or extrinsic parameters). The tracker we propose is based on the observation that our desire to grant the freedom of natural head movement to the user requires 3D modeling of the scene set-up. Although, using a single low resolution web camera bounds us in dimensions (no depth can be recovered), we propose ways to cope with this drawback and model the scene in front of the user. We tackle this three-dimensional problem by realizing that it can be viewed as series of two-dimensional special cases. Then, we propose a procedure that treats each movement of the user’s head as a special two-dimensional case, hence reducing the complexity of the problem back to two dimensions. Furthermore, the proposed tracker is calibration free and discards this tedious part of all previously mentioned trackers. Experimental results show that the proposed tracker achieves good results, given the restrictions on it. We can report that the tracker commits a mean error of (56.95, 70.82) pixels in x and y direction, respectively, when the user’s head is as static as possible (no chin-rests are used). Furthermore, we can report that the proposed tracker commits a mean error of (87.18, 103.86) pixels in x and y direction, respectively, under natural head movement.
Tasks	Calibration
Published	2018-03-29
URL	http://arxiv.org/abs/1803.11088v1
PDF	http://arxiv.org/pdf/1803.11088v1.pdf
PWC	https://paperswithcode.com/paper/webcam-based-eye-gaze-tracking-under-natural
Repo
Framework

Scale equivariance in CNNs with vector fields


Title	Scale equivariance in CNNs with vector fields
Authors	Diego Marcos, Benjamin Kellenberger, Sylvain Lobry, Devis Tuia
Abstract	We study the effect of injecting local scale equivariance into Convolutional Neural Networks. This is done by applying each convolutional filter at multiple scales. The output is a vector field encoding for the maximally activating scale and the scale itself, which is further processed by the following convolutional layers. This allows all the intermediate representations to be locally scale equivariant. We show that this improves the performance of the model by over $20%$ in the scale equivariant task of regressing the scaling factor applied to randomly scaled MNIST digits. Furthermore, we find it also useful for scale invariant tasks, such as the actual classification of randomly scaled digits. This highlights the usefulness of allowing for a compact representation that can also learn relationships between different local scales by keeping internal scale equivariance.
Tasks
Published	2018-07-31
URL	http://arxiv.org/abs/1807.11783v1
PDF	http://arxiv.org/pdf/1807.11783v1.pdf
PWC	https://paperswithcode.com/paper/scale-equivariance-in-cnns-with-vector-fields
Repo
Framework

Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning


Title	Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning
Authors	Weifeng Ge, Sibei Yang, Yizhou Yu
Abstract	Supervised object detection and semantic segmentation require object or even pixel level annotations. When there exist image level labels only, it is challenging for weakly supervised algorithms to achieve accurate predictions. The accuracy achieved by top weakly supervised algorithms is still significantly lower than their fully supervised counterparts. In this paper, we propose a novel weakly supervised curriculum learning pipeline for multi-label object recognition, detection and semantic segmentation. In this pipeline, we first obtain intermediate object localization and pixel labeling results for the training images, and then use such results to train task-specific deep networks in a fully supervised manner. The entire process consists of four stages, including object localization in the training images, filtering and fusing object instances, pixel labeling for the training images, and task-specific network training. To obtain clean object instances in the training images, we propose a novel algorithm for filtering, fusing and classifying object instances collected from multiple solution mechanisms. In this algorithm, we incorporate both metric learning and density-based clustering to filter detected object instances. Experiments show that our weakly supervised pipeline achieves state-of-the-art results in multi-label image classification as well as weakly supervised object detection and very competitive results in weakly supervised semantic segmentation on MS-COCO, PASCAL VOC 2007 and PASCAL VOC 2012.
Tasks	Image Classification, Metric Learning, Multi-Label Classification, Object Detection, Object Localization, Object Recognition, Semantic Segmentation, Weakly Supervised Object Detection, Weakly-Supervised Semantic Segmentation
Published	2018-02-26
URL	http://arxiv.org/abs/1802.09129v1
PDF	http://arxiv.org/pdf/1802.09129v1.pdf
PWC	https://paperswithcode.com/paper/multi-evidence-filtering-and-fusion-for-multi
Repo
Framework

Fingerprint liveness detection using local quality features


Title	Fingerprint liveness detection using local quality features
Authors	Ram Prakash Sharma, Somnath Dey
Abstract	Fingerprint-based recognition has been widely deployed in various applications. However, current recognition systems are vulnerable to spoofing attacks which make use of an artificial replica of a fingerprint to deceive the sensors. In such scenarios, fingerprint liveness detection ensures the actual presence of a real legitimate fingerprint in contrast to a fake self-manufactured synthetic sample. In this paper, we propose a static software-based approach using quality features to detect the liveness in a fingerprint. We have extracted features from a single fingerprint image to overcome the issues faced in dynamic software-based approaches which require longer computational time and user cooperation. The proposed system extracts 8 sensor independent quality features on a local level containing minute details of the ridge-valley structure of real and fake fingerprints. These local quality features constitutes a 13-dimensional feature vector. The system is tested on a publically available dataset of LivDet 2009 competition. The experimental results exhibit supremacy of the proposed method over current state-of-the-art approaches providing least average classification error of 5.3% for LivDet 2009. Additionally, effectiveness of the best performing features over LivDet 2009 is evaluated on the latest LivDet 2015 dataset which contain fingerprints fabricated using unknown spoof materials. An average classification error rate of 4.22% is achieved in comparison with 4.49% obtained by the LivDet 2015 winner. Further, the proposed system utilizes a single fingerprint image, which results in faster implications and makes it more user-friendly.
Tasks
Published	2018-06-08
URL	http://arxiv.org/abs/1806.02974v1
PDF	http://arxiv.org/pdf/1806.02974v1.pdf
PWC	https://paperswithcode.com/paper/fingerprint-liveness-detection-using-local
Repo
Framework

Abstractive Dialogue Summarization with Sentence-Gated Modeling Optimized by Dialogue Acts


Title	Abstractive Dialogue Summarization with Sentence-Gated Modeling Optimized by Dialogue Acts
Authors	Chih-Wen Goo, Yun-Nung Chen
Abstract	Neural abstractive summarization has been increasingly studied, where the prior work mainly focused on summarizing single-speaker documents (news, scientific publications, etc). In dialogues, there are different interactions between speakers, which are usually defined as dialogue acts. The interactive signals may provide informative cues for better summarizing dialogues. This paper proposes to explicitly leverage dialogue acts in a neural summarization model, where a sentence-gated mechanism is designed for modeling the relationship between dialogue acts and the summary. The experiments show that our proposed model significantly improves the abstractive summarization performance compared to the state-of-the-art baselines on AMI meeting corpus, demonstrating the usefulness of the interactive signal provided by dialogue acts.
Tasks	Abstractive Text Summarization
Published	2018-09-15
URL	http://arxiv.org/abs/1809.05715v2
PDF	http://arxiv.org/pdf/1809.05715v2.pdf
PWC	https://paperswithcode.com/paper/abstractive-dialogue-summarization-with
Repo
Framework

Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation Defenses


Title	Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation Defenses
Authors	Mohammad Hashemi, Greg Cusack, Eric Keller
Abstract	It has been shown that adversaries can craft example inputs to neural networks which are similar to legitimate inputs but have been created to purposely cause the neural network to misclassify the input. These adversarial examples are crafted, for example, by calculating gradients of a carefully defined loss function with respect to the input. As a countermeasure, some researchers have tried to design robust models by blocking or obfuscating gradients, even in white-box settings. Another line of research proposes introducing a separate detector to attempt to detect adversarial examples. This approach also makes use of gradient obfuscation techniques, for example, to prevent the adversary from trying to fool the detector. In this paper, we introduce stochastic substitute training, a gray-box approach that can craft adversarial examples for defenses which obfuscate gradients. For those defenses that have tried to make models more robust, with our technique, an adversary can craft adversarial examples with no knowledge of the defense. For defenses that attempt to detect the adversarial examples, with our technique, an adversary only needs very limited information about the defense to craft adversarial examples. We demonstrate our technique by applying it against two defenses which make models more robust and two defenses which detect adversarial examples.
Tasks
Published	2018-10-23
URL	http://arxiv.org/abs/1810.10031v1
PDF	http://arxiv.org/pdf/1810.10031v1.pdf
PWC	https://paperswithcode.com/paper/stochastic-substitute-training-a-gray-box
Repo
Framework

Energy-Efficient Inference Accelerator for Memory-Augmented Neural Networks on an FPGA


Title	Energy-Efficient Inference Accelerator for Memory-Augmented Neural Networks on an FPGA
Authors	Seongsik Park, Jaehee Jang, Seijoon Kim, Sungroh Yoon
Abstract	Memory-augmented neural networks (MANNs) are designed for question-answering tasks. It is difficult to run a MANN effectively on accelerators designed for other neural networks (NNs), in particular on mobile devices, because MANNs require recurrent data paths and various types of operations related to external memory access. We implement an accelerator for MANNs on a field-programmable gate array (FPGA) based on a data flow architecture. Inference times are also reduced by inference thresholding, which is a data-based maximum inner-product search specialized for natural language tasks. Measurements on the bAbI data show that the energy efficiency of the accelerator (FLOPS/kJ) was higher than that of an NVIDIA TITAN V GPU by a factor of about 125, increasing to 140 with inference thresholding
Tasks	Question Answering
Published	2018-05-21
URL	http://arxiv.org/abs/1805.07978v2
PDF	http://arxiv.org/pdf/1805.07978v2.pdf
PWC	https://paperswithcode.com/paper/energy-efficient-inference-accelerator-for
Repo
Framework

A Review on the Application of Natural Computing in Environmental Informatics


Title	A Review on the Application of Natural Computing in Environmental Informatics
Authors	Andreas Kamilaris
Abstract	Natural computing offers new opportunities to understand, model and analyze the complexity of the physical and human-created environment. This paper examines the application of natural computing in environmental informatics, by investigating related work in this research field. Various nature-inspired techniques are presented, which have been employed to solve different relevant problems. Advantages and disadvantages of these techniques are discussed, together with analysis of how natural computing is generally used in environmental research.
Tasks
Published	2018-08-01
URL	http://arxiv.org/abs/1808.00260v1
PDF	http://arxiv.org/pdf/1808.00260v1.pdf
PWC	https://paperswithcode.com/paper/a-review-on-the-application-of-natural
Repo
Framework