Paper Group ANR 469
Variational Information Bottleneck on Vector Quantized Autoencoders. Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction. Interpretable Spatio-temporal Attention for Video Action Recognition. An Overview of Machine Teaching. Improving Moderation of Online Discussions via Interpretable Neural Models. …
Variational Information Bottleneck on Vector Quantized Autoencoders
Title | Variational Information Bottleneck on Vector Quantized Autoencoders |
Authors | Hanwei Wu, Markus Flierl |
Abstract | In this paper, we provide an information-theoretic interpretation of the Vector Quantized-Variational Autoencoder (VQ-VAE). We show that the loss function of the original VQ-VAE can be derived from the variational deterministic information bottleneck (VDIB) principle. On the other hand, the VQ-VAE trained by the Expectation Maximization (EM) algorithm can be viewed as an approximation to the variational information bottleneck(VIB) principle. |
Tasks | |
Published | 2018-08-02 |
URL | http://arxiv.org/abs/1808.01048v1 |
http://arxiv.org/pdf/1808.01048v1.pdf | |
PWC | https://paperswithcode.com/paper/variational-information-bottleneck-on-vector |
Repo | |
Framework | |
Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction
Title | Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction |
Authors | Onno Kampman, Elham J. Barezi, Dario Bertero, Pascale Fung |
Abstract | We propose a tri-modal architecture to predict Big Five personality trait scores from video clips with different channels for audio, text, and video data. For each channel, stacked Convolutional Neural Networks are employed. The channels are fused both on decision-level and by concatenating their respective fully connected layers. It is shown that a multimodal fusion approach outperforms each single modality channel, with an improvement of 9.4% over the best individual modality (video). Full backpropagation is also shown to be better than a linear combination of modalities, meaning complex interactions between modalities can be leveraged to build better models. Furthermore, we can see the prediction relevance of each modality for each trait. The described model can be used to increase the emotional intelligence of virtual agents. |
Tasks | |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.00705v2 |
http://arxiv.org/pdf/1805.00705v2.pdf | |
PWC | https://paperswithcode.com/paper/investigating-audio-visual-and-text-fusion |
Repo | |
Framework | |
Interpretable Spatio-temporal Attention for Video Action Recognition
Title | Interpretable Spatio-temporal Attention for Video Action Recognition |
Authors | Lili Meng, Bo Zhao, Bo Chang, Gao Huang, Wei Sun, Frederich Tung, Leonid Sigal |
Abstract | Inspired by the observation that humans are able to process videos efficiently by only paying attention where and when it is needed, we propose an interpretable and easy plug-in spatial-temporal attention mechanism for video action recognition. For spatial attention, we learn a saliency mask to allow the model to focus on the most salient parts of the feature maps. For temporal attention, we employ a convolutional LSTM based attention mechanism to identify the most relevant frames from an input video. Further, we propose a set of regularizers to ensure that our attention mechanism attends to coherent regions in space and time. Our model not only improves video action recognition accuracy, but also localizes discriminative regions both spatially and temporally, despite being trained in a weakly-supervised manner with only classification labels (no bounding box labels or time frame temporal labels). We evaluate our approach on several public video action recognition datasets with ablation studies. Furthermore, we quantitatively and qualitatively evaluate our model’s ability to localize discriminative regions spatially and critical frames temporally. Experimental results demonstrate the efficacy of our approach, showing superior or comparable accuracy with the state-of-the-art methods while increasing model interpretability. |
Tasks | Temporal Action Localization |
Published | 2018-10-01 |
URL | https://arxiv.org/abs/1810.04511v2 |
https://arxiv.org/pdf/1810.04511v2.pdf | |
PWC | https://paperswithcode.com/paper/interpretable-spatio-temporal-attention-for |
Repo | |
Framework | |
An Overview of Machine Teaching
Title | An Overview of Machine Teaching |
Authors | Xiaojin Zhu, Adish Singla, Sandra Zilles, Anna N. Rafferty |
Abstract | In this paper we try to organize machine teaching as a coherent set of ideas. Each idea is presented as varying along a dimension. The collection of dimensions then form the problem space of machine teaching, such that existing teaching problems can be characterized in this space. We hope this organization allows us to gain deeper understanding of individual teaching problems, discover connections among them, and identify gaps in the field. |
Tasks | |
Published | 2018-01-18 |
URL | http://arxiv.org/abs/1801.05927v1 |
http://arxiv.org/pdf/1801.05927v1.pdf | |
PWC | https://paperswithcode.com/paper/an-overview-of-machine-teaching |
Repo | |
Framework | |
Improving Moderation of Online Discussions via Interpretable Neural Models
Title | Improving Moderation of Online Discussions via Interpretable Neural Models |
Authors | Andrej Švec, Matúš Pikuliak, Marián Šimko, Mária Bieliková |
Abstract | Growing amount of comments make online discussions difficult to moderate by human moderators only. Antisocial behavior is a common occurrence that often discourages other users from participating in discussion. We propose a neural network based method that partially automates the moderation process. It consists of two steps. First, we detect inappropriate comments for moderators to see. Second, we highlight inappropriate parts within these comments to make the moderation faster. We evaluated our method on data from a major Slovak news discussion platform. |
Tasks | |
Published | 2018-09-18 |
URL | http://arxiv.org/abs/1809.06906v1 |
http://arxiv.org/pdf/1809.06906v1.pdf | |
PWC | https://paperswithcode.com/paper/improving-moderation-of-online-discussions |
Repo | |
Framework | |
Towards Advanced Phenotypic Mutations in Cartesian Genetic Programming
Title | Towards Advanced Phenotypic Mutations in Cartesian Genetic Programming |
Authors | Roman Kalkreuth |
Abstract | Cartesian Genetic Programming is often used with a point mutation as the sole genetic operator. In this paper, we propose two phenotypic mutation techniques and take a step towards advanced phenotypic mutations in Cartesian Genetic Programming. The functionality of the proposed mutations is inspired by biological evolution which mutates DNA sequences by inserting and deleting nucleotides. Experiments with symbolic regression and boolean functions problems show a better search performance when the proposed mutations are in use. The results of our experiments indicate that the use of phenotypic mutations could be beneficial for the use of Cartesian Genetic Programming. |
Tasks | |
Published | 2018-03-16 |
URL | http://arxiv.org/abs/1803.06127v1 |
http://arxiv.org/pdf/1803.06127v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-advanced-phenotypic-mutations-in |
Repo | |
Framework | |
CIKM AnalytiCup 2017 Lazada Product Title Quality Challenge An Ensemble of Deep and Shallow Learning to predict the Quality of Product Titles
Title | CIKM AnalytiCup 2017 Lazada Product Title Quality Challenge An Ensemble of Deep and Shallow Learning to predict the Quality of Product Titles |
Authors | Karamjit Singh, Vishal Sunder |
Abstract | We present an approach where two different models (Deep and Shallow) are trained separately on the data and a weighted average of the outputs is taken as the final result. For the Deep approach, we use different combinations of models like Convolution Neural Network, pretrained word2vec embeddings and LSTMs to get representations which are then used to train a Deep Neural Network. For Clarity prediction, we also use an Attentive Pooling approach for the pooling operation so as to be aware of the Title-Category pair. For the shallow approach, we use boosting technique LightGBM on features generated using title and categories. We find that an ensemble of these approaches does a better job than using them alone suggesting that the results of the deep and shallow approach are highly complementary |
Tasks | |
Published | 2018-04-01 |
URL | http://arxiv.org/abs/1804.01000v1 |
http://arxiv.org/pdf/1804.01000v1.pdf | |
PWC | https://paperswithcode.com/paper/cikm-analyticup-2017-lazada-product-title |
Repo | |
Framework | |
Webcam-based Eye Gaze Tracking under Natural Head Movement
Title | Webcam-based Eye Gaze Tracking under Natural Head Movement |
Authors | Kalin Stefanov |
Abstract | This manuscript investigates and proposes a visual gaze tracker that tackles the problem using only an ordinary web camera and no prior knowledge in any sense (scene set-up, camera intrinsic and/or extrinsic parameters). The tracker we propose is based on the observation that our desire to grant the freedom of natural head movement to the user requires 3D modeling of the scene set-up. Although, using a single low resolution web camera bounds us in dimensions (no depth can be recovered), we propose ways to cope with this drawback and model the scene in front of the user. We tackle this three-dimensional problem by realizing that it can be viewed as series of two-dimensional special cases. Then, we propose a procedure that treats each movement of the user’s head as a special two-dimensional case, hence reducing the complexity of the problem back to two dimensions. Furthermore, the proposed tracker is calibration free and discards this tedious part of all previously mentioned trackers. Experimental results show that the proposed tracker achieves good results, given the restrictions on it. We can report that the tracker commits a mean error of (56.95, 70.82) pixels in x and y direction, respectively, when the user’s head is as static as possible (no chin-rests are used). Furthermore, we can report that the proposed tracker commits a mean error of (87.18, 103.86) pixels in x and y direction, respectively, under natural head movement. |
Tasks | Calibration |
Published | 2018-03-29 |
URL | http://arxiv.org/abs/1803.11088v1 |
http://arxiv.org/pdf/1803.11088v1.pdf | |
PWC | https://paperswithcode.com/paper/webcam-based-eye-gaze-tracking-under-natural |
Repo | |
Framework | |
Scale equivariance in CNNs with vector fields
Title | Scale equivariance in CNNs with vector fields |
Authors | Diego Marcos, Benjamin Kellenberger, Sylvain Lobry, Devis Tuia |
Abstract | We study the effect of injecting local scale equivariance into Convolutional Neural Networks. This is done by applying each convolutional filter at multiple scales. The output is a vector field encoding for the maximally activating scale and the scale itself, which is further processed by the following convolutional layers. This allows all the intermediate representations to be locally scale equivariant. We show that this improves the performance of the model by over $20%$ in the scale equivariant task of regressing the scaling factor applied to randomly scaled MNIST digits. Furthermore, we find it also useful for scale invariant tasks, such as the actual classification of randomly scaled digits. This highlights the usefulness of allowing for a compact representation that can also learn relationships between different local scales by keeping internal scale equivariance. |
Tasks | |
Published | 2018-07-31 |
URL | http://arxiv.org/abs/1807.11783v1 |
http://arxiv.org/pdf/1807.11783v1.pdf | |
PWC | https://paperswithcode.com/paper/scale-equivariance-in-cnns-with-vector-fields |
Repo | |
Framework | |
Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning
Title | Multi-Evidence Filtering and Fusion for Multi-Label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning |
Authors | Weifeng Ge, Sibei Yang, Yizhou Yu |
Abstract | Supervised object detection and semantic segmentation require object or even pixel level annotations. When there exist image level labels only, it is challenging for weakly supervised algorithms to achieve accurate predictions. The accuracy achieved by top weakly supervised algorithms is still significantly lower than their fully supervised counterparts. In this paper, we propose a novel weakly supervised curriculum learning pipeline for multi-label object recognition, detection and semantic segmentation. In this pipeline, we first obtain intermediate object localization and pixel labeling results for the training images, and then use such results to train task-specific deep networks in a fully supervised manner. The entire process consists of four stages, including object localization in the training images, filtering and fusing object instances, pixel labeling for the training images, and task-specific network training. To obtain clean object instances in the training images, we propose a novel algorithm for filtering, fusing and classifying object instances collected from multiple solution mechanisms. In this algorithm, we incorporate both metric learning and density-based clustering to filter detected object instances. Experiments show that our weakly supervised pipeline achieves state-of-the-art results in multi-label image classification as well as weakly supervised object detection and very competitive results in weakly supervised semantic segmentation on MS-COCO, PASCAL VOC 2007 and PASCAL VOC 2012. |
Tasks | Image Classification, Metric Learning, Multi-Label Classification, Object Detection, Object Localization, Object Recognition, Semantic Segmentation, Weakly Supervised Object Detection, Weakly-Supervised Semantic Segmentation |
Published | 2018-02-26 |
URL | http://arxiv.org/abs/1802.09129v1 |
http://arxiv.org/pdf/1802.09129v1.pdf | |
PWC | https://paperswithcode.com/paper/multi-evidence-filtering-and-fusion-for-multi |
Repo | |
Framework | |
Fingerprint liveness detection using local quality features
Title | Fingerprint liveness detection using local quality features |
Authors | Ram Prakash Sharma, Somnath Dey |
Abstract | Fingerprint-based recognition has been widely deployed in various applications. However, current recognition systems are vulnerable to spoofing attacks which make use of an artificial replica of a fingerprint to deceive the sensors. In such scenarios, fingerprint liveness detection ensures the actual presence of a real legitimate fingerprint in contrast to a fake self-manufactured synthetic sample. In this paper, we propose a static software-based approach using quality features to detect the liveness in a fingerprint. We have extracted features from a single fingerprint image to overcome the issues faced in dynamic software-based approaches which require longer computational time and user cooperation. The proposed system extracts 8 sensor independent quality features on a local level containing minute details of the ridge-valley structure of real and fake fingerprints. These local quality features constitutes a 13-dimensional feature vector. The system is tested on a publically available dataset of LivDet 2009 competition. The experimental results exhibit supremacy of the proposed method over current state-of-the-art approaches providing least average classification error of 5.3% for LivDet 2009. Additionally, effectiveness of the best performing features over LivDet 2009 is evaluated on the latest LivDet 2015 dataset which contain fingerprints fabricated using unknown spoof materials. An average classification error rate of 4.22% is achieved in comparison with 4.49% obtained by the LivDet 2015 winner. Further, the proposed system utilizes a single fingerprint image, which results in faster implications and makes it more user-friendly. |
Tasks | |
Published | 2018-06-08 |
URL | http://arxiv.org/abs/1806.02974v1 |
http://arxiv.org/pdf/1806.02974v1.pdf | |
PWC | https://paperswithcode.com/paper/fingerprint-liveness-detection-using-local |
Repo | |
Framework | |
Abstractive Dialogue Summarization with Sentence-Gated Modeling Optimized by Dialogue Acts
Title | Abstractive Dialogue Summarization with Sentence-Gated Modeling Optimized by Dialogue Acts |
Authors | Chih-Wen Goo, Yun-Nung Chen |
Abstract | Neural abstractive summarization has been increasingly studied, where the prior work mainly focused on summarizing single-speaker documents (news, scientific publications, etc). In dialogues, there are different interactions between speakers, which are usually defined as dialogue acts. The interactive signals may provide informative cues for better summarizing dialogues. This paper proposes to explicitly leverage dialogue acts in a neural summarization model, where a sentence-gated mechanism is designed for modeling the relationship between dialogue acts and the summary. The experiments show that our proposed model significantly improves the abstractive summarization performance compared to the state-of-the-art baselines on AMI meeting corpus, demonstrating the usefulness of the interactive signal provided by dialogue acts. |
Tasks | Abstractive Text Summarization |
Published | 2018-09-15 |
URL | http://arxiv.org/abs/1809.05715v2 |
http://arxiv.org/pdf/1809.05715v2.pdf | |
PWC | https://paperswithcode.com/paper/abstractive-dialogue-summarization-with |
Repo | |
Framework | |
Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation Defenses
Title | Stochastic Substitute Training: A Gray-box Approach to Craft Adversarial Examples Against Gradient Obfuscation Defenses |
Authors | Mohammad Hashemi, Greg Cusack, Eric Keller |
Abstract | It has been shown that adversaries can craft example inputs to neural networks which are similar to legitimate inputs but have been created to purposely cause the neural network to misclassify the input. These adversarial examples are crafted, for example, by calculating gradients of a carefully defined loss function with respect to the input. As a countermeasure, some researchers have tried to design robust models by blocking or obfuscating gradients, even in white-box settings. Another line of research proposes introducing a separate detector to attempt to detect adversarial examples. This approach also makes use of gradient obfuscation techniques, for example, to prevent the adversary from trying to fool the detector. In this paper, we introduce stochastic substitute training, a gray-box approach that can craft adversarial examples for defenses which obfuscate gradients. For those defenses that have tried to make models more robust, with our technique, an adversary can craft adversarial examples with no knowledge of the defense. For defenses that attempt to detect the adversarial examples, with our technique, an adversary only needs very limited information about the defense to craft adversarial examples. We demonstrate our technique by applying it against two defenses which make models more robust and two defenses which detect adversarial examples. |
Tasks | |
Published | 2018-10-23 |
URL | http://arxiv.org/abs/1810.10031v1 |
http://arxiv.org/pdf/1810.10031v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-substitute-training-a-gray-box |
Repo | |
Framework | |
Energy-Efficient Inference Accelerator for Memory-Augmented Neural Networks on an FPGA
Title | Energy-Efficient Inference Accelerator for Memory-Augmented Neural Networks on an FPGA |
Authors | Seongsik Park, Jaehee Jang, Seijoon Kim, Sungroh Yoon |
Abstract | Memory-augmented neural networks (MANNs) are designed for question-answering tasks. It is difficult to run a MANN effectively on accelerators designed for other neural networks (NNs), in particular on mobile devices, because MANNs require recurrent data paths and various types of operations related to external memory access. We implement an accelerator for MANNs on a field-programmable gate array (FPGA) based on a data flow architecture. Inference times are also reduced by inference thresholding, which is a data-based maximum inner-product search specialized for natural language tasks. Measurements on the bAbI data show that the energy efficiency of the accelerator (FLOPS/kJ) was higher than that of an NVIDIA TITAN V GPU by a factor of about 125, increasing to 140 with inference thresholding |
Tasks | Question Answering |
Published | 2018-05-21 |
URL | http://arxiv.org/abs/1805.07978v2 |
http://arxiv.org/pdf/1805.07978v2.pdf | |
PWC | https://paperswithcode.com/paper/energy-efficient-inference-accelerator-for |
Repo | |
Framework | |
A Review on the Application of Natural Computing in Environmental Informatics
Title | A Review on the Application of Natural Computing in Environmental Informatics |
Authors | Andreas Kamilaris |
Abstract | Natural computing offers new opportunities to understand, model and analyze the complexity of the physical and human-created environment. This paper examines the application of natural computing in environmental informatics, by investigating related work in this research field. Various nature-inspired techniques are presented, which have been employed to solve different relevant problems. Advantages and disadvantages of these techniques are discussed, together with analysis of how natural computing is generally used in environmental research. |
Tasks | |
Published | 2018-08-01 |
URL | http://arxiv.org/abs/1808.00260v1 |
http://arxiv.org/pdf/1808.00260v1.pdf | |
PWC | https://paperswithcode.com/paper/a-review-on-the-application-of-natural |
Repo | |
Framework | |