January 24, 2020

2426 words 12 mins read

Paper Group NANR 121

Paper Group NANR 121

Unseen Action Recognition with Unpaired Adversarial Multimodal Learning. Zeyad at SemEval-2019 Task 6: That’s Offensive! An All-Out Search For An Ensemble To Identify And Categorize Offense in Tweets.. Model Compression with Generative Adversarial Networks. Image Aesthetic Assessment Based on Pairwise Comparison A Unified Approach to Score Regressi …

Unseen Action Recognition with Unpaired Adversarial Multimodal Learning

Title Unseen Action Recognition with Unpaired Adversarial Multimodal Learning
Authors AJ Piergiovanni, Michael S. Ryoo
Abstract In this paper, we present a method to learn a joint multimodal representation space that allows for the recognition of unseen activities in videos. We compare the effect of placing various constraints on the embedding space using paired text and video data. Additionally, we propose a method to improve the joint embedding space using an adversarial formulation with unpaired text and video data. In addition to testing on publicly available datasets, we introduce a new, large-scale text/video dataset. We experimentally confirm that learning such shared embedding space benefits three difficult tasks (i) zero-shot activity classification, (ii) unsupervised activity discovery, and (iii) unseen activity captioning.
Tasks Temporal Action Localization
Published 2019-05-01
URL https://openreview.net/forum?id=S14g5s09tm
PDF https://openreview.net/pdf?id=S14g5s09tm
PWC https://paperswithcode.com/paper/unseen-action-recognition-with-unpaired
Repo
Framework

Zeyad at SemEval-2019 Task 6: That’s Offensive! An All-Out Search For An Ensemble To Identify And Categorize Offense in Tweets.

Title Zeyad at SemEval-2019 Task 6: That’s Offensive! An All-Out Search For An Ensemble To Identify And Categorize Offense in Tweets.
Authors Zeyad El-Zanaty
Abstract The objective of this paper is to provide a description for a classification system built for SemEval-2019 Task 6: OffensEval. This system classifies a tweet as either offensive or not offensive (Sub-task A) and further classifies offensive tweets into categories (Sub-tasks B - C). The system consists of two phases; a brute-force grid search to find the best learners amongst a given set and an ensemble of a subset of these best learners. The system achieved an F1-score of 0.728, ranking in subtask A, an F1-score score of 0.616 in subtask B and an F1-score of 0.509 in subtask C.
Tasks
Published 2019-06-01
URL https://www.aclweb.org/anthology/S19-2144/
PDF https://www.aclweb.org/anthology/S19-2144
PWC https://paperswithcode.com/paper/zeyad-at-semeval-2019-task-6-thats-offensive
Repo
Framework

Model Compression with Generative Adversarial Networks

Title Model Compression with Generative Adversarial Networks
Authors Ruishan Liu, Nicolo Fusi, Lester Mackey
Abstract More accurate machine learning models often demand more computation and memory at test time, making them difficult to deploy on CPU- or memory-constrained devices. Model compression (also known as distillation) alleviates this burden by training a less expensive student model to mimic the expensive teacher model while maintaining most of the original accuracy. However, when fresh data is unavailable for the compression task, the teacher’s training data is typically reused, leading to suboptimal compression. In this work, we propose to augment the compression dataset with synthetic data from a generative adversarial network (GAN) designed to approximate the training data distribution. Our GAN-assisted model compression (GAN-MC) significantly improves student accuracy for expensive models such as deep neural networks and large random forests on both image and tabular datasets. Building on these results, we propose a comprehensive metric—the Compression Score—to evaluate the quality of synthetic datasets based on their induced model compression performance. The Compression Score captures both data diversity and discriminability, and we illustrate its benefits over the popular Inception Score in the context of image classification.
Tasks Image Classification, Model Compression
Published 2019-05-01
URL https://openreview.net/forum?id=Byxz4n09tQ
PDF https://openreview.net/pdf?id=Byxz4n09tQ
PWC https://paperswithcode.com/paper/model-compression-with-generative-adversarial-1
Repo
Framework

Image Aesthetic Assessment Based on Pairwise Comparison A Unified Approach to Score Regression, Binary Classification, and Personalization

Title Image Aesthetic Assessment Based on Pairwise Comparison A Unified Approach to Score Regression, Binary Classification, and Personalization
Authors Jun-Tae Lee, Chang-Su Kim
Abstract We propose a unified approach to three tasks of aesthetic score regression, binary aesthetic classification, and personalized aesthetics. First, we develop a comparator to estimate the ratio of aesthetic scores for two images. Then, we construct a pairwise comparison matrix for multiple reference images and an input image, and predict the aesthetic score of the input via the eigenvalue decomposition of the matrix. By varying the reference images, the proposed algorithm can be used for binary aesthetic classification and personalized aesthetics, as well as generic score regression. Experimental results demonstrate that the proposed unified algorithm provides the state-of-the-art performances in all three tasks of image aesthetics.
Tasks
Published 2019-10-01
URL http://openaccess.thecvf.com/content_ICCV_2019/html/Lee_Image_Aesthetic_Assessment_Based_on_Pairwise_Comparison__A_Unified_ICCV_2019_paper.html
PDF http://openaccess.thecvf.com/content_ICCV_2019/papers/Lee_Image_Aesthetic_Assessment_Based_on_Pairwise_Comparison__A_Unified_ICCV_2019_paper.pdf
PWC https://paperswithcode.com/paper/image-aesthetic-assessment-based-on-pairwise
Repo
Framework

I3D-LSTM: A New Model for Human Action Recognition

Title I3D-LSTM: A New Model for Human Action Recognition
Authors Xianyuan Wang, Zhenjiang Miao, Ruyi Zhang, Shanshan Hao
Abstract Action recognition has already been a heated research topic recently, which attempts to classify different human actions in videos. The current main-stream methods generally utilize ImageNet-pretrained model as features extractor, however it’s not the optimal choice to pretrain a model for classifying videos on a huge still image dataset. What’s more, very few works notice that 3D convolution neural network(3D CNN) is better for low-level spatial-temporal features extraction while recurrent neural network(RNN) is better for modelling high-level temporal feature sequences. Consequently, a novel model is proposed in our work to address the two problems mentioned above. First, we pretrain 3D CNN model on huge video action recognition dataset Kinetics to improve generality of the model. And then long short term memory(LSTM) is introduced to model the high-level temporal features produced by the Kinetics-pretrained 3D CNN model. Our experiments results show that the Kinetics-pretrained model can generally outperform ImageNet-pretrained model. And our proposed network finally achieve leading performance on UCF-101 dataset.
Tasks Action Recognition In Videos, Temporal Action Localization
Published 2019-08-09
URL https://doi.org/10.1088/1757-899X/569/3/032035
PDF https://iopscience.iop.org/article/10.1088/1757-899X/569/3/032035/pdf
PWC https://paperswithcode.com/paper/i3d-lstm-a-new-model-for-human-action
Repo
Framework

Multilingual Complex Word Identification: Convolutional Neural Networks with Morphological and Linguistic Features

Title Multilingual Complex Word Identification: Convolutional Neural Networks with Morphological and Linguistic Features
Authors Kim Cheng SHEANG
Abstract The paper is about our experiments with Complex Word Identification system using deep learning approach with word embeddings and engineered features.
Tasks Complex Word Identification, Word Embeddings
Published 2019-09-01
URL https://www.aclweb.org/anthology/R19-2013/
PDF https://www.aclweb.org/anthology/R19-2013
PWC https://paperswithcode.com/paper/multilingual-complex-word-identification
Repo
Framework

Table Structure Recognition Based on Cell Relationship, a Bottom-Up Approach

Title Table Structure Recognition Based on Cell Relationship, a Bottom-Up Approach
Authors Darshan Adiga, Shabir Ahmad Bhat, Muzaffar Bashir Shah, Viveka Vyeth
Abstract In this paper, we present a relationship extraction based methodology for table structure recognition in PDF documents. The proposed deep learning-based method takes a bottom-up approach to table recognition in PDF documents. We outline the shortcomings of conventional approaches based on heuristics and machine learning-based top-down approaches. In this work, we explain how the task of table structure recognition can be modeled as a cell relationship extraction task and the importance of the bottom-up approach in recognizing the table cells. We use Multilayer Feedforward Neural Network for table structure recognition and compare the results of three feature sets. To gauge the performance of the proposed method, we prepared a training dataset using 250 tables in PDF documents, carefully selecting the table structures that are most commonly found in the documents. Our model achieves an overall accuracy of 97.95{%} and an F1-Score of 92.62{%} on the test dataset.
Tasks
Published 2019-09-01
URL https://www.aclweb.org/anthology/R19-1001/
PDF https://www.aclweb.org/anthology/R19-1001
PWC https://paperswithcode.com/paper/table-structure-recognition-based-on-cell
Repo
Framework

UCSMNLP: Statistical Machine Translation for WAT 2019

Title UCSMNLP: Statistical Machine Translation for WAT 2019
Authors Aye Thida, Nway Nway Han, Sheinn Thawtar Oo, Khin Thet Htar
Abstract This paper represents UCSMNLP{'}s submission to the WAT 2019 Translation Tasks focusing on the Myanmar-English translation. Phrase based statistical machine translation (PBSMT) system is built by using other resources: Name Entity Recognition (NER) corpus and bilingual dictionary which is created by Google Translate (GT). This system is also adopted with listwise reranking process in order to improve the quality of translation and tuning is done by changing initial distortion weight. The experimental results show that PBSMT using other resources with initial distortion weight (0.4) and listwise reranking function outperforms the baseline system.
Tasks Machine Translation
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5210/
PDF https://www.aclweb.org/anthology/D19-5210
PWC https://paperswithcode.com/paper/ucsmnlp-statistical-machine-translation-for
Repo
Framework

Information Regularized Neural Networks

Title Information Regularized Neural Networks
Authors Tianchen Zhao, Dejiao Zhang, Zeyu Sun, Honglak Lee
Abstract We formulate an information-based optimization problem for supervised classification. For invertible neural networks, the control of these information terms is passed down to the latent features and parameter matrix in the last fully connected layer, given that mutual information is invariant under invertible map. We propose an objective function and prove that it solves the optimization problem. Our framework allows us to learn latent features in an more interpretable form while improving the classification performance. We perform extensive quantitative and qualitative experiments in comparison with the existing state-of-the-art classification models.
Tasks
Published 2019-05-01
URL https://openreview.net/forum?id=BJgvg30ctX
PDF https://openreview.net/pdf?id=BJgvg30ctX
PWC https://paperswithcode.com/paper/information-regularized-neural-networks
Repo
Framework

Japanese-Russian TMU Neural Machine Translation System using Multilingual Model for WAT 2019

Title Japanese-Russian TMU Neural Machine Translation System using Multilingual Model for WAT 2019
Authors Aizhan Imankulova, Masahiro Kaneko, Mamoru Komachi
Abstract We introduce our system that is submitted to the News Commentary task (Japanese{\textless}-{\textgreater}Russian) of the 6th Workshop on Asian Translation. The goal of this shared task is to study extremely low resource situations for distant language pairs. It is known that using parallel corpora of different language pair as training data is effective for multilingual neural machine translation model in extremely low resource scenarios. Therefore, to improve the translation quality of Japanese{\textless}-{\textgreater}Russian language pair, our method leverages other in-domain Japanese-English and English-Russian parallel corpora as additional training data for our multilingual NMT model.
Tasks Machine Translation
Published 2019-11-01
URL https://www.aclweb.org/anthology/D19-5221/
PDF https://www.aclweb.org/anthology/D19-5221
PWC https://paperswithcode.com/paper/japanese-russian-tmu-neural-machine
Repo
Framework

Neural TTS Stylization with Adversarial and Collaborative Games

Title Neural TTS Stylization with Adversarial and Collaborative Games
Authors Shuang Ma, Daniel Mcduff, Yale Song
Abstract The modeling of style when synthesizing natural human speech from text has been the focus of significant attention. Some state-of-the-art approaches train an encoder-decoder network on paired text and audio samples (x_txt, x_aud) by encouraging its output to reconstruct x_aud. The synthesized audio waveform is expected to contain the verbal content of x_txt and the auditory style of x_aud. Unfortunately, modeling style in TTS is somewhat under-determined and training models with a reconstruction loss alone is insufficient to disentangle content and style from other factors of variation. In this work, we introduce an end-to-end TTS model that offers enhanced content-style disentanglement ability and controllability. We achieve this by combining a pairwise training procedure, an adversarial game, and a collaborative game into one training scheme. The adversarial game concentrates the true data distribution, and the collaborative game minimizes the distance between real samples and generated samples in both the original space and the latent space. As a result, the proposed model delivers a highly controllable generator, and a disentangled representation. Benefiting from the separate modeling of style and content, our model can generate human fidelity speech that satisfies the desired style conditions. Our model achieves start-of-the-art results across multiple tasks, including style transfer (content and style swapping), emotion modeling, and identity transfer (fitting a new speaker’s voice).
Tasks Style Transfer
Published 2019-05-01
URL https://openreview.net/forum?id=ByzcS3AcYX
PDF https://openreview.net/pdf?id=ByzcS3AcYX
PWC https://paperswithcode.com/paper/neural-tts-stylization-with-adversarial-and
Repo
Framework

SHAMANN: Shared Memory Augmented Neural Networks

Title SHAMANN: Shared Memory Augmented Neural Networks
Authors Cosmin I. Bercea, Olivier Pauly, Andreas K. Maier, Florin C. Ghesu
Abstract Current state-of-the-art methods for semantic segmentation use deep neural networks to learn the segmentation mask from the input image signal as an image-to-image mapping. While these methods effectively exploit global image context, the learning and computational complexities are high. We propose shared memory augmented neural network actors as a dynamically scalable alternative. Based on a decomposition of the image into a sequence of local patches, we train such actors to sequentially segment each patch. To further increase the robustness and better capture shape priors, an external memory module is shared between different actors, providing an implicit mechanism for image information exchange. Finally, the patch-wise predictions are aggregated to a complete segmentation mask. We demonstrate the benefits of the new paradigm on a challenging lung segmentation problem based on chest X-Ray images, as well as on two synthetic tasks based on the MNIST dataset. On the X-Ray data, our method achieves state-of-the-art accuracy with a significantly reduced model size compared to reference methods. In addition, we reduce the number of failure cases by at least half.
Tasks Semantic Segmentation
Published 2019-01-01
URL https://openreview.net/forum?id=BJeWOi09FQ
PDF https://openreview.net/pdf?id=BJeWOi09FQ
PWC https://paperswithcode.com/paper/shamann-shared-memory-augmented-neural
Repo
Framework

Unsupervised Learning of Dense Shape Correspondence

Title Unsupervised Learning of Dense Shape Correspondence
Authors Oshri Halimi, Or Litany, Emanuele Rodola, Alex M. Bronstein, Ron Kimmel
Abstract We introduce the first completely unsupervised correspondence learning approach for deformable 3D shapes. Key to our model is the understanding that natural deformations (such as changes in pose) approximately preserve the metric structure of the surface, yielding a natural criterion to drive the learning process toward distortion-minimizing predictions. On this basis, we overcome the need for annotated data and replace it by a purely geometric criterion. The resulting learning model is class-agnostic, and is able to leverage any type of deformable geometric data for the training phase. In contrast to existing supervised approaches which specialize on the class seen at training time, we demonstrate stronger generalization as well as applicability to a variety of challenging settings. We showcase our method on a wide selection of correspondence benchmarks, where we outperform other methods in terms of accuracy, generalization, and efficiency.
Tasks
Published 2019-06-01
URL http://openaccess.thecvf.com/content_CVPR_2019/html/Halimi_Unsupervised_Learning_of_Dense_Shape_Correspondence_CVPR_2019_paper.html
PDF http://openaccess.thecvf.com/content_CVPR_2019/papers/Halimi_Unsupervised_Learning_of_Dense_Shape_Correspondence_CVPR_2019_paper.pdf
PWC https://paperswithcode.com/paper/unsupervised-learning-of-dense-shape
Repo
Framework

ARHNet - Leveraging Community Interaction for Detection of Religious Hate Speech in Arabic

Title ARHNet - Leveraging Community Interaction for Detection of Religious Hate Speech in Arabic
Authors Arijit Ghosh Chowdhury, Aniket Didolkar, Ramit Sawhney, Rajiv Ratn Shah
Abstract The rapid widespread of social media has lead to some undesirable consequences like the rapid increase of hateful content and offensive language. Religious Hate Speech, in particular, often leads to unrest and sometimes aggravates to violence against people on the basis of their religious affiliations. The richness of the Arabic morphology and the limited available resources makes this task especially challenging. The current state-of-the-art approaches to detect hate speech in Arabic rely entirely on textual (lexical and semantic) cues. Our proposed methodology contends that leveraging Community-Interaction can better help us profile hate speech content on social media. Our proposed ARHNet (Arabic Religious Hate Speech Net) model incorporates both Arabic Word Embeddings and Social Network Graphs for the detection of religious hate speech.
Tasks Word Embeddings
Published 2019-07-01
URL https://www.aclweb.org/anthology/P19-2038/
PDF https://www.aclweb.org/anthology/P19-2038
PWC https://paperswithcode.com/paper/arhnet-leveraging-community-interaction-for
Repo
Framework

Improving American Sign Language Recognition with Synthetic Data

Title Improving American Sign Language Recognition with Synthetic Data
Authors Jungi Kim, Patricia O{'}Neill-Brown
Abstract
Tasks Sign Language Recognition
Published 2019-08-01
URL https://www.aclweb.org/anthology/W19-6615/
PDF https://www.aclweb.org/anthology/W19-6615
PWC https://paperswithcode.com/paper/improving-american-sign-language-recognition
Repo
Framework
comments powered by Disqus