January 24, 2020

2426 words 12 mins read

Paper Group NANR 121

Unseen Action Recognition with Unpaired Adversarial Multimodal Learning. Zeyad at SemEval-2019 Task 6: That’s Offensive! An All-Out Search For An Ensemble To Identify And Categorize Offense in Tweets.. Model Compression with Generative Adversarial Networks. Image Aesthetic Assessment Based on Pairwise Comparison A Unified Approach to Score Regressi …

Unseen Action Recognition with Unpaired Adversarial Multimodal Learning


Title	Unseen Action Recognition with Unpaired Adversarial Multimodal Learning
Authors	AJ Piergiovanni, Michael S. Ryoo
Abstract	In this paper, we present a method to learn a joint multimodal representation space that allows for the recognition of unseen activities in videos. We compare the effect of placing various constraints on the embedding space using paired text and video data. Additionally, we propose a method to improve the joint embedding space using an adversarial formulation with unpaired text and video data. In addition to testing on publicly available datasets, we introduce a new, large-scale text/video dataset. We experimentally confirm that learning such shared embedding space benefits three difficult tasks (i) zero-shot activity classification, (ii) unsupervised activity discovery, and (iii) unseen activity captioning.
Tasks	Temporal Action Localization
Published	2019-05-01
URL	https://openreview.net/forum?id=S14g5s09tm
PDF	https://openreview.net/pdf?id=S14g5s09tm
PWC	https://paperswithcode.com/paper/unseen-action-recognition-with-unpaired
Repo
Framework

Zeyad at SemEval-2019 Task 6: That’s Offensive! An All-Out Search For An Ensemble To Identify And Categorize Offense in Tweets.


Title	Zeyad at SemEval-2019 Task 6: That’s Offensive! An All-Out Search For An Ensemble To Identify And Categorize Offense in Tweets.
Authors	Zeyad El-Zanaty
Abstract	The objective of this paper is to provide a description for a classification system built for SemEval-2019 Task 6: OffensEval. This system classifies a tweet as either offensive or not offensive (Sub-task A) and further classifies offensive tweets into categories (Sub-tasks B - C). The system consists of two phases; a brute-force grid search to find the best learners amongst a given set and an ensemble of a subset of these best learners. The system achieved an F1-score of 0.728, ranking in subtask A, an F1-score score of 0.616 in subtask B and an F1-score of 0.509 in subtask C.
Tasks
Published	2019-06-01
URL	https://www.aclweb.org/anthology/S19-2144/
PDF	https://www.aclweb.org/anthology/S19-2144
PWC	https://paperswithcode.com/paper/zeyad-at-semeval-2019-task-6-thats-offensive
Repo
Framework

Model Compression with Generative Adversarial Networks


Title	Model Compression with Generative Adversarial Networks
Authors	Ruishan Liu, Nicolo Fusi, Lester Mackey
Abstract	More accurate machine learning models often demand more computation and memory at test time, making them difficult to deploy on CPU- or memory-constrained devices. Model compression (also known as distillation) alleviates this burden by training a less expensive student model to mimic the expensive teacher model while maintaining most of the original accuracy. However, when fresh data is unavailable for the compression task, the teacher’s training data is typically reused, leading to suboptimal compression. In this work, we propose to augment the compression dataset with synthetic data from a generative adversarial network (GAN) designed to approximate the training data distribution. Our GAN-assisted model compression (GAN-MC) significantly improves student accuracy for expensive models such as deep neural networks and large random forests on both image and tabular datasets. Building on these results, we propose a comprehensive metric—the Compression Score—to evaluate the quality of synthetic datasets based on their induced model compression performance. The Compression Score captures both data diversity and discriminability, and we illustrate its benefits over the popular Inception Score in the context of image classification.
Tasks	Image Classification, Model Compression
Published	2019-05-01
URL	https://openreview.net/forum?id=Byxz4n09tQ
PDF	https://openreview.net/pdf?id=Byxz4n09tQ
PWC	https://paperswithcode.com/paper/model-compression-with-generative-adversarial-1
Repo
Framework

Image Aesthetic Assessment Based on Pairwise Comparison A Unified Approach to Score Regression, Binary Classification, and Personalization


Title	Image Aesthetic Assessment Based on Pairwise Comparison A Unified Approach to Score Regression, Binary Classification, and Personalization
Authors	Jun-Tae Lee, Chang-Su Kim
Abstract	We propose a unified approach to three tasks of aesthetic score regression, binary aesthetic classification, and personalized aesthetics. First, we develop a comparator to estimate the ratio of aesthetic scores for two images. Then, we construct a pairwise comparison matrix for multiple reference images and an input image, and predict the aesthetic score of the input via the eigenvalue decomposition of the matrix. By varying the reference images, the proposed algorithm can be used for binary aesthetic classification and personalized aesthetics, as well as generic score regression. Experimental results demonstrate that the proposed unified algorithm provides the state-of-the-art performances in all three tasks of image aesthetics.
Tasks
Published	2019-10-01
URL	http://openaccess.thecvf.com/content_ICCV_2019/html/Lee_Image_Aesthetic_Assessment_Based_on_Pairwise_Comparison__A_Unified_ICCV_2019_paper.html
PDF	http://openaccess.thecvf.com/content_ICCV_2019/papers/Lee_Image_Aesthetic_Assessment_Based_on_Pairwise_Comparison__A_Unified_ICCV_2019_paper.pdf
PWC	https://paperswithcode.com/paper/image-aesthetic-assessment-based-on-pairwise
Repo
Framework

I3D-LSTM: A New Model for Human Action Recognition


Title	I3D-LSTM: A New Model for Human Action Recognition
Authors	Xianyuan Wang, Zhenjiang Miao, Ruyi Zhang, Shanshan Hao
Abstract	Action recognition has already been a heated research topic recently, which attempts to classify different human actions in videos. The current main-stream methods generally utilize ImageNet-pretrained model as features extractor, however it’s not the optimal choice to pretrain a model for classifying videos on a huge still image dataset. What’s more, very few works notice that 3D convolution neural network(3D CNN) is better for low-level spatial-temporal features extraction while recurrent neural network(RNN) is better for modelling high-level temporal feature sequences. Consequently, a novel model is proposed in our work to address the two problems mentioned above. First, we pretrain 3D CNN model on huge video action recognition dataset Kinetics to improve generality of the model. And then long short term memory(LSTM) is introduced to model the high-level temporal features produced by the Kinetics-pretrained 3D CNN model. Our experiments results show that the Kinetics-pretrained model can generally outperform ImageNet-pretrained model. And our proposed network finally achieve leading performance on UCF-101 dataset.
Tasks	Action Recognition In Videos, Temporal Action Localization
Published	2019-08-09
URL	https://doi.org/10.1088/1757-899X/569/3/032035
PDF	https://iopscience.iop.org/article/10.1088/1757-899X/569/3/032035/pdf
PWC	https://paperswithcode.com/paper/i3d-lstm-a-new-model-for-human-action
Repo
Framework

Multilingual Complex Word Identification: Convolutional Neural Networks with Morphological and Linguistic Features


Title	Multilingual Complex Word Identification: Convolutional Neural Networks with Morphological and Linguistic Features
Authors	Kim Cheng SHEANG
Abstract	The paper is about our experiments with Complex Word Identification system using deep learning approach with word embeddings and engineered features.
Tasks	Complex Word Identification, Word Embeddings
Published	2019-09-01
URL	https://www.aclweb.org/anthology/R19-2013/
PDF	https://www.aclweb.org/anthology/R19-2013
PWC	https://paperswithcode.com/paper/multilingual-complex-word-identification
Repo
Framework

Table Structure Recognition Based on Cell Relationship, a Bottom-Up Approach


Title	Table Structure Recognition Based on Cell Relationship, a Bottom-Up Approach
Authors	Darshan Adiga, Shabir Ahmad Bhat, Muzaffar Bashir Shah, Viveka Vyeth
Abstract	In this paper, we present a relationship extraction based methodology for table structure recognition in PDF documents. The proposed deep learning-based method takes a bottom-up approach to table recognition in PDF documents. We outline the shortcomings of conventional approaches based on heuristics and machine learning-based top-down approaches. In this work, we explain how the task of table structure recognition can be modeled as a cell relationship extraction task and the importance of the bottom-up approach in recognizing the table cells. We use Multilayer Feedforward Neural Network for table structure recognition and compare the results of three feature sets. To gauge the performance of the proposed method, we prepared a training dataset using 250 tables in PDF documents, carefully selecting the table structures that are most commonly found in the documents. Our model achieves an overall accuracy of 97.95{%} and an F1-Score of 92.62{%} on the test dataset.
Tasks
Published	2019-09-01
URL	https://www.aclweb.org/anthology/R19-1001/
PDF	https://www.aclweb.org/anthology/R19-1001
PWC	https://paperswithcode.com/paper/table-structure-recognition-based-on-cell
Repo
Framework

UCSMNLP: Statistical Machine Translation for WAT 2019


Title	UCSMNLP: Statistical Machine Translation for WAT 2019
Authors	Aye Thida, Nway Nway Han, Sheinn Thawtar Oo, Khin Thet Htar
Abstract	This paper represents UCSMNLP{'}s submission to the WAT 2019 Translation Tasks focusing on the Myanmar-English translation. Phrase based statistical machine translation (PBSMT) system is built by using other resources: Name Entity Recognition (NER) corpus and bilingual dictionary which is created by Google Translate (GT). This system is also adopted with listwise reranking process in order to improve the quality of translation and tuning is done by changing initial distortion weight. The experimental results show that PBSMT using other resources with initial distortion weight (0.4) and listwise reranking function outperforms the baseline system.
Tasks	Machine Translation
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5210/
PDF	https://www.aclweb.org/anthology/D19-5210
PWC	https://paperswithcode.com/paper/ucsmnlp-statistical-machine-translation-for
Repo
Framework

Information Regularized Neural Networks


Title	Information Regularized Neural Networks
Authors	Tianchen Zhao, Dejiao Zhang, Zeyu Sun, Honglak Lee
Abstract	We formulate an information-based optimization problem for supervised classification. For invertible neural networks, the control of these information terms is passed down to the latent features and parameter matrix in the last fully connected layer, given that mutual information is invariant under invertible map. We propose an objective function and prove that it solves the optimization problem. Our framework allows us to learn latent features in an more interpretable form while improving the classification performance. We perform extensive quantitative and qualitative experiments in comparison with the existing state-of-the-art classification models.
Tasks
Published	2019-05-01
URL	https://openreview.net/forum?id=BJgvg30ctX
PDF	https://openreview.net/pdf?id=BJgvg30ctX
PWC	https://paperswithcode.com/paper/information-regularized-neural-networks
Repo
Framework

Japanese-Russian TMU Neural Machine Translation System using Multilingual Model for WAT 2019


Title	Japanese-Russian TMU Neural Machine Translation System using Multilingual Model for WAT 2019
Authors	Aizhan Imankulova, Masahiro Kaneko, Mamoru Komachi
Abstract	We introduce our system that is submitted to the News Commentary task (Japanese{\textless}-{\textgreater}Russian) of the 6th Workshop on Asian Translation. The goal of this shared task is to study extremely low resource situations for distant language pairs. It is known that using parallel corpora of different language pair as training data is effective for multilingual neural machine translation model in extremely low resource scenarios. Therefore, to improve the translation quality of Japanese{\textless}-{\textgreater}Russian language pair, our method leverages other in-domain Japanese-English and English-Russian parallel corpora as additional training data for our multilingual NMT model.
Tasks	Machine Translation
Published	2019-11-01
URL	https://www.aclweb.org/anthology/D19-5221/
PDF	https://www.aclweb.org/anthology/D19-5221
PWC	https://paperswithcode.com/paper/japanese-russian-tmu-neural-machine
Repo
Framework

Neural TTS Stylization with Adversarial and Collaborative Games


Title	Neural TTS Stylization with Adversarial and Collaborative Games
Authors	Shuang Ma, Daniel Mcduff, Yale Song
Abstract	The modeling of style when synthesizing natural human speech from text has been the focus of significant attention. Some state-of-the-art approaches train an encoder-decoder network on paired text and audio samples (x_txt, x_aud) by encouraging its output to reconstruct x_aud. The synthesized audio waveform is expected to contain the verbal content of x_txt and the auditory style of x_aud. Unfortunately, modeling style in TTS is somewhat under-determined and training models with a reconstruction loss alone is insufficient to disentangle content and style from other factors of variation. In this work, we introduce an end-to-end TTS model that offers enhanced content-style disentanglement ability and controllability. We achieve this by combining a pairwise training procedure, an adversarial game, and a collaborative game into one training scheme. The adversarial game concentrates the true data distribution, and the collaborative game minimizes the distance between real samples and generated samples in both the original space and the latent space. As a result, the proposed model delivers a highly controllable generator, and a disentangled representation. Benefiting from the separate modeling of style and content, our model can generate human fidelity speech that satisfies the desired style conditions. Our model achieves start-of-the-art results across multiple tasks, including style transfer (content and style swapping), emotion modeling, and identity transfer (fitting a new speaker’s voice).
Tasks	Style Transfer
Published	2019-05-01
URL	https://openreview.net/forum?id=ByzcS3AcYX
PDF	https://openreview.net/pdf?id=ByzcS3AcYX
PWC	https://paperswithcode.com/paper/neural-tts-stylization-with-adversarial-and
Repo
Framework

SHAMANN: Shared Memory Augmented Neural Networks


Title	SHAMANN: Shared Memory Augmented Neural Networks
Authors	Cosmin I. Bercea, Olivier Pauly, Andreas K. Maier, Florin C. Ghesu
Abstract	Current state-of-the-art methods for semantic segmentation use deep neural networks to learn the segmentation mask from the input image signal as an image-to-image mapping. While these methods effectively exploit global image context, the learning and computational complexities are high. We propose shared memory augmented neural network actors as a dynamically scalable alternative. Based on a decomposition of the image into a sequence of local patches, we train such actors to sequentially segment each patch. To further increase the robustness and better capture shape priors, an external memory module is shared between different actors, providing an implicit mechanism for image information exchange. Finally, the patch-wise predictions are aggregated to a complete segmentation mask. We demonstrate the benefits of the new paradigm on a challenging lung segmentation problem based on chest X-Ray images, as well as on two synthetic tasks based on the MNIST dataset. On the X-Ray data, our method achieves state-of-the-art accuracy with a significantly reduced model size compared to reference methods. In addition, we reduce the number of failure cases by at least half.
Tasks	Semantic Segmentation
Published	2019-01-01
URL	https://openreview.net/forum?id=BJeWOi09FQ
PDF	https://openreview.net/pdf?id=BJeWOi09FQ
PWC	https://paperswithcode.com/paper/shamann-shared-memory-augmented-neural
Repo
Framework

Unsupervised Learning of Dense Shape Correspondence


Title	Unsupervised Learning of Dense Shape Correspondence
Authors	Oshri Halimi, Or Litany, Emanuele Rodola, Alex M. Bronstein, Ron Kimmel
Abstract	We introduce the first completely unsupervised correspondence learning approach for deformable 3D shapes. Key to our model is the understanding that natural deformations (such as changes in pose) approximately preserve the metric structure of the surface, yielding a natural criterion to drive the learning process toward distortion-minimizing predictions. On this basis, we overcome the need for annotated data and replace it by a purely geometric criterion. The resulting learning model is class-agnostic, and is able to leverage any type of deformable geometric data for the training phase. In contrast to existing supervised approaches which specialize on the class seen at training time, we demonstrate stronger generalization as well as applicability to a variety of challenging settings. We showcase our method on a wide selection of correspondence benchmarks, where we outperform other methods in terms of accuracy, generalization, and efficiency.
Tasks
Published	2019-06-01
URL	http://openaccess.thecvf.com/content_CVPR_2019/html/Halimi_Unsupervised_Learning_of_Dense_Shape_Correspondence_CVPR_2019_paper.html
PDF	http://openaccess.thecvf.com/content_CVPR_2019/papers/Halimi_Unsupervised_Learning_of_Dense_Shape_Correspondence_CVPR_2019_paper.pdf
PWC	https://paperswithcode.com/paper/unsupervised-learning-of-dense-shape
Repo
Framework

ARHNet - Leveraging Community Interaction for Detection of Religious Hate Speech in Arabic


Title	ARHNet - Leveraging Community Interaction for Detection of Religious Hate Speech in Arabic
Authors	Arijit Ghosh Chowdhury, Aniket Didolkar, Ramit Sawhney, Rajiv Ratn Shah
Abstract	The rapid widespread of social media has lead to some undesirable consequences like the rapid increase of hateful content and offensive language. Religious Hate Speech, in particular, often leads to unrest and sometimes aggravates to violence against people on the basis of their religious affiliations. The richness of the Arabic morphology and the limited available resources makes this task especially challenging. The current state-of-the-art approaches to detect hate speech in Arabic rely entirely on textual (lexical and semantic) cues. Our proposed methodology contends that leveraging Community-Interaction can better help us profile hate speech content on social media. Our proposed ARHNet (Arabic Religious Hate Speech Net) model incorporates both Arabic Word Embeddings and Social Network Graphs for the detection of religious hate speech.
Tasks	Word Embeddings
Published	2019-07-01
URL	https://www.aclweb.org/anthology/P19-2038/
PDF	https://www.aclweb.org/anthology/P19-2038
PWC	https://paperswithcode.com/paper/arhnet-leveraging-community-interaction-for
Repo
Framework

Improving American Sign Language Recognition with Synthetic Data


Title	Improving American Sign Language Recognition with Synthetic Data
Authors	Jungi Kim, Patricia O{'}Neill-Brown
Abstract
Tasks	Sign Language Recognition
Published	2019-08-01
URL	https://www.aclweb.org/anthology/W19-6615/
PDF	https://www.aclweb.org/anthology/W19-6615
PWC	https://paperswithcode.com/paper/improving-american-sign-language-recognition
Repo
Framework