Paper Group NANR 121
Unseen Action Recognition with Unpaired Adversarial Multimodal Learning. Zeyad at SemEval-2019 Task 6: That’s Offensive! An All-Out Search For An Ensemble To Identify And Categorize Offense in Tweets.. Model Compression with Generative Adversarial Networks. Image Aesthetic Assessment Based on Pairwise Comparison A Unified Approach to Score Regressi …
Unseen Action Recognition with Unpaired Adversarial Multimodal Learning
Title | Unseen Action Recognition with Unpaired Adversarial Multimodal Learning |
Authors | AJ Piergiovanni, Michael S. Ryoo |
Abstract | In this paper, we present a method to learn a joint multimodal representation space that allows for the recognition of unseen activities in videos. We compare the effect of placing various constraints on the embedding space using paired text and video data. Additionally, we propose a method to improve the joint embedding space using an adversarial formulation with unpaired text and video data. In addition to testing on publicly available datasets, we introduce a new, large-scale text/video dataset. We experimentally confirm that learning such shared embedding space benefits three difficult tasks (i) zero-shot activity classification, (ii) unsupervised activity discovery, and (iii) unseen activity captioning. |
Tasks | Temporal Action Localization |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=S14g5s09tm |
https://openreview.net/pdf?id=S14g5s09tm | |
PWC | https://paperswithcode.com/paper/unseen-action-recognition-with-unpaired |
Repo | |
Framework | |
Zeyad at SemEval-2019 Task 6: That’s Offensive! An All-Out Search For An Ensemble To Identify And Categorize Offense in Tweets.
Title | Zeyad at SemEval-2019 Task 6: That’s Offensive! An All-Out Search For An Ensemble To Identify And Categorize Offense in Tweets. |
Authors | Zeyad El-Zanaty |
Abstract | The objective of this paper is to provide a description for a classification system built for SemEval-2019 Task 6: OffensEval. This system classifies a tweet as either offensive or not offensive (Sub-task A) and further classifies offensive tweets into categories (Sub-tasks B - C). The system consists of two phases; a brute-force grid search to find the best learners amongst a given set and an ensemble of a subset of these best learners. The system achieved an F1-score of 0.728, ranking in subtask A, an F1-score score of 0.616 in subtask B and an F1-score of 0.509 in subtask C. |
Tasks | |
Published | 2019-06-01 |
URL | https://www.aclweb.org/anthology/S19-2144/ |
https://www.aclweb.org/anthology/S19-2144 | |
PWC | https://paperswithcode.com/paper/zeyad-at-semeval-2019-task-6-thats-offensive |
Repo | |
Framework | |
Model Compression with Generative Adversarial Networks
Title | Model Compression with Generative Adversarial Networks |
Authors | Ruishan Liu, Nicolo Fusi, Lester Mackey |
Abstract | More accurate machine learning models often demand more computation and memory at test time, making them difficult to deploy on CPU- or memory-constrained devices. Model compression (also known as distillation) alleviates this burden by training a less expensive student model to mimic the expensive teacher model while maintaining most of the original accuracy. However, when fresh data is unavailable for the compression task, the teacher’s training data is typically reused, leading to suboptimal compression. In this work, we propose to augment the compression dataset with synthetic data from a generative adversarial network (GAN) designed to approximate the training data distribution. Our GAN-assisted model compression (GAN-MC) significantly improves student accuracy for expensive models such as deep neural networks and large random forests on both image and tabular datasets. Building on these results, we propose a comprehensive metric—the Compression Score—to evaluate the quality of synthetic datasets based on their induced model compression performance. The Compression Score captures both data diversity and discriminability, and we illustrate its benefits over the popular Inception Score in the context of image classification. |
Tasks | Image Classification, Model Compression |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=Byxz4n09tQ |
https://openreview.net/pdf?id=Byxz4n09tQ | |
PWC | https://paperswithcode.com/paper/model-compression-with-generative-adversarial-1 |
Repo | |
Framework | |
Image Aesthetic Assessment Based on Pairwise Comparison A Unified Approach to Score Regression, Binary Classification, and Personalization
Title | Image Aesthetic Assessment Based on Pairwise Comparison A Unified Approach to Score Regression, Binary Classification, and Personalization |
Authors | Jun-Tae Lee, Chang-Su Kim |
Abstract | We propose a unified approach to three tasks of aesthetic score regression, binary aesthetic classification, and personalized aesthetics. First, we develop a comparator to estimate the ratio of aesthetic scores for two images. Then, we construct a pairwise comparison matrix for multiple reference images and an input image, and predict the aesthetic score of the input via the eigenvalue decomposition of the matrix. By varying the reference images, the proposed algorithm can be used for binary aesthetic classification and personalized aesthetics, as well as generic score regression. Experimental results demonstrate that the proposed unified algorithm provides the state-of-the-art performances in all three tasks of image aesthetics. |
Tasks | |
Published | 2019-10-01 |
URL | http://openaccess.thecvf.com/content_ICCV_2019/html/Lee_Image_Aesthetic_Assessment_Based_on_Pairwise_Comparison__A_Unified_ICCV_2019_paper.html |
http://openaccess.thecvf.com/content_ICCV_2019/papers/Lee_Image_Aesthetic_Assessment_Based_on_Pairwise_Comparison__A_Unified_ICCV_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/image-aesthetic-assessment-based-on-pairwise |
Repo | |
Framework | |
I3D-LSTM: A New Model for Human Action Recognition
Title | I3D-LSTM: A New Model for Human Action Recognition |
Authors | Xianyuan Wang, Zhenjiang Miao, Ruyi Zhang, Shanshan Hao |
Abstract | Action recognition has already been a heated research topic recently, which attempts to classify different human actions in videos. The current main-stream methods generally utilize ImageNet-pretrained model as features extractor, however it’s not the optimal choice to pretrain a model for classifying videos on a huge still image dataset. What’s more, very few works notice that 3D convolution neural network(3D CNN) is better for low-level spatial-temporal features extraction while recurrent neural network(RNN) is better for modelling high-level temporal feature sequences. Consequently, a novel model is proposed in our work to address the two problems mentioned above. First, we pretrain 3D CNN model on huge video action recognition dataset Kinetics to improve generality of the model. And then long short term memory(LSTM) is introduced to model the high-level temporal features produced by the Kinetics-pretrained 3D CNN model. Our experiments results show that the Kinetics-pretrained model can generally outperform ImageNet-pretrained model. And our proposed network finally achieve leading performance on UCF-101 dataset. |
Tasks | Action Recognition In Videos, Temporal Action Localization |
Published | 2019-08-09 |
URL | https://doi.org/10.1088/1757-899X/569/3/032035 |
https://iopscience.iop.org/article/10.1088/1757-899X/569/3/032035/pdf | |
PWC | https://paperswithcode.com/paper/i3d-lstm-a-new-model-for-human-action |
Repo | |
Framework | |
Multilingual Complex Word Identification: Convolutional Neural Networks with Morphological and Linguistic Features
Title | Multilingual Complex Word Identification: Convolutional Neural Networks with Morphological and Linguistic Features |
Authors | Kim Cheng SHEANG |
Abstract | The paper is about our experiments with Complex Word Identification system using deep learning approach with word embeddings and engineered features. |
Tasks | Complex Word Identification, Word Embeddings |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/R19-2013/ |
https://www.aclweb.org/anthology/R19-2013 | |
PWC | https://paperswithcode.com/paper/multilingual-complex-word-identification |
Repo | |
Framework | |
Table Structure Recognition Based on Cell Relationship, a Bottom-Up Approach
Title | Table Structure Recognition Based on Cell Relationship, a Bottom-Up Approach |
Authors | Darshan Adiga, Shabir Ahmad Bhat, Muzaffar Bashir Shah, Viveka Vyeth |
Abstract | In this paper, we present a relationship extraction based methodology for table structure recognition in PDF documents. The proposed deep learning-based method takes a bottom-up approach to table recognition in PDF documents. We outline the shortcomings of conventional approaches based on heuristics and machine learning-based top-down approaches. In this work, we explain how the task of table structure recognition can be modeled as a cell relationship extraction task and the importance of the bottom-up approach in recognizing the table cells. We use Multilayer Feedforward Neural Network for table structure recognition and compare the results of three feature sets. To gauge the performance of the proposed method, we prepared a training dataset using 250 tables in PDF documents, carefully selecting the table structures that are most commonly found in the documents. Our model achieves an overall accuracy of 97.95{%} and an F1-Score of 92.62{%} on the test dataset. |
Tasks | |
Published | 2019-09-01 |
URL | https://www.aclweb.org/anthology/R19-1001/ |
https://www.aclweb.org/anthology/R19-1001 | |
PWC | https://paperswithcode.com/paper/table-structure-recognition-based-on-cell |
Repo | |
Framework | |
UCSMNLP: Statistical Machine Translation for WAT 2019
Title | UCSMNLP: Statistical Machine Translation for WAT 2019 |
Authors | Aye Thida, Nway Nway Han, Sheinn Thawtar Oo, Khin Thet Htar |
Abstract | This paper represents UCSMNLP{'}s submission to the WAT 2019 Translation Tasks focusing on the Myanmar-English translation. Phrase based statistical machine translation (PBSMT) system is built by using other resources: Name Entity Recognition (NER) corpus and bilingual dictionary which is created by Google Translate (GT). This system is also adopted with listwise reranking process in order to improve the quality of translation and tuning is done by changing initial distortion weight. The experimental results show that PBSMT using other resources with initial distortion weight (0.4) and listwise reranking function outperforms the baseline system. |
Tasks | Machine Translation |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5210/ |
https://www.aclweb.org/anthology/D19-5210 | |
PWC | https://paperswithcode.com/paper/ucsmnlp-statistical-machine-translation-for |
Repo | |
Framework | |
Information Regularized Neural Networks
Title | Information Regularized Neural Networks |
Authors | Tianchen Zhao, Dejiao Zhang, Zeyu Sun, Honglak Lee |
Abstract | We formulate an information-based optimization problem for supervised classification. For invertible neural networks, the control of these information terms is passed down to the latent features and parameter matrix in the last fully connected layer, given that mutual information is invariant under invertible map. We propose an objective function and prove that it solves the optimization problem. Our framework allows us to learn latent features in an more interpretable form while improving the classification performance. We perform extensive quantitative and qualitative experiments in comparison with the existing state-of-the-art classification models. |
Tasks | |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=BJgvg30ctX |
https://openreview.net/pdf?id=BJgvg30ctX | |
PWC | https://paperswithcode.com/paper/information-regularized-neural-networks |
Repo | |
Framework | |
Japanese-Russian TMU Neural Machine Translation System using Multilingual Model for WAT 2019
Title | Japanese-Russian TMU Neural Machine Translation System using Multilingual Model for WAT 2019 |
Authors | Aizhan Imankulova, Masahiro Kaneko, Mamoru Komachi |
Abstract | We introduce our system that is submitted to the News Commentary task (Japanese{\textless}-{\textgreater}Russian) of the 6th Workshop on Asian Translation. The goal of this shared task is to study extremely low resource situations for distant language pairs. It is known that using parallel corpora of different language pair as training data is effective for multilingual neural machine translation model in extremely low resource scenarios. Therefore, to improve the translation quality of Japanese{\textless}-{\textgreater}Russian language pair, our method leverages other in-domain Japanese-English and English-Russian parallel corpora as additional training data for our multilingual NMT model. |
Tasks | Machine Translation |
Published | 2019-11-01 |
URL | https://www.aclweb.org/anthology/D19-5221/ |
https://www.aclweb.org/anthology/D19-5221 | |
PWC | https://paperswithcode.com/paper/japanese-russian-tmu-neural-machine |
Repo | |
Framework | |
Neural TTS Stylization with Adversarial and Collaborative Games
Title | Neural TTS Stylization with Adversarial and Collaborative Games |
Authors | Shuang Ma, Daniel Mcduff, Yale Song |
Abstract | The modeling of style when synthesizing natural human speech from text has been the focus of significant attention. Some state-of-the-art approaches train an encoder-decoder network on paired text and audio samples (x_txt, x_aud) by encouraging its output to reconstruct x_aud. The synthesized audio waveform is expected to contain the verbal content of x_txt and the auditory style of x_aud. Unfortunately, modeling style in TTS is somewhat under-determined and training models with a reconstruction loss alone is insufficient to disentangle content and style from other factors of variation. In this work, we introduce an end-to-end TTS model that offers enhanced content-style disentanglement ability and controllability. We achieve this by combining a pairwise training procedure, an adversarial game, and a collaborative game into one training scheme. The adversarial game concentrates the true data distribution, and the collaborative game minimizes the distance between real samples and generated samples in both the original space and the latent space. As a result, the proposed model delivers a highly controllable generator, and a disentangled representation. Benefiting from the separate modeling of style and content, our model can generate human fidelity speech that satisfies the desired style conditions. Our model achieves start-of-the-art results across multiple tasks, including style transfer (content and style swapping), emotion modeling, and identity transfer (fitting a new speaker’s voice). |
Tasks | Style Transfer |
Published | 2019-05-01 |
URL | https://openreview.net/forum?id=ByzcS3AcYX |
https://openreview.net/pdf?id=ByzcS3AcYX | |
PWC | https://paperswithcode.com/paper/neural-tts-stylization-with-adversarial-and |
Repo | |
Framework | |
SHAMANN: Shared Memory Augmented Neural Networks
Title | SHAMANN: Shared Memory Augmented Neural Networks |
Authors | Cosmin I. Bercea, Olivier Pauly, Andreas K. Maier, Florin C. Ghesu |
Abstract | Current state-of-the-art methods for semantic segmentation use deep neural networks to learn the segmentation mask from the input image signal as an image-to-image mapping. While these methods effectively exploit global image context, the learning and computational complexities are high. We propose shared memory augmented neural network actors as a dynamically scalable alternative. Based on a decomposition of the image into a sequence of local patches, we train such actors to sequentially segment each patch. To further increase the robustness and better capture shape priors, an external memory module is shared between different actors, providing an implicit mechanism for image information exchange. Finally, the patch-wise predictions are aggregated to a complete segmentation mask. We demonstrate the benefits of the new paradigm on a challenging lung segmentation problem based on chest X-Ray images, as well as on two synthetic tasks based on the MNIST dataset. On the X-Ray data, our method achieves state-of-the-art accuracy with a significantly reduced model size compared to reference methods. In addition, we reduce the number of failure cases by at least half. |
Tasks | Semantic Segmentation |
Published | 2019-01-01 |
URL | https://openreview.net/forum?id=BJeWOi09FQ |
https://openreview.net/pdf?id=BJeWOi09FQ | |
PWC | https://paperswithcode.com/paper/shamann-shared-memory-augmented-neural |
Repo | |
Framework | |
Unsupervised Learning of Dense Shape Correspondence
Title | Unsupervised Learning of Dense Shape Correspondence |
Authors | Oshri Halimi, Or Litany, Emanuele Rodola, Alex M. Bronstein, Ron Kimmel |
Abstract | We introduce the first completely unsupervised correspondence learning approach for deformable 3D shapes. Key to our model is the understanding that natural deformations (such as changes in pose) approximately preserve the metric structure of the surface, yielding a natural criterion to drive the learning process toward distortion-minimizing predictions. On this basis, we overcome the need for annotated data and replace it by a purely geometric criterion. The resulting learning model is class-agnostic, and is able to leverage any type of deformable geometric data for the training phase. In contrast to existing supervised approaches which specialize on the class seen at training time, we demonstrate stronger generalization as well as applicability to a variety of challenging settings. We showcase our method on a wide selection of correspondence benchmarks, where we outperform other methods in terms of accuracy, generalization, and efficiency. |
Tasks | |
Published | 2019-06-01 |
URL | http://openaccess.thecvf.com/content_CVPR_2019/html/Halimi_Unsupervised_Learning_of_Dense_Shape_Correspondence_CVPR_2019_paper.html |
http://openaccess.thecvf.com/content_CVPR_2019/papers/Halimi_Unsupervised_Learning_of_Dense_Shape_Correspondence_CVPR_2019_paper.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-of-dense-shape |
Repo | |
Framework | |
ARHNet - Leveraging Community Interaction for Detection of Religious Hate Speech in Arabic
Title | ARHNet - Leveraging Community Interaction for Detection of Religious Hate Speech in Arabic |
Authors | Arijit Ghosh Chowdhury, Aniket Didolkar, Ramit Sawhney, Rajiv Ratn Shah |
Abstract | The rapid widespread of social media has lead to some undesirable consequences like the rapid increase of hateful content and offensive language. Religious Hate Speech, in particular, often leads to unrest and sometimes aggravates to violence against people on the basis of their religious affiliations. The richness of the Arabic morphology and the limited available resources makes this task especially challenging. The current state-of-the-art approaches to detect hate speech in Arabic rely entirely on textual (lexical and semantic) cues. Our proposed methodology contends that leveraging Community-Interaction can better help us profile hate speech content on social media. Our proposed ARHNet (Arabic Religious Hate Speech Net) model incorporates both Arabic Word Embeddings and Social Network Graphs for the detection of religious hate speech. |
Tasks | Word Embeddings |
Published | 2019-07-01 |
URL | https://www.aclweb.org/anthology/P19-2038/ |
https://www.aclweb.org/anthology/P19-2038 | |
PWC | https://paperswithcode.com/paper/arhnet-leveraging-community-interaction-for |
Repo | |
Framework | |
Improving American Sign Language Recognition with Synthetic Data
Title | Improving American Sign Language Recognition with Synthetic Data |
Authors | Jungi Kim, Patricia O{'}Neill-Brown |
Abstract | |
Tasks | Sign Language Recognition |
Published | 2019-08-01 |
URL | https://www.aclweb.org/anthology/W19-6615/ |
https://www.aclweb.org/anthology/W19-6615 | |
PWC | https://paperswithcode.com/paper/improving-american-sign-language-recognition |
Repo | |
Framework | |