Paper Group ANR 625
Multi-View Features and Hybrid Reward Strategies for Vatex Video Captioning Challenge 2019. HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking. Propagated Perturbation of Adversarial Attack for well-known CNNs: Empirical Study and its Explanation. Imperial College London Submission to VATEX Video Captioning Task. Integrating …
Multi-View Features and Hybrid Reward Strategies for Vatex Video Captioning Challenge 2019
Title | Multi-View Features and Hybrid Reward Strategies for Vatex Video Captioning Challenge 2019 |
Authors | Xinxin Zhu, Longteng Guo, Peng Yao, Jing Liu, Shichen Lu, Zheng Yu, Wei Liu, Hanqing Lu |
Abstract | This document describes our solution for the VATEX Captioning Challenge 2019, which requires generating descriptions for the videos in both English and Chinese languages. We identified three crucial factors that improve the performance, namely: multi-view features, hybrid reward, and diverse ensemble. Our method achieves the 2nd and the 3rd places on the Chinese and English video captioning tracks, respectively. |
Tasks | Video Captioning |
Published | 2019-10-17 |
URL | https://arxiv.org/abs/1910.11102v2 |
https://arxiv.org/pdf/1910.11102v2.pdf | |
PWC | https://paperswithcode.com/paper/multi-view-features-and-hybrid-reward |
Repo | |
Framework | |
HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking
Title | HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking |
Authors | Rahul Goel, Shachi Paul, Dilek Hakkani-Tür |
Abstract | Recent works on end-to-end trainable neural network based approaches have demonstrated state-of-the-art results on dialogue state tracking. The best performing approaches estimate a probability distribution over all possible slot values. However, these approaches do not scale for large value sets commonly present in real-life applications and are not ideal for tracking slot values that were not observed in the training set. To tackle these issues, candidate-generation-based approaches have been proposed. These approaches estimate a set of values that are possible at each turn based on the conversation history and/or language understanding outputs, and hence enable state tracking over unseen values and large value sets however, they fall short in terms of performance in comparison to the first group. In this work, we analyze the performance of these two alternative dialogue state tracking methods, and present a hybrid approach (HyST) which learns the appropriate method for each slot type. To demonstrate the effectiveness of HyST on a rich-set of slot types, we experiment with the recently released MultiWOZ-2.0 multi-domain, task-oriented dialogue-dataset. Our experiments show that HyST scales to multi-domain applications. Our best performing model results in a relative improvement of 24% and 10% over the previous SOTA and our best baseline respectively. |
Tasks | Dialogue State Tracking |
Published | 2019-07-01 |
URL | https://arxiv.org/abs/1907.00883v1 |
https://arxiv.org/pdf/1907.00883v1.pdf | |
PWC | https://paperswithcode.com/paper/hyst-a-hybrid-approach-for-flexible-and |
Repo | |
Framework | |
Propagated Perturbation of Adversarial Attack for well-known CNNs: Empirical Study and its Explanation
Title | Propagated Perturbation of Adversarial Attack for well-known CNNs: Empirical Study and its Explanation |
Authors | Jihyeun Yoon, Kyungyul Kim, Jongseong Jang |
Abstract | Deep Neural Network based classifiers are known to be vulnerable to perturbations of inputs constructed by an adversarial attack to force misclassification. Most studies have focused on how to make vulnerable noise by gradient based attack methods or to defense model from adversarial attack. The use of the denoiser model is one of a well-known solution to reduce the adversarial noise although classification performance had not significantly improved. In this study, we aim to analyze the propagation of adversarial attack as an explainable AI(XAI) point of view. Specifically, we examine the trend of adversarial perturbations through the CNN architectures. To analyze the propagated perturbation, we measured normalized Euclidean Distance and cosine distance in each CNN layer between the feature map of the perturbed image passed through denoiser and the non-perturbed original image. We used five well-known CNN based classifiers and three gradient-based adversarial attacks. From the experimental results, we observed that in most cases, Euclidean Distance explosively increases in the final fully connected layer while cosine distance fluctuated and disappeared at the last layer. This means that the use of denoiser can decrease the amount of noise. However, it failed to defense accuracy degradation. |
Tasks | Adversarial Attack |
Published | 2019-09-19 |
URL | https://arxiv.org/abs/1909.09263v2 |
https://arxiv.org/pdf/1909.09263v2.pdf | |
PWC | https://paperswithcode.com/paper/propagated-perturbation-of-adversarial-attack |
Repo | |
Framework | |
Imperial College London Submission to VATEX Video Captioning Task
Title | Imperial College London Submission to VATEX Video Captioning Task |
Authors | Ozan Caglayan, Zixiu Wu, Pranava Madhyastha, Josiah Wang, Lucia Specia |
Abstract | This paper describes the Imperial College London team’s submission to the 2019’ VATEX video captioning challenge, where we first explore two sequence-to-sequence models, namely a recurrent (GRU) model and a transformer model, which generate captions from the I3D action features. We then investigate the effect of dropping the encoder and the attention mechanism and instead conditioning the GRU decoder over two different vectorial representations: (i) a max-pooled action feature vector and (ii) the output of a multi-label classifier trained to predict visual entities from the action features. Our baselines achieved scores comparable to the official baseline. Conditioning over entity predictions performed substantially better than conditioning on the max-pooled feature vector, and only marginally worse than the GRU-based sequence-to-sequence baseline. |
Tasks | Video Captioning |
Published | 2019-10-16 |
URL | https://arxiv.org/abs/1910.07482v1 |
https://arxiv.org/pdf/1910.07482v1.pdf | |
PWC | https://paperswithcode.com/paper/imperial-college-london-submission-to-vatex |
Repo | |
Framework | |
Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019
Title | Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019 |
Authors | Shizhe Chen, Yida Zhao, Yuqing Song, Qin Jin, Qi Wu |
Abstract | This notebook paper presents our model in the VATEX video captioning challenge. In order to capture multi-level aspects in the video, we propose to integrate both temporal and spatial attentions for video captioning. The temporal attentive module focuses on global action movements while spatial attentive module enables to describe more fine-grained objects. Considering these two types of attentive modules are complementary, we thus fuse them via a late fusion strategy. The proposed model significantly outperforms baselines and achieves 73.4 CIDEr score on the testing set which ranks the second place at the VATEX video captioning challenge leaderboard 2019. |
Tasks | Video Captioning |
Published | 2019-10-15 |
URL | https://arxiv.org/abs/1910.06737v1 |
https://arxiv.org/pdf/1910.06737v1.pdf | |
PWC | https://paperswithcode.com/paper/integrating-temporal-and-spatial-attentions |
Repo | |
Framework | |
SAWNet: A Spatially Aware Deep Neural Network for 3D Point Cloud Processing
Title | SAWNet: A Spatially Aware Deep Neural Network for 3D Point Cloud Processing |
Authors | Chaitanya Kaul, Nick Pears, Suresh Manandhar |
Abstract | Deep neural networks have established themselves as the state-of-the-art methodology in almost all computer vision tasks to date. But their application to processing data lying on non-Euclidean domains is still a very active area of research. One such area is the analysis of point cloud data which poses a challenge due to its lack of order. Many recent techniques have been proposed, spearheaded by the PointNet architecture. These techniques use either global or local information from the point clouds to extract a latent representation for the points, which is then used for the task at hand (classification/segmentation). In our work, we introduce a neural network layer that combines both global and local information to produce better embeddings of these points. We enhance our architecture with residual connections, to pass information between the layers, which also makes the network easier to train. We achieve state-of-the-art results on the ModelNet40 dataset with our architecture, and our results are also highly competitive with the state-of-the-art on the ShapeNet part segmentation dataset and the indoor scene segmentation dataset. We plan to open source our pre-trained models on github to encourage the research community to test our networks on their data, or simply use them for benchmarking purposes. |
Tasks | Scene Segmentation |
Published | 2019-05-18 |
URL | https://arxiv.org/abs/1905.07650v1 |
https://arxiv.org/pdf/1905.07650v1.pdf | |
PWC | https://paperswithcode.com/paper/sawnet-a-spatially-aware-deep-neural-network |
Repo | |
Framework | |
VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning
Title | VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning |
Authors | Ziqi Zhang, Yaya Shi, Jiutong Wei, Chunfeng Yuan, Bing Li, Weiming Hu |
Abstract | Multi-modal information is essential to describe what has happened in a video. In this work, we represent videos by various appearance, motion and audio information guided with video topic. By following multi-stage training strategy, our experiments show steady and significant improvement on the VATEX benchmark. This report presents an overview and comparative analysis of our system designed for both Chinese and English tracks on VATEX Captioning Challenge 2019. |
Tasks | Video Captioning |
Published | 2019-10-13 |
URL | https://arxiv.org/abs/1910.05752v1 |
https://arxiv.org/pdf/1910.05752v1.pdf | |
PWC | https://paperswithcode.com/paper/vatex-captioning-challenge-2019-multi-modal |
Repo | |
Framework | |
Rhythm Zone Theory: Speech Rhythms are Physical after all
Title | Rhythm Zone Theory: Speech Rhythms are Physical after all |
Authors | Dafydd Gibbon, Xuewei Lin |
Abstract | Speech rhythms have been dealt with in three main ways: from the introspective analyses of rhythm as a correlate of syllable and foot timing in linguistics and applied linguistics, through analyses of durations of segments of utterances associated with consonantal and vocalic properties, syllables, feet and words, to models of rhythms in speech production and perception as physical oscillations. The present study avoids introspection and human-filtered annotation methods and extends the signal processing paradigm of amplitude envelope spectrum analysis by adding an additional analytic step of edge detection, and postulating the co-existence of multiple speech rhythms in rhythm zones marked by identifiable edges (Rhythm Zone Theory, RZT). An exploratory investigation of the utility of RZT is conducted, suggesting that native and non-native readings of the same text are distinct sub-genres of read speech: a reading by a US native speaker and non-native readings by relatively low-performing Cantonese adult learners of English. The study concludes by noting that with the methods used, RZT can distinguish between the speech rhythms of well-defined sub-genres of native speaker reading vs. non-native learner reading, but needs further refinement in order to be applied to the paradoxically more complex speech of low-performing language learners, whose speech rhythms are co-determined by non-fluency and disfluency factors in addition to well-known linguistic factors of grammar, vocabulary and discourse constraints. |
Tasks | Edge Detection |
Published | 2019-01-31 |
URL | http://arxiv.org/abs/1902.01267v2 |
http://arxiv.org/pdf/1902.01267v2.pdf | |
PWC | https://paperswithcode.com/paper/rhythm-zone-theory-speech-rhythms-are |
Repo | |
Framework | |
Human Action Sequence Classification
Title | Human Action Sequence Classification |
Authors | Yan Bin Ng, Basura Fernando |
Abstract | This paper classifies human action sequences from videos using a machine translation model. In contrast to classical human action classification which outputs a set of actions, our method output a sequence of action in the chronological order of the actions performed by the human. Therefore our method is evaluated using sequential performance measures such as Bilingual Evaluation Understudy (BLEU) scores. Action sequence classification has many applications such as learning from demonstration, action segmentation, detection, localization and video captioning. Furthermore, we use our model that is trained to output action sequences to solve downstream tasks; such as video captioning and action localization. We obtain state of the art results for video captioning in challenging Charades dataset obtaining BLEU-4 score of 34.8 and METEOR score of 33.6 outperforming previous state-of-the-art of 18.8 and 19.5 respectively. Similarly, on ActivityNet captioning, we obtain excellent results in-terms of ROUGE (20.24) and CIDER (37.58) scores. For action localization, without using any explicit start/end action annotations, our method obtains localization performance of 22.2 mAP outperforming prior fully supervised methods. |
Tasks | Action Classification, Action Localization, action segmentation, Machine Translation, Video Captioning |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.02602v1 |
https://arxiv.org/pdf/1910.02602v1.pdf | |
PWC | https://paperswithcode.com/paper/human-action-sequence-classification |
Repo | |
Framework | |
Optimizing vaccine distribution networks in low and middle-income countries
Title | Optimizing vaccine distribution networks in low and middle-income countries |
Authors | Yuwen Yang, Hoda Bidkhori, Jayant Rajgopal |
Abstract | Vaccination has been proven to be the most effective method to prevent infectious diseases. However, there are still millions of children in low and middle-income countries who are not covered by routine vaccines and remain at risk. The World Health Organization’s Expanded Programme on Immunization (WHO-EPI) was designed to provide universal childhood vaccine access for children across the world and in this work, we address the design of the distribution network for WHO-EPI vaccines. In particular, we formulate the network design problem as a mixed integer program (MIP) and present a new algorithm for typical problems that are too large to be solved using commercial MIP software. We test the algorithm using data derived from four different countries in sub-Saharan Africa and show that the algorithm is able to obtain high-quality solutions for even the largest problems within a few minutes. |
Tasks | |
Published | 2019-07-25 |
URL | https://arxiv.org/abs/1907.13434v1 |
https://arxiv.org/pdf/1907.13434v1.pdf | |
PWC | https://paperswithcode.com/paper/optimizing-vaccine-distribution-networks-in |
Repo | |
Framework | |
Toward Maximizing the Visibility of Content in Social Media Brand Pages: A Temporal Analysis
Title | Toward Maximizing the Visibility of Content in Social Media Brand Pages: A Temporal Analysis |
Authors | Nagendra Kumar, Gopi Ande, J. Shirish Kumar, Manish Singh |
Abstract | A large amount of content is generated everyday in social media. One of the main goals of content creators is to spread their information to a large audience. There are many factors that affect information spread, such as posting time, location, type of information, number of social connections, etc. In this paper, we look at the problem of finding the best posting time(s) to get high content visibility. The posting time is derived taking other factors into account, such as location, type of information, etc. In this paper, we do our analysis over Facebook pages. We propose six posting schedules that can be used for individual pages or group of pages with similar audience reaction profile. We perform our experiment on a Facebook pages dataset containing 0.3 million posts, 10 million audience reactions. Our best posting schedule can lead to seven times more number of audience reactions compared to the average number of audience reactions that users would get without following any optimized posting schedule. We also present some interesting audience reaction patterns that we obtained through daily, weekly and monthly audience reaction analysis. |
Tasks | |
Published | 2019-08-22 |
URL | https://arxiv.org/abs/1908.08622v1 |
https://arxiv.org/pdf/1908.08622v1.pdf | |
PWC | https://paperswithcode.com/paper/toward-maximizing-the-visibility-of-content |
Repo | |
Framework | |
SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability
Title | SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability |
Authors | Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara |
Abstract | The ability to generate natural language explanations conditioned on the visual perception is a crucial step towards autonomous agents which can explain themselves and communicate with humans. While the research efforts in image and video captioning are giving promising results, this is often done at the expense of the computational requirements of the approaches, limiting their applicability to real contexts. In this paper, we propose a fully-attentive captioning algorithm which can provide state-of-the-art performances on language generation while restricting its computational demands. Our model is inspired by the Transformer model and employs only two Transformer layers in the encoding and decoding stages. Further, it incorporates a novel memory-aware encoding of image regions. Experiments demonstrate that our approach achieves competitive results in terms of caption quality while featuring reduced computational demands. Further, to evaluate its applicability on autonomous agents, we conduct experiments on simulated scenes taken from the perspective of domestic robots. |
Tasks | Text Generation, Video Captioning |
Published | 2019-10-07 |
URL | https://arxiv.org/abs/1910.02974v3 |
https://arxiv.org/pdf/1910.02974v3.pdf | |
PWC | https://paperswithcode.com/paper/smart-training-shallow-memory-aware |
Repo | |
Framework | |
Function Follows Form: Regression from Complete Thoracic Computed Tomography Scans
Title | Function Follows Form: Regression from Complete Thoracic Computed Tomography Scans |
Authors | Max Argus, Cornelia Schaefer-Prokop, David A. Lynch, Bram van Ginneken |
Abstract | Chronic Obstructive Pulmonary Disease (COPD) is a leading cause of morbidity and mortality. While COPD diagnosis is based on lung function tests, early stages and progression of different aspects of the disease can be visible and quantitatively assessed on computed tomography (CT) scans. Many studies have been published that quantify imaging biomarkers related to COPD. In this paper we present a convolutional neural network that directly computes visual emphysema scores and predicts the outcome of lung function tests for 195 CT scans from the COPDGene study. Contrary to previous work, the proposed method does not encode any specific prior knowledge about what to quantify, but it is trained end-to-end with a set of 1424 CT scans for which the output parameters were available. The network provided state-of-the-art results for these tasks: Visual emphysema scores are comparable to those assessed by trained human observers; COPD diagnosis from estimated lung function reaches an area under the ROC curve of 0.94, outperforming prior art. The method is easily generalizable to other situations where information from whole scans needs to be summarized in single quantities. |
Tasks | Computed Tomography (CT) |
Published | 2019-09-26 |
URL | https://arxiv.org/abs/1909.12047v2 |
https://arxiv.org/pdf/1909.12047v2.pdf | |
PWC | https://paperswithcode.com/paper/follows-form-regression-from-complete |
Repo | |
Framework | |
Detecting Bias with Generative Counterfactual Face Attribute Augmentation
Title | Detecting Bias with Generative Counterfactual Face Attribute Augmentation |
Authors | Emily Denton, Ben Hutchinson, Margaret Mitchell, Timnit Gebru |
Abstract | We introduce a simple framework for identifying biases of a smiling attribute classifier. Our method poses counterfactual questions of the form: how would the prediction change if this face characteristic had been different? We leverage recent advances in generative adversarial networks to build a realistic generative model of face images that affords controlled manipulation of specific image characteristics. We introduce a set of metrics that measure the effect of manipulating a specific property of an image on the output of a trained classifier. Empirically, we identify several different factors of variation that affect the predictions of a smiling classifier trained on CelebA. |
Tasks | |
Published | 2019-06-14 |
URL | https://arxiv.org/abs/1906.06439v2 |
https://arxiv.org/pdf/1906.06439v2.pdf | |
PWC | https://paperswithcode.com/paper/detecting-bias-with-generative-counterfactual |
Repo | |
Framework | |
Predicting Rainfall using Machine Learning Techniques
Title | Predicting Rainfall using Machine Learning Techniques |
Authors | Nikhil Oswal |
Abstract | Rainfall prediction is one of the challenging and uncertain tasks which has a significant impact on human society. Timely and accurate predictions can help to proactively reduce human and financial loss. This study presents a set of experiments which involve the use of prevalent machine learning techniques to build models to predict whether it is going to rain tomorrow or not based on weather data for that particular day in major cities of Australia. This comparative study is conducted concentrating on three aspects: modeling inputs, modeling methods, and pre-processing techniques. The results provide a comparison of various evaluation metrics of these machine learning techniques and their reliability to predict the rainfall by analyzing the weather data. |
Tasks | |
Published | 2019-10-29 |
URL | https://arxiv.org/abs/1910.13827v1 |
https://arxiv.org/pdf/1910.13827v1.pdf | |
PWC | https://paperswithcode.com/paper/predicting-rainfall-using-machine-learning |
Repo | |
Framework | |