July 28, 2019

3077 words 15 mins read

Paper Group ANR 341

An Analysis of Visual Question Answering Algorithms. SPLBoost: An Improved Robust Boosting Algorithm Based on Self-paced Learning. Alignment Distances on Systems of Bags. Indowordnets help in Indian Language Machine Translation. Image Segmentation and Classification for Sickle Cell Disease using Deformable U-Net. Learning from Noisy Labels with Dis …

An Analysis of Visual Question Answering Algorithms


Title	An Analysis of Visual Question Answering Algorithms
Authors	Kushal Kafle, Christopher Kanan
Abstract	In visual question answering (VQA), an algorithm must answer text-based questions about images. While multiple datasets for VQA have been created since late 2014, they all have flaws in both their content and the way algorithms are evaluated on them. As a result, evaluation scores are inflated and predominantly determined by answering easier questions, making it difficult to compare different methods. In this paper, we analyze existing VQA algorithms using a new dataset. It contains over 1.6 million questions organized into 12 different categories. We also introduce questions that are meaningless for a given image to force a VQA system to reason about image content. We propose new evaluation schemes that compensate for over-represented question-types and make it easier to study the strengths and weaknesses of algorithms. We analyze the performance of both baseline and state-of-the-art VQA models, including multi-modal compact bilinear pooling (MCB), neural module networks, and recurrent answering units. Our experiments establish how attention helps certain categories more than others, determine which models work better than others, and explain how simple models (e.g. MLP) can surpass more complex models (MCB) by simply learning to answer large, easy question categories.
Tasks	Question Answering, Visual Question Answering
Published	2017-03-28
URL	http://arxiv.org/abs/1703.09684v2
PDF	http://arxiv.org/pdf/1703.09684v2.pdf
PWC	https://paperswithcode.com/paper/an-analysis-of-visual-question-answering
Repo
Framework

SPLBoost: An Improved Robust Boosting Algorithm Based on Self-paced Learning


Title	SPLBoost: An Improved Robust Boosting Algorithm Based on Self-paced Learning
Authors	Kaidong Wang, Yao Wang, Qian Zhao, Deyu Meng, Zongben Xu
Abstract	It is known that Boosting can be interpreted as a gradient descent technique to minimize an underlying loss function. Specifically, the underlying loss being minimized by the traditional AdaBoost is the exponential loss, which is proved to be very sensitive to random noise/outliers. Therefore, several Boosting algorithms, e.g., LogitBoost and SavageBoost, have been proposed to improve the robustness of AdaBoost by replacing the exponential loss with some designed robust loss functions. In this work, we present a new way to robustify AdaBoost, i.e., incorporating the robust learning idea of Self-paced Learning (SPL) into Boosting framework. Specifically, we design a new robust Boosting algorithm based on SPL regime, i.e., SPLBoost, which can be easily implemented by slightly modifying off-the-shelf Boosting packages. Extensive experiments and a theoretical characterization are also carried out to illustrate the merits of the proposed SPLBoost.
Tasks
Published	2017-06-20
URL	http://arxiv.org/abs/1706.06341v2
PDF	http://arxiv.org/pdf/1706.06341v2.pdf
PWC	https://paperswithcode.com/paper/splboost-an-improved-robust-boosting
Repo
Framework

Alignment Distances on Systems of Bags


Title	Alignment Distances on Systems of Bags
Authors	Alexander Sagel, Martin Kleinsteuber
Abstract	Recent research in image and video recognition indicates that many visual processes can be thought of as being generated by a time-varying generative model. A nearby descriptive model for visual processes is thus a statistical distribution that varies over time. Specifically, modeling visual processes as streams of histograms generated by a kernelized linear dynamic system turns out to be efficient. We refer to such a model as a System of Bags. In this work, we investigate Systems of Bags with special emphasis on dynamic scenes and dynamic textures. Parameters of linear dynamic systems suffer from ambiguities. In order to cope with these ambiguities in the kernelized setting, we develop a kernelized version of the alignment distance. For its computation, we use a Jacobi-type method and prove its convergence to a set of critical points. We employ it as a dissimilarity measure on Systems of Bags. As such, it outperforms other known dissimilarity measures for kernelized linear dynamic systems, in particular the Martin Distance and the Maximum Singular Value Distance, in every tested classification setting. A considerable margin can be observed in settings, where classification is performed with respect to an abstract mean of video sets. For this scenario, the presented approach can outperform state-of-the-art techniques, such as Dynamic Fractal Spectrum or Orthogonal Tensor Dictionary Learning.
Tasks	Dictionary Learning, Video Recognition
Published	2017-06-14
URL	http://arxiv.org/abs/1706.04388v1
PDF	http://arxiv.org/pdf/1706.04388v1.pdf
PWC	https://paperswithcode.com/paper/alignment-distances-on-systems-of-bags
Repo
Framework

Indowordnets help in Indian Language Machine Translation


Title	Indowordnets help in Indian Language Machine Translation
Authors	Sreelekha S, Pushpak Bhattacharyya
Abstract	Being less resource languages, Indian-Indian and English-Indian language MT system developments faces the difficulty to translate various lexical phenomena. In this paper, we present our work on a comparative study of 440 phrase-based statistical trained models for 110 language pairs across 11 Indian languages. We have developed 110 baseline Statistical Machine Translation systems. Then we have augmented the training corpus with Indowordnet synset word entries of lexical database and further trained 110 models on top of the baseline system. We have done a detailed performance comparison using various evaluation metrics such as BLEU score, METEOR and TER. We observed significant improvement in evaluations of translation quality across all the 440 models after using the Indowordnet. These experiments give a detailed insight in two ways : (1) usage of lexical database with synset mapping for resource poor languages (2) efficient usage of Indowordnet sysnset mapping. More over, synset mapped lexical entries helped the SMT system to handle the ambiguity to a great extent during the translation.
Tasks	Machine Translation
Published	2017-10-05
URL	http://arxiv.org/abs/1710.02086v2
PDF	http://arxiv.org/pdf/1710.02086v2.pdf
PWC	https://paperswithcode.com/paper/indowordnets-help-in-indian-language-machine
Repo
Framework

Image Segmentation and Classification for Sickle Cell Disease using Deformable U-Net


Title	Image Segmentation and Classification for Sickle Cell Disease using Deformable U-Net
Authors	Mo Zhang, Xiang Li, Mengjia Xu, Quanzheng Li
Abstract	Reliable cell segmentation and classification from biomedical images is a crucial step for both scientific research and clinical practice. A major challenge for more robust segmentation and classification methods is the large variations in the size, shape and viewpoint of the cells, combining with the low image quality caused by noise and artifacts. To address this issue, in this work we propose a learning-based, simultaneous cell segmentation and classification method based on the deep U-Net structure with deformable convolution layers. The U-Net architecture for deep learning has been shown to offer a precise localization for image semantic segmentation. Moreover, deformable convolution layer enables the free form deformation of the feature learning process, thus makes the whole network more robust to various cell morphologies and image settings. The proposed method is tested on microscopic red blood cell images from patients with sickle cell disease. The results show that U-Net with deformable convolution achieves the highest accuracy for segmentation and classification, comparing with original U-Net structure.
Tasks	Cell Segmentation, Semantic Segmentation
Published	2017-10-23
URL	http://arxiv.org/abs/1710.08149v3
PDF	http://arxiv.org/pdf/1710.08149v3.pdf
PWC	https://paperswithcode.com/paper/image-segmentation-and-classification-for
Repo
Framework

Learning from Noisy Labels with Distillation


Title	Learning from Noisy Labels with Distillation
Authors	Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, Li-Jia Li
Abstract	The ability of learning from noisy labels is very useful in many visual recognition tasks, as a vast amount of data with noisy labels are relatively easy to obtain. Traditionally, the label noises have been treated as statistical outliers, and approaches such as importance re-weighting and bootstrap have been proposed to alleviate the problem. According to our observation, the real-world noisy labels exhibit multi-mode characteristics as the true labels, rather than behaving like independent random outliers. In this work, we propose a unified distillation framework to use side information, including a small clean dataset and label relations in knowledge graph, to “hedge the risk” of learning from noisy labels. Furthermore, unlike the traditional approaches evaluated based on simulated label noises, we propose a suite of new benchmark datasets, in Sports, Species and Artifacts domains, to evaluate the task of learning from noisy labels in the practical setting. The empirical study demonstrates the effectiveness of our proposed method in all the domains.
Tasks
Published	2017-03-07
URL	http://arxiv.org/abs/1703.02391v2
PDF	http://arxiv.org/pdf/1703.02391v2.pdf
PWC	https://paperswithcode.com/paper/learning-from-noisy-labels-with-distillation
Repo
Framework

Convolutional Neural Network on Three Orthogonal Planes for Dynamic Texture Classification


Title	Convolutional Neural Network on Three Orthogonal Planes for Dynamic Texture Classification
Authors	Vincent Andrearczyk, Paul F. Whelan
Abstract	Dynamic Textures (DTs) are sequences of images of moving scenes that exhibit certain stationarity properties in time such as smoke, vegetation and fire. The analysis of DT is important for recognition, segmentation, synthesis or retrieval for a range of applications including surveillance, medical imaging and remote sensing. Deep learning methods have shown impressive results and are now the new state of the art for a wide range of computer vision tasks including image and video recognition and segmentation. In particular, Convolutional Neural Networks (CNNs) have recently proven to be well suited for texture analysis with a design similar to a filter bank approach. In this paper, we develop a new approach to DT analysis based on a CNN method applied on three orthogonal planes x y , xt and y t . We train CNNs on spatial frames and temporal slices extracted from the DT sequences and combine their outputs to obtain a competitive DT classifier. Our results on a wide range of commonly used DT classification benchmark datasets prove the robustness of our approach. Significant improvement of the state of the art is shown on the larger datasets.
Tasks	Texture Classification, Video Recognition
Published	2017-03-16
URL	http://arxiv.org/abs/1703.05530v1
PDF	http://arxiv.org/pdf/1703.05530v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-neural-network-on-three
Repo
Framework

Crowdsourcing Argumentation Structures in Chinese Hotel Reviews


Title	Crowdsourcing Argumentation Structures in Chinese Hotel Reviews
Authors	Mengxue Li, Shiqiang Geng, Yang Gao, Haijing Liu, Hao Wang
Abstract	Argumentation mining aims at automatically extracting the premises-claim discourse structures in natural language texts. There is a great demand for argumentation corpora for customer reviews. However, due to the controversial nature of the argumentation annotation task, there exist very few large-scale argumentation corpora for customer reviews. In this work, we novelly use the crowdsourcing technique to collect argumentation annotations in Chinese hotel reviews. As the first Chinese argumentation dataset, our corpus includes 4814 argument component annotations and 411 argument relation annotations, and its annotations qualities are comparable to some widely used argumentation corpora in other languages.
Tasks
Published	2017-05-05
URL	http://arxiv.org/abs/1705.02077v1
PDF	http://arxiv.org/pdf/1705.02077v1.pdf
PWC	https://paperswithcode.com/paper/crowdsourcing-argumentation-structures-in
Repo
Framework

Hidden Markov Random Field Iterative Closest Point


Title	Hidden Markov Random Field Iterative Closest Point
Authors	John Stechschulte, Christoffer Heckman
Abstract	When registering point clouds resolved from an underlying 2-D pixel structure, such as those resulting from structured light and flash LiDAR sensors, or stereo reconstruction, it is expected that some points in one cloud do not have corresponding points in the other cloud, and that these would occur together, such as along an edge of the depth map. In this work, a hidden Markov random field model is used to capture this prior within the framework of the iterative closest point algorithm. The EM algorithm is used to estimate the distribution parameters and the hidden component memberships. Experiments are presented demonstrating that this method outperforms several other outlier rejection methods when the point clouds have low or moderate overlap.
Tasks
Published	2017-11-07
URL	http://arxiv.org/abs/1711.05864v1
PDF	http://arxiv.org/pdf/1711.05864v1.pdf
PWC	https://paperswithcode.com/paper/hidden-markov-random-field-iterative-closest
Repo
Framework

Full-Network Embedding in a Multimodal Embedding Pipeline


Title	Full-Network Embedding in a Multimodal Embedding Pipeline
Authors	Armand Vilalta, Dario Garcia-Gasulla, Ferran Parés, Eduard Ayguadé, Jesus Labarta, Ulises Cortés, Toyotaro Suzumura
Abstract	The current state-of-the-art for image annotation and image retrieval tasks is obtained through deep neural networks, which combine an image representation and a text representation into a shared embedding space. In this paper we evaluate the impact of using the Full-Network embedding in this setting, replacing the original image representation in a competitive multimodal embedding generation scheme. Unlike the one-layer image embeddings typically used by most approaches, the Full-Network embedding provides a multi-scale representation of images, which results in richer characterizations. To measure the influence of the Full-Network embedding, we evaluate its performance on three different datasets, and compare the results with the original multimodal embedding generation scheme when using a one-layer image embedding, and with the rest of the state-of-the-art. Results for image annotation and image retrieval tasks indicate that the Full-Network embedding is consistently superior to the one-layer embedding. These results motivate the integration of the Full-Network embedding on any multimodal embedding generation scheme, something feasible thanks to the flexibility of the approach.
Tasks	Image Retrieval, Network Embedding
Published	2017-07-24
URL	http://arxiv.org/abs/1707.09872v2
PDF	http://arxiv.org/pdf/1707.09872v2.pdf
PWC	https://paperswithcode.com/paper/full-network-embedding-in-a-multimodal
Repo
Framework

Sequential Dynamic Decision Making with Deep Neural Nets on a Test-Time Budget


Title	Sequential Dynamic Decision Making with Deep Neural Nets on a Test-Time Budget
Authors	Henghui Zhu, Feng Nan, Ioannis Paschalidis, Venkatesh Saligrama
Abstract	Deep neural network (DNN) based approaches hold significant potential for reinforcement learning (RL) and have already shown remarkable gains over state-of-art methods in a number of applications. The effectiveness of DNN methods can be attributed to leveraging the abundance of supervised data to learn value functions, Q-functions, and policy function approximations without the need for feature engineering. Nevertheless, the deployment of DNN-based predictors with very deep architectures can pose an issue due to computational and other resource constraints at test-time in a number of applications. We propose a novel approach for reducing the average latency by learning a computationally efficient gating function that is capable of recognizing states in a sequential decision process for which policy prescriptions of a shallow network suffices and deeper layers of the DNN have little marginal utility. The overall system is adaptive in that it dynamically switches control actions based on state-estimates in order to reduce average latency without sacrificing terminal performance. We experiment with a number of alternative loss-functions to train gating functions and shallow policies and show that in a number of applications a speed-up of up to almost 5X can be obtained with little loss in performance.
Tasks	Decision Making, Feature Engineering
Published	2017-05-31
URL	http://arxiv.org/abs/1705.10924v1
PDF	http://arxiv.org/pdf/1705.10924v1.pdf
PWC	https://paperswithcode.com/paper/sequential-dynamic-decision-making-with-deep
Repo
Framework

Informed Non-convex Robust Principal Component Analysis with Features


Title	Informed Non-convex Robust Principal Component Analysis with Features
Authors	Niannan Xue, Jiankang Deng, Yannis Panagakis, Stefanos Zafeiriou
Abstract	We revisit the problem of robust principal component analysis with features acting as prior side information. To this aim, a novel, elegant, non-convex optimization approach is proposed to decompose a given observation matrix into a low-rank core and the corresponding sparse residual. Rigorous theoretical analysis of the proposed algorithm results in exact recovery guarantees with low computational complexity. Aptly designed synthetic experiments demonstrate that our method is the first to wholly harness the power of non-convexity over convexity in terms of both recoverability and speed. That is, the proposed non-convex approach is more accurate and faster compared to the best available algorithms for the problem under study. Two real-world applications, namely image classification and face denoising further exemplify the practical superiority of the proposed method.
Tasks	Denoising, Image Classification
Published	2017-09-14
URL	http://arxiv.org/abs/1709.04836v1
PDF	http://arxiv.org/pdf/1709.04836v1.pdf
PWC	https://paperswithcode.com/paper/informed-non-convex-robust-principal
Repo
Framework

Weakly- and Self-Supervised Learning for Content-Aware Deep Image Retargeting


Title	Weakly- and Self-Supervised Learning for Content-Aware Deep Image Retargeting
Authors	Donghyeon Cho, Jinsun Park, Tae-Hyun Oh, Yu-Wing Tai, In So Kweon
Abstract	This paper proposes a weakly- and self-supervised deep convolutional neural network (WSSDCNN) for content-aware image retargeting. Our network takes a source image and a target aspect ratio, and then directly outputs a retargeted image. Retargeting is performed through a shift map, which is a pixel-wise mapping from the source to the target grid. Our method implicitly learns an attention map, which leads to a content-aware shift map for image retargeting. As a result, discriminative parts in an image are preserved, while background regions are adjusted seamlessly. In the training phase, pairs of an image and its image-level annotation are used to compute content and structure losses. We demonstrate the effectiveness of our proposed method for a retargeting application with insightful analyses.
Tasks
Published	2017-08-09
URL	http://arxiv.org/abs/1708.02731v1
PDF	http://arxiv.org/pdf/1708.02731v1.pdf
PWC	https://paperswithcode.com/paper/weakly-and-self-supervised-learning-for
Repo
Framework

Dependence Modeling in Ultra High Dimensions with Vine Copulas and the Graphical Lasso


Title	Dependence Modeling in Ultra High Dimensions with Vine Copulas and the Graphical Lasso
Authors	Dominik Müller, Claudia Czado
Abstract	To model high dimensional data, Gaussian methods are widely used since they remain tractable and yield parsimonious models by imposing strong assumptions on the data. Vine copulas are more flexible by combining arbitrary marginal distributions and (conditional) bivariate copulas. Yet, this adaptability is accompanied by sharply increasing computational effort as the dimension increases. The approach proposed in this paper overcomes this burden and makes the first step into ultra high dimensional non-Gaussian dependence modeling by using a divide-and-conquer approach. First, we apply Gaussian methods to split datasets into feasibly small subsets and second, apply parsimonious and flexible vine copulas thereon. Finally, we reconcile them into one joint model. We provide numerical results demonstrating the feasibility of our approach in moderate dimensions and showcase its ability to estimate ultra high dimensional non-Gaussian dependence models in thousands of dimensions.
Tasks
Published	2017-09-15
URL	http://arxiv.org/abs/1709.05119v1
PDF	http://arxiv.org/pdf/1709.05119v1.pdf
PWC	https://paperswithcode.com/paper/dependence-modeling-in-ultra-high-dimensions
Repo
Framework

Ternary Neural Networks with Fine-Grained Quantization


Title	Ternary Neural Networks with Fine-Grained Quantization
Authors	Naveen Mellempudi, Abhisek Kundu, Dheevatsa Mudigere, Dipankar Das, Bharat Kaul, Pradeep Dubey
Abstract	We propose a novel fine-grained quantization (FGQ) method to ternarize pre-trained full precision models, while also constraining activations to 8 and 4-bits. Using this method, we demonstrate a minimal loss in classification accuracy on state-of-the-art topologies without additional training. We provide an improved theoretical formulation that forms the basis for a higher quality solution using FGQ. Our method involves ternarizing the original weight tensor in groups of $N$ weights. Using $N=4$, we achieve Top-1 accuracy within $3.7%$ and $4.2%$ of the baseline full precision result for Resnet-101 and Resnet-50 respectively, while eliminating $75%$ of all multiplications. These results enable a full 8/4-bit inference pipeline, with best-reported accuracy using ternary weights on ImageNet dataset, with a potential of $9\times$ improvement in performance. Also, for smaller networks like AlexNet, FGQ achieves state-of-the-art results. We further study the impact of group size on both performance and accuracy. With a group size of $N=64$, we eliminate $\approx99%$ of the multiplications; however, this introduces a noticeable drop in accuracy, which necessitates fine tuning the parameters at lower precision. We address this by fine-tuning Resnet-50 with 8-bit activations and ternary weights at $N=64$, improving the Top-1 accuracy to within $4%$ of the full precision result with $<30%$ additional training overhead. Our final quantized model can run on a full 8-bit compute pipeline using 2-bit weights and has the potential of up to $15\times$ improvement in performance compared to baseline full-precision models.
Tasks	Quantization
Published	2017-05-02
URL	http://arxiv.org/abs/1705.01462v3
PDF	http://arxiv.org/pdf/1705.01462v3.pdf
PWC	https://paperswithcode.com/paper/ternary-neural-networks-with-fine-grained
Repo
Framework