Paper Group ANR 614
Predicting Good Configurations for GitHub and Stack Overflow Topic Models. ID Preserving Generative Adversarial Network for Partial Latent Fingerprint Reconstruction. Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data. Large Scale Constrained Linear Regression Revisited: Faster Algorithms via Preconditioni …
Predicting Good Configurations for GitHub and Stack Overflow Topic Models
Title | Predicting Good Configurations for GitHub and Stack Overflow Topic Models |
Authors | Christoph Treude, Markus Wagner |
Abstract | Software repositories contain large amounts of textual data, ranging from source code comments and issue descriptions to questions, answers, and comments on Stack Overflow. To make sense of this textual data, topic modelling is frequently used as a text-mining tool for the discovery of hidden semantic structures in text bodies. Latent Dirichlet allocation (LDA) is a commonly used topic model that aims to explain the structure of a corpus by grouping texts. LDA requires multiple parameters to work well, and there are only rough and sometimes conflicting guidelines available on how these parameters should be set. In this paper, we contribute (i) a broad study of parameters to arrive at good local optima for GitHub and Stack Overflow text corpora, (ii) an a-posteriori characterisation of text corpora related to eight programming languages, and (iii) an analysis of corpus feature importance via per-corpus LDA configuration. We find that (1) popular rules of thumb for topic modelling parameter configuration are not applicable to the corpora used in our experiments, (2) corpora sampled from GitHub and Stack Overflow have different characteristics and require different configurations to achieve good model fit, and (3) we can predict good configurations for unseen corpora reliably. These findings support researchers and practitioners in efficiently determining suitable configurations for topic modelling when analysing textual data contained in software repositories. |
Tasks | Feature Importance, Topic Models |
Published | 2018-04-13 |
URL | http://arxiv.org/abs/1804.04749v3 |
http://arxiv.org/pdf/1804.04749v3.pdf | |
PWC | https://paperswithcode.com/paper/per-corpus-configuration-of-topic-modelling |
Repo | |
Framework | |
ID Preserving Generative Adversarial Network for Partial Latent Fingerprint Reconstruction
Title | ID Preserving Generative Adversarial Network for Partial Latent Fingerprint Reconstruction |
Authors | Ali Dabouei, Sobhan Soleymani, Hadi Kazemi, Seyed Mehdi Iranmanesh, Jeremy Dawson, Nasser M. Nasrabadi |
Abstract | Performing recognition tasks using latent fingerprint samples is often challenging for automated identification systems due to poor quality, distortion, and partially missing information from the input samples. We propose a direct latent fingerprint reconstruction model based on conditional generative adversarial networks (cGANs). Two modifications are applied to the cGAN to adapt it for the task of latent fingerprint reconstruction. First, the model is forced to generate three additional maps to the ridge map to ensure that the orientation and frequency information is considered in the generation process, and prevent the model from filling large missing areas and generating erroneous minutiae. Second, a perceptual ID preservation approach is developed to force the generator to preserve the ID information during the reconstruction process. Using a synthetically generated database of latent fingerprints, the deep network learns to predict missing information from the input latent samples. We evaluate the proposed method in combination with two different fingerprint matching algorithms on several publicly available latent fingerprint datasets. We achieved the rank-10 accuracy of 88.02% on the IIIT-Delhi latent fingerprint database for the task of latent-to-latent matching and rank-50 accuracy of 70.89% on the IIIT-Delhi MOLF database for the task of latent-to-sensor matching. Experimental results of matching reconstructed samples in both latent-to-sensor and latent-to-latent frameworks indicate that the proposed method significantly increases the matching accuracy of the fingerprint recognition systems for the latent samples. |
Tasks | |
Published | 2018-07-31 |
URL | http://arxiv.org/abs/1808.00035v1 |
http://arxiv.org/pdf/1808.00035v1.pdf | |
PWC | https://paperswithcode.com/paper/id-preserving-generative-adversarial-network |
Repo | |
Framework | |
Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data
Title | Convolutional Attention Networks for Multimodal Emotion Recognition from Speech and Text Data |
Authors | Chan Woo Lee, Kyu Ye Song, Jihoon Jeong, Woo Yong Choi |
Abstract | Emotion recognition has become a popular topic of interest, especially in the field of human computer interaction. Previous works involve unimodal analysis of emotion, while recent efforts focus on multi-modal emotion recognition from vision and speech. In this paper, we propose a new method of learning about the hidden representations between just speech and text data using convolutional attention networks. Compared to the shallow model which employs simple concatenation of feature vectors, the proposed attention model performs much better in classifying emotion from speech and text data contained in the CMU-MOSEI dataset. |
Tasks | Emotion Recognition, Multimodal Emotion Recognition |
Published | 2018-05-17 |
URL | http://arxiv.org/abs/1805.06606v2 |
http://arxiv.org/pdf/1805.06606v2.pdf | |
PWC | https://paperswithcode.com/paper/convolutional-attention-networks-for |
Repo | |
Framework | |
Large Scale Constrained Linear Regression Revisited: Faster Algorithms via Preconditioning
Title | Large Scale Constrained Linear Regression Revisited: Faster Algorithms via Preconditioning |
Authors | Di Wang, Jinhui Xu |
Abstract | In this paper, we revisit the large-scale constrained linear regression problem and propose faster methods based on some recent developments in sketching and optimization. Our algorithms combine (accelerated) mini-batch SGD with a new method called two-step preconditioning to achieve an approximate solution with a time complexity lower than that of the state-of-the-art techniques for the low precision case. Our idea can also be extended to the high precision case, which gives an alternative implementation to the Iterative Hessian Sketch (IHS) method with significantly improved time complexity. Experiments on benchmark and synthetic datasets suggest that our methods indeed outperform existing ones considerably in both the low and high precision cases. |
Tasks | |
Published | 2018-02-09 |
URL | http://arxiv.org/abs/1802.03337v1 |
http://arxiv.org/pdf/1802.03337v1.pdf | |
PWC | https://paperswithcode.com/paper/large-scale-constrained-linear-regression |
Repo | |
Framework | |
Efficient Single-Shot Multibox Detector for Construction Site Monitoring
Title | Efficient Single-Shot Multibox Detector for Construction Site Monitoring |
Authors | Viral Thakar, Himani Saini, Walid Ahmed, Mohammad M Soltani, Ahmed Aly, Jia Yuan Yu |
Abstract | Asset monitoring in construction sites is an intricate, manually intensive task, that can highly benefit from automated solutions engineered using deep neural networks. We use Single-Shot Multibox Detector — SSD, for its fine balance between speed and accuracy, to leverage ubiquitously available images and videos from the surveillance cameras on the construction sites and automate the monitoring tasks, hence enabling project managers to better track the performance and optimize the utilization of each resource. We propose to improve the performance of SSD by clustering the predicted boxes instead of a greedy approach like non-maximum suppression. We do so using Affinity Propagation Clustering — APC to cluster the predicted boxes based on the similarity index computed using the spatial features as well as location of predicted boxes. In our attempts, we have been able to improve the mean average precision of SSD by 3.77% on custom dataset consist of images from construction sites and by 1.67% on PASCAL VOC Challenge. |
Tasks | |
Published | 2018-08-17 |
URL | http://arxiv.org/abs/1808.05730v2 |
http://arxiv.org/pdf/1808.05730v2.pdf | |
PWC | https://paperswithcode.com/paper/efficient-single-shot-multibox-detector-for |
Repo | |
Framework | |
Topological Map Extraction from Overhead Images
Title | Topological Map Extraction from Overhead Images |
Authors | Zuoyue Li, Jan Dirk Wegner, Aurélien Lucchi |
Abstract | We propose a new approach, named PolyMapper, to circumvent the conventional pixel-wise segmentation of (aerial) images and predict objects in a vector representation directly. PolyMapper directly extracts the topological map of a city from overhead images as collections of building footprints and road networks. In order to unify the shape representation for different types of objects, we also propose a novel sequentialization method that reformulates a graph structure as closed polygons. Experiments are conducted on both existing and self-collected large-scale datasets of several cities. Our empirical results demonstrate that our end-to-end learnable model is capable of drawing polygons of building footprints and road networks that very closely approximate the structure of existing online map services, in a fully automated manner. Quantitative and qualitative comparison to the state-of-the-art also shows that our approach achieves good levels of performance. To the best of our knowledge, the automatic extraction of large-scale topological maps is a novel contribution in the remote sensing community that we believe will help develop models with more informed geometrical constraints. |
Tasks | Semantic Segmentation |
Published | 2018-12-04 |
URL | https://arxiv.org/abs/1812.01497v3 |
https://arxiv.org/pdf/1812.01497v3.pdf | |
PWC | https://paperswithcode.com/paper/polymapper-extracting-city-maps-using |
Repo | |
Framework | |
Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges
Title | Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges |
Authors | Gabrielle Ras, Marcel van Gerven, Pim Haselager |
Abstract | Issues regarding explainable AI involve four components: users, laws & regulations, explanations and algorithms. Together these components provide a context in which explanation methods can be evaluated regarding their adequacy. The goal of this chapter is to bridge the gap between expert users and lay users. Different kinds of users are identified and their concerns revealed, relevant statements from the General Data Protection Regulation are analyzed in the context of Deep Neural Networks (DNNs), a taxonomy for the classification of existing explanation methods is introduced, and finally, the various classes of explanation methods are analyzed to verify if user concerns are justified. Overall, it is clear that (visual) explanations can be given about various aspects of the influence of the input on the output. However, it is noted that explanation methods or interfaces for lay users are missing and we speculate which criteria these methods / interfaces should satisfy. Finally it is noted that two important concerns are difficult to address with explanation methods: the concern about bias in datasets that leads to biased DNNs, as well as the suspicion about unfair outcomes. |
Tasks | |
Published | 2018-03-20 |
URL | http://arxiv.org/abs/1803.07517v2 |
http://arxiv.org/pdf/1803.07517v2.pdf | |
PWC | https://paperswithcode.com/paper/explanation-methods-in-deep-learning-users |
Repo | |
Framework | |
Faster Shift-Reduce Constituent Parsing with a Non-Binary, Bottom-Up Strategy
Title | Faster Shift-Reduce Constituent Parsing with a Non-Binary, Bottom-Up Strategy |
Authors | Daniel Fernández-González, Carlos Gómez-Rodríguez |
Abstract | An increasingly wide range of artificial intelligence applications rely on syntactic information to process and extract meaning from natural language text or speech, with constituent trees being one of the most widely used syntactic formalisms. To produce these phrase-structure representations from sentences in natural language, shift-reduce constituent parsers have become one of the most efficient approaches. Increasing their accuracy and speed is still one of the main objectives pursued by the research community so that artificial intelligence applications that make use of parsing outputs, such as machine translation or voice assistant services, can improve their performance. With this goal in mind, we propose in this article a novel non-binary shift-reduce algorithm for constituent parsing. Our parser follows a classical bottom-up strategy but, unlike others, it straightforwardly creates non-binary branchings with just one Reduce transition, instead of requiring prior binarization or a sequence of binary transitions, allowing its direct application to any language without the need of further resources such as percolation tables. As a result, it uses fewer transitions per sentence than existing transition-based constituent parsers, becoming the fastest such system and, as a consequence, speeding up downstream applications. Using static oracle training and greedy search, the accuracy of this novel approach is on par with state-of-the-art transition-based constituent parsers and outperforms all top-down and bottom-up greedy shift-reduce systems on the Wall Street Journal section from the English Penn Treebank and the Penn Chinese Treebank. Additionally, we develop a dynamic oracle for training the proposed transition-based algorithm, achieving further improvements in both benchmarks and obtaining the best accuracy to date on the Penn Chinese Treebank among greedy shift-reduce parsers. |
Tasks | Machine Translation |
Published | 2018-04-21 |
URL | https://arxiv.org/abs/1804.07961v3 |
https://arxiv.org/pdf/1804.07961v3.pdf | |
PWC | https://paperswithcode.com/paper/faster-shift-reduce-constituent-parsing-with |
Repo | |
Framework | |
Classification of Findings with Localized Lesions in Fundoscopic Images using a Regionally Guided CNN
Title | Classification of Findings with Localized Lesions in Fundoscopic Images using a Regionally Guided CNN |
Authors | Jaemin Son, Woong Bae, Sangkeun Kim, Sang Jun Park, Kyu-Hwan Jung |
Abstract | Fundoscopic images are often investigated by ophthalmologists to spot abnormal lesions to make diagnoses. Recent successes of convolutional neural networks are confined to diagnoses of few diseases without proper localization of lesion. In this paper, we propose an efficient annotation method for localizing lesions and a CNN architecture that can classify an individual finding and localize the lesions at the same time. Also, we introduce a new loss function to guide the network to learn meaningful patterns with the guidance of the regional annotations. In experiments, we demonstrate that our network performed better than the widely used network and the guidance loss helps achieve higher AUROC up to 4.1% and superior localization capability. |
Tasks | |
Published | 2018-11-02 |
URL | http://arxiv.org/abs/1811.00871v1 |
http://arxiv.org/pdf/1811.00871v1.pdf | |
PWC | https://paperswithcode.com/paper/classification-of-findings-with-localized |
Repo | |
Framework | |
Traits & Transferability of Adversarial Examples against Instance Segmentation & Object Detection
Title | Traits & Transferability of Adversarial Examples against Instance Segmentation & Object Detection |
Authors | Raghav Gurbaxani, Shivank Mishra |
Abstract | Despite the recent advancements in deploying neural networks for image classification, it has been found that adversarial examples are able to fool these models leading them to misclassify the images. Since these models are now being widely deployed, we provide an insight on the threat of these adversarial examples by evaluating their characteristics and transferability to more complex models that utilize Image Classification as a subtask. We demonstrate the ineffectiveness of adversarial examples when applied to Instance Segmentation & Object Detection models. We show that this ineffectiveness arises from the inability of adversarial examples to withstand transformations such as scaling or a change in lighting conditions. Moreover, we show that there exists a small threshold below which the adversarial property is retained while applying these input transformations. Additionally, these attacks demonstrate weak cross-network transferability across neural network architectures, e.g. VGG16 and ResNet50, however, the attack may fool both the networks if passed sequentially through networks during its formation. The lack of scalability and transferability challenges the question of how adversarial images would be effective in the real world. |
Tasks | Image Classification, Instance Segmentation, Object Detection, Semantic Segmentation |
Published | 2018-08-04 |
URL | http://arxiv.org/abs/1808.01452v1 |
http://arxiv.org/pdf/1808.01452v1.pdf | |
PWC | https://paperswithcode.com/paper/traits-transferability-of-adversarial |
Repo | |
Framework | |
Semantic Channel and Shannon’s Channel Mutually Match for Multi-Label Classification
Title | Semantic Channel and Shannon’s Channel Mutually Match for Multi-Label Classification |
Authors | Chenguang Lu |
Abstract | A group of transition probability functions form a Shannon’s channel whereas a group of truth functions form a semantic channel. Label learning is to let semantic channels match Shannon’s channels and label selection is to let Shannon’s channels match semantic channels. The Channel Matching (CM) algorithm is provided for multi-label classification. This algorithm adheres to maximum semantic information criterion which is compatible with maximum likelihood criterion and regularized least squares criterion. If samples are very large, we can directly convert Shannon’s channels into semantic channels by the third kind of Bayes’ theorem; otherwise, we can train truth functions with parameters by sampling distributions. A label may be a Boolean function of some atomic labels. For simplifying learning, we may only obtain the truth functions of some atomic label. For a given label, instances are divided into three kinds (positive, negative, and unclear) instead of two kinds as in popular studies so that the problem with binary relevance is avoided. For each instance, the classifier selects a compound label with most semantic information or richest connotation. As a predictive model, the semantic channel does not change with the prior probability distribution (source) of instances. It still works when the source is changed. The classifier changes with the source, and hence can overcome class-imbalance problem. It is shown that the old population’s increasing will change the classifier for label “Old” and has been impelling the semantic evolution of “Old”. The CM iteration algorithm for unseen instance classification is introduced. |
Tasks | Multi-Label Classification |
Published | 2018-05-02 |
URL | http://arxiv.org/abs/1805.01288v1 |
http://arxiv.org/pdf/1805.01288v1.pdf | |
PWC | https://paperswithcode.com/paper/semantic-channel-and-shannons-channel |
Repo | |
Framework | |
KF-LAX: Kronecker-factored curvature estimation for control variate optimization in reinforcement learning
Title | KF-LAX: Kronecker-factored curvature estimation for control variate optimization in reinforcement learning |
Authors | Mohammad Firouzi |
Abstract | A key challenge for gradient based optimization methods in model-free reinforcement learning is to develop an approach that is sample efficient and has low variance. In this work, we apply Kronecker-factored curvature estimation technique (KFAC) to a recently proposed gradient estimator for control variate optimization, RELAX, to increase the sample efficiency of using this gradient estimation method in reinforcement learning. The performance of the proposed method is demonstrated on a synthetic problem and a set of three discrete control task Atari games. |
Tasks | Atari Games |
Published | 2018-12-11 |
URL | http://arxiv.org/abs/1812.04181v1 |
http://arxiv.org/pdf/1812.04181v1.pdf | |
PWC | https://paperswithcode.com/paper/kf-lax-kronecker-factored-curvature |
Repo | |
Framework | |
SparseNet: A Sparse DenseNet for Image Classification
Title | SparseNet: A Sparse DenseNet for Image Classification |
Authors | Wenqi Liu, Kun Zeng |
Abstract | Deep neural networks have made remarkable progresses on various computer vision tasks. Recent works have shown that depth, width and shortcut connections of networks are all vital to their performances. In this paper, we introduce a method to sparsify DenseNet which can reduce connections of a L-layer DenseNet from O(L^2) to O(L), and thus we can simultaneously increase depth, width and connections of neural networks in a more parameter-efficient and computation-efficient way. Moreover, an attention module is introduced to further boost our network’s performance. We denote our network as SparseNet. We evaluate SparseNet on datasets of CIFAR(including CIFAR10 and CIFAR100) and SVHN. Experiments show that SparseNet can obtain improvements over the state-of-the-art on CIFAR10 and SVHN. Furthermore, while achieving comparable performances as DenseNet on these datasets, SparseNet is x2.6 smaller and x3.7 faster than the original DenseNet. |
Tasks | Image Classification |
Published | 2018-04-15 |
URL | http://arxiv.org/abs/1804.05340v1 |
http://arxiv.org/pdf/1804.05340v1.pdf | |
PWC | https://paperswithcode.com/paper/sparsenet-a-sparse-densenet-for-image |
Repo | |
Framework | |
Learning structure-from-motion from motion
Title | Learning structure-from-motion from motion |
Authors | Clément Pinard, Laure Chevalley, Antoine Manzanera, David Filliat |
Abstract | This work is based on a questioning of the quality metrics used by deep neural networks performing depth prediction from a single image, and then of the usability of recently published works on unsupervised learning of depth from videos. To overcome their limitations, we propose to learn in the same unsupervised manner a depth map inference system from monocular videos that takes a pair of images as input. This algorithm actually learns structure-from-motion from motion, and not only structure from context appearance. The scale factor issue is explicitly treated, and the absolute depth map can be estimated from camera displacement magnitude, which can be easily measured from cheap external sensors. Our solution is also much more robust with respect to domain variation and adaptation via fine tuning, because it does not rely entirely in depth from context. Two use cases are considered, unstabilized moving camera videos, and stabilized ones. This choice is motivated by the UAV (for Unmanned Aerial Vehicle) use case that generally provides reliable orientation measurement. We provide a set of experiments showing that, used in real conditions where only speed can be known, our network outperforms competitors for most depth quality measures. Results are given on the well known KITTI dataset, which provides robust stabilization for our second use case, but also contains moving scenes which are very typical of the in-car road context. We then present results on a synthetic dataset that we believe to be more representative of typical UAV scenes. Lastly, we present two domain adaptation use cases showing superior robustness of our method compared to single view depth algorithms, which indicates that it is better suited for highly variable visual contexts. |
Tasks | Depth Estimation, Domain Adaptation |
Published | 2018-09-12 |
URL | http://arxiv.org/abs/1809.04471v2 |
http://arxiv.org/pdf/1809.04471v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-structure-from-motion-from-motion |
Repo | |
Framework | |
COBRAS: Fast, Iterative, Active Clustering with Pairwise Constraints
Title | COBRAS: Fast, Iterative, Active Clustering with Pairwise Constraints |
Authors | Toon Van Craenendonck, Sebastijan Dumančić, Elia Van Wolputte, Hendrik Blockeel |
Abstract | Constraint-based clustering algorithms exploit background knowledge to construct clusterings that are aligned with the interests of a particular user. This background knowledge is often obtained by allowing the clustering system to pose pairwise queries to the user: should these two elements be in the same cluster or not? Active clustering methods aim to minimize the number of queries needed to obtain a good clustering by querying the most informative pairs first. Ideally, a user should be able to answer a couple of these queries, inspect the resulting clustering, and repeat these two steps until a satisfactory result is obtained. We present COBRAS, an approach to active clustering with pairwise constraints that is suited for such an interactive clustering process. A core concept in COBRAS is that of a super-instance: a local region in the data in which all instances are assumed to belong to the same cluster. COBRAS constructs such super-instances in a top-down manner to produce high-quality results early on in the clustering process, and keeps refining these super-instances as more pairwise queries are given to get more detailed clusterings later on. We experimentally demonstrate that COBRAS produces good clusterings at fast run times, making it an excellent candidate for the iterative clustering scenario outlined above. |
Tasks | |
Published | 2018-03-29 |
URL | http://arxiv.org/abs/1803.11060v1 |
http://arxiv.org/pdf/1803.11060v1.pdf | |
PWC | https://paperswithcode.com/paper/cobras-fast-iterative-active-clustering-with |
Repo | |
Framework | |