Paper Group ANR 775
Resolving Referring Expressions in Images With Labeled Elements. Fast Interactive Image Retrieval using large-scale unlabeled data. Deep-Temporal LSTM for Daily Living Action Recognition. Deception Detection by 2D-to-3D Face Reconstruction from Videos. Learning Interpretable Rules for Multi-label Classification. Leveraging human knowledge in tabula …
Resolving Referring Expressions in Images With Labeled Elements
Title | Resolving Referring Expressions in Images With Labeled Elements |
Authors | Nevan Wichers, Dilek Hakkani-Tur, Jindong Chen |
Abstract | Images may have elements containing text and a bounding box associated with them, for example, text identified via optical character recognition on a computer screen image, or a natural image with labeled objects. We present an end-to-end trainable architecture to incorporate the information from these elements and the image to segment/identify the part of the image a natural language expression is referring to. We calculate an embedding for each element and then project it onto the corresponding location (i.e., the associated bounding box) of the image feature map. We show that this architecture gives an improvement in resolving referring expressions, over only using the image, and other methods that incorporate the element information. We demonstrate experimental results on the referring expression datasets based on COCO, and on a webpage image referring expression dataset that we developed. |
Tasks | Optical Character Recognition |
Published | 2018-10-24 |
URL | http://arxiv.org/abs/1810.10165v2 |
http://arxiv.org/pdf/1810.10165v2.pdf | |
PWC | https://paperswithcode.com/paper/resolving-referring-expressions-in-images |
Repo | |
Framework | |
Fast Interactive Image Retrieval using large-scale unlabeled data
Title | Fast Interactive Image Retrieval using large-scale unlabeled data |
Authors | Akshay Mehra, Jihun Hamm, Mikhail Belkin |
Abstract | An interactive image retrieval system learns which images in the database belong to a user’s query concept, by analyzing the example images and feedback provided by the user. The challenge is to retrieve the relevant images with minimal user interaction. In this work, we propose to solve this problem by posing it as a binary classification task of classifying all images in the database as being relevant or irrelevant to the user’s query concept. Our method combines active learning with graph-based semi-supervised learning (GSSL) to tackle this problem. Active learning reduces the number of user interactions by querying the labels of the most informative points and GSSL allows to use abundant unlabeled data along with the limited labeled data provided by the user. To efficiently find the most informative point, we use an uncertainty sampling based method that queries the label of the point nearest to the decision boundary of the classifier. We estimate this decision boundary using our heuristic of adaptive threshold. To utilize huge volumes of unlabeled data we use an efficient approximation based method that reduces the complexity of GSSL from $O(n^3)$ to $O(n)$, making GSSL scalable. We make the classifier robust to the diversity and noisy labels associated with images in large databases by incorporating information from multiple modalities such as visual information extracted from deep learning based models and semantic information extracted from the WordNet. High F1 scores within few relevance feedback rounds in our experiments with concepts defined on AnimalWithAttributes and Imagenet (1.2 million images) datasets indicate the effectiveness and scalability of our approach. |
Tasks | Active Learning, Image Retrieval |
Published | 2018-02-12 |
URL | http://arxiv.org/abs/1802.04204v1 |
http://arxiv.org/pdf/1802.04204v1.pdf | |
PWC | https://paperswithcode.com/paper/fast-interactive-image-retrieval-using-large |
Repo | |
Framework | |
Deep-Temporal LSTM for Daily Living Action Recognition
Title | Deep-Temporal LSTM for Daily Living Action Recognition |
Authors | Srijan Das, Michal Koperski, Francois Bremond, Gianpiero Francesca |
Abstract | In this paper, we propose to improve the traditional use of RNNs by employing a many to many model for video classification. We analyze the importance of modeling spatial layout and temporal encoding for daily living action recognition. Many RGB methods focus only on short term temporal information obtained from optical flow. Skeleton based methods on the other hand show that modeling long term skeleton evolution improves action recognition accuracy. In this work, we propose a deep-temporal LSTM architecture which extends standard LSTM and allows better encoding of temporal information. In addition, we propose to fuse 3D skeleton geometry with deep static appearance. We validate our approach on public available CAD60, MSRDailyActivity3D and NTU-RGB+D, achieving competitive performance as compared to the state-of-the art. |
Tasks | Optical Flow Estimation, Temporal Action Localization, Video Classification |
Published | 2018-02-01 |
URL | http://arxiv.org/abs/1802.00421v2 |
http://arxiv.org/pdf/1802.00421v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-temporal-lstm-for-daily-living-action |
Repo | |
Framework | |
Deception Detection by 2D-to-3D Face Reconstruction from Videos
Title | Deception Detection by 2D-to-3D Face Reconstruction from Videos |
Authors | Minh Ngô, Burak Mandira, Selim Fırat Yılmaz, Ward Heij, Sezer Karaoglu, Henri Bouma, Hamdi Dibeklioglu, Theo Gevers |
Abstract | Lies and deception are common phenomena in society, both in our private and professional lives. However, humans are notoriously bad at accurate deception detection. Based on the literature, human accuracy of distinguishing between lies and truthful statements is 54% on average, in other words it is slightly better than a random guess. While people do not much care about this issue, in high-stakes situations such as interrogations for series crimes and for evaluating the testimonies in court cases, accurate deception detection methods are highly desirable. To achieve a reliable, covert, and non-invasive deception detection, we propose a novel method that jointly extracts reliable low- and high-level facial features namely, 3D facial geometry, skin reflectance, expression, head pose, and scene illumination in a video sequence. Then these features are modeled using a Recurrent Neural Network to learn temporal characteristics of deceptive and honest behavior. We evaluate the proposed method on the Real-Life Trial (RLT) dataset that contains high-stake deceptive and honest videos recorded in courtrooms. Our results show that the proposed method (with an accuracy of 72.8%) improves the state of the art as well as outperforming the use of manually coded facial attributes 67.6%) in deception detection. |
Tasks | 3D Face Reconstruction, Deception Detection, Face Reconstruction |
Published | 2018-12-26 |
URL | http://arxiv.org/abs/1812.10558v1 |
http://arxiv.org/pdf/1812.10558v1.pdf | |
PWC | https://paperswithcode.com/paper/deception-detection-by-2d-to-3d-face |
Repo | |
Framework | |
Learning Interpretable Rules for Multi-label Classification
Title | Learning Interpretable Rules for Multi-label Classification |
Authors | Eneldo Loza Mencía, Johannes Fürnkranz, Eyke Hüllermeier, Michael Rapp |
Abstract | Multi-label classification (MLC) is a supervised learning problem in which, contrary to standard multiclass classification, an instance can be associated with several class labels simultaneously. In this chapter, we advocate a rule-based approach to multi-label classification. Rule learning algorithms are often employed when one is not only interested in accurate predictions, but also requires an interpretable theory that can be understood, analyzed, and qualitatively evaluated by domain experts. Ideally, by revealing patterns and regularities contained in the data, a rule-based theory yields new insights in the application domain. Recently, several authors have started to investigate how rule-based models can be used for modeling multi-label data. Discussing this task in detail, we highlight some of the problems that make rule learning considerably more challenging for MLC than for conventional classification. While mainly focusing on our own previous work, we also provide a short overview of related work in this area. |
Tasks | Multi-Label Classification |
Published | 2018-11-30 |
URL | http://arxiv.org/abs/1812.00050v2 |
http://arxiv.org/pdf/1812.00050v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-interpretable-rules-for-multi-label |
Repo | |
Framework | |
Leveraging human knowledge in tabular reinforcement learning: A study of human subjects
Title | Leveraging human knowledge in tabular reinforcement learning: A study of human subjects |
Authors | Ariel Rosenfeld, Moshe Cohen, Matthew E. Taylor, Sarit Kraus |
Abstract | Reinforcement Learning (RL) can be extremely effective in solving complex, real-world problems. However, injecting human knowledge into an RL agent may require extensive effort and expertise on the human designer’s part. To date, human factors are generally not considered in the development and evaluation of possible RL approaches. In this article, we set out to investigate how different methods for injecting human knowledge are applied, in practice, by human designers of varying levels of knowledge and skill. We perform the first empirical evaluation of several methods, including a newly proposed method named SASS which is based on the notion of similarities in the agent’s state-action space. Through this human study, consisting of 51 human participants, we shed new light on the human factors that play a key role in RL. We find that the classical reward shaping technique seems to be the most natural method for most designers, both expert and non-expert, to speed up RL. However, we further find that our proposed method SASS can be effectively and efficiently combined with reward shaping, and provides a beneficial alternative to using only a single speedup method with minimal human designer effort overhead. |
Tasks | |
Published | 2018-05-15 |
URL | http://arxiv.org/abs/1805.05769v1 |
http://arxiv.org/pdf/1805.05769v1.pdf | |
PWC | https://paperswithcode.com/paper/leveraging-human-knowledge-in-tabular |
Repo | |
Framework | |
Deep learning for determining a near-optimal topological design without any iteration
Title | Deep learning for determining a near-optimal topological design without any iteration |
Authors | Yonggyun Yu, Taeil Hur, Jaeho Jung, In Gwun Jang |
Abstract | In this study, we propose a novel deep learning-based method to predict an optimized structure for a given boundary condition and optimization setting without using any iterative scheme. For this purpose, first, using open-source topology optimization code, datasets of the optimized structures paired with the corresponding information on boundary conditions and optimization settings are generated at low (32 x 32) and high (128 x 128) resolutions. To construct the artificial neural network for the proposed method, a convolutional neural network (CNN)-based encoder and decoder network is trained using the training dataset generated at low resolution. Then, as a two-stage refinement, the conditional generative adversarial network (cGAN) is trained with the optimized structures paired at both low and high resolutions, and is connected to the trained CNN-based encoder and decoder network. The performance evaluation results of the integrated network demonstrate that the proposed method can determine a near-optimal structure in terms of pixel values and compliance with negligible computational time. |
Tasks | |
Published | 2018-01-13 |
URL | http://arxiv.org/abs/1801.05463v3 |
http://arxiv.org/pdf/1801.05463v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-for-determining-a-near-optimal |
Repo | |
Framework | |
MONK – Outlier-Robust Mean Embedding Estimation by Median-of-Means
Title | MONK – Outlier-Robust Mean Embedding Estimation by Median-of-Means |
Authors | Matthieu Lerasle, Zoltan Szabo, Timothee Mathieu, Guillaume Lecue |
Abstract | Mean embeddings provide an extremely flexible and powerful tool in machine learning and statistics to represent probability distributions and define a semi-metric (MMD, maximum mean discrepancy; also called N-distance or energy distance), with numerous successful applications. The representation is constructed as the expectation of the feature map defined by a kernel. As a mean, its classical empirical estimator, however, can be arbitrary severely affected even by a single outlier in case of unbounded features. To the best of our knowledge, unfortunately even the consistency of the existing few techniques trying to alleviate this serious sensitivity bottleneck is unknown. In this paper, we show how the recently emerged principle of median-of-means can be used to design estimators for kernel mean embedding and MMD with excessive resistance properties to outliers, and optimal sub-Gaussian deviation bounds under mild assumptions. |
Tasks | |
Published | 2018-02-13 |
URL | https://arxiv.org/abs/1802.04784v4 |
https://arxiv.org/pdf/1802.04784v4.pdf | |
PWC | https://paperswithcode.com/paper/monk-outlier-robust-mean-embedding-estimation |
Repo | |
Framework | |
A New Channel Boosted Convolutional Neural Network using Transfer Learning
Title | A New Channel Boosted Convolutional Neural Network using Transfer Learning |
Authors | Asifullah Khan, Anabia Sohail, Amna Ali |
Abstract | We present a novel architectural enhancement of Channel Boosting in deep convolutional neural network (CNN). This idea of Channel Boosting exploits both the channel dimension of CNN (learning from multiple input channels) and Transfer learning (TL). TL is utilized at two different stages; channel generation and channel exploitation. In the proposed methodology, a deep CNN is boosted by various channels available through TL from already trained Deep Neural Networks, in addition to its own original channel. The deep architecture of CNN then exploits the original and boosted channels down the stream for learning discriminative patterns. Churn prediction in telecom is a challenging task due to high dimensionality and imbalanced nature of the data and it is therefore used to evaluate the performance of the proposed Channel Boosted CNN (CB CNN). In the first phase, discriminative informative features are being extracted using a staked autoencoder, and then in the second phase, these features are combined with the original features to form Channel Boosted images. Finally, the knowledge gained by a pre trained CNN is exploited by employing TL. The results are promising and show the ability of the Channel Boosting concept in learning complex classification problem by discerning even minute differences in churners and non churners. The proposed work validates the concept observed from the evolution of recent CNN architectures that the innovative restructuring of a CNN architecture may increase the representative capacity of the network. |
Tasks | Transfer Learning |
Published | 2018-04-23 |
URL | http://arxiv.org/abs/1804.08528v4 |
http://arxiv.org/pdf/1804.08528v4.pdf | |
PWC | https://paperswithcode.com/paper/a-new-channel-boosted-convolutional-neural |
Repo | |
Framework | |
Unsupervised Learning of Face Representations
Title | Unsupervised Learning of Face Representations |
Authors | Samyak Datta, Gaurav Sharma, C. V. Jawahar |
Abstract | We present an approach for unsupervised training of CNNs in order to learn discriminative face representations. We mine supervised training data by noting that multiple faces in the same video frame must belong to different persons and the same face tracked across multiple frames must belong to the same person. We obtain millions of face pairs from hundreds of videos without using any manual supervision. Although faces extracted from videos have a lower spatial resolution than those which are available as part of standard supervised face datasets such as LFW and CASIA-WebFace, the former represent a much more realistic setting, e.g. in surveillance scenarios where most of the faces detected are very small. We train our CNNs with the relatively low resolution faces extracted from video frames collected, and achieve a higher verification accuracy on the benchmark LFW dataset cf. hand-crafted features such as LBPs, and even surpasses the performance of state-of-the-art deep networks such as VGG-Face, when they are made to work with low resolution input images. |
Tasks | |
Published | 2018-03-03 |
URL | http://arxiv.org/abs/1803.01260v1 |
http://arxiv.org/pdf/1803.01260v1.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-learning-of-face-representations |
Repo | |
Framework | |
Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks
Title | Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks |
Authors | Ambra Demontis, Marco Melis, Maura Pintor, Matthew Jagielski, Battista Biggio, Alina Oprea, Cristina Nita-Rotaru, Fabio Roli |
Abstract | Transferability captures the ability of an attack against a machine-learning model to be effective against a different, potentially unknown, model. Empirical evidence for transferability has been shown in previous work, but the underlying reasons why an attack transfers or not are not yet well understood. In this paper, we present a comprehensive analysis aimed to investigate the transferability of both test-time evasion and training-time poisoning attacks. We provide a unifying optimization framework for evasion and poisoning attacks, and a formal definition of transferability of such attacks. We highlight two main factors contributing to attack transferability: the intrinsic adversarial vulnerability of the target model, and the complexity of the surrogate model used to optimize the attack. Based on these insights, we define three metrics that impact an attack’s transferability. Interestingly, our results derived from theoretical analysis hold for both evasion and poisoning attacks, and are confirmed experimentally using a wide range of linear and non-linear classifiers and datasets. |
Tasks | |
Published | 2018-09-08 |
URL | https://arxiv.org/abs/1809.02861v4 |
https://arxiv.org/pdf/1809.02861v4.pdf | |
PWC | https://paperswithcode.com/paper/why-do-adversarial-attacks-transfer |
Repo | |
Framework | |
Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning
Title | Rearrangement with Nonprehensile Manipulation Using Deep Reinforcement Learning |
Authors | Weihao Yuan, Johannes A. Stork, Danica Kragic, Michael Y. Wang, Kaiyu Hang |
Abstract | Rearranging objects on a tabletop surface by means of nonprehensile manipulation is a task which requires skillful interaction with the physical world. Usually, this is achieved by precisely modeling physical properties of the objects, robot, and the environment for explicit planning. In contrast, as explicitly modeling the physical environment is not always feasible and involves various uncertainties, we learn a nonprehensile rearrangement strategy with deep reinforcement learning based on only visual feedback. For this, we model the task with rewards and train a deep Q-network. Our potential field-based heuristic exploration strategy reduces the amount of collisions which lead to suboptimal outcomes and we actively balance the training set to avoid bias towards poor examples. Our training process leads to quicker learning and better performance on the task as compared to uniform exploration and standard experience replay. We demonstrate empirical evidence from simulation that our method leads to a success rate of 85%, show that our system can cope with sudden changes of the environment, and compare our performance with human level performance. |
Tasks | |
Published | 2018-03-15 |
URL | http://arxiv.org/abs/1803.05752v1 |
http://arxiv.org/pdf/1803.05752v1.pdf | |
PWC | https://paperswithcode.com/paper/rearrangement-with-nonprehensile-manipulation |
Repo | |
Framework | |
Generating Realistic Training Images Based on Tonality-Alignment Generative Adversarial Networks for Hand Pose Estimation
Title | Generating Realistic Training Images Based on Tonality-Alignment Generative Adversarial Networks for Hand Pose Estimation |
Authors | Liangjian Chen, Shih-Yao Lin, Yusheng Xie, Hui Tang, Yufan Xue, Xiaohui Xie, Yen-Yu Lin, Wei Fan |
Abstract | Hand pose estimation from a monocular RGB image is an important but challenging task. The main factor affecting its performance is the lack of a sufficiently large training dataset with accurate hand-keypoint annotations. In this work, we circumvent this problem by proposing an effective method for generating realistic hand poses and show that state-of-the-art algorithms for hand pose estimation can be greatly improved by utilizing the generated hand poses as training data. Specifically, we first adopt an augmented reality (AR) simulator to synthesize hand poses with accurate hand-keypoint labels. Although the synthetic hand poses come with precise joint labels, eliminating the need of manual annotations, they look unnatural and are not the ideal training data. To produce more realistic hand poses, we propose to blend a synthetic hand pose with a real background, such as arms and sleeves. To this end, we develop tonality-alignment generative adversarial networks (TAGANs), which align the tonality and color distributions between synthetic hand poses and real backgrounds, and can generate high quality hand poses. We evaluate TAGAN on three benchmarks, including the RHP, STB, and CMU-PS hand pose datasets. With the aid of the synthesized poses, our method performs favorably against the state-of-the-arts in both $2$D and $3$D hand pose estimations. |
Tasks | Hand Pose Estimation, Pose Estimation |
Published | 2018-11-25 |
URL | http://arxiv.org/abs/1811.09916v3 |
http://arxiv.org/pdf/1811.09916v3.pdf | |
PWC | https://paperswithcode.com/paper/generating-realistic-training-images-based-on |
Repo | |
Framework | |
Crowd ideation of supervised learning problems
Title | Crowd ideation of supervised learning problems |
Authors | James P. Bagrow |
Abstract | Crowdsourcing is an important avenue for collecting machine learning data, but crowdsourcing can go beyond simple data collection by employing the creativity and wisdom of crowd workers. Yet crowd participants are unlikely to be experts in statistics or predictive modeling, and it is not clear how well non-experts can contribute creatively to the process of machine learning. Here we study an end-to-end crowdsourcing algorithm where groups of non-expert workers propose supervised learning problems, rank and categorize those problems, and then provide data to train predictive models on those problems. Problem proposal includes and extends feature engineering because workers propose the entire problem, not only the input features but also the target variable. We show that workers without machine learning experience can collectively construct useful datasets and that predictive models can be learned on these datasets. In our experiments, the problems proposed by workers covered a broad range of topics, from politics and current events to problems capturing health behavior, demographics, and more. Workers also favored questions showing positively correlated relationships, which has interesting implications given many supervised learning methods perform as well with strong negative correlations. Proper instructions are crucial for non-experts, so we also conducted a randomized trial to understand how different instructions may influence the types of problems proposed by workers. In general, shifting the focus of machine learning tasks from designing and training individual predictive models to problem proposal allows crowdsourcers to design requirements for problems of interest and then guide workers towards contributing to the most suitable problems. |
Tasks | Feature Engineering |
Published | 2018-02-14 |
URL | http://arxiv.org/abs/1802.05101v1 |
http://arxiv.org/pdf/1802.05101v1.pdf | |
PWC | https://paperswithcode.com/paper/crowd-ideation-of-supervised-learning |
Repo | |
Framework | |
Exp-Concavity of Proper Composite Losses
Title | Exp-Concavity of Proper Composite Losses |
Authors | Parameswaran Kamalaruban, Robert C. Williamson, Xinhua Zhang |
Abstract | The goal of online prediction with expert advice is to find a decision strategy which will perform almost as well as the best expert in a given pool of experts, on any sequence of outcomes. This problem has been widely studied and $O(\sqrt{T})$ and $O(\log{T})$ regret bounds can be achieved for convex losses (\cite{zinkevich2003online}) and strictly convex losses with bounded first and second derivatives (\cite{hazan2007logarithmic}) respectively. In special cases like the Aggregating Algorithm (\cite{vovk1995game}) with mixable losses and the Weighted Average Algorithm (\cite{kivinen1999averaging}) with exp-concave losses, it is possible to achieve $O(1)$ regret bounds. \cite{van2012exp} has argued that mixability and exp-concavity are roughly equivalent under certain conditions. Thus by understanding the underlying relationship between these two notions we can gain the best of both algorithms (strong theoretical performance guarantees of the Aggregating Algorithm and the computational efficiency of the Weighted Average Algorithm). In this paper we provide a complete characterization of the exp-concavity of any proper composite loss. Using this characterization and the mixability condition of proper losses (\cite{van2012mixability}), we show that it is possible to transform (re-parameterize) any $\beta$-mixable binary proper loss into a $\beta$-exp-concave composite loss with the same $\beta$. In the multi-class case, we propose an approximation approach for this transformation. |
Tasks | |
Published | 2018-05-20 |
URL | http://arxiv.org/abs/1805.07737v1 |
http://arxiv.org/pdf/1805.07737v1.pdf | |
PWC | https://paperswithcode.com/paper/exp-concavity-of-proper-composite-losses |
Repo | |
Framework | |