Paper Group ANR 15
Deceiving Google’s Cloud Video Intelligence API Built for Summarizing Videos. On Estimation of Conditional Modes Using Multiple Quantile Regressions. Embedded Spectral Descriptors: Learning the point-wise correspondence metric via Siamese neural networks. MIT at SemEval-2017 Task 10: Relation Extraction with Convolutional Neural Networks. CNN as Gu …
Deceiving Google’s Cloud Video Intelligence API Built for Summarizing Videos
Title | Deceiving Google’s Cloud Video Intelligence API Built for Summarizing Videos |
Authors | Hossein Hosseini, Baicen Xiao, Radha Poovendran |
Abstract | Despite the rapid progress of the techniques for image classification, video annotation has remained a challenging task. Automated video annotation would be a breakthrough technology, enabling users to search within the videos. Recently, Google introduced the Cloud Video Intelligence API for video analysis. As per the website, the system can be used to “separate signal from noise, by retrieving relevant information at the video, shot or per frame” level. A demonstration website has been also launched, which allows anyone to select a video for annotation. The API then detects the video labels (objects within the video) as well as shot labels (description of the video events over time). In this paper, we examine the usability of the Google’s Cloud Video Intelligence API in adversarial environments. In particular, we investigate whether an adversary can subtly manipulate a video in such a way that the API will return only the adversary-desired labels. For this, we select an image, which is different from the video content, and insert it, periodically and at a very low rate, into the video. We found that if we insert one image every two seconds, the API is deceived into annotating the video as if it only contained the inserted image. Note that the modification to the video is hardly noticeable as, for instance, for a typical frame rate of 25, we insert only one image per 50 video frames. We also found that, by inserting one image per second, all the shot labels returned by the API are related to the inserted image. We perform the experiments on the sample videos provided by the API demonstration website and show that our attack is successful with different videos and images. |
Tasks | Image Classification |
Published | 2017-03-26 |
URL | http://arxiv.org/abs/1703.09793v2 |
http://arxiv.org/pdf/1703.09793v2.pdf | |
PWC | https://paperswithcode.com/paper/deceiving-googles-cloud-video-intelligence |
Repo | |
Framework | |
On Estimation of Conditional Modes Using Multiple Quantile Regressions
Title | On Estimation of Conditional Modes Using Multiple Quantile Regressions |
Authors | Hirofumi Ohta, Satoshi Hara |
Abstract | We propose an estimation method for the conditional mode when the conditioning variable is high-dimensional. In the proposed method, we first estimate the conditional density by solving quantile regressions multiple times. We then estimate the conditional mode by finding the maximum of the estimated conditional density. The proposed method has two advantages in that it is computationally stable because it has no initial parameter dependencies, and it is statistically efficient with a fast convergence rate. Synthetic and real-world data experiments demonstrate the better performance of the proposed method compared to other existing ones. |
Tasks | |
Published | 2017-12-23 |
URL | http://arxiv.org/abs/1712.08754v1 |
http://arxiv.org/pdf/1712.08754v1.pdf | |
PWC | https://paperswithcode.com/paper/on-estimation-of-conditional-modes-using |
Repo | |
Framework | |
Embedded Spectral Descriptors: Learning the point-wise correspondence metric via Siamese neural networks
Title | Embedded Spectral Descriptors: Learning the point-wise correspondence metric via Siamese neural networks |
Authors | Zhiyu Sun, Yusen He, Andrey Gritsenko, Amaury Lendasse, Stephen Baek |
Abstract | A robust and informative local shape descriptor plays an important role in mesh registration. In this regard, spectral descriptors that are based on the spectrum of the Laplace-Beltrami operator have been a popular subject of research for the last decade due to their advantageous properties, such as isometry invariance. Despite such, however, spectral descriptors often fail to give a correct similarity measure for non-isometric cases where the metric distortion between the models is large. Hence, they are not reliable for correspondence matching problems when the models are not isometric. In this paper, it is proposed a method to improve the similarity metric of spectral descriptors for correspondence matching problems. We embed a spectral shape descriptor into a different metric space where the Euclidean distance between the elements directly indicates the geometric dissimilarity. We design and train a Siamese neural network to find such an embedding, where the embedded descriptors are promoted to rearrange based on the geometric similarity. We demonstrate our approach can significantly enhance the performance of the conventional spectral descriptors by the simple augmentation achieved via the Siamese neural network in comparison to other state-of-the-art methods. |
Tasks | |
Published | 2017-10-17 |
URL | https://arxiv.org/abs/1710.06368v3 |
https://arxiv.org/pdf/1710.06368v3.pdf | |
PWC | https://paperswithcode.com/paper/deep-spectral-descriptors-learning-the-point |
Repo | |
Framework | |
MIT at SemEval-2017 Task 10: Relation Extraction with Convolutional Neural Networks
Title | MIT at SemEval-2017 Task 10: Relation Extraction with Convolutional Neural Networks |
Authors | Ji Young Lee, Franck Dernoncourt, Peter Szolovits |
Abstract | Over 50 million scholarly articles have been published: they constitute a unique repository of knowledge. In particular, one may infer from them relations between scientific concepts, such as synonyms and hyponyms. Artificial neural networks have been recently explored for relation extraction. In this work, we continue this line of work and present a system based on a convolutional neural network to extract relations. Our model ranked first in the SemEval-2017 task 10 (ScienceIE) for relation extraction in scientific articles (subtask C). |
Tasks | Relation Extraction |
Published | 2017-04-05 |
URL | http://arxiv.org/abs/1704.01523v1 |
http://arxiv.org/pdf/1704.01523v1.pdf | |
PWC | https://paperswithcode.com/paper/mit-at-semeval-2017-task-10-relation |
Repo | |
Framework | |
CNN as Guided Multi-layer RECOS Transform
Title | CNN as Guided Multi-layer RECOS Transform |
Authors | C. -C. Jay Kuo |
Abstract | There is a resurging interest in developing a neural-network-based solution to the supervised machine learning problem. The convolutional neural network (CNN) will be studied in this note. To begin with, we introduce a RECOS transform as a basic building block of CNNs. The “RECOS” is an acronym for “REctified-COrrelations on a Sphere”. It consists of two main concepts: 1) data clustering on a sphere and 2) rectification. Afterwards, we interpret a CNN as a network that implements the guided multi-layer RECOS transform with three highlights. First, we compare the traditional single-layer and modern multi-layer signal analysis approaches, point out key ingredients that enable the multi-layer approach, and provide a full explanation to the operating principle of CNNs. Second, we discuss how guidance is provided by labels through backpropagation (BP) in the training. Third, we show that a trained network can be greatly simplified in the testing stage demanding only one-bit representation for both filter weights and inputs. |
Tasks | |
Published | 2017-01-30 |
URL | http://arxiv.org/abs/1701.08481v3 |
http://arxiv.org/pdf/1701.08481v3.pdf | |
PWC | https://paperswithcode.com/paper/cnn-as-guided-multi-layer-recos-transform |
Repo | |
Framework | |
Recurrent neural networks based Indic word-wise script identification using character-wise training
Title | Recurrent neural networks based Indic word-wise script identification using character-wise training |
Authors | Rohun Tripathi, Aman Gill, Riccha Tripati |
Abstract | This paper presents a novel methodology of Indic handwritten script recognition using Recurrent Neural Networks and addresses the problem of script recognition in poor data scenarios, such as when only character level online data is available. It is based on the hypothesis that curves of online character data comprise sufficient information for prediction at the word level. Online character data is used to train RNNs using BLSTM architecture which are then used to make predictions of online word level data. These prediction results on the test set are at par with prediction results of models trained with online word data, while the training of the character level model is much less data intensive and takes much less time. Performance for binary-script models and then 5 Indic script models are reported, along with comparison with HMM models.The system is extended for offline data prediction. Raw offline data lacks the temporal information available in online data and required for prediction using models trained with online data. To overcome this, stroke recovery is implemented and the strokes are utilized for predicting using the online character level models. The performance on character and word level offline data is reported. |
Tasks | |
Published | 2017-09-11 |
URL | http://arxiv.org/abs/1709.03209v2 |
http://arxiv.org/pdf/1709.03209v2.pdf | |
PWC | https://paperswithcode.com/paper/recurrent-neural-networks-based-indic-word |
Repo | |
Framework | |
Structure-Preserving Image Super-resolution via Contextualized Multi-task Learning
Title | Structure-Preserving Image Super-resolution via Contextualized Multi-task Learning |
Authors | Yukai Shi, Keze Wang, Chongyu Chen, Li Xu, Liang Lin |
Abstract | Single image super resolution (SR), which refers to reconstruct a higher-resolution (HR) image from the observed low-resolution (LR) image, has received substantial attention due to its tremendous application potentials. Despite the breakthroughs of recently proposed SR methods using convolutional neural networks (CNNs), their generated results usually lack of preserving structural (high-frequency) details. In this paper, regarding global boundary context and residual context as complimentary information for enhancing structural details in image restoration, we develop a contextualized multi-task learning framework to address the SR problem. Specifically, our method first extracts convolutional features from the input LR image and applies one deconvolutional module to interpolate the LR feature maps in a content-adaptive way. Then, the resulting feature maps are fed into two branched sub-networks. During the neural network training, one sub-network outputs salient image boundaries and the HR image, and the other sub-network outputs the local residual map, i.e., the residual difference between the generated HR image and ground-truth image. On several standard benchmarks (i.e., Set5, Set14 and BSD200), our extensive evaluations demonstrate the effectiveness of our SR method on achieving both higher restoration quality and computational efficiency compared with several state-of-the-art SR approaches. The source code and some SR results can be found at: http://hcp.sysu.edu.cn/structure-preserving-image-super-resolution/ |
Tasks | Image Restoration, Image Super-Resolution, Multi-Task Learning, Super-Resolution |
Published | 2017-07-26 |
URL | http://arxiv.org/abs/1707.08340v1 |
http://arxiv.org/pdf/1707.08340v1.pdf | |
PWC | https://paperswithcode.com/paper/structure-preserving-image-super-resolution |
Repo | |
Framework | |
Joint Sequence Learning and Cross-Modality Convolution for 3D Biomedical Segmentation
Title | Joint Sequence Learning and Cross-Modality Convolution for 3D Biomedical Segmentation |
Authors | Kuan-Lun Tseng, Yen-Liang Lin, Winston Hsu, Chung-Yang Huang |
Abstract | Deep learning models such as convolutional neural net- work have been widely used in 3D biomedical segmentation and achieve state-of-the-art performance. However, most of them often adapt a single modality or stack multiple modalities as different input channels. To better leverage the multi- modalities, we propose a deep encoder-decoder structure with cross-modality convolution layers to incorporate different modalities of MRI data. In addition, we exploit convolutional LSTM to model a sequence of 2D slices, and jointly learn the multi-modalities and convolutional LSTM in an end-to-end manner. To avoid converging to the certain labels, we adopt a re-weighting scheme and two-phase training to handle the label imbalance. Experimental results on BRATS-2015 show that our method outperforms state-of-the-art biomedical segmentation approaches. |
Tasks | |
Published | 2017-04-25 |
URL | http://arxiv.org/abs/1704.07754v1 |
http://arxiv.org/pdf/1704.07754v1.pdf | |
PWC | https://paperswithcode.com/paper/joint-sequence-learning-and-cross-modality |
Repo | |
Framework | |
A K-fold Method for Baseline Estimation in Policy Gradient Algorithms
Title | A K-fold Method for Baseline Estimation in Policy Gradient Algorithms |
Authors | Nithyanand Kota, Abhishek Mishra, Sunil Srinivasa, Xi, Chen, Pieter Abbeel |
Abstract | The high variance issue in unbiased policy-gradient methods such as VPG and REINFORCE is typically mitigated by adding a baseline. However, the baseline fitting itself suffers from the underfitting or the overfitting problem. In this paper, we develop a K-fold method for baseline estimation in policy gradient algorithms. The parameter K is the baseline estimation hyperparameter that can adjust the bias-variance trade-off in the baseline estimates. We demonstrate the usefulness of our approach via two state-of-the-art policy gradient algorithms on three MuJoCo locomotive control tasks. |
Tasks | Policy Gradient Methods |
Published | 2017-01-03 |
URL | http://arxiv.org/abs/1701.00867v1 |
http://arxiv.org/pdf/1701.00867v1.pdf | |
PWC | https://paperswithcode.com/paper/a-k-fold-method-for-baseline-estimation-in |
Repo | |
Framework | |
Accountability of AI Under the Law: The Role of Explanation
Title | Accountability of AI Under the Law: The Role of Explanation |
Authors | Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, David O’Brien, Kate Scott, Stuart Schieber, James Waldo, David Weinberger, Adrian Weller, Alexandra Wood |
Abstract | The ubiquity of systems using artificial intelligence or “AI” has brought increasing attention to how those systems should be regulated. The choice of how to regulate AI systems will require care. AI systems have the potential to synthesize large amounts of data, allowing for greater levels of personalization and precision than ever before—applications range from clinical decision support to autonomous driving and predictive policing. That said, there exist legitimate concerns about the intentional and unintentional negative consequences of AI systems. There are many ways to hold AI systems accountable. In this work, we focus on one: explanation. Questions about a legal right to explanation from AI systems was recently debated in the EU General Data Protection Regulation, and thus thinking carefully about when and how explanation from AI systems might improve accountability is timely. In this work, we review contexts in which explanation is currently required under the law, and then list the technical considerations that must be considered if we desired AI systems that could provide kinds of explanations that are currently required of humans. |
Tasks | Autonomous Driving |
Published | 2017-11-03 |
URL | https://arxiv.org/abs/1711.01134v3 |
https://arxiv.org/pdf/1711.01134v3.pdf | |
PWC | https://paperswithcode.com/paper/accountability-of-ai-under-the-law-the-role |
Repo | |
Framework | |
Navigating Occluded Intersections with Autonomous Vehicles using Deep Reinforcement Learning
Title | Navigating Occluded Intersections with Autonomous Vehicles using Deep Reinforcement Learning |
Authors | David Isele, Reza Rahimi, Akansel Cosgun, Kaushik Subramanian, Kikuo Fujimura |
Abstract | Providing an efficient strategy to navigate safely through unsignaled intersections is a difficult task that requires determining the intent of other drivers. We explore the effectiveness of Deep Reinforcement Learning to handle intersection problems. Using recent advances in Deep RL, we are able to learn policies that surpass the performance of a commonly-used heuristic approach in several metrics including task completion time and goal success rate and have limited ability to generalize. We then explore a system’s ability to learn active sensing behaviors to enable navigating safely in the case of occlusions. Our analysis, provides insight into the intersection handling problem, the solutions learned by the network point out several shortcomings of current rule-based methods, and the failures of our current deep reinforcement learning system point to future research directions. |
Tasks | Autonomous Vehicles |
Published | 2017-05-02 |
URL | http://arxiv.org/abs/1705.01196v2 |
http://arxiv.org/pdf/1705.01196v2.pdf | |
PWC | https://paperswithcode.com/paper/navigating-occluded-intersections-with |
Repo | |
Framework | |
On Tensor Train Rank Minimization: Statistical Efficiency and Scalable Algorithm
Title | On Tensor Train Rank Minimization: Statistical Efficiency and Scalable Algorithm |
Authors | Masaaki Imaizumi, Takanori Maehara, Kohei Hayashi |
Abstract | Tensor train (TT) decomposition provides a space-efficient representation for higher-order tensors. Despite its advantage, we face two crucial limitations when we apply the TT decomposition to machine learning problems: the lack of statistical theory and of scalable algorithms. In this paper, we address the limitations. First, we introduce a convex relaxation of the TT decomposition problem and derive its error bound for the tensor completion task. Next, we develop an alternating optimization method with a randomization technique, in which the time complexity is as efficient as the space complexity is. In experiments, we numerically confirm the derived bounds and empirically demonstrate the performance of our method with a real higher-order tensor. |
Tasks | |
Published | 2017-08-01 |
URL | http://arxiv.org/abs/1708.00132v2 |
http://arxiv.org/pdf/1708.00132v2.pdf | |
PWC | https://paperswithcode.com/paper/on-tensor-train-rank-minimization-statistical |
Repo | |
Framework | |
Unsupervised Adaptation with Domain Separation Networks for Robust Speech Recognition
Title | Unsupervised Adaptation with Domain Separation Networks for Robust Speech Recognition |
Authors | Zhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li, Yifan Gong |
Abstract | Unsupervised domain adaptation of speech signal aims at adapting a well-trained source-domain acoustic model to the unlabeled data from target domain. This can be achieved by adversarial training of deep neural network (DNN) acoustic models to learn an intermediate deep representation that is both senone-discriminative and domain-invariant. Specifically, the DNN is trained to jointly optimize the primary task of senone classification and the secondary task of domain classification with adversarial objective functions. In this work, instead of only focusing on learning a domain-invariant feature (i.e. the shared component between domains), we also characterize the difference between the source and target domain distributions by explicitly modeling the private component of each domain through a private component extractor DNN. The private component is trained to be orthogonal with the shared component and thus implicitly increases the degree of domain-invariance of the shared component. A reconstructor DNN is used to reconstruct the original speech feature from the private and shared components as a regularization. This domain separation framework is applied to the unsupervised environment adaptation task and achieved 11.08% relative WER reduction from the gradient reversal layer training, a representative adversarial training method, for automatic speech recognition on CHiME-3 dataset. |
Tasks | Domain Adaptation, Robust Speech Recognition, Speech Recognition, Unsupervised Domain Adaptation |
Published | 2017-11-21 |
URL | http://arxiv.org/abs/1711.08010v2 |
http://arxiv.org/pdf/1711.08010v2.pdf | |
PWC | https://paperswithcode.com/paper/unsupervised-adaptation-with-domain |
Repo | |
Framework | |
An Experimental Study of Deep Convolutional Features For Iris Recognition
Title | An Experimental Study of Deep Convolutional Features For Iris Recognition |
Authors | Shervin Minaee, Amirali Abdolrashidi, Yao Wang |
Abstract | Iris is one of the popular biometrics that is widely used for identity authentication. Different features have been used to perform iris recognition in the past. Most of them are based on hand-crafted features designed by biometrics experts. Due to tremendous success of deep learning in computer vision problems, there has been a lot of interest in applying features learned by convolutional neural networks on general image recognition to other tasks such as segmentation, face recognition, and object detection. In this paper, we have investigated the application of deep features extracted from VGG-Net for iris recognition. The proposed scheme has been tested on two well-known iris databases, and has shown promising results with the best accuracy rate of 99.4%, which outperforms the previous best result. |
Tasks | Face Recognition, Iris Recognition, Object Detection |
Published | 2017-02-04 |
URL | http://arxiv.org/abs/1702.01334v1 |
http://arxiv.org/pdf/1702.01334v1.pdf | |
PWC | https://paperswithcode.com/paper/an-experimental-study-of-deep-convolutional |
Repo | |
Framework | |
Learning Discriminative Relational Features for Sequence Labeling
Title | Learning Discriminative Relational Features for Sequence Labeling |
Authors | Naveen Nair, Ajay Nagesh, Ganesh Ramakrishnan |
Abstract | Discovering relational structure between input features in sequence labeling models has shown to improve their accuracy in several problem settings. However, the search space of relational features is exponential in the number of basic input features. Consequently, approaches that learn relational features, tend to follow a greedy search strategy. In this paper, we study the possibility of optimally learning and applying discriminative relational features for sequence labeling. For learning features derived from inputs at a particular sequence position, we propose a Hierarchical Kernels-based approach (referred to as Hierarchical Kernel Learning for Structured Output Spaces - StructHKL). This approach optimally and efficiently explores the hierarchical structure of the feature space for problems with structured output spaces such as sequence labeling. Since the StructHKL approach has limitations in learning complex relational features derived from inputs at relative positions, we propose two solutions to learn relational features namely, (i) enumerating simple component features of complex relational features and discovering their compositions using StructHKL and (ii) leveraging relational kernels, that compute the similarity between instances implicitly, in the sequence labeling problem. We perform extensive empirical evaluation on publicly available datasets and record our observations on settings in which certain approaches are effective. |
Tasks | |
Published | 2017-05-07 |
URL | http://arxiv.org/abs/1705.02562v1 |
http://arxiv.org/pdf/1705.02562v1.pdf | |
PWC | https://paperswithcode.com/paper/learning-discriminative-relational-features |
Repo | |
Framework | |