Paper Group ANR 220
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers. Diagnosis of liver disease using computer-assisted imaging techniques: A Review. Annotation-Free Cardiac Vessel Segmentation via Knowledge Transfer from Retinal Images. Stochastic Prototype Embeddings. Metric-Based Few-Shot Learning for Vide …
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers
Title | Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers |
Authors | Manjot Bilkhu, Siyang Wang, Tushar Dobhal |
Abstract | Video Captioning and Summarization have become very popular in the recent years due to advancements in Sequence Modelling, with the resurgence of Long-Short Term Memory networks (LSTMs) and introduction of Gated Recurrent Units (GRUs). Existing architectures extract spatio-temporal features using CNNs and utilize either GRUs or LSTMs to model dependencies with soft attention layers. These attention layers do help in attending to the most prominent features and improve upon the recurrent units, however, these models suffer from the inherent drawbacks of the recurrent units themselves. The introduction of the Transformer model has driven the Sequence Modelling field into a new direction. In this project, we implement a Transformer-based model for Video captioning, utilizing 3D CNN architectures like C3D and Two-stream I3D for video extraction. We also apply certain dimensionality reduction techniques so as to keep the overall size of the model within limits. We finally present our results on the MSVD and ActivityNet datasets for Single and Dense video captioning tasks respectively. |
Tasks | Dense Video Captioning, Dimensionality Reduction, Video Captioning, Video Summarization |
Published | 2019-06-06 |
URL | https://arxiv.org/abs/1906.02792v1 |
https://arxiv.org/pdf/1906.02792v1.pdf | |
PWC | https://paperswithcode.com/paper/attention-is-all-you-need-for-videos-self |
Repo | |
Framework | |
Diagnosis of liver disease using computer-assisted imaging techniques: A Review
Title | Diagnosis of liver disease using computer-assisted imaging techniques: A Review |
Authors | Behnam Kiani Kalejahi, Saeed Meshgini, Sabalan Daneshvar, Shiva Asadzadeh |
Abstract | The evidence says that liver disease detection using CAD is one of the most efficient techniques but the presence of better organization of studies and the performance parameters to represent the result analysis of the proposed techniques are pointedly missing in most of the recent studies. Few benchmarked studies have been found in some of the papers as benchmarking makes a reader understand that under which circumstances their experimental results or outcomes are better and useful for the future implementation and adoption of the work. Liver diseases and image processing algorithms, especially in medicine, are the most important and important topics of the day. Unfortunately, the necessary data and data, as they are invoked in the articles, are low in this area and require the revision and implementation of policies in order to gather and do more research in this field. Detection with ultrasound is quite normal in liver diseases and depends on the physician’s experience and skills. CAD systems are very important for doctors to understand medical images and improve the accuracy of diagnosing various diseases. In the following, we describe the techniques used in the various stages of a CAD system, namely: extracting features, selecting features, and classifying them. Although there are many techniques that are used to classify medical images, it is still a challenging issue for creating a universally accepted approach. |
Tasks | |
Published | 2019-12-15 |
URL | https://arxiv.org/abs/1912.09572v1 |
https://arxiv.org/pdf/1912.09572v1.pdf | |
PWC | https://paperswithcode.com/paper/diagnosis-of-liver-disease-using-computer |
Repo | |
Framework | |
Annotation-Free Cardiac Vessel Segmentation via Knowledge Transfer from Retinal Images
Title | Annotation-Free Cardiac Vessel Segmentation via Knowledge Transfer from Retinal Images |
Authors | Fei Yu, Jie Zhao, Yanjun Gong, Zhi Wang, Yuxi Li, Fan Yang, Bin Dong, Quanzheng Li, Li Zhang |
Abstract | Segmenting coronary arteries is challenging, as classic unsupervised methods fail to produce satisfactory results and modern supervised learning (deep learning) requires manual annotation which is often time-consuming and can some time be infeasible. To solve this problem, we propose a knowledge transfer based shape-consistent generative adversarial network (SC-GAN), which is an annotation-free approach that uses the knowledge from publicly available annotated fundus dataset to segment coronary arteries. The proposed network is trained in an end-to-end fashion, generating and segmenting synthetic images that maintain the background of coronary angiography and preserve the vascular structures of retinal vessels and coronary arteries. We train and evaluate the proposed model on a dataset of 1092 digital subtraction angiography images, and experiments demonstrate the supreme accuracy of the proposed method on coronary arteries segmentation. |
Tasks | Transfer Learning |
Published | 2019-07-26 |
URL | https://arxiv.org/abs/1907.11483v1 |
https://arxiv.org/pdf/1907.11483v1.pdf | |
PWC | https://paperswithcode.com/paper/annotation-free-cardiac-vessel-segmentation |
Repo | |
Framework | |
Stochastic Prototype Embeddings
Title | Stochastic Prototype Embeddings |
Authors | Tyler R. Scott, Karl Ridgeway, Michael C. Mozer |
Abstract | Supervised deep-embedding methods project inputs of a domain to a representational space in which same-class instances lie near one another and different-class instances lie far apart. We propose a probabilistic method that treats embeddings as random variables. Extending a state-of-the-art deterministic method, Prototypical Networks (Snell et al., 2017), our approach supposes the existence of a class prototype around which class instances are Gaussian distributed. The prototype posterior is a product distribution over labeled instances, and query instances are classified by marginalizing relative prototype proximity over embedding uncertainty. We describe an efficient sampler for approximate inference that allows us to train the model at roughly the same space and time cost as its deterministic sibling. Incorporating uncertainty improves performance on few-shot learning and gracefully handles label noise and out-of-distribution inputs. Compared to the state-of-the-art stochastic method, Hedged Instance Embeddings (Oh et al., 2019), we achieve superior large- and open-set classification accuracy. Our method also aligns class-discriminating features with the axes of the embedding space, yielding an interpretable, disentangled representation. |
Tasks | Few-Shot Learning |
Published | 2019-09-25 |
URL | https://arxiv.org/abs/1909.11702v1 |
https://arxiv.org/pdf/1909.11702v1.pdf | |
PWC | https://paperswithcode.com/paper/stochastic-prototype-embeddings |
Repo | |
Framework | |
Metric-Based Few-Shot Learning for Video Action Recognition
Title | Metric-Based Few-Shot Learning for Video Action Recognition |
Authors | Chris Careaga, Brian Hutchinson, Nathan Hodas, Lawrence Phillips |
Abstract | In the few-shot scenario, a learner must effectively generalize to unseen classes given a small support set of labeled examples. While a relatively large amount of research has gone into few-shot learning for image classification, little work has been done on few-shot video classification. In this work, we address the task of few-shot video action recognition with a set of two-stream models. We evaluate the performance of a set of convolutional and recurrent neural network video encoder architectures used in conjunction with three popular metric-based few-shot algorithms. We train and evaluate using a few-shot split of the Kinetics 600 dataset. Our experiments confirm the importance of the two-stream setup, and find prototypical networks and pooled long short-term memory network embeddings to give the best performance as few-shot method and video encoder, respectively. For a 5-shot 5-way task, this setup obtains 84.2% accuracy on the test set and 59.4% on a special “challenge” test set, composed of highly confusable classes. |
Tasks | Few-Shot Learning, Image Classification, Temporal Action Localization, Video Classification |
Published | 2019-09-14 |
URL | https://arxiv.org/abs/1909.09602v1 |
https://arxiv.org/pdf/1909.09602v1.pdf | |
PWC | https://paperswithcode.com/paper/metric-based-few-shot-learning-for-video |
Repo | |
Framework | |
Integrating Hardware Diversity with Neural Architecture Search for Efficient Convolutional Neural Networks
Title | Integrating Hardware Diversity with Neural Architecture Search for Efficient Convolutional Neural Networks |
Authors | Li Lyna Zhang, Yuqing Yang, Yuhang Jiang, Wenwu Zhu, Yunxin Liu |
Abstract | Designing accurate and efficient convolutional neural architectures for vast amount of hardware is challenging because hardware designs are complex and diverse. This paper addresses the hardware diversity challenge in Neural Architecture Search (NAS). Unlike previous approaches that apply search algorithms on a small, human-designed search space without considering hardware diversity, we propose HURRICANE that explores the automatic hardware-aware search over a much larger search space and a two-stage search algorithm, to efficiently generate tailored models for different types of hardware. Extensive experiments on ImageNet show that our algorithm consistently achieves a much lower inference latency with a similar or better accuracy than state-of-the-art NAS methods on three types of hardware. Remarkably, HURRICANE achieves a 76.67% top-1 accuracy on ImageNet with a inference latency of only 16.5 ms for DSP, which is a 3.47% higher accuracy and a 6.35X inference speedup than FBNet-iPhoneX, respectively. For VPU, HURRICANE achieves a 0.53% higher top-1 accuracy than Proxylessmobile with a 1.49X speedup. Even for well-studied mobile CPU, HURRICANE achieves a 1.63% higher top-1 accuracy than FBNet-iPhoneX with a comparable inference latency. HURRICANE also reduces the training time by 30.4% or even 54.7% (with less than 0.5% accuracy loss) compared to Singlepath-Oneshot. |
Tasks | Neural Architecture Search |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.11609v2 |
https://arxiv.org/pdf/1910.11609v2.pdf | |
PWC | https://paperswithcode.com/paper/hardware-aware-one-shot-neural-architecture-1 |
Repo | |
Framework | |
Anomaly Detection in Large Scale Networks with Latent Space Models
Title | Anomaly Detection in Large Scale Networks with Latent Space Models |
Authors | Wesley Lee, Tyler H. McCormick, Joshua Neil, Cole Sodja |
Abstract | We develop a real-time anomaly detection algorithm for directed activity on large, sparse networks. We model the propensity for future activity using a dynamic logistic model with interaction terms for sender- and receiver-specific latent factors in addition to sender- and receiver-specific popularity scores; deviations from this underlying model constitute potential anomalies. Latent nodal attributes are estimated via a variational Bayesian approach and may change over time, representing natural shifts in network activity. Estimation is augmented with a case-control approximation to take advantage of the sparsity of the network and reduces computational complexity from $O(N^2)$ to $O(E)$, where $N$ is the number of nodes and $E$ is the number of observed edges. We run our algorithm on network event records collected from an enterprise network of over 25,000 computers and are able to identify a red team attack with half the detection rate required of the model without latent interaction terms. |
Tasks | Anomaly Detection |
Published | 2019-11-13 |
URL | https://arxiv.org/abs/1911.05522v1 |
https://arxiv.org/pdf/1911.05522v1.pdf | |
PWC | https://paperswithcode.com/paper/anomaly-detection-in-large-scale-networks |
Repo | |
Framework | |
The FAST Algorithm for Submodular Maximization
Title | The FAST Algorithm for Submodular Maximization |
Authors | Adam Breuer, Eric Balkanski, Yaron Singer |
Abstract | In this paper we describe a new algorithm called Fast Adaptive Sequencing Technique (FAST) for maximizing a monotone submodular function under a cardinality constraint $k$ whose approximation ratio is arbitrarily close to $1-1/e$, is $O(\log(n) \log^2(\log k))$ adaptive, and uses a total of $O(n \log\log(k))$ queries. Recent algorithms have comparable guarantees in terms of asymptotic worst case analysis, but their actual number of rounds and query complexity depend on very large constants and polynomials in terms of precision and confidence, making them impractical for large data sets. Our main contribution is a design that is extremely efficient both in terms of its non-asymptotic worst case query complexity and number of rounds, and in terms of its practical runtime. We show that this algorithm outperforms any algorithm for submodular maximization we are aware of, including hyper-optimized parallel versions of state-of-the-art serial algorithms, by running experiments on large data sets. These experiments show FAST is orders of magnitude faster than the state-of-the-art. |
Tasks | |
Published | 2019-07-14 |
URL | https://arxiv.org/abs/1907.06173v1 |
https://arxiv.org/pdf/1907.06173v1.pdf | |
PWC | https://paperswithcode.com/paper/the-fast-algorithm-for-submodular |
Repo | |
Framework | |
Differentially Private Meta-Learning
Title | Differentially Private Meta-Learning |
Authors | Jeffrey Li, Mikhail Khodak, Sebastian Caldas, Ameet Talwalkar |
Abstract | Parameter-transfer is a well-known and versatile approach for meta-learning, with applications including few-shot learning, federated learning, and reinforcement learning. However, parameter-transfer algorithms often require sharing models that have been trained on the samples from specific tasks, thus leaving the task-owners susceptible to breaches of privacy. We conduct the first formal study of privacy in this setting and formalize the notion of task-global differential privacy as a practical relaxation of more commonly studied threat models. We then propose a new differentially private algorithm for gradient-based parameter transfer that not only satisfies this privacy requirement but also retains provable transfer learning guarantees in convex settings. Empirically, we apply our analysis to the problems of federated learning with personalization and few-shot classification, showing that allowing the relaxation to task-global privacy from the more commonly studied notion of local privacy leads to dramatically increased performance in recurrent neural language modeling and image classification. |
Tasks | Few-Shot Learning, Image Classification, Language Modelling, Meta-Learning, Transfer Learning |
Published | 2019-09-12 |
URL | https://arxiv.org/abs/1909.05830v2 |
https://arxiv.org/pdf/1909.05830v2.pdf | |
PWC | https://paperswithcode.com/paper/differentially-private-meta-learning |
Repo | |
Framework | |
Training an Interactive Helper
Title | Training an Interactive Helper |
Authors | Mark Woodward, Chelsea Finn, Karol Hausman |
Abstract | Developing agents that can quickly adapt their behavior to new tasks remains a challenge. Meta-learning has been applied to this problem, but previous methods require either specifying a reward function which can be tedious or providing demonstrations which can be inefficient. In this paper, we investigate if, and how, a “helper” agent can be trained to interactively adapt their behavior to maximize the reward of another agent, whom we call the “prime” agent, without observing their reward or receiving explicit demonstrations. To this end, we propose to meta-learn a helper agent along with a prime agent, who, during training, observes the reward function and serves as a surrogate for a human prime. We introduce a distribution of multi-agent cooperative foraging tasks, in which only the prime agent knows the objects that should be collected. We demonstrate that, from the emerged physical communication, the trained helper rapidly infers and collects the correct objects. |
Tasks | Meta-Learning |
Published | 2019-06-24 |
URL | https://arxiv.org/abs/1906.10165v2 |
https://arxiv.org/pdf/1906.10165v2.pdf | |
PWC | https://paperswithcode.com/paper/training-an-interactive-helper |
Repo | |
Framework | |
PARN: Position-Aware Relation Networks for Few-Shot Learning
Title | PARN: Position-Aware Relation Networks for Few-Shot Learning |
Authors | Ziyang Wu, Yuwei Li, Lihua Guo, Kui Jia |
Abstract | Few-shot learning presents a challenge that a classifier must quickly adapt to new classes that do not appear in the training set, given only a few labeled examples of each new class. This paper proposes a position-aware relation network (PARN) to learn a more flexible and robust metric ability for few-shot learning. Relation networks (RNs), a kind of architectures for relational reasoning, can acquire a deep metric ability for images by just being designed as a simple convolutional neural network (CNN) [23]. However, due to the inherent local connectivity of CNN, the CNN-based relation network (RN) can be sensitive to the spatial position relationship of semantic objects in two compared images. To address this problem, we introduce a deformable feature extractor (DFE) to extract more efficient features, and design a dual correlation attention mechanism (DCA) to deal with its inherent local connectivity. Successfully, our proposed approach extents the potential of RN to be position-aware of semantic objects by introducing only a small number of parameters. We evaluate our approach on two major benchmark datasets, i.e., Omniglot and Mini-Imagenet, and on both of the datasets our approach achieves state-of-the-art performance with the setting of using a shallow feature extraction network. It’s worth noting that our 5-way 1-shot result on Omniglot even outperforms the previous 5-way 5-shot results. |
Tasks | Few-Shot Learning, Omniglot, Relational Reasoning |
Published | 2019-09-10 |
URL | https://arxiv.org/abs/1909.04332v1 |
https://arxiv.org/pdf/1909.04332v1.pdf | |
PWC | https://paperswithcode.com/paper/parn-position-aware-relation-networks-for-few |
Repo | |
Framework | |
A Statistical Learning Approach to Reactive Power Control in Distribution Systems
Title | A Statistical Learning Approach to Reactive Power Control in Distribution Systems |
Authors | Qiuling Yang, Alireza Sadeghi, Gang Wang, Georgios B. Giannakis, Jian Sun |
Abstract | Pronounced variability due to the growth of renewable energy sources, flexible loads, and distributed generation is challenging residential distribution systems. This context, motivates well fast, efficient, and robust reactive power control. Real-time optimal reactive power control is possible in theory by solving a non-convex optimization problem based on the exact model of distribution flow. However, lack of high-precision instrumentation and reliable communications, as well as the heavy computational burden of non-convex optimization solvers render computing and implementing the optimal control challenging in practice. Taking a statistical learning viewpoint, the input-output relationship between each grid state and the corresponding optimal reactive power control is parameterized in the present work by a deep neural network, whose unknown weights are learned offline by minimizing the power loss over a number of historical and simulated training pairs. In the inference phase, one just feeds the real-time state vector into the learned neural network to obtain the `optimal’ reactive power control with only several matrix-vector multiplications. The merits of this novel statistical learning approach are computational efficiency as well as robustness to random input perturbations. Numerical tests on a 47-bus distribution network using real data corroborate these practical merits. | |
Tasks | |
Published | 2019-10-25 |
URL | https://arxiv.org/abs/1910.13938v1 |
https://arxiv.org/pdf/1910.13938v1.pdf | |
PWC | https://paperswithcode.com/paper/a-statistical-learning-approach-to-reactive |
Repo | |
Framework | |
Partner Selection for the Emergence of Cooperation in Multi-Agent Systems Using Reinforcement Learning
Title | Partner Selection for the Emergence of Cooperation in Multi-Agent Systems Using Reinforcement Learning |
Authors | Nicolas Anastassacos, Stephen Hailes, Mirco Musolesi |
Abstract | Social dilemmas have been widely studied to explain how humans are able to cooperate in society. Considerable effort has been invested in designing artificial agents for social dilemmas that incorporate explicit agent motivations that are chosen to favor coordinated or cooperative responses. The prevalence of this general approach points towards the importance of achieving an understanding of both an agent’s internal design and external environment dynamics that facilitate cooperative behavior. In this paper, we investigate how partner selection can promote cooperative behavior between agents who are trained to maximize a purely selfish objective function. Our experiments reveal that agents trained with this dynamic learn a strategy that retaliates against defectors while promoting cooperation with other agents resulting in a prosocial society. |
Tasks | Multi-agent Reinforcement Learning |
Published | 2019-02-08 |
URL | https://arxiv.org/abs/1902.03185v4 |
https://arxiv.org/pdf/1902.03185v4.pdf | |
PWC | https://paperswithcode.com/paper/understanding-the-impact-of-partner-choice-on |
Repo | |
Framework | |
A Behavioral Approach to Visual Navigation with Graph Localization Networks
Title | A Behavioral Approach to Visual Navigation with Graph Localization Networks |
Authors | Kevin Chen, Juan Pablo de Vicente, Gabriel Sepulveda, Fei Xia, Alvaro Soto, Marynel Vazquez, Silvio Savarese |
Abstract | Inspired by research in psychology, we introduce a behavioral approach for visual navigation using topological maps. Our goal is to enable a robot to navigate from one location to another, relying only on its visual input and the topological map of the environment. We propose using graph neural networks for localizing the agent in the map, and decompose the action space into primitive behaviors implemented as convolutional or recurrent neural networks. Using the Gibson simulator, we verify that our approach outperforms relevant baselines and is able to navigate in both seen and unseen environments. |
Tasks | Visual Navigation |
Published | 2019-03-01 |
URL | http://arxiv.org/abs/1903.00445v1 |
http://arxiv.org/pdf/1903.00445v1.pdf | |
PWC | https://paperswithcode.com/paper/a-behavioral-approach-to-visual-navigation |
Repo | |
Framework | |
Motion-Nets: 6D Tracking of Unknown Objects in Unseen Environments using RGB
Title | Motion-Nets: 6D Tracking of Unknown Objects in Unseen Environments using RGB |
Authors | Felix Leeb, Arunkumar Byravan, Dieter Fox |
Abstract | In this work, we bridge the gap between recent pose estimation and tracking work to develop a powerful method for robots to track objects in their surroundings. Motion-Nets use a segmentation model to segment the scene, and separate translation and rotation models to identify the relative 6D motion of an object between two consecutive frames. We train our method with generated data of floating objects, and then test on several prediction tasks, including one with a real PR2 robot, and a toy control task with a simulated PR2 robot never seen during training. Motion-Nets are able to track the pose of objects with some quantitative accuracy for about 30-60 frames including occlusions and distractors. Additionally, the single step prediction errors remain low even after 100 frames. We also investigate an iterative correction procedure to improve performance for control tasks. |
Tasks | Pose Estimation |
Published | 2019-10-30 |
URL | https://arxiv.org/abs/1910.13942v1 |
https://arxiv.org/pdf/1910.13942v1.pdf | |
PWC | https://paperswithcode.com/paper/motion-nets-6d-tracking-of-unknown-objects-in |
Repo | |
Framework | |