October 17, 2019

2960 words 14 mins read

Paper Group ANR 901

DID: Distributed Incremental Block Coordinate Descent for Nonnegative Matrix Factorization. Quit When You Can: Efficient Evaluation of Ensembles with Ordering Optimization. False Positive Reduction by Actively Mining Negative Samples for Pulmonary Nodule Detection in Chest Radiographs. Adversarial Attacks and Defences: A Survey. Convolutional Neura …

DID: Distributed Incremental Block Coordinate Descent for Nonnegative Matrix Factorization


Title	DID: Distributed Incremental Block Coordinate Descent for Nonnegative Matrix Factorization
Authors	Tianxiang Gao, Chris Chu
Abstract	Nonnegative matrix factorization (NMF) has attracted much attention in the last decade as a dimension reduction method in many applications. Due to the explosion in the size of data, naturally the samples are collected and stored distributively in local computational nodes. Thus, there is a growing need to develop algorithms in a distributed memory architecture. We propose a novel distributed algorithm, called \textit{distributed incremental block coordinate descent} (DID), to solve the problem. By adapting the block coordinate descent framework, closed-form update rules are obtained in DID. Moreover, DID performs updates incrementally based on the most recently updated residual matrix. As a result, only one communication step per iteration is required. The correctness, efficiency, and scalability of the proposed algorithm are verified in a series of numerical experiments.
Tasks	Dimensionality Reduction
Published	2018-02-25
URL	http://arxiv.org/abs/1802.08938v1
PDF	http://arxiv.org/pdf/1802.08938v1.pdf
PWC	https://paperswithcode.com/paper/did-distributed-incremental-block-coordinate
Repo
Framework

Quit When You Can: Efficient Evaluation of Ensembles with Ordering Optimization


Title	Quit When You Can: Efficient Evaluation of Ensembles with Ordering Optimization
Authors	Serena Wang, Maya Gupta, Seungil You
Abstract	Given a classifier ensemble and a set of examples to be classified, many examples may be confidently and accurately classified after only a subset of the base models in the ensemble are evaluated. This can reduce both mean latency and CPU while maintaining the high accuracy of the original ensemble. To achieve such gains, we propose jointly optimizing a fixed evaluation order of the base models and early-stopping thresholds. Our proposed objective is a combinatorial optimization problem, but we provide a greedy algorithm that achieves a 4-approximation of the optimal solution for certain cases. For those cases, this is also the best achievable polynomial time approximation bound unless $P = NP$. Experiments on benchmark and real-world problems show that the proposed Quit When You Can (QWYC) algorithm can speed-up average evaluation time by $2$x–$4$x, and is around $1.5$x faster than prior work. QWYC’s joint optimization of ordering and thresholds also performed better in experiments than various fixed orderings, including gradient boosted trees’ ordering.
Tasks	Combinatorial Optimization
Published	2018-06-28
URL	http://arxiv.org/abs/1806.11202v1
PDF	http://arxiv.org/pdf/1806.11202v1.pdf
PWC	https://paperswithcode.com/paper/quit-when-you-can-efficient-evaluation-of
Repo
Framework

False Positive Reduction by Actively Mining Negative Samples for Pulmonary Nodule Detection in Chest Radiographs


Title	False Positive Reduction by Actively Mining Negative Samples for Pulmonary Nodule Detection in Chest Radiographs
Authors	Sejin Park, Woochan Hwang, Kyu Hwan Jung, Joon Beom Seo, Namkug Kim
Abstract	Generating large quantities of quality labeled data in medical imaging is very time consuming and expensive. The performance of supervised algorithms for various tasks on imaging has improved drastically over the years, however the availability of data to train these algorithms have become one of the main bottlenecks for implementation. To address this, we propose a semi-supervised learning method where pseudo-negative labels from unlabeled data are used to further refine the performance of a pulmonary nodule detection network in chest radiographs. After training with the proposed network, the false positive rate was reduced to 0.1266 from 0.4864 while maintaining sensitivity at 0.89.
Tasks
Published	2018-07-26
URL	http://arxiv.org/abs/1807.10756v1
PDF	http://arxiv.org/pdf/1807.10756v1.pdf
PWC	https://paperswithcode.com/paper/false-positive-reduction-by-actively-mining
Repo
Framework

Adversarial Attacks and Defences: A Survey


Title	Adversarial Attacks and Defences: A Survey
Authors	Anirban Chakraborty, Manaar Alam, Vishal Dey, Anupam Chattopadhyay, Debdeep Mukhopadhyay
Abstract	Deep learning has emerged as a strong and efficient framework that can be applied to a broad spectrum of complex learning problems which were difficult to solve using the traditional machine learning techniques in the past. In the last few years, deep learning has advanced radically in such a way that it can surpass human-level performance on a number of tasks. As a consequence, deep learning is being extensively used in most of the recent day-to-day applications. However, security of deep learning systems are vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify the output. In recent times, different types of adversaries based on their threat model leverage these vulnerabilities to compromise a deep learning system where adversaries have high incentives. Hence, it is extremely important to provide robustness to deep learning algorithms against these adversaries. However, there are only a few strong countermeasures which can be used in all types of attack scenarios to design a robust deep learning system. In this paper, we attempt to provide a detailed discussion on different types of adversarial attacks with various threat models and also elaborate the efficiency and challenges of recent countermeasures against them.
Tasks
Published	2018-09-28
URL	http://arxiv.org/abs/1810.00069v1
PDF	http://arxiv.org/pdf/1810.00069v1.pdf
PWC	https://paperswithcode.com/paper/adversarial-attacks-and-defences-a-survey
Repo
Framework

Convolutional Neural Networks for Video Quality Assessment


Title	Convolutional Neural Networks for Video Quality Assessment
Authors	Michalis Giannopoulos, Grigorios Tsagkatakis, Saverio Blasi, Farzad Toutounchi, Athanasios Mouchtaris, Panagiotis Tsakalides, Marta Mrak, Ebroul Izquierdo
Abstract	Video Quality Assessment (VQA) is a very challenging task due to its highly subjective nature. Moreover, many factors influence VQA. Compression of video content, while necessary for minimising transmission and storage requirements, introduces distortions which can have detrimental effects on the perceived quality. Especially when dealing with modern video coding standards, it is extremely difficult to model the effects of compression due to the unpredictability of encoding on different content types. Moreover, transmission also introduces delays and other distortion types which affect the perceived quality. Therefore, it would be highly beneficial to accurately predict the perceived quality of video to be distributed over modern content distribution platforms, so that specific actions could be undertaken to maximise the Quality of Experience (QoE) of the users. Traditional VQA techniques based on feature extraction and modelling may not be sufficiently accurate. In this paper, a novel Deep Learning (DL) framework is introduced for effectively predicting VQA of video content delivery mechanisms based on end-to-end feature learning. The proposed framework is based on Convolutional Neural Networks, taking into account compression distortion as well as transmission delays. Training and evaluation of the proposed framework are performed on a user annotated VQA dataset specifically created to undertake this work. The experiments show that the proposed methods can lead to high accuracy of the quality estimation, showcasing the potential of using DL in complex VQA scenarios.
Tasks	Video Quality Assessment, Visual Question Answering
Published	2018-09-26
URL	http://arxiv.org/abs/1809.10117v1
PDF	http://arxiv.org/pdf/1809.10117v1.pdf
PWC	https://paperswithcode.com/paper/convolutional-neural-networks-for-video
Repo
Framework

ISNA-Set: A novel English Corpus of Iran NEWS


Title	ISNA-Set: A novel English Corpus of Iran NEWS
Authors	Mohammad Kamel, Hadi Sadoghi-Yazdi
Abstract	News agencies publish news on their websites all over the world. Moreover, creating novel corpuses is necessary to bring natural processing to new domains. Textual processing of online news is challenging in terms of the strategy of collecting data, the complex structure of news websites, and selecting or designing suitable algorithms for processing these types of data. Despite the previous works which focus on creating corpuses for Iran news in Persian, in this paper, we introduce a new corpus for English news of a national news agency. ISNA-Set is a new dataset of English news of Iranian Students News Agency (ISNA), as one of the most famous news agencies in Iran. We statistically analyze the data and the sentiments of news, and also extract entities and part-of-speech tagging.
Tasks	Part-Of-Speech Tagging
Published	2018-08-21
URL	http://arxiv.org/abs/1808.07046v1
PDF	http://arxiv.org/pdf/1808.07046v1.pdf
PWC	https://paperswithcode.com/paper/isna-set-a-novel-english-corpus-of-iran-news
Repo
Framework

Persuasive Faces: Generating Faces in Advertisements


Title	Persuasive Faces: Generating Faces in Advertisements
Authors	Christopher Thomas, Adriana Kovashka
Abstract	In this paper, we examine the visual variability of objects across different ad categories, i.e. what causes an advertisement to be visually persuasive. We focus on modeling and generating faces which appear to come from different types of ads. For example, if faces in beauty ads tend to be women wearing lipstick, a generative model should portray this distinct visual appearance. Training generative models which capture such category-specific differences is challenging because of the highly diverse appearance of faces in ads and the relatively limited amount of available training data. To address these problems, we propose a conditional variational autoencoder which makes use of predicted semantic attributes and facial expressions as a supervisory signal when training. We show how our model can be used to produce visually distinct faces which appear to be from a fixed ad topic category. Our human studies and quantitative and qualitative experiments confirm that our method greatly outperforms a variety of baselines, including two variations of a state-of-the-art generative adversarial network, for transforming faces to be more ad-category appropriate. Finally, we show preliminary generation results for other types of objects, conditioned on an ad topic.
Tasks	Face Generation
Published	2018-07-25
URL	http://arxiv.org/abs/1807.09882v1
PDF	http://arxiv.org/pdf/1807.09882v1.pdf
PWC	https://paperswithcode.com/paper/persuasive-faces-generating-faces-in
Repo
Framework

Detecting Tiny Moving Vehicles in Satellite Videos


Title	Detecting Tiny Moving Vehicles in Satellite Videos
Authors	Wei Ao, Yanwei Fu, Feng Xu
Abstract	In recent years, the satellite videos have been captured by a moving satellite platform. In contrast to consumer, movie, and common surveillance videos, satellite video can record the snapshot of the city-scale scene. In a broad field-of-view of satellite videos, each moving target would be very tiny and usually composed of several pixels in frames. Even worse, the noise signals also existed in the video frames, since the background of the video frame has the subpixel-level and uneven moving thanks to the motion of satellites. We argue that this is a new type of computer vision task since previous technologies are unable to detect such tiny vehicles efficiently. This paper proposes a novel framework that can identify the small moving vehicles in satellite videos. In particular, we offer a novel detecting algorithm based on the local noise modeling. We differentiate the potential vehicle targets from noise patterns by an exponential probability distribution. Subsequently, a multi-morphological-cue based discrimination strategy is designed to distinguish correct vehicle targets from a few existing noises further. Another significant contribution is to introduce a series of evaluation protocols to measure the performance of tiny moving vehicle detection systematically. We annotate a satellite video manually and use it to test our algorithms under different evaluation criterion. The proposed algorithm is also compared with the state-of-the-art baselines, and demonstrates the advantages of our framework over the benchmarks.
Tasks
Published	2018-07-05
URL	http://arxiv.org/abs/1807.01864v1
PDF	http://arxiv.org/pdf/1807.01864v1.pdf
PWC	https://paperswithcode.com/paper/detecting-tiny-moving-vehicles-in-satellite
Repo
Framework

AMR Dependency Parsing with a Typed Semantic Algebra


Title	AMR Dependency Parsing with a Typed Semantic Algebra
Authors	Jonas Groschwitz, Matthias Lindemann, Meaghan Fowlie, Mark Johnson, Alexander Koller
Abstract	We present a semantic parser for Abstract Meaning Representations which learns to parse strings into tree representations of the compositional structure of an AMR graph. This allows us to use standard neural techniques for supertagging and dependency tree parsing, constrained by a linguistically principled type system. We present two approximative decoding algorithms, which achieve state-of-the-art accuracy and outperform strong baselines.
Tasks	Dependency Parsing
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11465v1
PDF	http://arxiv.org/pdf/1805.11465v1.pdf
PWC	https://paperswithcode.com/paper/amr-dependency-parsing-with-a-typed-semantic
Repo
Framework

Do Explanations make VQA Models more Predictable to a Human?


Title	Do Explanations make VQA Models more Predictable to a Human?
Authors	Arjun Chandrasekaran, Viraj Prabhu, Deshraj Yadav, Prithvijit Chattopadhyay, Devi Parikh
Abstract	A rich line of research attempts to make deep neural networks more transparent by generating human-interpretable ‘explanations’ of their decision process, especially for interactive tasks like Visual Question Answering (VQA). In this work, we analyze if existing explanations indeed make a VQA model – its responses as well as failures – more predictable to a human. Surprisingly, we find that they do not. On the other hand, we find that human-in-the-loop approaches that treat the model as a black-box do.
Tasks	Question Answering, Visual Question Answering
Published	2018-10-29
URL	http://arxiv.org/abs/1810.12366v1
PDF	http://arxiv.org/pdf/1810.12366v1.pdf
PWC	https://paperswithcode.com/paper/do-explanations-make-vqa-models-more
Repo
Framework

High Diversity Attribute Guided Face Generation with GANs


Title	High Diversity Attribute Guided Face Generation with GANs
Authors	Evgeny Izutov
Abstract	In this work we focused on GAN-based solution for the attribute guided face synthesis. Previous works exploited GANs for generation of photo-realistic face images and did not pay attention to the question of diversity of the resulting images. The proposed solution in its turn introducing novel latent space of unit complex numbers is able to provide the diversity on the “birthday paradox” score 3 times higher than the size of the training dataset. It is important to emphasize that our result is shown on relatively small dataset (20k samples vs 200k) while preserving photo-realistic properties of generated faces on significantly higher resolution (128x128 in comparison to 32x32 of previous works).
Tasks	Face Generation
Published	2018-06-28
URL	http://arxiv.org/abs/1806.10982v1
PDF	http://arxiv.org/pdf/1806.10982v1.pdf
PWC	https://paperswithcode.com/paper/high-diversity-attribute-guided-face
Repo
Framework

2018 Low-Power Image Recognition Challenge


Title	2018 Low-Power Image Recognition Challenge
Authors	Sergei Alyamkin, Matthew Ardi, Achille Brighton, Alexander C. Berg, Yiran Chen, Hsin-Pai Cheng, Bo Chen, Zichen Fan, Chen Feng, Bo Fu, Kent Gauen, Jongkook Go, Alexander Goncharenko, Xuyang Guo, Hong Hanh Nguyen, Andrew Howard, Yuanjun Huang, Donghyun Kang, Jaeyoun Kim, Alexander Kondratyev, Seungjae Lee, Suwoong Lee, Junhyeok Lee, Zhiyu Liang, Xin Liu, Juzheng Liu, Zichao Li, Yang Lu, Yung-Hsiang Lu, Deeptanshu Malik, Eunbyung Park, Denis Repin, Tao Sheng, Liang Shen, Fei Sun, David Svitov, George K. Thiruvathukal, Baiwu Zhang, Jingchi Zhang, Xiaopeng Zhang, Shaojie Zhuo
Abstract	The Low-Power Image Recognition Challenge (LPIRC, https://rebootingcomputing.ieee.org/lpirc) is an annual competition started in 2015. The competition identifies the best technologies that can classify and detect objects in images efficiently (short execution time and low energy consumption) and accurately (high precision). Over the four years, the winners’ scores have improved more than 24 times. As computer vision is widely used in many battery-powered systems (such as drones and mobile phones), the need for low-power computer vision will become increasingly important. This paper summarizes LPIRC 2018 by describing the three different tracks and the winners’ solutions.
Tasks
Published	2018-10-03
URL	http://arxiv.org/abs/1810.01732v1
PDF	http://arxiv.org/pdf/1810.01732v1.pdf
PWC	https://paperswithcode.com/paper/2018-low-power-image-recognition-challenge
Repo
Framework

Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems


Title	Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems
Authors	Hieu-Thi Luong, Junichi Yamagishi
Abstract	Most neural-network based speaker-adaptive acoustic models for speech synthesis can be categorized into either layer-based or input-code approaches. Although both approaches have their own pros and cons, most existing works on speaker adaptation focus on improving one or the other. In this paper, after we first systematically overview the common principles of neural-network based speaker-adaptive models, we show that these approaches can be represented in a unified framework and can be generalized further. More specifically, we introduce the use of scaling and bias codes as generalized means for speaker-adaptive transformation. By utilizing these codes, we can create a more efficient factorized speaker-adaptive model and capture advantages of both approaches while reducing their disadvantages. The experiments show that the proposed method can improve the performance of speaker adaptation compared with speaker adaptation based on the conventional input code.
Tasks	Speech Synthesis
Published	2018-07-31
URL	http://arxiv.org/abs/1807.11632v2
PDF	http://arxiv.org/pdf/1807.11632v2.pdf
PWC	https://paperswithcode.com/paper/scaling-and-bias-codes-for-modeling-speaker
Repo
Framework

“Factual” or “Emotional”: Stylized Image Captioning with Adaptive Learning and Attention


Title	“Factual” or “Emotional”: Stylized Image Captioning with Adaptive Learning and Attention
Authors	Tianlang Chen, Zhongping Zhang, Quanzeng You, Chen Fang, Zhaowen Wang, Hailin Jin, Jiebo Luo
Abstract	Generating stylized captions for an image is an emerging topic in image captioning. Given an image as input, it requires the system to generate a caption that has a specific style (e.g., humorous, romantic, positive, and negative) while describing the image content semantically accurately. In this paper, we propose a novel stylized image captioning model that effectively takes both requirements into consideration. To this end, we first devise a new variant of LSTM, named style-factual LSTM, as the building block of our model. It uses two groups of matrices to capture the factual and stylized knowledge, respectively, and automatically learns the word-level weights of the two groups based on previous context. In addition, when we train the model to capture stylized elements, we propose an adaptive learning approach based on a reference factual model, it provides factual knowledge to the model as the model learns from stylized caption labels, and can adaptively compute how much information to supply at each time step. We evaluate our model on two stylized image captioning datasets, which contain humorous/romantic captions and positive/negative captions, respectively. Experiments shows that our proposed model outperforms the state-of-the-art approaches, without using extra ground truth supervision.
Tasks	Image Captioning
Published	2018-07-10
URL	http://arxiv.org/abs/1807.03871v3
PDF	http://arxiv.org/pdf/1807.03871v3.pdf
PWC	https://paperswithcode.com/paper/factual-or-emotional-stylized-image
Repo
Framework

TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection


Title	TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection
Authors	Yunchao Wei, Zhiqiang Shen, Bowen Cheng, Honghui Shi, Jinjun Xiong, Jiashi Feng, Thomas Huang
Abstract	This work provides a simple approach to discover tight object bounding boxes with only image-level supervision, called Tight box mining with Surrounding Segmentation Context (TS2C). We observe that object candidates mined through current multiple instance learning methods are usually trapped to discriminative object parts, rather than the entire object. TS2C leverages surrounding segmentation context derived from weakly-supervised segmentation to suppress such low-quality distracting candidates and boost the high-quality ones. Specifically, TS2C is developed based on two key properties of desirable bounding boxes: 1) high purity, meaning most pixels in the box are with high object response, and 2) high completeness, meaning the box covers high object response pixels comprehensively. With such novel and computable criteria, more tight candidates can be discovered for learning a better object detector. With TS2C, we obtain 48.0% and 44.4% mAP scores on VOC 2007 and 2012 benchmarks, which are the new state-of-the-arts.
Tasks	Multiple Instance Learning, Object Detection, Weakly Supervised Object Detection
Published	2018-07-13
URL	http://arxiv.org/abs/1807.04897v1
PDF	http://arxiv.org/pdf/1807.04897v1.pdf
PWC	https://paperswithcode.com/paper/ts2c-tight-box-mining-with-surrounding
Repo
Framework