October 18, 2019

3426 words 17 mins read

Paper Group ANR 526

Autoencoder Based Sample Selection for Self-Taught Learning. Chinese Poetry Generation with Flexible Styles. IamNN: Iterative and Adaptive Mobile Neural Network for Efficient Image Classification. Lexico-acoustic Neural-based Models for Dialog Act Classification. RiFCN: Recurrent Network in Fully Convolutional Network for Semantic Segmentation of H …

Autoencoder Based Sample Selection for Self-Taught Learning


Title	Autoencoder Based Sample Selection for Self-Taught Learning
Authors	Siwei Feng, Han Yu, Marco F. Duarte
Abstract	Self-taught learning is a technique that uses a large number of unlabeled data as source samples to improve the task performance on target samples. Compared with other transfer learning techniques, self-taught learning can be applied to a broader set of scenarios due to the loose restrictions on the source data. However, knowledge transferred from source samples that are not sufficiently related to the target domain may negatively influence the target learner, which is referred to as negative transfer. In this paper, we propose a metric for the relevance between a source sample and the target samples. To be more specific, both source and target samples are reconstructed through a single-layer autoencoder with a linear relationship between source samples and reconstructed target samples being simultaneously enforced. An $\ell_{2,1}$-norm sparsity constraint is imposed on the transformation matrix to identify source samples relevant to the target domain. Source domain samples that are deemed relevant are assigned pseudo-labels reflecting their relevance to target domain samples, and are combined with target samples in order to provide an expanded training set for classifier training. Local data structures are also preserved during source sample selection through spectral graph analysis. Promising results in extensive experiments show the advantages of the proposed approach.
Tasks	Transfer Learning
Published	2018-08-05
URL	https://arxiv.org/abs/1808.01574v2
PDF	https://arxiv.org/pdf/1808.01574v2.pdf
PWC	https://paperswithcode.com/paper/autoencoder-based-sample-selection-for-self
Repo
Framework

Chinese Poetry Generation with Flexible Styles


Title	Chinese Poetry Generation with Flexible Styles
Authors	Jiyuan Zhang, Dong Wang
Abstract	Research has shown that sequence-to-sequence neural models, particularly those with the attention mechanism, can successfully generate classical Chinese poems. However, neural models are not capable of generating poems that match specific styles, such as the impulsive style of Li Bai, a famous poet in the Tang Dynasty. This work proposes a memory-augmented neural model to enable the generation of style-specific poetry. The key idea is a memory structure that stores how poems with a desired style were generated by humans, and uses similar fragments to adjust the generation. We demonstrate that the proposed algorithm generates poems with flexible styles, including styles of a particular era and an individual poet.
Tasks
Published	2018-07-17
URL	http://arxiv.org/abs/1807.06500v2
PDF	http://arxiv.org/pdf/1807.06500v2.pdf
PWC	https://paperswithcode.com/paper/chinese-poetry-generation-with-flexible
Repo
Framework

IamNN: Iterative and Adaptive Mobile Neural Network for Efficient Image Classification


Title	IamNN: Iterative and Adaptive Mobile Neural Network for Efficient Image Classification
Authors	Sam Leroux, Pavlo Molchanov, Pieter Simoens, Bart Dhoedt, Thomas Breuel, Jan Kautz
Abstract	Deep residual networks (ResNets) made a recent breakthrough in deep learning. The core idea of ResNets is to have shortcut connections between layers that allow the network to be much deeper while still being easy to optimize avoiding vanishing gradients. These shortcut connections have interesting side-effects that make ResNets behave differently from other typical network architectures. In this work we use these properties to design a network based on a ResNet but with parameter sharing and with adaptive computation time. The resulting network is much smaller than the original network and can adapt the computational cost to the complexity of the input image.
Tasks	Image Classification
Published	2018-04-26
URL	http://arxiv.org/abs/1804.10123v1
PDF	http://arxiv.org/pdf/1804.10123v1.pdf
PWC	https://paperswithcode.com/paper/iamnn-iterative-and-adaptive-mobile-neural
Repo
Framework

Lexico-acoustic Neural-based Models for Dialog Act Classification


Title	Lexico-acoustic Neural-based Models for Dialog Act Classification
Authors	Daniel Ortega, Ngoc Thang Vu
Abstract	Recent works have proposed neural models for dialog act classification in spoken dialogs. However, they have not explored the role and the usefulness of acoustic information. We propose a neural model that processes both lexical and acoustic features for classification. Our results on two benchmark datasets reveal that acoustic features are helpful in improving the overall accuracy. Finally, a deeper analysis shows that acoustic features are valuable in three cases: when a dialog act has sufficient data, when lexical information is limited and when strong lexical cues are not present.
Tasks	Dialog Act Classification
Published	2018-03-02
URL	http://arxiv.org/abs/1803.00831v1
PDF	http://arxiv.org/pdf/1803.00831v1.pdf
PWC	https://paperswithcode.com/paper/lexico-acoustic-neural-based-models-for
Repo
Framework

RiFCN: Recurrent Network in Fully Convolutional Network for Semantic Segmentation of High Resolution Remote Sensing Images


Title	RiFCN: Recurrent Network in Fully Convolutional Network for Semantic Segmentation of High Resolution Remote Sensing Images
Authors	Lichao Mou, Xiao Xiang Zhu
Abstract	Semantic segmentation in high resolution remote sensing images is a fundamental and challenging task. Convolutional neural networks (CNNs), such as fully convolutional network (FCN) and SegNet, have shown outstanding performance in many segmentation tasks. One key pillar of these successes is mining useful information from features in convolutional layers for producing high resolution segmentation maps. For example, FCN nonlinearly combines high-level features extracted from last convolutional layers; whereas SegNet utilizes a deconvolutional network which takes as input only coarse, high-level feature maps of the last convolutional layer. However, how to better fuse multi-level convolutional feature maps for semantic segmentation of remote sensing images is underexplored. In this work, we propose a novel bidirectional network called recurrent network in fully convolutional network (RiFCN), which is end-to-end trainable. It has a forward stream and a backward stream. The former is a classification CNN architecture for feature extraction, which takes an input image and produces multi-level convolutional feature maps from shallow to deep; while in the later, to achieve accurate boundary inference and semantic segmentation, boundary-aware high resolution feature maps in shallower layers and high-level but low-resolution features are recursively embedded into the learning framework (from deep to shallow) to generate a fused feature representation that draws a holistic picture of not only high-level semantic information but also low-level fine-grained details. Experimental results on two widely-used high resolution remote sensing data sets for semantic segmentation tasks, ISPRS Potsdam and Inria Aerial Image Labeling Data Set, demonstrate competitive performance obtained by the proposed methodology compared to other studied approaches.
Tasks	Semantic Segmentation
Published	2018-05-05
URL	http://arxiv.org/abs/1805.02091v1
PDF	http://arxiv.org/pdf/1805.02091v1.pdf
PWC	https://paperswithcode.com/paper/rifcn-recurrent-network-in-fully
Repo
Framework

Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation


Title	Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation
Authors	Pallabi Ghosh, Yi Yao, Larry S. Davis, Ajay Divakaran
Abstract	We propose novel Stacked Spatio-Temporal Graph Convolutional Networks (Stacked-STGCN) for action segmentation, i.e., predicting and localizing a sequence of actions over long videos. We extend the Spatio-Temporal Graph Convolutional Network (STGCN) originally proposed for skeleton-based action recognition to enable nodes with different characteristics (e.g., scene, actor, object, action, etc.), feature descriptors with varied lengths, and arbitrary temporal edge connections to account for large graph deformation commonly associated with complex activities. We further introduce the stacked hourglass architecture to STGCN to leverage the advantages of an encoder-decoder design for improved generalization performance and localization accuracy. We explore various descriptors such as frame-level VGG, segment-level I3D, RCNN-based object, etc. as node descriptors to enable action segmentation based on joint inference over comprehensive contextual information. We show results on CAD120 (which provides pre-computed node features and edge weights for fair performance comparison across algorithms) as well as a more complex real-world activity dataset, Charades. Our Stacked-STGCN in general achieves 4.0% performance improvement over the best reported results in F1 score on CAD120 and 1.3% in mAP on Charades using VGG features.
Tasks	action segmentation, Skeleton Based Action Recognition, Temporal Action Localization
Published	2018-11-26
URL	https://arxiv.org/abs/1811.10575v6
PDF	https://arxiv.org/pdf/1811.10575v6.pdf
PWC	https://paperswithcode.com/paper/stacked-spatio-temporal-graph-convolutional
Repo
Framework

Synthetic Depth-of-Field with a Single-Camera Mobile Phone


Title	Synthetic Depth-of-Field with a Single-Camera Mobile Phone
Authors	Neal Wadhwa, Rahul Garg, David E. Jacobs, Bryan E. Feldman, Nori Kanazawa, Robert Carroll, Yair Movshovitz-Attias, Jonathan T. Barron, Yael Pritch, Marc Levoy
Abstract	Shallow depth-of-field is commonly used by photographers to isolate a subject from a distracting background. However, standard cell phone cameras cannot produce such images optically, as their short focal lengths and small apertures capture nearly all-in-focus images. We present a system to computationally synthesize shallow depth-of-field images with a single mobile camera and a single button press. If the image is of a person, we use a person segmentation network to separate the person and their accessories from the background. If available, we also use dense dual-pixel auto-focus hardware, effectively a 2-sample light field with an approximately 1 millimeter baseline, to compute a dense depth map. These two signals are combined and used to render a defocused image. Our system can process a 5.4 megapixel image in 4 seconds on a mobile phone, is fully automatic, and is robust enough to be used by non-experts. The modular nature of our system allows it to degrade naturally in the absence of a dual-pixel sensor or a human subject.
Tasks
Published	2018-06-11
URL	http://arxiv.org/abs/1806.04171v1
PDF	http://arxiv.org/pdf/1806.04171v1.pdf
PWC	https://paperswithcode.com/paper/synthetic-depth-of-field-with-a-single-camera
Repo
Framework


Title	Modeling Popularity in Asynchronous Social Media Streams with Recurrent Neural Networks
Authors	Swapnil Mishra, Marian-Andrei Rizoiu, Lexing Xie
Abstract	Understanding and predicting the popularity of online items is an important open problem in social media analysis. Considerable progress has been made recently in data-driven predictions, and in linking popularity to external promotions. However, the existing methods typically focus on a single source of external influence, whereas for many types of online content such as YouTube videos or news articles, attention is driven by multiple heterogeneous sources simultaneously - e.g. microblogs or traditional media coverage. Here, we propose RNN-MAS, a recurrent neural network for modeling asynchronous streams. It is a sequence generator that connects multiple streams of different granularity via joint inference. We show RNN-MAS not only to outperform the current state-of-the-art Youtube popularity prediction system by 17%, but also to capture complex dynamics, such as seasonal trends of unseen influence. We define two new metrics: promotion score quantifies the gain in popularity from one unit of promotion for a Youtube video; the loudness level captures the effects of a particular user tweeting about the video. We use the loudness level to compare the effects of a video being promoted by a single highly-followed user (in the top 1% most followed users) against being promoted by a group of mid-followed users. We find that results depend on the type of content being promoted: superusers are more successful in promoting Howto and Gaming videos, whereas the cohort of regular users are more influential for Activism videos. This work provides more accurate and explainable popularity predictions, as well as computational tools for content producers and marketers to allocate resources for promotion campaigns.
Tasks
Published	2018-04-06
URL	http://arxiv.org/abs/1804.02101v2
PDF	http://arxiv.org/pdf/1804.02101v2.pdf
PWC	https://paperswithcode.com/paper/modeling-popularity-in-asynchronous-social
Repo
Framework

Anomaly Detection Using GANs for Visual Inspection in Noisy Training Data


Title	Anomaly Detection Using GANs for Visual Inspection in Noisy Training Data
Authors	Masanari Kimura, Takashi Yanagihara
Abstract	The detection and the quantification of anomalies in image data are critical tasks in industrial scenes such as detecting micro scratches on product. In recent years, due to the difficulty of defining anomalies and the limit of correcting their labels, research on unsupervised anomaly detection using generative models has attracted attention. Generally, in those studies, only normal images are used for training to model the distribution of normal images. The model measures the anomalies in the target images by reproducing the most similar images and scoring image patches indicating their fit to the learned distribution. This approach is based on a strong presumption; the trained model should not be able to generate abnormal images. However, in reality, the model can generate abnormal images mainly due to noisy normal data which include small abnormal pixels, and such noise severely affects the accuracy of the model. Therefore, we propose a novel anomaly detection method to distort the distribution of the model with existing abnormal images. The proposed method detects pixel-level micro anomalies with a high accuracy from 1024x1024 high resolution images which are actually used in an industrial scene. In this paper, we share experimental results on open datasets, due to the confidentiality of the data.
Tasks	Anomaly Detection, Unsupervised Anomaly Detection
Published	2018-07-03
URL	http://arxiv.org/abs/1807.01136v2
PDF	http://arxiv.org/pdf/1807.01136v2.pdf
PWC	https://paperswithcode.com/paper/anomaly-detection-using-gans-for-visual
Repo
Framework

Robust low-rank multilinear tensor approximation for a joint estimation of the multilinear rank and the loading matrices


Title	Robust low-rank multilinear tensor approximation for a joint estimation of the multilinear rank and the loading matrices
Authors	Xu Han, Laurent Albera, Amar Kachenoura, Huazhong Shu, Lotfi Senhadji
Abstract	In order to compute the best low-rank tensor approximation using the Multilinear Tensor Decomposition (MTD) model, it is essential to estimate the rank of the underlying multilinear tensor from the noisy observation tensor. In this paper, we propose a Robust MTD (R-MTD) method, which jointly estimates the multilinear rank and the loading matrices. Based on the low-rank property and an over-estimation of the core tensor, this joint estimation problem is solved by promoting (group) sparsity of the over-estimated core tensor. Group sparsity is promoted using mixed-norms. Then we establish a link between the mixed-norms and the nuclear norm, showing that mixed-norms are better candidates for a convex envelope of the rank. After several iterations of the Alternating Direction Method of Multipliers (ADMM), the Minimum Description Length (MDL) criterion computed from the eigenvalues of the unfolding matrices of the estimated core tensor is minimized in order to estimate the multilinear rank. The latter is then used to estimate more accurately the loading matrices. We further develop another R-MTD method, called R-OMTD, by imposing an orthonormality constraint on each loading matrix in order to decrease the computation complexity. A series of simulated noisy tensor and real-world data are used to show the effectiveness of the proposed methods compared with state-of-the-art methods.
Tasks
Published	2018-11-14
URL	http://arxiv.org/abs/1811.05863v2
PDF	http://arxiv.org/pdf/1811.05863v2.pdf
PWC	https://paperswithcode.com/paper/robust-low-rank-multilinear-tensor
Repo
Framework

Lightweight Classification of IoT Malware based on Image Recognition


Title	Lightweight Classification of IoT Malware based on Image Recognition
Authors	Jiawei Su, Danilo Vasconcellos Vargas, Sanjiva Prasad, Daniele Sgandurra, Yaokai Feng, Kouichi Sakurai
Abstract	The Internet of Things (IoT) is an extension of the traditional Internet, which allows a very large number of smart devices, such as home appliances, network cameras, sensors and controllers to connect to one another to share information and improve user experiences. Current IoT devices are typically micro-computers for domain-specific computations rather than traditional functionspecific embedded devices. Therefore, many existing attacks, targeted at traditional computers connected to the Internet, may also be directed at IoT devices. For example, DDoS attacks have become very common in IoT environments, as these environments currently lack basic security monitoring and protection mechanisms, as shown by the recent Mirai and Brickerbot IoT botnets. In this paper, we propose a novel light-weight approach for detecting DDos malware in IoT environments.We firstly extract one-channel gray-scale images converted from binaries, and then utilize a lightweight convolutional neural network for classifying IoT malware families. The experimental results show that the proposed system can achieve 94.0% accuracy for the classification of goodware and DDoS malware, and 81.8% accuracy for the classification of goodware and two main malware families.
Tasks
Published	2018-02-11
URL	http://arxiv.org/abs/1802.03714v1
PDF	http://arxiv.org/pdf/1802.03714v1.pdf
PWC	https://paperswithcode.com/paper/lightweight-classification-of-iot-malware
Repo
Framework

When Work Matters: Transforming Classical Network Structures to Graph CNN


Title	When Work Matters: Transforming Classical Network Structures to Graph CNN
Authors	Wenting Zhao, Chunyan Xu, Zhen Cui, Tong Zhang, Jiatao Jiang, Zhenyu Zhang, Jian Yang
Abstract	Numerous pattern recognition applications can be formed as learning from graph-structured data, including social network, protein-interaction network, the world wide web data, knowledge graph, etc. While convolutional neural network (CNN) facilitates great advances in gridded image/video understanding tasks, very limited attention has been devoted to transform these successful network structures (including Inception net, Residual net, Dense net, etc.) to establish convolutional networks on graph, due to its irregularity and complexity geometric topologies (unordered vertices, unfixed number of adjacent edges/vertices). In this paper, we aim to give a comprehensive analysis of when work matters by transforming different classical network structures to graph CNN, particularly in the basic graph recognition problem. Specifically, we firstly review the general graph CNN methods, especially in its spectral filtering operation on the irregular graph data. We then introduce the basic structures of ResNet, Inception and DenseNet into graph CNN and construct these network structures on graph, named as G_ResNet, G_Inception, G_DenseNet. In particular, it seeks to help graph CNNs by shedding light on how these classical network structures work and providing guidelines for choosing appropriate graph network frameworks. Finally, we comprehensively evaluate the performance of these different network structures on several public graph datasets (including social networks and bioinformatic datasets), and demonstrate how different network structures work on graph CNN in the graph recognition task.
Tasks	Graph Classification, Video Understanding
Published	2018-07-07
URL	http://arxiv.org/abs/1807.02653v1
PDF	http://arxiv.org/pdf/1807.02653v1.pdf
PWC	https://paperswithcode.com/paper/when-work-matters-transforming-classical
Repo
Framework

Modeling Interpersonal Influence of Verbal Behavior in Couples Therapy Dyadic Interactions


Title	Modeling Interpersonal Influence of Verbal Behavior in Couples Therapy Dyadic Interactions
Authors	Sandeep Nallan Chakravarthula, Brian Baucom, Panayiotis Georgiou
Abstract	Dyadic interactions among humans are marked by speakers continuously influencing and reacting to each other in terms of responses and behaviors, among others. Understanding how interpersonal dynamics affect behavior is important for successful treatment in psychotherapy domains. Traditional schemes that automatically identify behavior for this purpose have often looked at only the target speaker. In this work, we propose a Markov model of how a target speaker’s behavior is influenced by their own past behavior as well as their perception of their partner’s behavior, based on lexical features. Apart from incorporating additional potentially useful information, our model can also control the degree to which the partner affects the target speaker. We evaluate our proposed model on the task of classifying Negative behavior in Couples Therapy and show that it is more accurate than the single-speaker model. Furthermore, we investigate the degree to which the optimal influence relates to how well a couple does on the long-term, via relating to relationship outcomes
Tasks
Published	2018-05-23
URL	http://arxiv.org/abs/1805.09436v1
PDF	http://arxiv.org/pdf/1805.09436v1.pdf
PWC	https://paperswithcode.com/paper/modeling-interpersonal-influence-of-verbal
Repo
Framework

SFA: Small Faces Attention Face Detector


Title	SFA: Small Faces Attention Face Detector
Authors	Shi Luo, Xiongfei Li, Rui Zhu, Xiaoli Zhang
Abstract	In recent year, tremendous strides have been made in face detection thanks to deep learning. However, most published face detectors deteriorate dramatically as the faces become smaller. In this paper, we present the Small Faces Attention (SFA) face detector to better detect faces with small scale. First, we propose a new scale-invariant face detection architecture which pays more attention to small faces, including 4-branch detection architecture and small faces sensitive anchor design. Second, feature maps fusion strategy is applied in SFA by partially combining high-level features into low-level features to further improve the ability of finding hard faces. Third, we use multi-scale training and testing strategy to enhance face detection performance in practice. Comprehensive experiments show that SFA significantly improves face detection performance, especially on small faces. Our real-time SFA face detector can run at 5 FPS on a single GPU as well as maintain high performance. Besides, our final SFA face detector achieves state-of-the-art detection performance on challenging face detection benchmarks, including WIDER FACE and FDDB datasets, with competitive runtime speed. Both our code and models will be available to the research community.
Tasks	Face Detection
Published	2018-12-20
URL	http://arxiv.org/abs/1812.08402v1
PDF	http://arxiv.org/pdf/1812.08402v1.pdf
PWC	https://paperswithcode.com/paper/sfa-small-faces-attention-face-detector
Repo
Framework

Physical Attribute Prediction Using Deep Residual Neural Networks


Title	Physical Attribute Prediction Using Deep Residual Neural Networks
Authors	Rashidedin Jahandideh, Alireza Tavakoli Targhi, Maryam Tahmasbi
Abstract	Images taken from the Internet have been used alongside Deep Learning for many different tasks such as: smile detection, ethnicity, hair style, hair colour, gender and age prediction. After witnessing these usages, we were wondering what other attributes can be predicted from facial images available on the Internet. In this paper we tackle the prediction of physical attributes from face images using Convolutional Neural Networks trained on our dataset named FIRW. We crawled around 61, 000 images from the web, then use face detection to crop faces from these real world images. We choose ResNet-50 as our base network architecture. This network was pretrained for the task of face recognition by using the VGG-Face dataset, and we finetune it by using our own dataset to predict physical attributes. Separate networks are trained for the prediction of body type, ethnicity, gender, height and weight; our models achieve the following accuracies for theses tasks, respectively: 84.58%, 87.34%, 97.97%, 70.51%, 63.99%. To validate our choice of ResNet-50 as the base architecture, we also tackle the famous CelebA dataset. Our models achieve an averagy accuracy of 91.19% on CelebA, which is comparable to state-of-the-art approaches.
Tasks	Face Detection, Face Recognition, Physical Attribute Prediction
Published	2018-12-19
URL	http://arxiv.org/abs/1812.07857v1
PDF	http://arxiv.org/pdf/1812.07857v1.pdf
PWC	https://paperswithcode.com/paper/physical-attribute-prediction-using-deep
Repo
Framework