July 29, 2019

3241 words 16 mins read

Paper Group AWR 83

Seernet at EmoInt-2017: Tweet Emotion Intensity Estimator. Jumping across biomedical contexts using compressive data fusion. B-CNN: Branch Convolutional Neural Network for Hierarchical Classification. Cell Detection in Microscopy Images with Deep Convolutional Neural Network and Compressed Sensing. A Neural Parametric Singing Synthesizer. A Distrib …

Seernet at EmoInt-2017: Tweet Emotion Intensity Estimator


Title	Seernet at EmoInt-2017: Tweet Emotion Intensity Estimator
Authors	Venkatesh Duppada, Sushant Hiray
Abstract	The paper describes experiments on estimating emotion intensity in tweets using a generalized regressor system. The system combines lexical, syntactic and pre-trained word embedding features, trains them on general regressors and finally combines the best performing models to create an ensemble. The proposed system stood 3rd out of 22 systems in the leaderboard of WASSA-2017 Shared Task on Emotion Intensity.
Tasks
Published	2017-08-21
URL	http://arxiv.org/abs/1708.06185v1
PDF	http://arxiv.org/pdf/1708.06185v1.pdf
PWC	https://paperswithcode.com/paper/seernet-at-emoint-2017-tweet-emotion
Repo	https://github.com/SEERNET/EmoInt
Framework	none

Jumping across biomedical contexts using compressive data fusion


Title	Jumping across biomedical contexts using compressive data fusion
Authors	Marinka Zitnik, Blaz Zupan
Abstract	Motivation: The rapid growth of diverse biological data allows us to consider interactions between a variety of objects, such as genes, chemicals, molecular signatures, diseases, pathways and environmental exposures. Often, any pair of objects–such as a gene and a disease–can be related in different ways, for example, directly via gene-disease associations or indirectly via functional annotations, chemicals and pathways. Different ways of relating these objects carry different semantic meanings. However, traditional methods disregard these semantics and thus cannot fully exploit their value in data modeling. Results: We present Medusa, an approach to detect size-k modules of objects that, taken together, appear most significant to another set of objects. Medusa operates on large-scale collections of heterogeneous data sets and explicitly distinguishes between diverse data semantics. It advances research along two dimensions: it builds on collective matrix factorization to derive different semantics, and it formulates the growing of the modules as a submodular optimization program. Medusa is flexible in choosing or combining semantic meanings and provides theoretical guarantees about detection quality. In a systematic study on 310 complex diseases, we show the effectiveness of Medusa in associating genes with diseases and detecting disease modules. We demonstrate that in predicting gene-disease associations Medusa compares favorably to methods that ignore diverse semantic meanings. We find that the utility of different semantics depends on disease categories and that, overall, Medusa recovers disease modules more accurately when combining different semantics.
Tasks
Published	2017-08-10
URL	http://arxiv.org/abs/1708.03392v1
PDF	http://arxiv.org/pdf/1708.03392v1.pdf
PWC	https://paperswithcode.com/paper/jumping-across-biomedical-contexts-using
Repo	https://github.com/marinkaz/medusa
Framework	none

B-CNN: Branch Convolutional Neural Network for Hierarchical Classification


Title	B-CNN: Branch Convolutional Neural Network for Hierarchical Classification
Authors	Xinqi Zhu, Michael Bain
Abstract	Convolutional Neural Network (CNN) image classifiers are traditionally designed to have sequential convolutional layers with a single output layer. This is based on the assumption that all target classes should be treated equally and exclusively. However, some classes can be more difficult to distinguish than others, and classes may be organized in a hierarchy of categories. At the same time, a CNN is designed to learn internal representations that abstract from the input data based on its hierarchical layered structure. So it is natural to ask if an inverse of this idea can be applied to learn a model that can predict over a classification hierarchy using multiple output layers in decreasing order of class abstraction. In this paper, we introduce a variant of the traditional CNN model named the Branch Convolutional Neural Network (B-CNN). A B-CNN model outputs multiple predictions ordered from coarse to fine along the concatenated convolutional layers corresponding to the hierarchical structure of the target classes, which can be regarded as a form of prior knowledge on the output. To learn with B-CNNs a novel training strategy, named the Branch Training strategy (BT-strategy), is introduced which balances the strictness of the prior with the freedom to adjust parameters on the output layers to minimize the loss. In this way we show that CNN based models can be forced to learn successively coarse to fine concepts in the internal layers at the output stage, and that hierarchical prior knowledge can be adopted to boost CNN models’ classification performance. Our models are evaluated to show that the B-CNN extensions improve over the corresponding baseline CNN on the benchmark datasets MNIST, CIFAR-10 and CIFAR-100.
Tasks
Published	2017-09-28
URL	http://arxiv.org/abs/1709.09890v2
PDF	http://arxiv.org/pdf/1709.09890v2.pdf
PWC	https://paperswithcode.com/paper/b-cnn-branch-convolutional-neural-network-for
Repo	https://github.com/zhuxinqimac/B-CNN
Framework	tf

Cell Detection in Microscopy Images with Deep Convolutional Neural Network and Compressed Sensing


Title	Cell Detection in Microscopy Images with Deep Convolutional Neural Network and Compressed Sensing
Authors	Yao Xue, Nilanjan Ray
Abstract	The ability to automatically detect certain types of cells or cellular subunits in microscopy images is of significant interest to a wide range of biomedical research and clinical practices. Cell detection methods have evolved from employing hand-crafted features to deep learning-based techniques. The essential idea of these methods is that their cell classifiers or detectors are trained in the pixel space, where the locations of target cells are labeled. In this paper, we seek a different route and propose a convolutional neural network (CNN)-based cell detection method that uses encoding of the output pixel space. For the cell detection problem, the output space is the sparsely labeled pixel locations indicating cell centers. We employ random projections to encode the output space to a compressed vector of fixed dimension. Then, CNN regresses this compressed vector from the input pixels. Furthermore, it is possible to stably recover sparse cell locations on the output pixel space from the predicted compressed vector using $L_1$-norm optimization. In the past, output space encoding using compressed sensing (CS) has been used in conjunction with linear and non-linear predictors. To the best of our knowledge, this is the first successful use of CNN with CS-based output space encoding. We made substantial experiments on several benchmark datasets, where the proposed CNN + CS framework (referred to as CNNCS) achieved the highest or at least top-3 performance in terms of F1-score, compared with other state-of-the-art methods.
Tasks
Published	2017-08-10
URL	http://arxiv.org/abs/1708.03307v3
PDF	http://arxiv.org/pdf/1708.03307v3.pdf
PWC	https://paperswithcode.com/paper/cell-detection-in-microscopy-images-with-deep
Repo	https://github.com/isgilman/CrossSection_DeepLearning
Framework	none

A Neural Parametric Singing Synthesizer


Title	A Neural Parametric Singing Synthesizer
Authors	Merlijn Blaauw, Jordi Bonada
Abstract	We present a new model for singing synthesis based on a modified version of the WaveNet architecture. Instead of modeling raw waveform, we model features produced by a parametric vocoder that separates the influence of pitch and timbre. This allows conveniently modifying pitch to match any target melody, facilitates training on more modest dataset sizes, and significantly reduces training and generation times. Our model makes frame-wise predictions using mixture density outputs rather than categorical outputs in order to reduce the required parameter count. As we found overfitting to be an issue with the relatively small datasets used in our experiments, we propose a method to regularize the model and make the autoregressive generation process more robust to prediction errors. Using a simple multi-stream architecture, harmonic, aperiodic and voiced/unvoiced components can all be predicted in a coherent manner. We compare our method to existing parametric statistical and state-of-the-art concatenative methods using quantitative metrics and a listening test. While naive implementations of the autoregressive generation algorithm tend to be inefficient, using a smart algorithm we can greatly speed up the process and obtain a system that’s competitive in both speed and quality.
Tasks
Published	2017-04-12
URL	http://arxiv.org/abs/1704.03809v3
PDF	http://arxiv.org/pdf/1704.03809v3.pdf
PWC	https://paperswithcode.com/paper/a-neural-parametric-singing-synthesizer
Repo	https://github.com/seaniezhao/torch_npss
Framework	pytorch

A Distributional Perspective on Reinforcement Learning


Title	A Distributional Perspective on Reinforcement Learning
Authors	Marc G. Bellemare, Will Dabney, Rémi Munos
Abstract	In this paper we argue for the fundamental importance of the value distribution: the distribution of the random return received by a reinforcement learning agent. This is in contrast to the common approach to reinforcement learning which models the expectation of this return, or value. Although there is an established body of literature studying the value distribution, thus far it has always been used for a specific purpose such as implementing risk-aware behaviour. We begin with theoretical results in both the policy evaluation and control settings, exposing a significant distributional instability in the latter. We then use the distributional perspective to design a new algorithm which applies Bellman’s equation to the learning of approximate value distributions. We evaluate our algorithm using the suite of games from the Arcade Learning Environment. We obtain both state-of-the-art results and anecdotal evidence demonstrating the importance of the value distribution in approximate reinforcement learning. Finally, we combine theoretical and empirical evidence to highlight the ways in which the value distribution impacts learning in the approximate setting.
Tasks	Atari Games
Published	2017-07-21
URL	http://arxiv.org/abs/1707.06887v1
PDF	http://arxiv.org/pdf/1707.06887v1.pdf
PWC	https://paperswithcode.com/paper/a-distributional-perspective-on-reinforcement
Repo	https://github.com/facebookresearch/Horizon
Framework	pytorch

graph2vec: Learning Distributed Representations of Graphs


Title	graph2vec: Learning Distributed Representations of Graphs
Authors	Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, Yang Liu, Shantanu Jaiswal
Abstract	Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph classification and clustering require representing entire graphs as fixed length feature vectors. While the aforementioned approaches are naturally unequipped to learn such representations, graph kernels remain as the most effective way of obtaining them. However, these graph kernels use handcrafted features (e.g., shortest paths, graphlets, etc.) and hence are hampered by problems such as poor generalization. To address this limitation, in this work, we propose a neural embedding framework named graph2vec to learn data-driven distributed representations of arbitrary sized graphs. graph2vec’s embeddings are learnt in an unsupervised manner and are task agnostic. Hence, they could be used for any downstream task such as graph classification, clustering and even seeding supervised representation learning approaches. Our experiments on several benchmark and large real-world datasets show that graph2vec achieves significant improvements in classification and clustering accuracies over substructure representation learning approaches and are competitive with state-of-the-art graph kernels.
Tasks	Graph Classification, Graph Embedding, Graph Matching, Representation Learning
Published	2017-07-17
URL	http://arxiv.org/abs/1707.05005v1
PDF	http://arxiv.org/pdf/1707.05005v1.pdf
PWC	https://paperswithcode.com/paper/graph2vec-learning-distributed
Repo	https://github.com/benedekrozemberczki/karateclub
Framework	none

Parallel Structure from Motion from Local Increment to Global Averaging


Title	Parallel Structure from Motion from Local Increment to Global Averaging
Authors	Siyu Zhu, Tianwei Shen, Lei Zhou, Runze Zhang, Jinglu Wang, Tian Fang, Long Quan
Abstract	In this paper, we tackle the accurate and consistent Structure from Motion (SfM) problem, in particular camera registration, far exceeding the memory of a single computer in parallel. Different from the previous methods which drastically simplify the parameters of SfM and sacrifice the accuracy of the final reconstruction, we try to preserve the connectivities among cameras by proposing a camera clustering algorithm to divide a large SfM problem into smaller sub-problems in terms of camera clusters with overlapping. We then exploit a hybrid formulation that applies the relative poses from local incremental SfM into a global motion averaging framework and produce accurate and consistent global camera poses. Our scalable formulation in terms of camera clusters is highly applicable to the whole SfM pipeline including track generation, local SfM, 3D point triangulation and bundle adjustment. We are even able to reconstruct the camera poses of a city-scale data-set containing more than one million high-resolution images with superior accuracy and robustness evaluated on benchmark, Internet, and sequential data-sets.
Tasks
Published	2017-02-28
URL	http://arxiv.org/abs/1702.08601v3
PDF	http://arxiv.org/pdf/1702.08601v3.pdf
PWC	https://paperswithcode.com/paper/parallel-structure-from-motion-from-local
Repo	https://github.com/AIBluefisher/ResearchWork
Framework	none


Title	Improving Semantic Relevance for Sequence-to-Sequence Learning of Chinese Social Media Text Summarization
Authors	Shuming Ma, Xu Sun, Jingjing Xu, Houfeng Wang, Wenjie Li, Qi Su
Abstract	Current Chinese social media text summarization models are based on an encoder-decoder framework. Although its generated summaries are similar to source texts literally, they have low semantic relevance. In this work, our goal is to improve semantic relevance between source texts and summaries for Chinese social media summarization. We introduce a Semantic Relevance Based neural model to encourage high semantic similarity between texts and summaries. In our model, the source text is represented by a gated attention encoder, while the summary representation is produced by a decoder. Besides, the similarity score between the representations is maximized during training. Our experiments show that the proposed model outperforms baseline systems on a social media corpus.
Tasks	Semantic Similarity, Semantic Textual Similarity, Text Summarization
Published	2017-06-08
URL	http://arxiv.org/abs/1706.02459v1
PDF	http://arxiv.org/pdf/1706.02459v1.pdf
PWC	https://paperswithcode.com/paper/improving-semantic-relevance-for-sequence-to
Repo	https://github.com/shumingma/SRB
Framework	tf

FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks


Title	FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks
Authors	Suo Qiu, Xiangmin Xu, Bolun Cai
Abstract	Rectified linear unit (ReLU) is a widely used activation function for deep convolutional neural networks. However, because of the zero-hard rectification, ReLU networks miss the benefits from negative values. In this paper, we propose a novel activation function called \emph{flexible rectified linear unit (FReLU)} to further explore the effects of negative values. By redesigning the rectified point of ReLU as a learnable parameter, FReLU expands the states of the activation output. When the network is successfully trained, FReLU tends to converge to a negative value, which improves the expressiveness and thus the performance. Furthermore, FReLU is designed to be simple and effective without exponential functions to maintain low cost computation. For being able to easily used in various network architectures, FReLU does not rely on strict assumptions by self-adaption. We evaluate FReLU on three standard image classification datasets, including CIFAR-10, CIFAR-100, and ImageNet. Experimental results show that the proposed method achieves fast convergence and higher performances on both plain and residual networks.
Tasks	Image Classification
Published	2017-06-25
URL	http://arxiv.org/abs/1706.08098v2
PDF	http://arxiv.org/pdf/1706.08098v2.pdf
PWC	https://paperswithcode.com/paper/frelu-flexible-rectified-linear-units-for
Repo	https://github.com/ducha-aiki/caffenet-benchmark
Framework	none

Adaptive Neural Networks for Efficient Inference


Title	Adaptive Neural Networks for Efficient Inference
Authors	Tolga Bolukbasi, Joseph Wang, Ofer Dekel, Venkatesh Saligrama
Abstract	We present an approach to adaptively utilize deep neural networks in order to reduce the evaluation time on new examples without loss of accuracy. Rather than attempting to redesign or approximate existing networks, we propose two schemes that adaptively utilize networks. We first pose an adaptive network evaluation scheme, where we learn a system to adaptively choose the components of a deep network to be evaluated for each example. By allowing examples correctly classified using early layers of the system to exit, we avoid the computational time associated with full evaluation of the network. We extend this to learn a network selection system that adaptively selects the network to be evaluated for each example. We show that computational time can be dramatically reduced by exploiting the fact that many examples can be correctly classified using relatively efficient networks and that complex, computationally costly networks are only necessary for a small fraction of examples. We pose a global objective for learning an adaptive early exit or network selection policy and solve it by reducing the policy learning problem to a layer-by-layer weighted binary classification problem. Empirically, these approaches yield dramatic reductions in computational cost, with up to a 2.8x speedup on state-of-the-art networks from the ImageNet image recognition challenge with minimal (<1%) loss of top5 accuracy.
Tasks
Published	2017-02-25
URL	http://arxiv.org/abs/1702.07811v2
PDF	http://arxiv.org/pdf/1702.07811v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-neural-networks-for-efficient
Repo	https://github.com/NervanaSystems/distiller
Framework	pytorch

DAGER: Deep Age, Gender and Emotion Recognition Using Convolutional Neural Network


Title	DAGER: Deep Age, Gender and Emotion Recognition Using Convolutional Neural Network
Authors	Afshin Dehghan, Enrique G. Ortiz, Guang Shu, Syed Zain Masood
Abstract	This paper describes the details of Sighthound’s fully automated age, gender and emotion recognition system. The backbone of our system consists of several deep convolutional neural networks that are not only computationally inexpensive, but also provide state-of-the-art results on several competitive benchmarks. To power our novel deep networks, we collected large labeled datasets through a semi-supervised pipeline to reduce the annotation effort/time. We tested our system on several public benchmarks and report outstanding results. Our age, gender and emotion recognition models are available to developers through the Sighthound Cloud API at https://www.sighthound.com/products/cloud
Tasks	Emotion Recognition
Published	2017-02-14
URL	http://arxiv.org/abs/1702.04280v2
PDF	http://arxiv.org/pdf/1702.04280v2.pdf
PWC	https://paperswithcode.com/paper/dager-deep-age-gender-and-emotion-recognition
Repo	https://github.com/CVxTz/face_age_gender
Framework	tf


Title	Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering
Authors	Zhou Yu, Jun Yu, Jianping Fan, Dacheng Tao
Abstract	Visual question answering (VQA) is challenging because it requires a simultaneous understanding of both the visual content of images and the textual content of questions. The approaches used to represent the images and questions in a fine-grained manner and questions and to fuse these multi-modal features play key roles in performance. Bilinear pooling based models have been shown to outperform traditional linear models for VQA, but their high-dimensional representations and high computational complexity may seriously limit their applicability in practice. For multi-modal feature fusion, here we develop a Multi-modal Factorized Bilinear (MFB) pooling approach to efficiently and effectively combine multi-modal features, which results in superior performance for VQA compared with other bilinear pooling approaches. For fine-grained image and question representation, we develop a co-attention mechanism using an end-to-end deep network architecture to jointly learn both the image and question attentions. Combining the proposed MFB approach with co-attention learning in a new network architecture provides a unified model for VQA. Our experimental results demonstrate that the single MFB with co-attention model achieves new state-of-the-art performance on the real-world VQA dataset. Code available at https://github.com/yuzcccc/mfb.
Tasks	Question Answering, Visual Question Answering
Published	2017-08-04
URL	http://arxiv.org/abs/1708.01471v1
PDF	http://arxiv.org/pdf/1708.01471v1.pdf
PWC	https://paperswithcode.com/paper/multi-modal-factorized-bilinear-pooling-with
Repo	https://github.com/yuzcccc/vqa-mfb
Framework	caffe2

Web-based visualisation of head pose and facial expressions changes: monitoring human activity using depth data


Title	Web-based visualisation of head pose and facial expressions changes: monitoring human activity using depth data
Authors	Grigorios Kalliatakis, Nikolaos Vidakis, Georgios Triantafyllidis
Abstract	Despite significant recent advances in the field of head pose estimation and facial expression recognition, raising the cognitive level when analysing human activity presents serious challenges to current concepts. Motivated by the need of generating comprehensible visual representations from different sets of data, we introduce a system capable of monitoring human activity through head pose and facial expression changes, utilising an affordable 3D sensing technology (Microsoft Kinect sensor). An approach build on discriminative random regression forests was selected in order to rapidly and accurately estimate head pose changes in unconstrained environment. In order to complete the secondary process of recognising four universal dominant facial expressions (happiness, anger, sadness and surprise), emotion recognition via facial expressions (ERFE) was adopted. After that, a lightweight data exchange format (JavaScript Object Notation-JSON) is employed, in order to manipulate the data extracted from the two aforementioned settings. Such mechanism can yield a platform for objective and effortless assessment of human activity within the context of serious gaming and human-computer interaction.
Tasks	Emotion Recognition, Facial Expression Recognition, Head Pose Estimation, Pose Estimation
Published	2017-03-11
URL	http://arxiv.org/abs/1703.03949v2
PDF	http://arxiv.org/pdf/1703.03949v2.pdf
PWC	https://paperswithcode.com/paper/web-based-visualisation-of-head-pose-and
Repo	https://github.com/eric-erki/Visualising-Facial-Expression-Changes
Framework	none

VSE++: Improving Visual-Semantic Embeddings with Hard Negatives


Title	VSE++: Improving Visual-Semantic Embeddings with Hard Negatives
Authors	Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, Sanja Fidler
Abstract	We present a new technique for learning visual-semantic embeddings for cross-modal retrieval. Inspired by hard negative mining, the use of hard negatives in structured prediction, and ranking loss functions, we introduce a simple change to common loss functions used for multi-modal embeddings. That, combined with fine-tuning and use of augmented data, yields significant gains in retrieval performance. We showcase our approach, VSE++, on MS-COCO and Flickr30K datasets, using ablation studies and comparisons with existing methods. On MS-COCO our approach outperforms state-of-the-art methods by 8.8% in caption retrieval and 11.3% in image retrieval (at R@1).
Tasks	Cross-Modal Retrieval, Image Retrieval, Structured Prediction
Published	2017-07-18
URL	http://arxiv.org/abs/1707.05612v4
PDF	http://arxiv.org/pdf/1707.05612v4.pdf
PWC	https://paperswithcode.com/paper/vse-improving-visual-semantic-embeddings-with
Repo	https://github.com/Cadene/recipe1m.bootstrap.pytorch
Framework	pytorch