January 25, 2020

3373 words 16 mins read

Paper Group ANR 1695

Q-DATA: Enhanced Traffic Flow Monitoring in Software-Defined Networks applying Q-learning. end-to-end training of a large vocabulary end-to-end speech recognition system. Low-rank Random Tensor for Bilinear Pooling. Analyzing Customer Feedback for Product Fit Prediction. Efficient Contextual Representation Learning Without Softmax Layer. Weakly-Sup …

Q-DATA: Enhanced Traffic Flow Monitoring in Software-Defined Networks applying Q-learning


Title	Q-DATA: Enhanced Traffic Flow Monitoring in Software-Defined Networks applying Q-learning
Authors	Trung V. Phan, Syed Tasnimul Islam, Tri Gia Nguyen, Thomas Bauschert
Abstract	Software-Defined Networking (SDN) introduces a centralized network control and management by separating the data plane from the control plane which facilitates traffic flow monitoring, security analysis and policy formulation. However, it is challenging to choose a proper degree of traffic flow handling granularity while proactively protecting forwarding devices from getting overloaded. In this paper, we propose a novel traffic flow matching control framework called Q-DATA that applies reinforcement learning in order to enhance the traffic flow monitoring performance in SDN based networks and prevent traffic forwarding performance degradation. We first describe and analyse an SDN-based traffic flow matching control system that applies a reinforcement learning approach based on Q-learning algorithm in order to maximize the traffic flow granularity. It also considers the forwarding performance status of the SDN switches derived from a Support Vector Machine based algorithm. Next, we outline the Q-DATA framework that incorporates the optimal traffic flow matching policy derived from the traffic flow matching control system to efficiently provide the most detailed traffic flow information that other mechanisms require. Our novel approach is realized as a REST SDN application and evaluated in an SDN environment. Through comprehensive experiments, the results show that—compared to the default behavior of common SDN controllers and to our previous DATA mechanism—the new Q-DATA framework yields a remarkable improvement in terms of traffic forwarding performance degradation protection of SDN switches while still providing the most detailed traffic flow information on demand.
Tasks	Q-Learning
Published	2019-09-04
URL	https://arxiv.org/abs/1909.01544v1
PDF	https://arxiv.org/pdf/1909.01544v1.pdf
PWC	https://paperswithcode.com/paper/q-data-enhanced-traffic-flow-monitoring-in
Repo
Framework

end-to-end training of a large vocabulary end-to-end speech recognition system


Title	end-to-end training of a large vocabulary end-to-end speech recognition system
Authors	Chanwoo Kim, Sungsoo Kim, Kwangyoun Kim, Mehul Kumar, Jiyeon Kim, Kyungmin Lee, Changwoo Han, Abhinav Garg, Eunhyang Kim, Minkyoo Shin, Shatrughan Singh, Larry Heck, Dhananjaya Gowda
Abstract	In this paper, we present an end-to-end training framework for building state-of-the-art end-to-end speech recognition systems. Our training system utilizes a cluster of Central Processing Units(CPUs) and Graphics Processing Units (GPUs). The entire data reading, large scale data augmentation, neural network parameter updates are all performed “on-the-fly”. We use vocal tract length perturbation [1] and an acoustic simulator [2] for data augmentation. The processed features and labels are sent to the GPU cluster. The Horovod allreduce approach is employed to train neural network parameters. We evaluated the effectiveness of our system on the standard Librispeech corpus [3] and the 10,000-hr anonymized Bixby English dataset. Our end-to-end speech recognition system built using this training infrastructure showed a 2.44 % WER on test-clean of the LibriSpeech test set after applying shallow fusion with a Transformer language model (LM). For the proprietary English Bixby open domain test set, we obtained a WER of 7.92 % using a Bidirectional Full Attention (BFA) end-to-end model after applying shallow fusion with an RNN-LM. When the monotonic chunckwise attention (MoCha) based approach is employed for streaming speech recognition, we obtained a WER of 9.95 % on the same Bixby open domain test set.
Tasks	Data Augmentation, End-To-End Speech Recognition, Language Modelling, Speech Recognition
Published	2019-12-22
URL	https://arxiv.org/abs/1912.11040v1
PDF	https://arxiv.org/pdf/1912.11040v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-training-of-a-large-vocabulary-end
Repo
Framework

Low-rank Random Tensor for Bilinear Pooling


Title	Low-rank Random Tensor for Bilinear Pooling
Authors	Yan Zhang, Krikamol Muandet, Qianli Ma, Heiko Neumann, Siyu Tang
Abstract	Bilinear pooling is capable of extracting high-order information from data, which makes it suitable for fine-grained visual understanding and information fusion. Despite their effectiveness in various applications, bilinear models with massive number of parameters can easily suffer from curse of dimensionality and intractable computation. In this paper, we propose a novel bilinear model based on low-rank random tensors. The key idea is to effectively combine low-rank tensor decomposition and random projection to reduce the number of parameters while preserving the model representativeness. From the theoretical perspective, we prove that our bilinear model with random tensors can estimate feature maps to reproducing kernel Hilbert spaces (RKHSs) with compositional kernels, grounding the high-dimensional feature fusion with theoretical foundations. From the application perspective, our low-rank tensor operation is lightweight, and can be integrated into standard neural network architectures to enable high-order information fusion. We perform extensive experiments to show that the use of our model leads to state-of-the-art performance on several challenging fine-grained action parsing benchmarks.
Tasks	Action Parsing
Published	2019-06-03
URL	https://arxiv.org/abs/1906.01004v1
PDF	https://arxiv.org/pdf/1906.01004v1.pdf
PWC	https://paperswithcode.com/paper/low-rank-random-tensor-for-bilinear-pooling
Repo
Framework

Analyzing Customer Feedback for Product Fit Prediction


Title	Analyzing Customer Feedback for Product Fit Prediction
Authors	Stephan Baier
Abstract	One of the biggest hurdles for customers when purchasing fashion online, is the difficulty of finding products with the right fit. In order to provide a better online shopping experience, platforms need to find ways to recommend the right product sizes and the best fitting products to their customers. These recommendation systems, however, require customer feedback in order to estimate the most suitable sizing options. Such feedback is rare and often only available as natural text. In this paper, we examine the extraction of product fit feedback from customer reviews using natural language processing techniques. In particular, we compare traditional methods with more recent transfer learning techniques for text classification, and analyze their results. Our evaluation shows, that the transfer learning approach ULMFit is not only comparatively fast to train, but also achieves highest accuracy on this task. The integration of the extracted information with actual size recommendation systems is left for future work.
Tasks	Recommendation Systems, Text Classification, Transfer Learning
Published	2019-08-28
URL	https://arxiv.org/abs/1908.10896v1
PDF	https://arxiv.org/pdf/1908.10896v1.pdf
PWC	https://paperswithcode.com/paper/analyzing-customer-feedback-for-product-fit
Repo
Framework

Efficient Contextual Representation Learning Without Softmax Layer


Title	Efficient Contextual Representation Learning Without Softmax Layer
Authors	Liunian Harold Li, Patrick H. Chen, Cho-Jui Hsieh, Kai-Wei Chang
Abstract	Contextual representation models have achieved great success in improving various downstream tasks. However, these language-model-based encoders are difficult to train due to the large parameter sizes and high computational complexity. By carefully examining the training procedure, we find that the softmax layer (the output layer) causes significant inefficiency due to the large vocabulary size. Therefore, we redesign the learning objective and propose an efficient framework for training contextual representation models. Specifically, the proposed approach bypasses the softmax layer by performing language modeling with dimension reduction, and allows the models to leverage pre-trained word embeddings. Our framework reduces the time spent on the output layer to a negligible level, eliminates almost all the trainable parameters of the softmax layer and performs language modeling without truncating the vocabulary. When applied to ELMo, our method achieves a 4 times speedup and eliminates 80% trainable parameters while achieving competitive performance on downstream tasks.
Tasks	Dimensionality Reduction, Language Modelling, Representation Learning, Word Embeddings
Published	2019-02-28
URL	http://arxiv.org/abs/1902.11269v1
PDF	http://arxiv.org/pdf/1902.11269v1.pdf
PWC	https://paperswithcode.com/paper/efficient-contextual-representation-learning
Repo
Framework

Weakly-Supervised Video Moment Retrieval via Semantic Completion Network


Title	Weakly-Supervised Video Moment Retrieval via Semantic Completion Network
Authors	Zhijie Lin, Zhou Zhao, Zhu Zhang, Qi Wang, Huasheng Liu
Abstract	Video moment retrieval is to search the moment that is most relevant to the given natural language query. Existing methods are mostly trained in a fully-supervised setting, which requires the full annotations of temporal boundary for each query. However, manually labeling the annotations is actually time-consuming and expensive. In this paper, we propose a novel weakly-supervised moment retrieval framework requiring only coarse video-level annotations for training. Specifically, we devise a proposal generation module that aggregates the context information to generate and score all candidate proposals in one single pass. We then devise an algorithm that considers both exploitation and exploration to select top-K proposals. Next, we build a semantic completion module to measure the semantic similarity between the selected proposals and query, compute reward and provide feedbacks to the proposal generation module for scoring refinement. Experiments on the ActivityCaptions and Charades-STA demonstrate the effectiveness of our proposed method.
Tasks	Semantic Similarity, Semantic Textual Similarity
Published	2019-11-19
URL	https://arxiv.org/abs/1911.08199v3
PDF	https://arxiv.org/pdf/1911.08199v3.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-video-moment-retrieval-via
Repo
Framework

DLGNet: A Transformer-based Model for Dialogue Response Generation


Title	DLGNet: A Transformer-based Model for Dialogue Response Generation
Authors	Oluwatobi Olabiyi, Erik T. Mueller
Abstract	Neural dialogue models, despite their successes, still suffer from lack of relevance, diversity, and in many cases coherence in their generated responses. These issues can attributed to reasons including (1) short-range model architectures that capture limited temporal dependencies, (2) limitations of the maximum likelihood training objective, (3) the concave entropy profile of dialogue datasets resulting in short and generic responses, and (4) the out-of-vocabulary problem leading to generation of a large number of tokens. On the other hand, transformer-based models such as GPT-2 have demonstrated an excellent ability to capture long-range structures in language modeling tasks. In this paper, we present DLGNet, a transformer-based model for dialogue modeling. We specifically examine the use of DLGNet for multi-turn dialogue response generation. In our experiments, we evaluate DLGNet on the open-domain Movie Triples dataset and the closed-domain Ubuntu Dialogue dataset. DLGNet models, although trained with only the maximum likelihood objective, achieve significant improvements over state-of-the-art multi-turn dialogue models. They also produce best performance to date on the two datasets based on several metrics, including BLEU, ROUGE, and distinct n-gram. Our analysis shows that the performance improvement is mostly due to the combination of (1) the long-range transformer architecture with (2) the injection of random informative paddings. Other contributing factors include the joint modeling of dialogue context and response, and the 100% tokenization coverage from the byte pair encoding (BPE).
Tasks	Language Modelling, Tokenization
Published	2019-07-26
URL	https://arxiv.org/abs/1908.01841v2
PDF	https://arxiv.org/pdf/1908.01841v2.pdf
PWC	https://paperswithcode.com/paper/multi-turn-dialogue-response-generation-with
Repo
Framework

Do Transformer Attention Heads Provide Transparency in Abstractive Summarization?


Title	Do Transformer Attention Heads Provide Transparency in Abstractive Summarization?
Authors	Joris Baan, Maartje ter Hoeve, Marlies van der Wees, Anne Schuth, Maarten de Rijke
Abstract	Learning algorithms become more powerful, often at the cost of increased complexity. In response, the demand for algorithms to be transparent is growing. In NLP tasks, attention distributions learned by attention-based deep learning models are used to gain insights in the models’ behavior. To which extent is this perspective valid for all NLP tasks? We investigate whether distributions calculated by different attention heads in a transformer architecture can be used to improve transparency in the task of abstractive summarization. To this end, we present both a qualitative and quantitative analysis to investigate the behavior of the attention heads. We show that some attention heads indeed specialize towards syntactically and semantically distinct input. We propose an approach to evaluate to which extent the Transformer model relies on specifically learned attention distributions. We also discuss what this implies for using attention distributions as a means of transparency.
Tasks	Abstractive Text Summarization
Published	2019-07-01
URL	https://arxiv.org/abs/1907.00570v2
PDF	https://arxiv.org/pdf/1907.00570v2.pdf
PWC	https://paperswithcode.com/paper/do-transformer-attention-heads-provide
Repo
Framework

Jumping Manifolds: Geometry Aware Dense Non-Rigid Structure from Motion


Title	Jumping Manifolds: Geometry Aware Dense Non-Rigid Structure from Motion
Authors	Suryansh Kumar
Abstract	Given dense image feature correspondences of a non-rigidly moving object across multiple frames, this paper proposes an algorithm to estimate its 3D shape for each frame. To solve this problem accurately, the recent state-of-the-art algorithm reduces this task to set of local linear subspace reconstruction and clustering problem using Grassmann manifold representation \cite{kumar2018scalable}. Unfortunately, their method missed on some of the critical issues associated with the modeling of surface deformations, for e.g., the dependence of a local surface deformation on its neighbors. Furthermore, their representation to group high dimensional data points inevitably introduce the drawbacks of categorizing samples on the high-dimensional Grassmann manifold \cite{huang2015projection, harandi2014manifold}. Hence, to deal with such limitations with \cite{kumar2018scalable}, we propose an algorithm that jointly exploits the benefit of high-dimensional Grassmann manifold to perform reconstruction, and its equivalent lower-dimensional representation to infer suitable clusters. To accomplish this, we project each Grassmannians onto a lower-dimensional Grassmann manifold which preserves and respects the deformation of the structure w.r.t its neighbors. These Grassmann points in the lower-dimension then act as a representative for the selection of high-dimensional Grassmann samples to perform each local reconstruction. In practice, our algorithm provides a geometrically efficient way to solve dense NRSfM by switching between manifolds based on its benefit and usage. Experimental results show that the proposed algorithm is very effective in handling noise with reconstruction accuracy as good as or better than the competing methods.
Tasks
Published	2019-02-04
URL	http://arxiv.org/abs/1902.01077v3
PDF	http://arxiv.org/pdf/1902.01077v3.pdf
PWC	https://paperswithcode.com/paper/jumping-manifolds-geometry-aware-dense-non
Repo
Framework

Style-aware Neural Model with Application in Authorship Attribution


Title	Style-aware Neural Model with Application in Authorship Attribution
Authors	Fereshteh Jafariakinabad, Kien A. Hua
Abstract	Writing style is a combination of consistent decisions associated with a specific author at different levels of language production, including lexical, syntactic, and structural. In this paper, we introduce a style-aware neural model to encode document information from three stylistic levels and evaluate it in the domain of authorship attribution. First, we propose a simple way to jointly encode syntactic and lexical representations of sentences. Subsequently, we employ an attention-based hierarchical neural network to encode the syntactic and semantic structure of sentences in documents while rewarding the sentences which contribute more to capturing the writing style. Our experimental results, based on four benchmark datasets, reveal the benefits of encoding document information from all three stylistic levels when compared to the baseline methods in the literature.
Tasks
Published	2019-09-12
URL	https://arxiv.org/abs/1909.06194v1
PDF	https://arxiv.org/pdf/1909.06194v1.pdf
PWC	https://paperswithcode.com/paper/style-aware-neural-model-with-application-in
Repo
Framework

Document Network Embedding: Coping for Missing Content and Missing Links


Title	Document Network Embedding: Coping for Missing Content and Missing Links
Authors	Jean Dupuy, Adrien Guille, Julien Jacques
Abstract	Searching through networks of documents is an important task. A promising path to improve the performance of information retrieval systems in this context is to leverage dense node and content representations learned with embedding techniques. However, these techniques cannot learn representations for documents that are either isolated or whose content is missing. To tackle this issue, assuming that the topology of the network and the content of the documents correlate, we propose to estimate the missing node representations from the available content representations, and conversely. Inspired by recent advances in machine translation, we detail in this paper how to learn a linear transformation from a set of aligned content and node representations. The projection matrix is efficiently calculated in terms of the singular value decomposition. The usefulness of the proposed method is highlighted by the improved ability to predict the neighborhood of nodes whose links are unobserved based on the projected content representations, and to retrieve similar documents when content is missing, based on the projected node representations.
Tasks	Information Retrieval, Machine Translation, Network Embedding
Published	2019-12-06
URL	https://arxiv.org/abs/1912.03048v1
PDF	https://arxiv.org/pdf/1912.03048v1.pdf
PWC	https://paperswithcode.com/paper/document-network-embedding-coping-for-missing
Repo
Framework

BHIN2vec: Balancing the Type of Relation in Heterogeneous Information Network


Title	BHIN2vec: Balancing the Type of Relation in Heterogeneous Information Network
Authors	Seonghyeon Lee, Chanyoung Park, Hwanjo Yu
Abstract	The goal of network embedding is to transform nodes in a network to a low-dimensional embedding vectors. Recently, heterogeneous network has shown to be effective in representing diverse information in data. However, heterogeneous network embedding suffers from the imbalance issue, i.e. the size of relation types (or the number of edges in the network regarding the type) is imbalanced. In this paper, we devise a new heterogeneous network embedding method, called BHIN2vec, which considers the balance among all relation types in a network. We view the heterogeneous network embedding as simultaneously solving multiple tasks in which each task corresponds to each relation type in a network. After splitting the skip-gram loss into multiple losses corresponding to different tasks, we propose a novel random-walk strategy to focus on the tasks with high loss values by considering the relative training ratio. Unlike previous random walk strategies, our proposed random-walk strategy generates training samples according to the relative training ratio among different tasks, which results in a balanced training for the node embedding. Our extensive experiments on node classification and recommendation demonstrate the superiority of BHIN2vec compared to the state-of-the-art methods. Also, based on the relative training ratio, we analyze how much each relation type is represented in the embedding space.
Tasks	Network Embedding, Node Classification
Published	2019-11-26
URL	https://arxiv.org/abs/1912.08925v1
PDF	https://arxiv.org/pdf/1912.08925v1.pdf
PWC	https://paperswithcode.com/paper/bhin2vec-balancing-the-type-of-relation-in
Repo
Framework

Definition Frames: Using Definitions for Hybrid Concept Representations


Title	Definition Frames: Using Definitions for Hybrid Concept Representations
Authors	Evangelia Spiliopoulou, Eduard Hovy
Abstract	Concept representations is a particularly active area in NLP. Although recent advances in distributional semantics have shown tremendous improvements in performance, they still lack semantic interpretability. In this paper, we introduce a novel hybrid representation called Definition Frames, which is extracted from definitions under the formulation of domain-transfer Relation Extraction. Definition Frames are easily reformulated to a matrix representation where each row is semantically meaningful. This results in a fluid representation, where we can prune dimension(s) according to the type of information we want to retain for any specific task. Our results show that Definition Frames (1) maintain the significant semantic information of the original definition (human evaluation) and (2) have competitive performance with other distributional semantic approaches on word similarity tasks. Furthermore, our experiments show substantial improvements over word-embeddings when fine-tuned to a task even using only a linear transform.
Tasks	Relation Extraction, Word Embeddings
Published	2019-09-10
URL	https://arxiv.org/abs/1909.04793v1
PDF	https://arxiv.org/pdf/1909.04793v1.pdf
PWC	https://paperswithcode.com/paper/definition-frames-using-definitions-for
Repo
Framework

Extract and Merge: Merging extracted humans from different images utilizing Mask R-CNN


Title	Extract and Merge: Merging extracted humans from different images utilizing Mask R-CNN
Authors	Asati Minkesh, Kraisittipong Worranitta, Miyachi Taizo
Abstract	Selecting human objects out of the various type of objects in images and merging them with other scenes is manual and day-to-day work for photo editors. Although recently Adobe photoshop released “select subject” tool which automatically selects the foreground object in an image, but still requires fine manual tweaking separately. In this work, we proposed an application utilizing Mask R-CNN (for object detection and mask segmentation) that can extract human instances from multiple images and merge them with a new background. This application does not add any overhead to Mask R-CNN, running at 5 frames per second. It can extract human instances from any number of images or videos from merging them together. We also structured the code to accept videos of different lengths as input and length of the output-video will be equal to the longest input-video. We wanted to create a simple yet effective application that can serve as a base for photo editing and do most time-consuming work automatically, so, editors can focus more on the design part. Other application could be to group people together in a single picture with a new background from different images which could not be physically together. We are showing single-person and multi-person extraction and placement in two different backgrounds. Also, we are showing a video example with single-person extraction.
Tasks	Object Detection
Published	2019-08-01
URL	https://arxiv.org/abs/1908.00398v1
PDF	https://arxiv.org/pdf/1908.00398v1.pdf
PWC	https://paperswithcode.com/paper/extract-and-merge-merging-extracted-humans
Repo
Framework

Semantic Estimation of 3D Body Shape and Pose using Minimal Cameras


Title	Semantic Estimation of 3D Body Shape and Pose using Minimal Cameras
Authors	Andrew Gilbert, Matthew Trumble, Adrian Hilton, John Collomosse
Abstract	We present an approach to accurately estimate high fidelity markerless 3D pose and volumetric reconstruction of human performance using only a small set of camera views ($\sim 2$). Our method utilises a dual loss in a generative adversarial network that can yield improved performance in both reconstruction and pose estimate error. We use a deep prior implicitly learnt by the network trained over a dataset of view-ablated multi-view video footage of a wide range of subjects and actions. Uniquely we use a multi-channel symmetric 3D convolutional encoder-decoder with a dual loss to enforce the learning of a latent embedding that enforces skeletal joint positions and a deep volumetric reconstruction of the performer. An extensive evaluation is performed with state of the art performance reported on three datasets; Human 3.6M, TotalCapture and TotalCaptureOutdoor. The method opens the possibility of high-end volumetric and pose performance capture in on-set and prosumer scenarios where time or cost prohibit a high witness camera count.
Tasks	3D Human Pose Estimation
Published	2019-08-08
URL	https://arxiv.org/abs/1908.03030v1
PDF	https://arxiv.org/pdf/1908.03030v1.pdf
PWC	https://paperswithcode.com/paper/semantic-estimation-of-3d-body-shape-and-pose
Repo
Framework