February 1, 2020

2996 words 15 mins read

Paper Group AWR 106

Tracking without bells and whistles. Neural Collective Entity Linking Based on Recurrent Random Walk Network Learning. IoU-uniform R-CNN: Breaking Through the Limitations of RPN. MoGA: Searching Beyond MobileNetV3. Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Autoencoders. BERT has a Mouth, and It Must Speak: BERT as a Mark …

Tracking without bells and whistles


Title	Tracking without bells and whistles
Authors	Philipp Bergmann, Tim Meinhardt, Laura Leal-Taixe
Abstract	The problem of tracking multiple objects in a video sequence poses several challenging tasks. For tracking-by-detection, these include object re-identification, motion prediction and dealing with occlusions. We present a tracker (without bells and whistles) that accomplishes tracking without specifically targeting any of these tasks, in particular, we perform no training or optimization on tracking data. To this end, we exploit the bounding box regression of an object detector to predict the position of an object in the next frame, thereby converting a detector into a Tracktor. We demonstrate the potential of Tracktor and provide a new state-of-the-art on three multi-object tracking benchmarks by extending it with a straightforward re-identification and camera motion compensation. We then perform an analysis on the performance and failure cases of several state-of-the-art tracking methods in comparison to our Tracktor. Surprisingly, none of the dedicated tracking methods are considerably better in dealing with complex tracking scenarios, namely, small and occluded objects or missing detections. However, our approach tackles most of the easy tracking scenarios. Therefore, we motivate our approach as a new tracking paradigm and point out promising future research directions. Overall, Tracktor yields superior tracking performance than any current tracking method and our analysis exposes remaining and unsolved tracking challenges to inspire future research directions.
Tasks	Motion Compensation, motion prediction, Multi-Object Tracking, Object Tracking
Published	2019-03-13
URL	https://arxiv.org/abs/1903.05625v3
PDF	https://arxiv.org/pdf/1903.05625v3.pdf
PWC	https://paperswithcode.com/paper/tracking-without-bells-and-whistles
Repo	https://github.com/zhanxinrui/tracking_wo_bnw_fork
Framework	pytorch

Neural Collective Entity Linking Based on Recurrent Random Walk Network Learning


Title	Neural Collective Entity Linking Based on Recurrent Random Walk Network Learning
Authors	Mengge Xue, Weiming Cai, Jinsong Su, Linfeng Song, Yubin Ge, Yubao Liu, Bin Wang
Abstract	Benefiting from the excellent ability of neural networks on learning semantic representations, existing studies for entity linking (EL) have resorted to neural networks to exploit both the local mention-to-entity compatibility and the global interdependence between different EL decisions for target entity disambiguation. However, most neural collective EL methods depend entirely upon neural networks to automatically model the semantic dependencies between different EL decisions, which lack of the guidance from external knowledge. In this paper, we propose a novel end-to-end neural network with recurrent random-walk layers for collective EL, which introduces external knowledge to model the semantic interdependence between different EL decisions. Specifically, we first establish a model based on local context features, and then stack random-walk layers to reinforce the evidence for related EL decisions into high-probability decisions, where the semantic interdependence between candidate entities is mainly induced from an external knowledge base. Finally, a semantic regularizer that preserves the collective EL decisions consistency is incorporated into the conventional objective function, so that the external knowledge base can be fully exploited in collective EL decisions. Experimental results and in-depth analysis on various datasets show that our model achieves better performance than other state-of-the-art models. Our code and data are released at \url{https://github.com/DeepLearnXMU/RRWEL}.
Tasks	Entity Disambiguation, Entity Linking, Learning Semantic Representations
Published	2019-06-20
URL	https://arxiv.org/abs/1906.09320v1
PDF	https://arxiv.org/pdf/1906.09320v1.pdf
PWC	https://paperswithcode.com/paper/neural-collective-entity-linking-based-on
Repo	https://github.com/DeepLearnXMU/RRWEL
Framework	pytorch

IoU-uniform R-CNN: Breaking Through the Limitations of RPN


Title	IoU-uniform R-CNN: Breaking Through the Limitations of RPN
Authors	Li Zhu, Zihao Xie, Liman Liu, Bo Tao, Wenbing Tao
Abstract	Region Proposal Network (RPN) is the cornerstone of two-stage object detectors, it generates a sparse set of object proposals and alleviates the extrem foregroundbackground class imbalance problem during training. However, we find that the potential of the detector has not been fully exploited due to the IoU distribution imbalance and inadequate quantity of the training samples generated by RPN. With the increasing intersection over union (IoU), the exponentially smaller numbers of positive samples would lead to the distribution skewed towards lower IoUs, which hinders the optimization of detector at high IoU levels. In this paper, to break through the limitations of RPN, we propose IoU-Uniform R-CNN, a simple but effective method that directly generates training samples with uniform IoU distribution for the regression branch as well as the IoU prediction branch. Besides, we improve the performance of IoU prediction branch by eliminating the feature offsets of RoIs at inference, which helps the NMS procedure by preserving accurately localized bounding box. Extensive experiments on the PASCAL VOC and MS COCO dataset show the effectiveness of our method, as well as its compatibility and adaptivity to many object detection architectures. The code is made publicly available at https://github.com/zl1994/IoU-Uniform-R-CNN,
Tasks	Object Detection
Published	2019-12-11
URL	https://arxiv.org/abs/1912.05190v1
PDF	https://arxiv.org/pdf/1912.05190v1.pdf
PWC	https://paperswithcode.com/paper/iou-uniform-r-cnn-breaking-through-the
Repo	https://github.com/zl1994/IoU-Uniform-R-CNN
Framework	pytorch

MoGA: Searching Beyond MobileNetV3


Title	MoGA: Searching Beyond MobileNetV3
Authors	Xiangxiang Chu, Bo Zhang, Ruijun Xu
Abstract	The evolution of MobileNets has laid a solid foundation for neural network applications on mobile end. With the latest MobileNetV3, neural architecture search again claimed its supremacy in network design. Unfortunately, till today all mobile methods mainly focus on CPU latencies instead of GPU, the latter, however, is much preferred in practice for it has faster speed, lower overhead and less interference. Bearing the target hardware in mind, we propose the first Mobile GPU-Aware (MoGA) neural architecture search in order to be precisely tailored for real-world applications. Further, the ultimate objective to devise a mobile network lies in achieving better performance by maximizing the utilization of bounded resources. Urging higher capability while restraining time consumption is not reconcilable. We alleviate the tension by weighted evolution techniques. Moreover, we encourage increasing the number of parameters for higher representational power. With 200x fewer GPU days than MnasNet, we obtain a series of models that outperform MobileNetV3 under the similar latency constraints, i.e., MoGA-A achieves 75.9% top-1 accuracy on ImageNet, MoGA-B meets 75.5% which costs only 0.5 ms more on mobile GPU. MoGA-C best attests GPU-awareness by reaching 75.3% and being slower on CPU but faster on GPU.The models and test code is made available here https://github.com/xiaomi-automl/MoGA.
Tasks	AutoML, Image Classification, Neural Architecture Search
Published	2019-08-04
URL	https://arxiv.org/abs/1908.01314v4
PDF	https://arxiv.org/pdf/1908.01314v4.pdf
PWC	https://paperswithcode.com/paper/moga-searching-beyond-mobilenetv3
Repo	https://github.com/xiaomi-automl/MoGA
Framework	pytorch

Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Autoencoders


Title	Unsupervised Latent Tree Induction with Deep Inside-Outside Recursive Autoencoders
Authors	Andrew Drozdov, Pat Verga, Mohit Yadav, Mohit Iyyer, Andrew McCallum
Abstract	We introduce deep inside-outside recursive autoencoders (DIORA), a fully-unsupervised method for discovering syntax that simultaneously learns representations for constituents within the induced tree. Our approach predicts each word in an input sentence conditioned on the rest of the sentence and uses inside-outside dynamic programming to consider all possible binary trees over the sentence. At test time the CKY algorithm extracts the highest scoring parse. DIORA achieves a new state-of-the-art F1 in unsupervised binary constituency parsing (unlabeled) in two benchmark datasets, WSJ and MultiNLI.
Tasks	Constituency Parsing
Published	2019-04-03
URL	http://arxiv.org/abs/1904.02142v2
PDF	http://arxiv.org/pdf/1904.02142v2.pdf
PWC	https://paperswithcode.com/paper/unsupervised-latent-tree-induction-with-deep
Repo	https://github.com/iesl/diora
Framework	pytorch

BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model


Title	BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model
Authors	Alex Wang, Kyunghyun Cho
Abstract	We show that BERT (Devlin et al., 2018) is a Markov random field language model. This formulation gives way to a natural procedure to sample sentences from BERT. We generate from BERT and find that it can produce high-quality, fluent generations. Compared to the generations of a traditional left-to-right language model, BERT generates sentences that are more diverse but of slightly worse quality.
Tasks	Language Modelling
Published	2019-02-11
URL	http://arxiv.org/abs/1902.04094v2
PDF	http://arxiv.org/pdf/1902.04094v2.pdf
PWC	https://paperswithcode.com/paper/bert-has-a-mouth-and-it-must-speak-bert-as-a
Repo	https://github.com/nyu-dl/bert-gen
Framework	pytorch

StructureFlow: Image Inpainting via Structure-aware Appearance Flow


Title	StructureFlow: Image Inpainting via Structure-aware Appearance Flow
Authors	Yurui Ren, Xiaoming Yu, Ruonan Zhang, Thomas H. Li, Shan Liu, Ge Li
Abstract	Image inpainting techniques have shown significant improvements by using deep neural networks recently. However, most of them may either fail to reconstruct reasonable structures or restore fine-grained textures. In order to solve this problem, in this paper, we propose a two-stage model which splits the inpainting task into two parts: structure reconstruction and texture generation. In the first stage, edge-preserved smooth images are employed to train a structure reconstructor which completes the missing structures of the inputs. In the second stage, based on the reconstructed structures, a texture generator using appearance flow is designed to yield image details. Experiments on multiple publicly available datasets show the superior performance of the proposed network.
Tasks	Image Inpainting, Texture Synthesis
Published	2019-08-11
URL	https://arxiv.org/abs/1908.03852v1
PDF	https://arxiv.org/pdf/1908.03852v1.pdf
PWC	https://paperswithcode.com/paper/structureflow-image-inpainting-via-structure
Repo	https://github.com/RenYurui/StructureFlow
Framework	pytorch


Title	Deep Social Collaborative Filtering
Authors	Wenqi Fan, Yao Ma, Dawei Yin, Jianping Wang, Jiliang Tang, Qing Li
Abstract	Recommender systems are crucial to alleviate the information overload problem in online worlds. Most of the modern recommender systems capture users’ preference towards items via their interactions based on collaborative filtering techniques. In addition to the user-item interactions, social networks can also provide useful information to understand users’ preference as suggested by the social theories such as homophily and influence. Recently, deep neural networks have been utilized for social recommendations, which facilitate both the user-item interactions and the social network information. However, most of these models cannot take full advantage of the social network information. They only use information from direct neighbors, but distant neighbors can also provide helpful information. Meanwhile, most of these models treat neighbors’ information equally without considering the specific recommendations. However, for a specific recommendation case, the information relevant to the specific item would be helpful. Besides, most of these models do not explicitly capture the neighbor’s opinions to items for social recommendations, while different opinions could affect the user differently. In this paper, to address the aforementioned challenges, we propose DSCF, a Deep Social Collaborative Filtering framework, which can exploit the social relations with various aspects for recommender systems. Comprehensive experiments on two-real world datasets show the effectiveness of the proposed framework.
Tasks	Recommendation Systems
Published	2019-07-16
URL	https://arxiv.org/abs/1907.06853v1
PDF	https://arxiv.org/pdf/1907.06853v1.pdf
PWC	https://paperswithcode.com/paper/deep-social-collaborative-filtering
Repo	https://github.com/wenqifan03/GraphRec-WWW19
Framework	pytorch

NoduleNet: Decoupled False Positive Reductionfor Pulmonary Nodule Detection and Segmentation


Title	NoduleNet: Decoupled False Positive Reductionfor Pulmonary Nodule Detection and Segmentation
Authors	Hao Tang, Chupeng Zhang, Xiaohui Xie
Abstract	Pulmonary nodule detection, false positive reduction and segmentation represent three of the most common tasks in the computeraided analysis of chest CT images. Methods have been proposed for eachtask with deep learning based methods heavily favored recently. However training deep learning models to solve each task separately may be sub-optimal - resource intensive and without the benefit of feature sharing. Here, we propose a new end-to-end 3D deep convolutional neural net (DCNN), called NoduleNet, to solve nodule detection, false positive reduction and nodule segmentation jointly in a multi-task fashion. To avoid friction between different tasks and encourage feature diversification, we incorporate two major design tricks: 1) decoupled feature maps for nodule detection and false positive reduction, and 2) a segmentation refinement subnet for increasing the precision of nodule segmentation. Extensive experiments on the large-scale LIDC dataset demonstrate that the multi-task training is highly beneficial, improving the nodule detection accuracy by 10.27%, compared to the baseline model trained to only solve the nodule detection task. We also carry out systematic ablation studies to highlight contributions from each of the added components. Code is available at https://github.com/uci-cbcl/NoduleNet.
Tasks
Published	2019-07-25
URL	https://arxiv.org/abs/1907.11320v1
PDF	https://arxiv.org/pdf/1907.11320v1.pdf
PWC	https://paperswithcode.com/paper/nodulenet-decoupled-false-positive
Repo	https://github.com/uci-cbcl/NoduleNet
Framework	pytorch

Text Matters but Speech Influences: A Computational Analysis of Syntactic Ambiguity Resolution


Title	Text Matters but Speech Influences: A Computational Analysis of Syntactic Ambiguity Resolution
Authors	Won Ik Cho, Jeonghwa Cho, Woo Hyun Kang, Nam Soo Kim
Abstract	Analyzing how human beings resolve syntactic ambiguity has long been an issue of interest in the field of linguistics. It is, at the same time, one of the most challenging issues for spoken language understanding (SLU) systems as well. As syntactic ambiguity is intertwined with issues regarding prosody and semantics, the computational approach toward speech intention identification is expected to benefit from the observations of the human language processing mechanism. In this regard, we address the task with attentive recurrent neural networks that exploit acoustic and textual features simultaneously and reveal how the modalities interact with each other to derive sentence meaning. Utilizing a speech corpus recorded on Korean scripts of syntactically ambiguous utterances, we revealed that co-attention frameworks, namely multi-hop attention and cross-attention, show significantly superior performance in disambiguating speech intention. With further analysis, we demonstrate that the computational models reflect the internal relationship between auditory and linguistic processes.
Tasks	Spoken Language Understanding
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09275v2
PDF	https://arxiv.org/pdf/1910.09275v2.pdf
PWC	https://paperswithcode.com/paper/disambiguating-speech-intention-via-audio
Repo	https://github.com/warnikchow/coaudiotext
Framework	tf

Torchmeta: A Meta-Learning library for PyTorch


Title	Torchmeta: A Meta-Learning library for PyTorch
Authors	Tristan Deleu, Tobias Würfl, Mandana Samiei, Joseph Paul Cohen, Yoshua Bengio
Abstract	The constant introduction of standardized benchmarks in the literature has helped accelerating the recent advances in meta-learning research. They offer a way to get a fair comparison between different algorithms, and the wide range of datasets available allows full control over the complexity of this evaluation. However, for a large majority of code available online, the data pipeline is often specific to one dataset, and testing on another dataset requires significant rework. We introduce Torchmeta, a library built on top of PyTorch that enables seamless and consistent evaluation of meta-learning algorithms on multiple datasets, by providing data-loaders for most of the standard benchmarks in few-shot classification and regression, with a new meta-dataset abstraction. It also features some extensions for PyTorch to simplify the development of models compatible with meta-learning algorithms. The code is available here: https://github.com/tristandeleu/pytorch-meta
Tasks	Meta-Learning
Published	2019-09-14
URL	https://arxiv.org/abs/1909.06576v1
PDF	https://arxiv.org/pdf/1909.06576v1.pdf
PWC	https://paperswithcode.com/paper/torchmeta-a-meta-learning-library-for-pytorch
Repo	https://github.com/tristandeleu/pytorch-meta
Framework	pytorch

Multi-task Learning for Target-dependent Sentiment Classification


Title	Multi-task Learning for Target-dependent Sentiment Classification
Authors	Divam Gupta, Kushagra Singh, Soumen Chakrabarti, Tanmoy Chakraborty
Abstract	Detecting and aggregating sentiments toward people, organizations, and events expressed in unstructured social media have become critical text mining operations. Early systems detected sentiments over whole passages, whereas more recently, target-specific sentiments have been of greater interest. In this paper, we present MTTDSC, a multi-task target-dependent sentiment classification system that is informed by feature representation learnt for the related auxiliary task of passage-level sentiment classification. The auxiliary task uses a gated recurrent unit (GRU) and pools GRU states, followed by an auxiliary fully-connected layer that outputs passage-level predictions. In the main task, these GRUs contribute auxiliary per-token representations over and above word embeddings. The main task has its own, separate GRUs. The auxiliary and main GRUs send their states to a different fully connected layer, trained for the main task. Extensive experiments using two auxiliary datasets and three benchmark datasets (of which one is new, introduced by us) for the main task demonstrate that MTTDSC outperforms state-of-the-art baselines. Using word-level sensitivity analysis, we present anecdotal evidence that prior systems can make incorrect target-specific predictions because they miss sentiments expressed by words independent of target.
Tasks	Multi-Task Learning, Sentiment Analysis, Word Embeddings
Published	2019-02-08
URL	http://arxiv.org/abs/1902.02930v1
PDF	http://arxiv.org/pdf/1902.02930v1.pdf
PWC	https://paperswithcode.com/paper/multi-task-learning-for-target-dependent
Repo	https://github.com/16631140828/Paper-list
Framework	none

Deep Learning for Symbolic Mathematics


Title	Deep Learning for Symbolic Mathematics
Authors	Guillaume Lample, François Charton
Abstract	Neural networks have a reputation for being better at solving statistical or approximate problems than at performing calculations or working with symbolic data. In this paper, we show that they can be surprisingly good at more elaborated tasks in mathematics, such as symbolic integration and solving differential equations. We propose a syntax for representing mathematical problems, and methods for generating large datasets that can be used to train sequence-to-sequence models. We achieve results that outperform commercial Computer Algebra Systems such as Matlab or Mathematica.
Tasks
Published	2019-12-02
URL	https://arxiv.org/abs/1912.01412v1
PDF	https://arxiv.org/pdf/1912.01412v1.pdf
PWC	https://paperswithcode.com/paper/deep-learning-for-symbolic-mathematics-1
Repo	https://github.com/janeyoung2018/symbolic-math
Framework	none

VERIFAI: A Toolkit for the Design and Analysis of Artificial Intelligence-Based Systems


Title	VERIFAI: A Toolkit for the Design and Analysis of Artificial Intelligence-Based Systems
Authors	Tommaso Dreossi, Daniel J. Fremont, Shromona Ghosh, Edward Kim, Hadi Ravanbakhsh, Marcell Vazquez-Chanlatte, Sanjit A. Seshia
Abstract	We present VERIFAI, a software toolkit for the formal design and analysis of systems that include artificial intelligence (AI) and machine learning (ML) components. VERIFAI particularly seeks to address challenges with applying formal methods to perception and ML components, including those based on neural networks, and to model and analyze system behavior in the presence of environment uncertainty. We describe the initial version of VERIFAI which centers on simulation guided by formal models and specifications. Several use cases are illustrated with examples, including temporal-logic falsification, model-based systematic fuzz testing, parameter synthesis, counterexample analysis, and data set augmentation.
Tasks
Published	2019-02-12
URL	http://arxiv.org/abs/1902.04245v2
PDF	http://arxiv.org/pdf/1902.04245v2.pdf
PWC	https://paperswithcode.com/paper/verifai-a-toolkit-for-the-design-and-analysis
Repo	https://github.com/BerkeleyLearnVerify/VerifAI
Framework	tf

Communication-efficient distributed SGD with Sketching


Title	Communication-efficient distributed SGD with Sketching
Authors	Nikita Ivkin, Daniel Rothchild, Enayat Ullah, Vladimir Braverman, Ion Stoica, Raman Arora
Abstract	Large-scale distributed training of neural networks is often limited by network bandwidth, wherein the communication time overwhelms the local computation time. Motivated by the success of sketching methods in sub-linear/streaming algorithms, we introduce Sketched SGD, an algorithm for carrying out distributed SGD by communicating sketches instead of full gradients. We show that Sketched SGD has favorable convergence rates on several classes of functions. When considering all communication – both of gradients and of updated model weights – Sketched SGD reduces the amount of communication required compared to other gradient compression methods from $\mathcal{O}(d)$ or $\mathcal{O}(W)$ to $\mathcal{O}(\log d)$, where $d$ is the number of model parameters and $W$ is the number of workers participating in training. We run experiments on a transformer model, an LSTM, and a residual network, demonstrating up to a 40x reduction in total communication cost with no loss in final model performance. We also show experimentally that Sketched SGD scales to at least 256 workers without increasing communication cost or degrading model performance.
Tasks
Published	2019-03-12
URL	https://arxiv.org/abs/1903.04488v3
PDF	https://arxiv.org/pdf/1903.04488v3.pdf
PWC	https://paperswithcode.com/paper/communication-efficient-distributed-sgd-with
Repo	https://github.com/sunahhlee/TopHCS
Framework	pytorch