July 27, 2019

3262 words 16 mins read

Paper Group ANR 490

Towards Automatic Generation of Entertaining Dialogues in Chinese Crosstalks. Pattern recognition techniques for Boson Sampling validation. Large scale digital prostate pathology image analysis combining feature extraction and deep neural network. Approximate Computational Approaches for Bayesian Sensor Placement in High Dimensions. LIUM-CVC Submis …

Towards Automatic Generation of Entertaining Dialogues in Chinese Crosstalks


Title	Towards Automatic Generation of Entertaining Dialogues in Chinese Crosstalks
Authors	Shikang Du, Xiaojun Wan, Yajie Ye
Abstract	Crosstalk, also known by its Chinese name xiangsheng, is a traditional Chinese comedic performing art featuring jokes and funny dialogues, and one of China’s most popular cultural elements. It is typically in the form of a dialogue between two performers for the purpose of bringing laughter to the audience, with one person acting as the leading comedian and the other as the supporting role. Though general dialogue generation has been widely explored in previous studies, it is unknown whether such entertaining dialogues can be automatically generated or not. In this paper, we for the first time investigate the possibility of automatic generation of entertaining dialogues in Chinese crosstalks. Given the utterance of the leading comedian in each dialogue, our task aims to generate the replying utterance of the supporting role. We propose a humor-enhanced translation model to address this task and human evaluation results demonstrate the efficacy of our proposed model. The feasibility of automatic entertaining dialogue generation is also verified.
Tasks	Dialogue Generation
Published	2017-11-01
URL	http://arxiv.org/abs/1711.00294v1
PDF	http://arxiv.org/pdf/1711.00294v1.pdf
PWC	https://paperswithcode.com/paper/towards-automatic-generation-of-entertaining
Repo
Framework

Pattern recognition techniques for Boson Sampling validation


Title	Pattern recognition techniques for Boson Sampling validation
Authors	Iris Agresti, Niko Viggianiello, Fulvio Flamini, Nicolò Spagnolo, Andrea Crespi, Roberto Osellame, Nathan Wiebe, Fabio Sciarrino
Abstract	The difficulty of validating large-scale quantum devices, such as Boson Samplers, poses a major challenge for any research program that aims to show quantum advantages over classical hardware. To address this problem, we propose a novel data-driven approach wherein models are trained to identify common pathologies using unsupervised machine learning methods. We illustrate this idea by training a classifier that exploits K-means clustering to distinguish between Boson Samplers that use indistinguishable photons from those that do not. We train the model on numerical simulations of small-scale Boson Samplers and then validate the pattern recognition technique on larger numerical simulations as well as on photonic chips in both traditional Boson Sampling and scattershot experiments. The effectiveness of such method relies on particle-type-dependent internal correlations present in the output distributions. This approach performs substantially better on the test data than previous methods and underscores the ability to further generalize its operation beyond the scope of the examples that it was trained on.
Tasks
Published	2017-12-19
URL	http://arxiv.org/abs/1712.06863v1
PDF	http://arxiv.org/pdf/1712.06863v1.pdf
PWC	https://paperswithcode.com/paper/pattern-recognition-techniques-for-boson
Repo
Framework

Large scale digital prostate pathology image analysis combining feature extraction and deep neural network


Title	Large scale digital prostate pathology image analysis combining feature extraction and deep neural network
Authors	Naiyun Zhou, Andrey Fedorov, Fiona Fennessy, Ron Kikinis, Yi Gao
Abstract	Histopathological assessments, including surgical resection and core needle biopsy, are the standard procedures in the diagnosis of the prostate cancer. Current interpretation of the histopathology images includes the determination of the tumor area, Gleason grading, and identification of certain prognosis-critical features. Such a process is not only tedious, but also prune to intra/inter-observe variabilities. Recently, FDA cleared the marketing of the first whole slide imaging system for digital pathology. This opens a new era for the computer aided prostate image analysis and feature extraction based on the digital histopathology images. In this work, we present an analysis pipeline that includes localization of the cancer region, grading, area ratio of different Gleason grades, and cytological/architectural feature extraction. The proposed algorithm combines the human engineered feature extraction as well as those learned by the deep neural network. Moreover, the entire pipeline is implemented to directly operate on the whole slide images produced by the digital scanners and is therefore potentially easy to translate into clinical practices. The algorithm is tested on 368 whole slide images from the TCGA data set and achieves an overall accuracy of 75% in differentiating Gleason 3+4 with 4+3 slides.
Tasks
Published	2017-05-07
URL	http://arxiv.org/abs/1705.02678v2
PDF	http://arxiv.org/pdf/1705.02678v2.pdf
PWC	https://paperswithcode.com/paper/large-scale-digital-prostate-pathology-image
Repo
Framework

Approximate Computational Approaches for Bayesian Sensor Placement in High Dimensions


Title	Approximate Computational Approaches for Bayesian Sensor Placement in High Dimensions
Authors	Xiao Lin, Asif Chowdhury, Xiaofan Wang, Gabriel Terejanu
Abstract	Since the cost of installing and maintaining sensors is usually high, sensor locations are always strategically selected. For those aiming at inferring certain quantities of interest (QoI), it is desirable to explore the dependency between sensor measurements and QoI. One of the most popular metric for the dependency is mutual information which naturally measures how much information about one variable can be obtained given the other. However, computing mutual information is always challenging, and the result is unreliable in high dimension. In this paper, we propose an approach to find an approximate lower bound of mutual information and compute it in a lower dimension. Then, sensors are placed where highest mutual information (lower bound) is achieved and QoI is inferred via Bayes rule given sensor measurements. In addition, Bayesian optimization is introduced to provide a continuous mutual information surface over the domain and thus reduce the number of evaluations. A chemical release accident is simulated where multiple sensors are placed to locate the source of the release. The result shows that the proposed approach is both effective and efficient in inferring QoI.
Tasks
Published	2017-03-01
URL	http://arxiv.org/abs/1703.00368v2
PDF	http://arxiv.org/pdf/1703.00368v2.pdf
PWC	https://paperswithcode.com/paper/170300368
Repo
Framework

LIUM-CVC Submissions for WMT17 Multimodal Translation Task


Title	LIUM-CVC Submissions for WMT17 Multimodal Translation Task
Authors	Ozan Caglayan, Walid Aransa, Adrien Bardet, Mercedes García-Martínez, Fethi Bougares, Loïc Barrault, Marc Masana, Luis Herranz, Joost van de Weijer
Abstract	This paper describes the monomodal and multimodal Neural Machine Translation systems developed by LIUM and CVC for WMT17 Shared Task on Multimodal Translation. We mainly explored two multimodal architectures where either global visual features or convolutional feature maps are integrated in order to benefit from visual context. Our final systems ranked first for both En-De and En-Fr language pairs according to the automatic evaluation metrics METEOR and BLEU.
Tasks	Machine Translation
Published	2017-07-14
URL	http://arxiv.org/abs/1707.04481v1
PDF	http://arxiv.org/pdf/1707.04481v1.pdf
PWC	https://paperswithcode.com/paper/lium-cvc-submissions-for-wmt17-multimodal
Repo
Framework

Stochastic Low-Rank Bandits


Title	Stochastic Low-Rank Bandits
Authors	Branislav Kveton, Csaba Szepesvari, Anup Rao, Zheng Wen, Yasin Abbasi-Yadkori, S. Muthukrishnan
Abstract	Many problems in computer vision and recommender systems involve low-rank matrices. In this work, we study the problem of finding the maximum entry of a stochastic low-rank matrix from sequential observations. At each step, a learning agent chooses pairs of row and column arms, and receives the noisy product of their latent values as a reward. The main challenge is that the latent values are unobserved. We identify a class of non-negative matrices whose maximum entry can be found statistically efficiently and propose an algorithm for finding them, which we call LowRankElim. We derive a $\DeclareMathOperator{\poly}{poly} O((K + L) \poly(d) \Delta^{-1} \log n)$ upper bound on its $n$-step regret, where $K$ is the number of rows, $L$ is the number of columns, $d$ is the rank of the matrix, and $\Delta$ is the minimum gap. The bound depends on other problem-specific constants that clearly do not depend $K L$. To the best of our knowledge, this is the first such result in the literature.
Tasks	Recommendation Systems
Published	2017-12-13
URL	http://arxiv.org/abs/1712.04644v1
PDF	http://arxiv.org/pdf/1712.04644v1.pdf
PWC	https://paperswithcode.com/paper/stochastic-low-rank-bandits
Repo
Framework

Improving 6D Pose Estimation of Objects in Clutter via Physics-aware Monte Carlo Tree Search


Title	Improving 6D Pose Estimation of Objects in Clutter via Physics-aware Monte Carlo Tree Search
Authors	Chaitanya Mitash, Abdeslam Boularias, Kostas E. Bekris
Abstract	This work proposes a process for efficiently searching over combinations of individual object 6D pose hypotheses in cluttered scenes, especially in cases involving occlusions and objects resting on each other. The initial set of candidate object poses is generated from state-of-the-art object detection and global point cloud registration techniques. The best-scored pose per object by using these techniques may not be accurate due to overlaps and occlusions. Nevertheless, experimental indications provided in this work show that object poses with lower ranks may be closer to the real poses than ones with high ranks according to registration techniques. This motivates a global optimization process for improving these poses by taking into account scene-level physical interactions between objects. It also implies that the Cartesian product of candidate poses for interacting objects must be searched so as to identify the best scene-level hypothesis. To perform the search efficiently, the candidate poses for each object are clustered so as to reduce their number but still keep a sufficient diversity. Then, searching over the combinations of candidate object poses is performed through a Monte Carlo Tree Search (MCTS) process that uses the similarity between the observed depth image of the scene and a rendering of the scene given the hypothesized pose as a score that guides the search procedure. MCTS handles in a principled way the tradeoff between fine-tuning the most promising poses and exploring new ones, by using the Upper Confidence Bound (UCB) technique. Experimental results indicate that this process is able to quickly identify in cluttered scenes physically-consistent object poses that are significantly closer to ground truth compared to poses found by point cloud registration methods.
Tasks	6D Pose Estimation, 6D Pose Estimation using RGB, Object Detection, Point Cloud Registration, Pose Estimation
Published	2017-10-24
URL	http://arxiv.org/abs/1710.08577v1
PDF	http://arxiv.org/pdf/1710.08577v1.pdf
PWC	https://paperswithcode.com/paper/improving-6d-pose-estimation-of-objects-in
Repo
Framework

Exponential Moving Average Model in Parallel Speech Recognition Training


Title	Exponential Moving Average Model in Parallel Speech Recognition Training
Authors	Xu Tian, Jun Zhang, Zejun Ma, Yi He, Juan Wei
Abstract	As training data rapid growth, large-scale parallel training with multi-GPUs cluster is widely applied in the neural network model learning currently.We present a new approach that applies exponential moving average method in large-scale parallel training of neural network model. It is a non-interference strategy that the exponential moving average model is not broadcasted to distributed workers to update their local models after model synchronization in the training process, and it is implemented as the final model of the training system. Fully-connected feed-forward neural networks (DNNs) and deep unidirectional Long short-term memory (LSTM) recurrent neural networks (RNNs) are successfully trained with proposed method for large vocabulary continuous speech recognition on Shenma voice search data in Mandarin. The character error rate (CER) of Mandarin speech recognition further degrades than state-of-the-art approaches of parallel training.
Tasks	Large Vocabulary Continuous Speech Recognition, Speech Recognition
Published	2017-03-03
URL	http://arxiv.org/abs/1703.01024v1
PDF	http://arxiv.org/pdf/1703.01024v1.pdf
PWC	https://paperswithcode.com/paper/exponential-moving-average-model-in-parallel
Repo
Framework


Title	Recovering 6D Object Pose: A Review and Multi-modal Analysis
Authors	Caner Sahin, Tae-Kyun Kim
Abstract	A large number of studies analyse object detection and pose estimation at visual level in 2D, discussing the effects of challenges such as occlusion, clutter, texture, etc., on the performances of the methods, which work in the context of RGB modality. Interpreting the depth data, the study in this paper presents thorough multi-modal analyses. It discusses the above-mentioned challenges for full 6D object pose estimation in RGB-D images comparing the performances of several 6D detectors in order to answer the following questions: What is the current position of the computer vision community for maintaining “automation” in robotic manipulation? What next steps should the community take for improving “autonomy” in robotics while handling objects? Our findings include: (i) reasonably accurate results are obtained on textured-objects at varying viewpoints with cluttered backgrounds. (ii) Heavy existence of occlusion and clutter severely affects the detectors, and similar-looking distractors is the biggest challenge in recovering instances’ 6D. (iii) Template-based methods and random forest-based learning algorithms underlie object detection and 6D pose estimation. Recent paradigm is to learn deep discriminative feature representations and to adopt CNNs taking RGB images as input. (iv) Depending on the availability of large-scale 6D annotated depth datasets, feature representations can be learnt on these datasets, and then the learnt representations can be customized for the 6D problem.
Tasks	6D Pose Estimation, 6D Pose Estimation using RGB, Object Detection, Pose Estimation
Published	2017-06-10
URL	http://arxiv.org/abs/1706.03285v2
PDF	http://arxiv.org/pdf/1706.03285v2.pdf
PWC	https://paperswithcode.com/paper/recovering-6d-object-pose-a-review-and-multi
Repo
Framework

Long-Range Correlation Underlying Childhood Language and Generative Models


Title	Long-Range Correlation Underlying Childhood Language and Generative Models
Authors	Kumiko Tanaka-Ishii
Abstract	Long-range correlation, a property of time series exhibiting long-term memory, is mainly studied in the statistical physics domain and has been reported to exist in natural language. Using a state-of-the-art method for such analysis, long-range correlation is first shown to occur in long CHILDES data sets. To understand why, Bayesian generative models of language, originally proposed in the cognitive scientific domain, are investigated. Among representative models, the Simon model was found to exhibit surprisingly good long-range correlation, but not the Pitman-Yor model. Since the Simon model is known not to correctly reflect the vocabulary growth of natural language, a simple new model is devised as a conjunct of the Simon and Pitman-Yor models, such that long-range correlation holds with a correct vocabulary growth rate. The investigation overall suggests that uniform sampling is one cause of long-range correlation and could thus have a relation with actual linguistic processes.
Tasks	Time Series
Published	2017-12-11
URL	http://arxiv.org/abs/1712.03645v1
PDF	http://arxiv.org/pdf/1712.03645v1.pdf
PWC	https://paperswithcode.com/paper/long-range-correlation-underlying-childhood
Repo
Framework

T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects


Title	T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects
Authors	Tomas Hodan, Pavel Haluza, Stepan Obdrzalek, Jiri Matas, Manolis Lourakis, Xenophon Zabulis
Abstract	We introduce T-LESS, a new public dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from simple scenes with several isolated objects to very challenging ones with multiple instances of several objects and with a high amount of clutter and occlusion. The images were captured from a systematically sampled view sphere around the object/scene, and are annotated with accurate ground truth 6D poses of all modeled objects. Initial evaluation results indicate that the state of the art in 6D object pose estimation has ample room for improvement, especially in difficult cases with significant occlusion. The T-LESS dataset is available online at cmp.felk.cvut.cz/t-less.
Tasks	6D Pose Estimation, 6D Pose Estimation using RGB, Pose Estimation
Published	2017-01-19
URL	http://arxiv.org/abs/1701.05498v1
PDF	http://arxiv.org/pdf/1701.05498v1.pdf
PWC	https://paperswithcode.com/paper/t-less-an-rgb-d-dataset-for-6d-pose
Repo
Framework

Simultaneous Matrix Diagonalization for Structural Brain Networks Classification


Title	Simultaneous Matrix Diagonalization for Structural Brain Networks Classification
Authors	Nikita Mokrov, Maxim Panov, Boris A. Gutman, Joshua I. Faskowitz, Neda Jahanshad, Paul M. Thompson
Abstract	This paper considers the problem of brain disease classification based on connectome data. A connectome is a network representation of a human brain. The typical connectome classification problem is very challenging because of the small sample size and high dimensionality of the data. We propose to use simultaneous approximate diagonalization of adjacency matrices in order to compute their eigenstructures in more stable way. The obtained approximate eigenvalues are further used as features for classification. The proposed approach is demonstrated to be efficient for detection of Alzheimer’s disease, outperforming simple baselines and competing with state-of-the-art approaches to brain disease classification.
Tasks
Published	2017-10-14
URL	http://arxiv.org/abs/1710.05213v1
PDF	http://arxiv.org/pdf/1710.05213v1.pdf
PWC	https://paperswithcode.com/paper/simultaneous-matrix-diagonalization-for
Repo
Framework

Reverse Curriculum Generation for Reinforcement Learning


Title	Reverse Curriculum Generation for Reinforcement Learning
Authors	Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, Pieter Abbeel
Abstract	Many relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These goal-oriented tasks present a considerable challenge for reinforcement learning, since their natural reward function is sparse and prohibitive amounts of exploration are required to reach the goal and receive some learning signal. Past approaches tackle these problems by exploiting expert demonstrations or by manually designing a task-specific reward shaping function to guide the learning agent. Instead, we propose a method to learn these tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved. The robot is trained in reverse, gradually learning to reach the goal from a set of start states increasingly far from the goal. Our method automatically generates a curriculum of start states that adapts to the agent’s performance, leading to efficient training on goal-oriented tasks. We demonstrate our approach on difficult simulated navigation and fine-grained manipulation problems, not solvable by state-of-the-art reinforcement learning methods.
Tasks
Published	2017-07-17
URL	http://arxiv.org/abs/1707.05300v3
PDF	http://arxiv.org/pdf/1707.05300v3.pdf
PWC	https://paperswithcode.com/paper/reverse-curriculum-generation-for
Repo
Framework

Automated Top View Registration of Broadcast Football Videos


Title	Automated Top View Registration of Broadcast Football Videos
Authors	Rahul Anand Sharma, Bharath Bhat, Vineet Gandhi, C. V. Jawahar
Abstract	In this paper, we propose a novel method to register football broadcast video frames on the static top view model of the playing surface. The proposed method is fully automatic in contrast to the current state of the art which requires manual initialization of point correspondences between the image and the static model. Automatic registration using existing approaches has been difficult due to the lack of sufficient point correspondences. We investigate an alternate approach exploiting the edge information from the line markings on the field. We formulate the registration problem as a nearest neighbour search over a synthetically generated dictionary of edge map and homography pairs. The synthetic dictionary generation allows us to exhaustively cover a wide variety of camera angles and positions and reduce this problem to a minimal per-frame edge map matching procedure. We show that the per-frame results can be improved in videos using an optimization framework for temporal camera stabilization. We demonstrate the efficacy of our approach by presenting extensive results on a dataset collected from matches of football World Cup 2014.
Tasks	Bird View Synthesis, Homography Estimation
Published	2017-03-04
URL	http://arxiv.org/abs/1703.01437v1
PDF	http://arxiv.org/pdf/1703.01437v1.pdf
PWC	https://paperswithcode.com/paper/automated-top-view-registration-of-broadcast
Repo
Framework

Boosting the accuracy of multi-spectral image pan-sharpening by learning a deep residual network


Title	Boosting the accuracy of multi-spectral image pan-sharpening by learning a deep residual network
Authors	Yancong Wei, Qiangqiang Yuan, Huanfeng Shen, Liangpei Zhang
Abstract	In the field of fusing multi-spectral and panchromatic images (Pan-sharpening), the impressive effectiveness of deep neural networks has been recently employed to overcome the drawbacks of traditional linear models and boost the fusing accuracy. However, to the best of our knowledge, existing research works are mainly based on simple and flat networks with relatively shallow architecture, which severely limited their performances. In this paper, the concept of residual learning has been introduced to form a very deep convolutional neural network to make a full use of the high non-linearity of deep learning models. By both quantitative and visual assessments on a large number of high quality multi-spectral images from various sources, it has been supported that our proposed model is superior to all mainstream algorithms included in the comparison, and achieved the highest spatial-spectral unified accuracy.
Tasks
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07556v2
PDF	http://arxiv.org/pdf/1705.07556v2.pdf
PWC	https://paperswithcode.com/paper/boosting-the-accuracy-of-multi-spectral-image
Repo
Framework