Lijie Xu (许利杰)

Associate Research Professor
Technology Center of Software Engineering (TCSE)
Institute of Software, Chinese Academy of Sciences (ISCAS)

Email: xulijie AT otcaix DOT iscas DOT ac DOT cn
Github: https://github.com/JerryLead



About Me

My research interests focus on big data systems/applications. Currently, I'm working on distributed stream processing, distributed machine learning, and memory management techniques.

I got my PhD Degree from Institute of Software, Chinese Academy of Sciences in Jan. 2016 and my Bachelor's Degree from Wuhan University in 2009.


Publications
  1. Lijie Xu, Tian Guo, Wensheng Dou, Wei Wang, and Jun Wei. An Experimental Evaluation of Garbage Collectors on Big Data Applications. The 45th International Conference on Very Large Data Bases (VLDB 2019), pages 570-583. [pdf][slides]
  2. Lijie Xu, Wensheng Dou, Feng Zhu, Chushu Gao, Jie Liu, and Jun Wei. Characterizing and Diagnosing Out of Memory Errors in MapReduce Applications. The Journal of Systems and Software (JSS), 2018. [pdf]
  3. Lijie Xu, Wensheng Dou, Feng Zhu, Chushu Gao, Jie Liu, Hua Zhong, Jun Wei. A Characteristic Study on Out of Memory Errors in Distributed Data-Parallel Applications. In the 26th IEEE International Symposium on Software Reliability Engineering (ISSRE 2015), Washington DC, USA, Nov. 2015. [pdf][OOM Cases]
  4. Lijie Xu, Jie Liu, and Jun Wei. FMEM: A Fine-grained Memory Estimator for MapReduce Jobs. In Proceedings of the 10th International Conference on Autonomic Computing (ICAC 2013), pages 65-68, San Jose, USA, June 2013. [pdf]
  5. Lijie Xu, Jie Liu, and Jun Wei. MapReduce Framework Optimization via Performance Modeling. In Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPS PhD Forum 2012), pages 2506-2509, Shanghai, China, May 2012. [pdf]
  6. Shijian Li, Robert J. Walls, Lijie Xu, and Tian Guo. Speeding up Deep Learning with Transient Servers. In Proceedings of the 16th IEEE International Conference on Autonomic Computing (ICAC 2019). [pdf]
  7. Yingying Zheng, Lijie Xu, Wei Wang, Wei Zhou, Ying Ding. A Reliability Benchmark for Big Data Systems on JointCloud. In the Eighth International Workshop on Joint Cloud Computing (JCC 2017) in conjunction with the 37th International Conference on Distributed Computing Systems (ICDCS 2017), Atlanta, USA, Jun. 2017. [pdf]
  8. Feng Zhu, Jie Liu, Sa Wang, Jiwei Xu, Lijie Xu, Jixin Ren, Dan Ye, Jun Wei, Tao Huang. Hug the Elephant: Migrating a Legacy Data Analytics Application to Hadoop Ecosystem. In Proceedings of the 32th IEEE International Conference on Software Maintenance and Evolution (ICSME 2016), pages 178-187, Raleigh, North Carolina, USA, Oct. 2016. [pdf]
  9. Feng Zhu, Jie Liu, Lijie Xu, Dan Ye, Jun Wei, Tao Huang. A Lightweight Evaluation Framework for Table Layouts in MapReduce Based Query Systems. In Proceedings of the 17th Asia-Pacific Web Conference (APWeb 2015), pages 473-484, GuangZhou, China, Sept. 2015. [pdf]
  10. Feng Zhu, Jie Liu, Lijie Xu. A Fast and High Throughput SQL Query System for Big Data. In Proceedings of the 13th International Conference on Web Information Systems Engineering (WISE 2012), pages 783-788, Paphos, Cyprus, Nov. 2012. [pdf]

Experiences
  1. Intern, Taobao technology department, Alibaba (Nov. 2014 - Feb. 2015)
    1. Improve and optimize Spark, fix a critical bug SPARK-4672 (elected as an important update of Spark 1.2.0)
    2. Analyze parameter server systems, including Petuum [My notes]
  2. Intern, System Research Group, Microsoft Research Asia (Apr. 2013 - Sept. 2013)
    1. Design and implement a RPC prototype to ease concurrent/asynchronous/non-deterministic programming
    2. Research on memory management problems in MapReduce applications
  3. Intern, Data Mining (NLP) Group, Tencent (June 2010 - Aug. 2010)
    1. Mining synonyms from dictionaries and Wikipedia
    2. Mined 250K+ pairs of commonly used synonyms
    3. Mined 200K+ pairs of entity name synonyms
  4. Intern, Institute of Computing Technology, CAS (June 2008 - Aug. 2008)
    Work on workflow-scheduling algorithms in grid computing

Technical Reports
  1. Spark Internals (talking about the design and implementation of Apache Spark, with 3,500+ Stars and 1,400+ Forks in github), 2014-2015
  2. Machine Learning Notes (in Chinese, with 300,000+ pageviews), 2012
  3. Hadoop Memory Usage Model, 2013

Projects
  1. A Distributed Computing Model for Streaming Machine Learning Applications, NSFC, PI, 2019-2021
  2. A Comprehensive Benchmark for Big Data Stream Processing Systems, PI, 2017-2018
  3. GraphLib: An Algorithm Library for Distributed Graph Mining, PI, 2017-2018

Contributions to Apache Hadoop/Spark
  1. SPARK-4672 (An important update of Spark 1.2.0, Iterative Spark jobs may suffer from StackOverflow errors)
  2. SPARK-22713 (OOM errors caused by the memory contention and memory leak in TaskMemoryManager)
  3. SPARK-22286 (OutOfMemoryError caused by memory leak and large serializer batch size in ExternalAppendOnlyMap)
  4. MAPREDUCE-4882 (The wrong estimating size of output file causes disk full error in spill phase)
  5. MAPREDUCE-4883 (The improper framework buffer size affects jobs' performance)

Writings

Services
  1. Reviewer of Fast Data Processing with Spark – Second Edition, PACKT Publishing, 2015.
  2. Reviewer of Mastering Apache Spark, PACKT Publishing, 2015.
  3. One of the translators of Stanford Deep Learning Tutorial, 2013

Hobbies
  1. Play acoustic/classical guitar
  2. Swimming