Dr. Wei Zhang graduated from the Department of Computer Science at Texas Tech University (TTU). While currently working for Big Data Service Team at Oracle Cloud Infrastructure, he also serves as the adjunct research scientists in the Data-Intensive Scalable Computing Laboratory (DISCL) at Texas Tech University. Before joining Texas Tech University, he had 7 years of industrial experience in building large-scale distributed back-end services for Internet applications, including Weibo.com, t.cn, etc.

His research interest broadly lies in:

  • Scientific Data Management

  • Scientific Metadata Indexing/Retrieving 

  • Streaming Graph Partitioning

  • Storage Resource Manangement

Other research experiences/interests include:

  • Interdisciplinary Geo-spatial data mining & visualization

  • Storage system for graph data processing

Please contact me by Email:  X-Spirit [dot] zhang [at] ttu.edu

Find Me on GitHub : https://github.com/zhangwei217245

Find Me on Linked-In : https://www.linkedin.com/in/zhangwei217245/

EDUCATION
 
Aug/2014 - Present

PhD Program in Computer Science Texas Tech University, United States

Sep/2003 - Jul/2007

Bachelor Degree in Computer Science Hebei University of Science and Technology, China

Thesis: “Feed-Based Online Socializing”

 
SELECTED PUBLICATIONS

  • W. Zhang, S. Byna, H. Sim, S.K. Lee, S. Vazhkudai and Y. Chen. Exploiting User Activeness for Data Retention in HPC Systems. Accepted to appear in Proceedings of The 34th International Conference for High Performance Computing, Networking, Storage and Analysis (SC'21), 2021. (acceptance rate: 86/365=23.6%) 

  • N. Zhao, G. Cao, W. Zhang, E.L. Samson and Y. Chen. Remote Sensing and Social Sensing for Socioeconomic Systems: A Comparison Study between Nighttime Lights and Location-based Social Media at the 500m Spatial Resolution. International Journal of Applied Earth Observation and Geoinformation, Volume: 87, 2020​

  • W.Zhang, S.Byna, H.Tang, B. Williams, Y.Chen. MIQS: Metadata Indexing and Querying Service for Self-describing File Formats. Accepted to appear in The Proceedings of The 31st ACM/IEEE Supercomputing Conference (SC’19), Denver, CO, 2019. (first-around acceptance rate: 72/344=21%, another 15 papers being asked for major revisions per SC’19)

  • W. Zhang, S. Byna, C. Niu, Y. Chen. Exploring Metadata Search Essentials for Scientific Data Management. Accepted to appear in The Proceedings of 26th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC '19), Hyderabad, India. (acceptance rate:23%)

  • D. Dai, Y. Chen, P. Carns, J. Jenkins, W. Zhang and R. Ross. Managing Rich Metadata in High-Performance Computing Systems Using a Graph Model. IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume: 30, Issue: 7, Pages: 1613 - 1627, 2019. 

  • N. Zhao, W. Zhang, Y. Liu, E. Samson, Y. Chen and G. Cao. Improving Nighttime Light Imagery with Location-Based Social Media Data. IEEE Transactions on Geoscience and Remote Sensing, Volume: 57, Issue: 4, Pages: 2161 - 2172, 2019.

  • N. Zhao, G. Cao, W. Zhang and E.L. Samson. Tweets or Nighttime Lights: Comparison for Preeminence in Estimating Socioeconomic Factors. ISPRS Journal of Photogrammetry and Remote Sensing, Volume: 146, Pages: 1 - 10, 2018.

  • W.Zhang, H.J. Tang, S. Byna, Y. Chen. DART: Distributed Adaptive Radix Tree for Efficient Affix-based Keyword Search on HPC Systems. In the Proceedings of The 27th International Conference on Parallel Architectures and Compilation Techniques (PACT '18), 2018. (acceptance rate: 36/126=28.6%)[Paper][Slides][Teaser Video][BibTex]

  • W. Zhang, Y. Chen and D. Dai. AKIN: A Streaming Graph Partitioning Algorithm for Distributed Graph Storage Systems. Accepted to appear in The Proceedings of The 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2018. (acceptance rate: 20.8%) [Paper] [Slides][Presentation] [Poster][BibTex]

  • D.Dai, W.Zhang, Y.Chen. IOGP: An Incremental Online Graph Partitioning for Large-Scale Distributed Graph Databases, In the Proc. of The 26th ACM International Symposium on High Performance Parallel and Distributed Computing (HPDC'17), 2017. (acceptance rate: 19%)

  • D.Dai, Y.Chen, P.Carns, J.Jenkins, W.Zhang, R.Ross. GraphMeta: Managing HPC Rich Metadata in Graphs, in Proceedings of the IEEE International Conference on Cluster Computing, (Cluster'16), 2016. (acceptance rate: 39/162=24.1%)

 

2021

2020

2019

2019

2019

2019

2018

2018

2018

2017

2016

ACADEMIC EXPERIENCE
 
May/2017 - Now

Research Intern at Scientific Data Management Group, Lawrence Berkeley National Laboratory 

Research Assistant at Data-intensive and Scalable Computing Laboratory (DISCL) in Texas Tech University

Metadata Indexing​ and Querying Service

Major Contributions

  1. Investigated scientific data management solutions in HPC environment

  2. Investigated indexing technologies for improving metadata search efficiency

  3. Baseline evaluation on existing metadata management platform.

  4. Proposed new system architecture for indexing metadata in object-centric storage

  5. Proposed new distributed indexing technology for prefix search and suffix search (DART)

  6. Implemented the proposed metadata indexing and conducted the evaluation. 

  7. Proposed and implemented the self-contained metadata indexing and querying service for self-describing data formats (MIQS)

Aug/2016 - May/2017

Teaching Assistant at Computer Science Department, Texas Tech University 

 

Data​ structure, Object-oriented Programming

Major Contributions

  1. Data Structure Lab: Conducted Lab sessions for the students.

  2. Grader and Q&A session host for the following Courses:

  • Data Structure

  • Object Oriented Programming

  • Advanced Operating System

  • Computer Organization and Assembly Language

  • Computer Architecture 

Jan/2016 - Dec/2016

Research Assistant at STARLab, Texas Tech University

 

Geo-spatial data mining and analysis on social network.

Major Contributions

  1. Data mining platform on top of Spark, HBase and Hadoop.
    Optimization were performed, especially data compression on the entire technical stack.

  2. Geo-spatial visualization on social media user distribution.
    Implemented both NodeJS version and Python version of the data processing and visualization platform, along with Redis serving as the data storage.

  3. Twitter sentiment analysis
    Implemented with both Naive Bayes model and Stanford CoreNLP package.

  4. Geo-spatial demographic and political information analysis based on social media users.
    Implemented residential location recognition using unsupervised clustering technique, conducted demographic information analysis on the basis of residential location analysis and Naive Bayes analysis on the statistical data of name, age and gender. Also conducted political preference analysis using the result of twitter sentimental analysis on the topic of 2012 U.S. presidential election. 

Jan/2015 - May/2015

Research Assistant at DISCL, Texas Tech University


Graph-based Provenance Management​

Major Contributions

  1. GraphFS: Project re-factoring on the implementation of GraphFS, makes it more mod- ularized.

  2. Control Experiments on Meshwork(In comparison with GraphMeta implementation).)

  3. Control Experiments on Titan(In comparison with GraphMeta), the related code implementation and performance tuning. 

  4. Proposed similarity-based streaming graph partitioning algorithm - AKIN

  5. Participated in the idea shaping and experiment of incremental graph partitioning algorithm - IOGP

 
INDUSTRIAL EXPERIENCE
Jan/2014 - Jan/2016
  •   Senior System R&D Engineer at Beijing Serious Technology Co., Ltd , China

 

Designed and developed server-side application which provides distributed data service with great performance, high availability, flexible scalability for Enjoy!.

 

Major Contributions

 

1.   PCVF: Parameter constraining and validating framework for RESTful Web Service APIs.(Secretive project.)

2.   DevOps practice involving Maven, Jenkins, Unit Testing and customized document generator for RESTful Web Service APIs(Compatible with PCVF.)

3.   BrookSide: Message processing framework for AMQP (specifically RabbitMQ)

4.   Meshwork: Graph-like data access API for both MySQL and Redis

5.   Commons: Several utilities including redis access API providing HA capability for spring-data-redis

6.   Webshot-rest-amqp-service: A NodeJS project for capturing the snapshot for any specified website according to the message received from amqp implementation like rabbitmq.

2010 - 2013
  • System R&D Engineer at Sina.com Technology (China) Co.,Ltd.

 

Design and development of several critical server-side applications providing distributed data services for open platform of weibo.com (the Chinese version of twitter), including URL shorten service T.cn and user & profile data service for Weibo.com.

Major Contributions

 

1.   REST API optimization for boosting the friendliness of Weibo REST API, several fruitful work was done, including the development of BDD Testing Tool, specification for Weibo Open API documentation, specification for Weibo Open API implementation.

2.   T.cn: A url shorten service and its related url-hits counting program.

3.   In charge of user data service for Weibo Open API which is the critical data access path of almost every single REST API for weibo.com, which requires high performance and high availability as well as the flexibility to change in terms of the functionality.

4.   User service v2.0 for Weibo Open API. In charge of the data migration, service migration as well as the development of the critical distributed data service and message processing system.

5.   Cache service optimization for user service v2.0. Reduce the total Memcache resource usage based on the intensive analysis to the system cache usage.

6.   Visualized service monitoring system from which one can easily watch the running status of the user service in terms of the cache hit ratio, throughput of MySQL instances as well as user-related critical services such as the Relationship service and the Feed service.

2009-2010
  • Senior Software Developer at Beijing JustMusic Co.,Ltd., China

 

Designed and developed business data management system for JustMusic! Co.,Ltd.

 

Major Contributions

 

1.   Development of business data management system.

2.   Simple batch processing framework.

2008-2009
  • Software Developer at Beijing Datuu.com Technology Co.,Ltd., China

Developed data service for business management system.

 

Major Contributions

 

1.   Development of operation management system, including routine feature development and data maintenance as well as generating business report.

 
SKILLS

Server-side Development

Web Development

BigData Tools

Cloud Computing 

 

Databases 

Operating Systems 

Software Engineering 

  •             Java, NodeJS, Scala, Python, C/C++, RESTful WebService, Bash, .Net, Ruby. 

  •             HTML, JavaScript, CSS, XML, XSTL, AJAX

  •             Spark, Hadoop

  •             AWS experience, Docker 

  •             MySQL, Oracle, SQL Server, NoSQL Databases: Memcache, Redis, Neo4J, HBase 

  •             Linux, Unix, Windows

  •             Design Patterns, UML, Continuous Integration.