- Built a NLP tool to extract terms that are contextually and syntactically similar to existing terms in thesaurus (JBI 2017) - Developed a text mining pipeline for structuring...more and visualizing free text eligibility criteria of clinical trials (BIBM 2016) - Parse text corpus from clinical trials and social media for analyzing the characteristics of cancer and diabetes
- Designed and constructed the distributed computing system using Spark for customer behavior analysis system - Developed and implemented highly available and reliable...more distributed storage/cache system between three Data Centers, based on Memcached and MySQL - Redesigned and built a A/B testing system based on the Haproxy, Squid and SoA technique
- Designed architecture and implemented core algorithms of QoS (quality of server) system for S-series Switch, including Traffic control, Scheduling, Shaping and User Defined...more ACL techniques - Developed security framework of network protocol for S-series Switch, including ARP-SEC, IP-SEC, DHCP-SEC etc.
A text mining toolkit for medical text parsing and analysis
A distributed clinical text analyzing tool based on StanfordNLP and Spark, performed text tokenization, syntax and dependency parsing, pattern recognition,… · More and employed clustering algorithm (k-means) to extract characteristics of cancer and diabetes. Another goal was to extract new medical terms used frequently by laypeople to enrich vocabularies in UMLS.
Master thesis on semantic relation in word embeddings
Explored the semantic relation representations in word embeddings based on Wikipedia, WordNet and UMLS datasets. The study revealed that pertainymy had a… · More higher probability to occur in the nearest neighbors of a word, meanwhile diverse semantic relations occur in the nearest neighbors. The study proposed a NER-based phrase composition method and it outperformed the Word2phrase method. The study found the word morphology did not affect the performance of the word2vec on analogy and semantic relation tasks
A face recognition tool in Python based on Eigenfaces technique, used basic algebra functions of Numpy. Contained the entire workflow including image… · More preprocessing, face recognition and performance evaluation. Evaluated on ATT and Yale B face datasets.
An Android App to help people monitor and store the records of their running, including track, speed and elevation recorded by GPS. Supported Google map and… · More Gaode map, corresponding to two languages: English and Chinese
Analyzed the order, click and view duration data of YHD.COM to extract their profile and preference, then recommended appropriate products based on their… · More profile. Employed Spark to process huge data (greater than 10 Terabyte).
A framework to sample specific percentage of users to evaluate a tested version of the website, then collect feedbacks (e.g. clicks, orders) to decide the… · More better version. The framework was transparent to service modules and implemented with modified Haproxy.
Distributed database and cache architecture for big data
Design a solution to keep the data consistent between three data centers, which use MySQL and Memcached for storage. Used master and slave database mode and… · More developed a reliable cache data invalidation tool to keep cache data consistency.
What I Do
From 2008 to 2013, I worked as a senior software engineer in Huawei Technology
From 2013 to 2015, he worked in YHD.COM as a system architect;
From 2015 to now, I obtainedCS Master degree in Florida State University. I am a research assistant and published 5 papers in domain of NLP, machine learning.
I am looking for a software engineer opportunity and I believe technology, especially AI, NLP, Machine learning, data analysis, can improve our lives. I am passinate to new techniques and share my idea and code in the community.