Cleaner, clearer, safer roads for everyone
Data Engineer$90k – $125k • 0.1% – 0.4%
To eliminate congestion, improve traffic flow and give back time to everyone.
Your main mission is to build the data infrastructure that enables us to collect, store and efficiently process traffic and mapping data for every road and every road user in the US. The data which you provide enables our Data Science team to accurately model traffic patterns allowing us to optimize traffic signals. Your work will impact millions of people each day: with better traffic signal timing we will lower travel times by up to 25%, reduce emissions by up to 22% and enable everyone to spend more time on the things that are most important to them and with the people who are most important to them.
You will be responsible for creating the production software to run our data pipelines to feed the data into our Modelling, Simulation and Optimization platform. The pipeline will do aggregations, data cleaning, and transformations and must be self-healing and scalable data pipeline.
You will be managing multiple data types including time series data, spatial data, relational data, and blob store data.
You will create automation tools, as well as create processes for data monitoring and anomaly detection.
You will ensure that the pipeline are testable, reliable and follow good data engineering practices.
The Data Science and Machine Learning teams will be your internal customers. You will work closely with them to understand their expectations and meet or exceed them.
You will evaluate data providers and partners to help our team select the best data sources for the needs of the team.
BS / MS in Mathematics, Computer Science or an Engineering discipline from a top university.
3+ years of experience in production software engineering using languages such as Python, Java, C++, etc
3+ years of experience in SQL/relational databases and 1+ year with wide column store (eg DynamoDB, PostgreSQL)
Solid understanding of relational concepts and pros and cons of using each type of data store
Experience building high-performance batch and real-time data processing pipelines (MapReduce, Hadoop, HBase/Cassandra, Spark, Samza etc)
Experience with AWS
Experience building and managing large-scale geospatial data systems (mapping, navigation, GIS, routing, fleet management).