Lead Assistant Manager - Data Engineer
Senior Data Engineer, Data feeds
About the Role:
Our mission is to empower those who strive to achieve better financial health. Data feeds team plays a crucial role in achieving our mission. We are seeking a Sr Data Engineer for our Data feeds team to provide batch data processing, real-time streaming, and pipeline orchestration capabilities. You’ll be part of the Data Technology organization that helps drive business decisions using data. You will have the opportunity to use your expertise in solving big data problems, design thinking, coding and analytical skills to build data pipelines and data products and leverage our PB scale data. Our business is data driven and you will build solutions to help the company in the areas of marketing, pricing, credit, funding, Investing and many other business aspects, which is transforming the banking industry. We’re looking for talented Data Engineers passionate about building new data driven solutions with the latest Big Data technology.
What you’ll Do:
- Create and maintain optimal data pipeline architecture
- Build data pipelines that transform raw, unstructured data into formats that data analyst can use to for analysis
- Assemble large, complex data sets that meet functional / non-functional business requirements
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and delivery of data from a wide variety of data sources using SQL and AWS Big Data technologies
- Work with stakeholders including the Executive, Product, Engineering, and program teams to assist with data-related technical issues and support their data infrastructure needs.
- Develop and maintain scalable data pipelines and builds out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using scalable distributed Data technologies
- Implements processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it
- Write unit/integration tests, adopt Test-driven development, contribute to engineering wiki, and document work
- Performs root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement
About you:
- 6+ yrs experience and bachelor’s degree in computer science, Informatics, Information Systems or a related field; or equivalent work experience
- In-depth working experience of distributed systems Hadoop/MapReduce, Spark, Hive, Kafka and Oozie/Airflow
- At least 5 years of solid production quality coding experience in data pipeline implementation in Java, Scala and Python
- Experience with AWS cloud services: EC2, EMR, RDS
- Experience in GIT, JIRA, Jenkins, Shell scripting
- Familiar with Agile methodology, test-driven development, source control management and test automation
- Experience supporting and working with cross-functional teams in a dynamic environment
- You're passionate about data and building efficient data pipelines
- You have excellent listening skills and are empathetic to others
- You believe in simple and elegant solutions and give paramount importance to quality
- You have a track record of building fast, reliable, and high-quality data pipelines
Nice to have skills:
- Experience building Marketing Data pipelines including Direct Mail will be a big plus
- Experience with Snowflake and Salesforce Marketing Cloud
- Working knowledge of open-source ML frameworks and end-to-end model development life cycle
- Previous working experience with running containers (Docker/LXC) in a production environment using one of the container orchestration services (Kubernetes, AWS ECS, AWS EKS)
Otros detalles
- Tipo de pago Salary
- San Francisco, California, EE. UU.