Lead Python Developer

Pittsburgh, Pensilvania, EE. UU. N.º de req. 27265
viernes, 15 de noviembre de 2024

Job Title: Senior Python Data Engineer 

Job Description 

We are seekinga highly skilled Senior Python Data Engineerto join our dynamic team. The ideal candidate will possessa strong programming background in advanced Python, with a focus on data engineering frameworks and libraries. You will be responsible fordesigning, building, and maintainingrobust data ingestion pipelines, ensuring seamless integration of data from various sources. 

Key Responsibilities 

  • Data Pipeline Development: Design, implement, and optimizedata ingestion pipelines using advanced Python (NumPy, Pandas, Dask) to ensure efficient data flow and processing. 

  • Data Storage Management: Work extensively with Parquet files for efficient data storage and retrieval, including partitioned Parquet files, ensuring optimalcompression and schema evolution. 

  • Collaboration: Work closely with geographically distributed teams and clients to gather requirements, provide technical solutions, and ensure data quality. 

  • Team Leadership: Lead a team of data engineers by assigning tasks, reviewing code, and mentoring junior team members. 

  • Design Participation: Engage in architectural discussions and design sessions, contributing to the overall data pipeline architecture. 

  • REST API Development: Build and maintainREST APIs, ensuring API security through key validation, authorization, and authentication mechanisms. 

  • Data Manipulation: Set up and manipulate Python data structures such as lists, strings, dictionaries, and tuples. Use strong expertisein Pandas and NumPy for data manipulation. 

  • Data Exploration & Visualization: Conduct data exploration, visualization, and comparison of metrics for large CSV and Parquet files. 

  • Debugging and Optimization: Troubleshoot complex data pipeline issues, utilizinglogging and monitoring tools (like ELK Stack, Grafana) to optimizeperformance for scalability and efficiency. 

  • Data Storage Solutions: Design and implement data storage solutions using SQL (PostgreSQL, MySQL) and NoSQL databases (MongoDB, Cassandra). 

  • Data Transformation: Use advanced techniques such as joins, merges, pivot tables, grouping, and window functions in Python or SQL. 

  • Documentation: Maintainthorough documentation of data pipelines, architectures, and processes for future reference and onboarding. 

Required Qualifications (Must-Have) 

  • Programming Skills: Advanced proficiencyin Python, particularly with libraries such as NumPy and Pandas for data manipulation and analysis. 

  • Parquet Experience: Strong experience with Parquet files, including reading, writing, and optimizingfor performance and storage efficiency. 

  • Data Structure Manipulation: Ability to set up and manipulate Python data structures such as lists, strings, dictionaries, and tuples. 

  • Data Exploration: Familiarity with data exploration, visualization, and comparing metrics of large CSV and Parquet files, including partitioned Parquet files. 

  • Advanced Data Techniques: Strong skills in joins, merges, pivot tables, grouping, and window functions in Python or SQL. 

  • Version Control: Strong understanding of GIT, including git push and git clone for collaborative development. 

  • Linux Proficiency: Experience with Linux commands and shell scripting for data operations. 

  • Data Pipeline Experience: Proven experience in building and managing data ingestion pipeline scripts, including batch and real-time processing. 

  • REST API Knowledge: Familiarity with building REST APIs and securing them through API key validation and authentication mechanisms. 

  • Debugging Skills: Demonstratedability to handle complex data pipeline architecture with excellent debugging skills. 

  • Leadership Experience: Prior experience leading a technical team and mentoring junior engineers. 

Preferred Qualifications (Good-to-Have) 

  •  
  • Object-Oriented Programming: Good experiencewith object-oriented programming patterns, multithreading, and multiprocessing. 
 
  • Spark Applications: Experience developing Spark applications using Python, including familiarity with Apache Spark (Spark SQL, Spark Streaming, DataFrames, RDD, PySpark). 

  • Communication Skills: Excellent verbal and written communication skills, with the ability to convey technical concepts to non-technical stakeholders. 

 

Otros detalles

  • Tipo de pago Salary
Location on Google Maps
  • Pittsburgh, Pensilvania, EE. UU.