Job Description
Our client is seeking an experienced and motivated data engineer to join a Lean/Agile team building our data science and analytics operational platform. This is an exciting and challenging opportunity to help drive transformative change at our client.
As an experienced engineer of data extraction, transformation, and persistence you will be designing and implementing various components of our data science collaboration and deployment platform. Working closely with Data Science and Analytics professionals, you will develop automated, streaming data pipelines for event capture, transformation, and feature extraction to assist the machine learning process. The industry changes rapidly, so we are looking for candidates who can respond to change, pick up new technologies quickly and adapt to shifting requirements. We also want candidates who are production-oriented and have a commitment to quality.
PRINCIPAL DUTIES AND RESPONSIBILITIES:
• Build and maintain event capture/transformation flows, feature repositories, data cache for real-time analytics and more.
• Develop data pipelines that can be leveraged in both model training and production.
• Collaborate with Data Architecture and other Data Engineering groups maintaining a focus on operationalizing data flows in the service of data science and analytics groups.
• Development of code to extract value from various structured, semi-structured and unstructured data sources creating refined data repositories for ease of analysis.
MINIMUM JOB REQUIREMENTS:
• 5+ years in data-related field
• Strong Python skills with Pandas, data structures, and efficient batch process scripting
• Strong SQL skills with relational databases and NoSQL (Snowflake, Microsoft SQL, Microsoft Dynamics, FetchXML, and others)
• Experience calling third-party REST APIs and working with JSON data
• Good understanding of Linux command line operations
• Experience with AWS cloud technologies including S3, EC2, ECR, no-trust environments, etc.
• Experience with developing data connections in/out of CRM or IVR systems
• Experience with the following technologies a plus:
• (Preferred)
o Workflow scheduling tools such as Airflow, windows scheduler, or crontab
o PySpark, Spark, Hadoop, Hive, SparkSQL etc.
o Experience in additional languages such as PHP, Perl, C#, Java, Scala helpful
o Asyncio, threading, multithreading, and other asynchronous coding frameworks
o ETL tooling solutions such as Control-M
o AI optimizations and integrating LLMs via AWS Bedrock or Custom Images
o Working with Dockerfiles and images
Salary/ Rate: $55-$75/hour (depends on experience level). This is a contract position with candidates expected to work 40 hours/ week. Contract duration is 12 months. This position currently does not offer any benefits.