Spark

Senior Data Engineer - USA, Remote - $110,560 to $155,840

Senior Data Engineer

USA, Remote

$110,560 to $155,840

 

Job Description

You are a driven and motivated problem solver ready to pursue meaningful work. You strive to make an impact every day & not only at work, but in your personal life and community too. If that sounds like you, then you've landed in the right place.

The Data Science AI Factory team is committed to exploring new ways to use data and analytics to solve business problems.  The team utilizes a variety of data sources, with a strong focus on unstructured and semi-structured text using NLP to enhance outcomes related to claim, underwriting, operations and the customer experience. 

As a Sr. Data Engineer, you will be an established thought leader through close partnerships with expert resources to design, develop, and implement data assets for a wide range of new initiatives across multiple lines of business. The role involves heavy data exploration, proficiency with SQL and Python, knowledge of service-based deployments and APIs, and the ability to discover and learn quickly through collaboration.  There is a need to think analytically and outside of the box while questioning current processes and continuing to build on the individual’s business acumen.

There will be a combination of team collaboration and independent work efforts.  We seek candidates with strong quantitative background and excellent analytical and problem-solving skills. This position combines business and technical skills involving interaction with business customers, data science partners, internal and external data suppliers and information technology partners.

Responsibilities

  • Identify and validate internal and external data sources for availability and quality. Work with SME’s to describe and understand data lineage and suitability for a use case.

  • Create data assets and build data pipelines that align to modern software development principles for further analytical consumption. Perform data analysis to ensure quality of data assets.

  • Create summary statistics/reports from data warehouses, marts, and operational data stores.

  • Extract data from source systems, and data warehouses, and deliver in a pre-defined format using standard database query and parsing tools.

  • Understand ways to link or compare information already in our systems with new information.

  • Perform preliminary exploratory analysis to evaluate nulls, duplicates and other issues with data sources.

  • Work with data scientists and knowledge engineers to understand the requirements and propose and identify data sources and alternatives.

  • Produce code artifacts and documentation using Github for reproducible results and hand-off to other data science teams.

  • Propose ways to improve and standardize processes to enable new data and capability assessment and to enable pivoting to new projects.

  • Understand data classification and adhere to the information protection and privacy restrictions on data.

  • Collaborate closely with data scientists, business partners, data suppliers, and IT resources.

Experience & Skills

Candidates must have the technical skills to transform, manipulate and store data, the analytical skills to relate the data to the business processes that generates it, and the communication skills to document & disseminate information regarding the availability, quality, and other characteristics of the data to a diverse audience. These varied skills may be demonstrated through the following:

  • Bachelor’s degree or equivalent experience in a related quantitative field

  • 5 + years experience accessing and retrieving data from disparate large data sources, by creating and tuning SQL queries. Understanding of data modeling concepts, data warehousing tools and databases (e.g. Oracle, AWS, Snowflake, Spark/PySpark, ETL, Big Data, and Hive) 

  • Demonstrated ability to create and deliver high quality Python code using software engineering best practices. Experience with object-oriented programming and software development a plus. Proficiency with Github and Linux highly desired.

  • Ability to analyze data sources and provide technical solutions. Strong exploratory and problem-solving skills to check for data quality issues.

  • Determine business recommendations and translate into actionable steps 

  • Self-starter with curiosity and a willingness to become a data expert

  • Demonstrate a passion to both learn new skills and lead discovery of the data research 

  • Results oriented with the ability to multi-task and adjust priorities when necessary 

  • Ability to work both independently and in a team environment with internal customers 

  • Ability to articulate and train technical concepts regarding data to both data scientists and partners

Azure Data Engineer - Irving, TX Full-Time, Permanent - $110,000 - $120,000

Azure Data Engineer
Irving, TX
Full-Time, Permanent
$110,000 - $120,000

Required Skills:

                                  

  • Experience in GCP/Azure, Strong Data modelling, Python, Experience with RDBMS, Big Data processing frameworks and tools (Cloudera, Sqoop, Hive, Impala, Spark), DevOps tools and techniques (e.g. continuous integration, Jenkins, Puppet, etc)

                                                        

Preferred Skills:                                     

  • Experience building/migrating data pipeline from on-prem to Cloud (GCP or any cloud)

  • Understanding of cloud technologies

  • Unix Scripting

  • Tableau and Excel tool expertise

                                                     

Job description:                                     

  • Build data pipelines to ingest data from On-prem to cloud

  • Experience with Big Data processing frameworks and tools (Cloudera, Sqoop, Hive, Impala, Spark)

  • Experience with DevOps tools and techniques (e.g continuous integration, Jenkins, Puppet, etc)

  • Experience software development on a team using Agile methodology

  • Build data standardization & transformation logic using framework following Object Oriented Programming concept

  • Write Unit Test scripts

  • Implement standardized error handling & diagnostic logging

  • Schedule and maintain production workflows on-prem as well as cloud

  • Troubleshoot and resolve QA and Production defects

  • Handle code review and code deployment