We are now looking for a data engineer that will own the build and architecture of our data platform, infrastructure, and pipelines. At the core of the platform is invoice digitization to transform invoice data, currently locked away in PDFs, into a rich and clean digital data asset -- and your role is critical to making this a reality!
Here are some of the major items you will own :
● Data engineering pipeline to process and receive invoices from clients
● Tag the PDFs with initial basic metadata (e.g., client name, date received, file ID, storage location, etc.)
● Call OCR and ML classification APIs to automatically extract data
● Capture the human-in-the-loop data extractions from the UI
● Clean and enrich the raw data asset
● Run QA scripts to validate the quality of the cleansed data
● Map the clean data to our growing taxonomy of vendors and vendor line items
● Run analytics and anomaly detection scripts to surface insights
● Push the data to our front-end dashboard via APIs
You will work closely with the data scientists to productionize the data science and machine learning code that is developed, and you will work closely with the engineering team to make sure the data engineering pipelines are performant in production. Eventually, the data engineering work will also require tight integrations into other fintech platforms, (e.g., payment platforms).
You’ll be accountable:
● Data engineering vision – ownership over and definition of the data engineering pipeline, working closely with the executive team, and acting as a thought leader across the company
● Technical execution – define, articulate, and execute the data engineering strategy from launch to scale, managing the day-to-day execution, and implementing best practices
● Engineering culture - champion agile and design philosophy, CI/CD, test-driven development, thoughtful architecture, empathetic communication,
● Getting your hands dirty - we’re a small team and we have a lot of work ahead of us. You should be excited to roll up your sleeves and help the team in any way you can
● Build, build, build – jumping in and writing code, from architecture to fixing bugs, delivering the MVP within the first 3 months of hire. We'll rely on you and your team to handle day-to-day execution including:
● Shipping highly scalable distributed systems on cloud platforms (AWS/GCP) and database technologies (SQL/NoSQL/column-oriented data stores/distributed databases)
● Developing Big Data systems (Spark/Airflow) and microservice architectures
● Collecting, storing, processing, and analyzing structured and unstructured data and implementing ETL processes
● Wrangling, standardizing, enhancing, implementing, and monitoring data repositories
● Creating workflows to ingest, enrich, and make data available across the
● Defining data retention policies and automating data delivery
You’ll be a perfect fit if...
● You want to join an early-stage startup or you are extremely anxious to be challenged at your first startup
● You like the tension between craft and shipping. You have strong ability to quickly and effectively evaluate technical tradeoffs and translate them into short/long term business decisions
● You’ve built highly scalable data pipelines and have deep expertise in setting technical vision, architecting, building, and maintaining performant systems, from launch to scale
● You are passionate about building / leading data engineering teams and rapidly developing data pipelines at various stages of growth
● You pride yourself in communicating complex concepts, including the ability to distill intricate workflows and systems into clear processes and decisions with measurable company-wide impact
● You ask “why” a lot and use critical thinking and data to back up your intuitions. You hate when a customer struggles through your product experience
● You have managed a budget before and have seen first-hand the challenges in managing vendor spend
● You are a pro at Python; proficiency at any static type language is a plus.
We’re looking to build the next billion-dollar business in enterprise SAAS. We have an amazing founding team already, with experts in product, engineering, data science, and machine learning.
Founded by fintech veteran and vseasoned CFO, is an AI-powered spend intelligence solution that saves SMBs money by analyzing expense drivers and finding line-item level insights overlooked by most accounts payable solutions, which are focused on speeding up payment cycles rather than optimizing vendor spend.
Over time, our mission is to become the all-in-one spend management solution for SMBs that uses AI to: analyze spend, manage approval and payment workflows, identify anomalous spend & savings opportunities, benchmark spend performance vs. peers, find & negotiate savings with vendors, and forecast future expense trends.