Data Engineer / Machine Learning Engineer
🇺🇸 United States › California › Palo Alto (Posted Jul 1 2019)
About the company
We curate community content and tell contextual stories. Our curation services are used by 1,000+ forums reaching millions of people. We started in 2015 and have 2 locations – Palo Alto and Bellevue. Our team comes from a mix of startup and big tech backgrounds, but we all share a desire to build a better Internet. We are Stanford StartX alumni (2016).
Threadloom is looking for an experienced data engineer with strong machine learning experience.
This is a foundational role. You will be Threadloom's first engineer solely responsible for building and extending our processing pipelines. Working closely with Product, Ops and Eng it will be your job to design and develop the data warehouses used by all of our services and products. This includes ownership of the processing of billions of documents that power Threadloom Search and Newsletter and upcoming consumer products.
The ideal candidate is passionate about building large-scale, high-volume pipelines that manage and store mission-critical data. This person is conversant with current cloud platforms for parallel processing and storage, and can easily translate product and user requirements to data requirements for storing and managing data. They should also have experience with building machine learning models which classify and rank content and predict user preferences.
The ideal candidate also cares about our end users and is a careful steward of their data, so is also comfortable with modern user privacy standards and has experience applying them in real-world situations.
Skills & requirements
3+ years of relevant work experience
Launching consumer products that people love, at scale
Designing and implementing data pipelines and warehouses
Optimizing servers and pipelines to manage operational costs at scale
Building systems to handle user authentication and PII (e.g. Firebase, OAuth, GDPR)
Deploying production cloud services (e.g., Google Cloud, AWS, Azure)
Languages and tools
Python required, Scala/Java desired
Fluency with the latest tools, libraries, and infrastructure for building and maintaining production-level data pipelines and storage, including
distributed data processing frameworks (e.g., Hadoop, Spark, Flink, Apache Beam)
SQL and NoSQL databases (e.g., MySQL, Postgres, Cassandra, Redis)
stream processing frameworks (e.g., Kafka, Storm, Spark)
search engines (Elastic, Solr)
Built & launched ML models in a production environment
Scaling experimental models from proof-of-concept to live products that handle large-scale data
Comfortable building scalable backends, RESTful web services and APIs
Other jobs that might be interesting
Data Scientist - Waystar (May 2020)
Remote US, 100% Remote
We are looking for an experienced Data Scientist, who has previously supported Healthcare software applications. The data scientist role involves solving technical, data-driven Healthcare problems using computer science, mathematical, predictive modeling and s...
Lead Machine Learning Engineer, Recommenders - The RealReal (April 2020)
San Francisco, California, United States
The Lead Machine Learning Engineer will be working on our recommendation team to generate product and search recommendations on our website directly to customers. This person will drive the technical direction of the recommendations ML team and work closely wi...
Senior Data Scientist - GutCheck (April 2020)
Denver, Colorado, United States (Remote work possible)
As our Senior Data Scientist, you will be the primary contributor that will operationalize predictive models built with machine learning and advanced analytic technologies. GutCheck owns years of structured and unstructured data. We’ve started to identify p...