About the Role
This position is for a Software Developer whose responsibility will be to prepare infrastructure which makes large data sets accessible to Data Science teams for further analysis. ETL-ing data into data warehouses for specific analysis will also be required, as will involvement in downstream business intelligence and Machine Learning development in some situations.
About the Product
Rapid River manages several mission-critical data analysis systems for one of our clients.
One of these systems is a generic clickstream ingestion pipeline, which feeds around 1000 events per second into a Hadoop ecosystem for further analysis by our client’s Data Science team. The data collected represents user activity over a few hundred websites, all owned by our client. The project’s purpose is to use this information to drive effective marketing, and in general to better understand their users.
Other systems we manage collect similar user information, but do so for single business units. We help by providing suites of tools used for business intelligence analysis, used to generate reports which feed back into our clients’ sales tools.
- You love automation. You’re always asking: how can we automate this. You’re all over Docker and orchestration tools in general.
- You build systems which are solid. Thousands of events per second? No problem. You know what’s happening on the system you built at any time (through monitoring, alerts, etc.) and you proactively take measures to address potential future problems.
- You follow best practices and don’t waver when pushed to cut corners. One mistake and hey, we’ve lost that data forever.
- You’re highly organised. You plan every change meticulously, ensuring that data collection goes on at all times and that the system never loses data.
- You’re concerned about the ethics of data collection. You appreciate the tremendous value that identifying and targeting a user has for business, but you also appreciate privacy and respect user preferences. You also appreciate the business case for ad targeting. You understand the business desire to target sales campaigns to people who are more likely to engage, but also that such things should be done on the user’s terms, and shouldn’t “creep them out”.
Skills and Requirements
- 3+ years experience working with data ingestion and processing technologies such as Kafka, Spark.
- 3+ years experience working with data warehousing technologies such as Hadoop.
- Solid grasp of major databases like PostgreSQL.
- Solid grasp of functional querying via techniques like MapReduce.
- Experience with containerisation tools like Docker and Kubernetes.
- Experience with user tracking via HTTP cookies, device fingerprinting, etc.
If you think you have what it takes, get in touch with us by sending an email to firstname.lastname@example.org.