Job Title: HPC Security and Infrastructure Specialist
Location: [Specify Location or Remote]
Employment Type: Full-Time
About Us:
DMV IT Service LLC is a trusted IT consulting firm, established in 2020. We specialize in optimizing IT infrastructure, providing expert guidance, and supporting workforce needs with top-tier staffing services. Our expertise spans system administration, cybersecurity, networking, and IT operations. We empower our clients to achieve their technology goals with a client-focused approach that includes online training and job placements, fostering long-term IT success.
Job Purpose:
We are seeking an experienced HPC Security and Infrastructure Specialist to join our team. This role will focus on managing, securing, and maintaining high-performance computing (HPC) systems, storage solutions, and cloud infrastructure. The ideal candidate will have a strong background in systems administration, high-speed network storage systems, cloud systems, and providing technical support for complex IT environments.
- Implement and maintain data management infrastructure and HPC security measures.
- Provide ongoing support for SAN, NAS storage, backup/recovery environments, and virtualization infrastructure.
- Ensure the security, disaster recovery, and service continuity of highly available enterprise storage and backup systems.
- Perform technical support for installation, configuration, maintenance, upgrade, troubleshooting, and retirement of IT systems.
- Utilize frameworks like Ansible, Puppet, and Chef for configuration management.
- Administer high-speed network storage systems, including Mellanox switches and NAS clusters.
- Manage, configure, and support cloud systems, including setting up, maintaining, and troubleshooting cloud compute engines and storage buckets.
- Administer and manage databases such as SQL Server, PostgreSQL, MySQL, and Oracle.
- Assist staff with accessing and utilizing computational resources effectively.
- Collaborate with Labs and DTMB staff to maintain and manage computational resources.
Skills & Experience:
- 10+ years of experience with Linux CLI and programming languages such as R, Python, and Bash.
- 10+ years of experience with workload management systems, particularly SLURM.
- 10+ years of experience in setting up HPC systems, including identifying suitable hardware and software.
- 10+ years of experience with databases such as PostgreSQL and system administration tasks such as installation and support.
- Strong hands-on experience with Network Appliance clustered servers and applicable software.
- Expertise in Linux configuration for storage, networking, load balancing, memory management, VMs, firewalls, and system monitoring.
- 10+ years of experience with computer security and implementing security protocols.
- Familiarity with package management systems such as conda, Docker, and Singularity.
- Experience with automation tools like Ansible, Puppet, and NextFlow.
- Knowledge of cloud computing, including setting up compute engines and managing storage buckets.
- Strong knowledge of enterprise storage solutions and big data analysis frameworks.
- Ability to provide recommendations for storage optimizations and cost-saving solutions for Labs.
- Familiarity with HL7 messaging and interpreting web.config files for plugins.
- Experience reviewing logs (e.g., IIS logs, Dynatrace logs) to ensure no excess resource utilization or performance spikes.
- Familiar with CloudFlare, ForcePoint, and the related rules and policies (e.g., C86 rule).
- Expertise in understanding and configuring Failover environments for applications.