How to develop ansible-review standards?

One of the very important aspects of infrastructure as a code approach is automated testing of code standards. In my case the code is ansible playbooks, tasks and variables. Looking for available solution I found a great tool announced by Whill Thames on his blog [1] – ansible-review [2]. Unfortunately the example standard shown on [1] is very simple and other standards rely on ansible-lint which doesn’t show full capabilities of ansible-review. In this post I’d like to share my experience with standards development, based on a few examples. I’m not going to discuss basic usage of ansible-review – you can check that on one of cited sites.

How to develop ansible-review standards?

In all examples below I assume you have imported packages from the example standards.py

Example1: Don’t allow spaces in task name.

To make it easier to copy task name (double click on text to mark the word) and use it in --start-at-task it’s convenient to forbid spaces in task names. To achieve this with ansible-review standards.py you may use code similar to one below:

Reading of the code should start in line 14 where we define the standard called task_name_should_not_have_spaces. As required by ansible-review framework standard is an association of:

  1. name – the string describing the standard,
  2. check – python function that will verify if the standard is matched,
  3. types of ansible elements where the standard applies (tasks, defaults, playbook, etc.),
  4. and version in which the standard is enforced.

In the example above evaluation is performed by function called check_task_name defined in 1st line of the code snippet. Check functions receive 2 arguments:

  1. candidate – a file being reviewed
  2. and setting – a dictionary with lintdir, config file and a few other settings. Personally I don’t have any particular use case for them in check function.

What happens in our 1st check function is straightforward. We open the file, read it line by line. If the second word is name: (The first is just ) then we check if the number of words in the line is greater than 3, since this indicates space in name. If this is the case we add an Error to the errors list in the Result object returned by check function.

Example 2: Check if variables defined in vaulted defaults/main.yml file are prefixed with role name.

If you have some experience in maintaining an ansible repository with multiple roles you probably know that having variables defined in different places sometimes make it difficult to predict the result. Because of that it may be a good practice to prefix variables defined in defaults with the name of the role, where it was defined. Even if you’ll overwrite this variable in group settings it will always remind you that initial idea was to use it within a role. It’s quite common to store some secrets like passwords and API tokens, so the good practice is to encrypt those files. In the example below additionally to standard ansible-review imports we use ansible-vault python package to handle data decryption.

As you can see in the listing above we define the standard called all_defaults_start_with_rolename that will be applied only to defaults. In the check function we parse out the role name from file path (actually doing a double verification of the file being a defaults/main.yml). As you see in line 9 it’s assumed that vault password is stored in /etc/ansible/vault-password than we use Vault.load method to decrypt and read the file. This method internally executes yaml parser that stores the input into defaults dictionary. Our next step is simple iteration over the keys in dictionary to check their names. If one is missing appropriate prefix another entry is added into reslut.errors list.

Example 3: Make sure that all tasks in role have a standard “role” tag.

One of the very important differences of ansible and puppet is that it’s more natural to push configuration changes to hosts than automatically execute everything on hosts. When you’re deploying changes you normally use playbooks containing much more than latest modification. I find it convenient to have standard tags for all tasks in role. This is something you can achieve with the standard defined below:

It’s quite similar to the previous example. Input data is not encrypted, so instead of Vault.load loading of YAML is done by parse_yaml_linenumbers function, beside that the logic is easy to understand – the loop in line 9 iterates over all tasks, checks if they are tagged with role_ROLE NAME.

In this case it’s quite important to emphasise that it may not be sufficient in case of tasks included dynamically from main.yml, those will not inherit the tag, so the logic should be a little bit more sophisticated. If you have this issue addressed – let me know 🙂

Example 4: Fail on services restarted in tasks.

Personally I think that services should never be restarted in tasks, this should be always done by handler. Tasks should only use services states like started or stopped. Implementation of this may look like:

The most crucial part is the if statement in line 7. It checks if the task has service key indicating the module used. In this case we check if state was specified and report an error when it’s restarted. As for exmaple 3, there are some cases where this standard will not work like task without full YAML format (in line module arguments) or use of local_action.

I hope those examples will help you developing your own ansible-review standards!

[1] http://willthames.github.io/2016/06/28/announcing-ansible-review.html
[2] https://github.com/willthames/ansible-review

Advertisements

Notes from the XDMoD patch mitigating the issue with overestimated wall time for suspended jobs.

2018-09-13_09h22_48XDMoD[1] is a fantastic tool that allows various summaries of HPC clusters accounting. It supports all popular HPC resource managers including Slurm[2], which is the queuing system of my choice for more than 5 years. I have a very good opinion on XDMoD code quality, so the day I saw utilization of the cluster being over 100% for a few days my eyes were on stalks. (You can see the plot for the stacked per project utilization of the cluster below). Checking on the queuing system accounting I simply used sreport command which shown that the nodes for a specific day were occupied all the time, but this should end-up with 100%, shouldn’t it?

Notes from the XDMoD patch mitigating the issue with overestimated wall time for suspended jobs.

Digging deeper into the details of the jobs executed during this period I noticed that there are several with long “Suspended” time. My guess was that maybe this is the reason of XDMoD overestimated cluster utilization. Asking this question on support mailing list I received a confirmation that similar issues were observed on the cluster with job suspension enabled(Thanks to Trey Dockendorf, for prompt replies). Finally checking the code I understood one very fundamental difficulty – queuing system doesn’t provide the time slots when the job was running, so XDMoD creating a plot as a function of time doesn’t have all the required information to work 100% correct.

I feel that like the best approach would be to add another parameter to XDMoD like fraction of time or change number of CPUs to floating number and in case of suspended or gang scheduled jobs recalculate it to be numberOfCoresUsed * (endTime – startTime) / wallTime. I didn’t feel like this is a change I can implement within a few hours and get it merged to upstream, but in my case the issue was coming only from a few short jobs (a few minutes of wall time) that were suspended for more than a day.

Quick fix for me was to add another dates validation into the “Shredder” code – simply if endTime – startTime is larger than wallTime provided by the queuing system I decided to falsify endTime with the value of startTime + wallTime. Such approach won’t fix all potential issues, but it will mitigate it a lot. The issue will still be visible in the case of gang scheduled jobs. XDMoD will show all of them running in the same time period (utilization over 100%) ending up earlier than it really happened (appropriately lower utilization of the cluster in this period). However, the clear benefit is that total utilization will be correct. At the time of writing of this post I’m trying to get this merged into XDMoD github project. We’ll see if this will be accepted [3], but if you’re looking for this partial fix just update your code with the patch from the gist below.

Since this applies changes on “Shredder” stage it won’t fix data you already have ingested. To achieve this, you’ll have to remove this part of data and ingest it one more time. This requires manual jobs data removal from XDMoD backend databases, which can be done with the help of the script below.

[1] https://open.xdmod.org/8.0/
[2] https://slurm.schedmd.com/
[3] https://github.com/ubccr/xdmod/pull/729

[SOLVED] Singularity 2.6 – fails to resize writable container

Executing singularity image expand -s 1G ./mycontainer.sigm failed for me with following error message:

e2fsck 1.41.12 (17-May-2010)
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open ./centos.simg

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 

which suggests that e2fsck is simply called against file/device that doesn’t have EXT file system inside. However, because I knew that this is writable image[1] it means that EXT is there. My guess was that file system doesn’t start from the very beginning which… happened to be the issue.

Debuging and fixing singularity image expand failure.

I was able to write inside container, so it was easy to find the way file system is mounted. Simply run a sleep process inside singularity and check mounts of the process in /proc/PID directory. In my case following commands were helpful:

[root@srv ]# singularity exec ./centos.sigm sleep 10m &
[3] 6342
[root@srv ]# ps aux | grep sleep
root      6342  0.0  0.0 100916   608 pts/18   S+   10:22   0:00 sleep 10m
root     15721  0.0  0.0 103252   864 pts/24   S+   06:56   0:00 grep sleep

I’ve added ps output just to make it obvious that singularity that returned PID is actually sleep running within singularity containter.

For the purpose of the post I’ll write only |grep loop part of process mounts, since it’s quite long. As you’ll see on the listing below the device is mounted as ext3 file system in read-only mode, which is the case because I didn’t add --writable option to my singularity exec.

[root@srv ]#cat /proc/6342/mounts | grep loop
udev /dev/loop1 devtmpfs rw,relatime,size=132263744k,nr_inodes=33065936,mode=755 0 0
/dev/loop2 /local/singularity/mnt/container ext3 ro,nosuid,relatime,errors=remount-ro,barrier=1,data=ordered 0 0
/dev/loop2 /local/singularity/mnt/final ext3 ro,nosuid,relatime,errors=remount-ro,barrier=1,data=ordered 0 0
udev /local/singularity/mnt/final/dev/loop1 devtmpfs rw,relatime,size=132263744k,nr_inodes=33065936,mode=755 0 0

Let’s check how the loop device was created to verify if the issue is really an offset of ext3 formatted space inside our file:

[root@srv ]# losetup -a | grep loop2
/dev/loop2: [001d]:6712556 (), offset 31

Bingo! the offset is 31, simply manually creating a loop device with this offset and running tools like dd, e2fsck and resize2fs allowed me to resize the container file system. Checking the code I found that in version 2.6 the whole responsibility is on the shell script called image.expand.exec. I’m not sure if the offset is always 31, but in this case you can use the patch below (it’s done against 2.6 tag)

Thanks to @jmstover [2] I know that those 32 bytes it’s something expected in every .sigm file, since it’s simply shebang:

[root@usinkok-log01 singularity]# head -c 32 ./centos.sigm
#!/usr/bin/env run-singularity

The goal of it is to allowi simplified execution of applications in the container – doing for instance ./myContainer.sigm. Nevertheless, the offset is fixed so I submitted my patch as a pull request[3].

[1] http://singularity.lbl.gov/docs-flow#writable-image
[2] https://github.com/sylabs/singularity/issues/1076
[3] https://github.com/sylabs/singularity/pull/2150

What one should know developing an API 2 API serverless translating proxy.

Grafana snow integration schemeAs a feedback from the last post about new version of snow-grafana-proxy[1] I’ve got a question on similar functionality implemented as AWS Lambda, which is a cloud service dedicated for serverless infrastructure. In this model you pay for memory*time computing hours, you don’t have to think about the platform. You may think that you have to keep it running because it has to listen for incoming connections, but here AWS API Gateway service comes into play allowing you to configure an HTTP listening endpoint executing AWS Lambda function per request. The whole concept is depicted on the schema.

How to configure AWS API gateway with Lambda functions working as a backend?

This question was answered number of times, so instead of repeating it I’ll just redirect you to AWS dosc and blog post I read [2,3]. If you are interested in configuration done from awscli you can find appropriate commands in README file in subproject directory [4].

I’d like to focus on a few hints for those who would like to create similar service, so the topic is:

What one should know developing an API 2 API serverless translating proxy.

1) Prepare your own local test cases.
However it’s possible to test everything as AWS Lambda free tier I think it’s not an efficient test procedure. For some reason I’m vim user, I really don’t use any sophisticated IDE – maybe this is why the option to edit a script in AWS web interface was not comfortable enough for me. The easiest way was to add a few last lines:
with those few lines I was able to test proxy -> service-now.com interface without the need to repeatedly update lambda functions code. It requires to zip the files and upload new-version, both are quick operations, but still if you’re doing it tens or hundreds of times event additional 5 seconds matters.

2) Before you start check if everything can be serverless.
In my case it’s really true – one can use grafana as a service, ServiceNow is also hosted for users without the need to have your own servers, so snow-grafana-proxy based on SaaS makes perfect sense. If you need to maintain the platform for one of those sides, lightway translating proxy can and should be deployed there.

3) Check if backend API replies promptly.
I was testing this on instance from ServiceNow developers program, but also from our company reality I know that it may take some time to get an answer for ServiceNow. Of course, you can adjust AWS Lambda function timeout, however, you pay for execution walltime. It makes no difference if your function was waiting or doing real job, so from cost perspective my tests on developer instance, which shown the need to increase the timeout to ~30s, where not very promising.

4) Serverless means no daemon.
This may simplify development of “translator”, but will also limit your possibilities. In my case I had to remove a lot of in-memory caching, since it didn’t make sense. Subsequent call will start the process from scratch – it won’t “remember” what was the result from backend API we got 5s ago. Of course you can use another cloud service to store this. Interested in this – create a pull request or open an issue with suggestions 🙂 You can also use caching capabilities of AWS API Gateway, but… in case of API 2 API it’s highly probable that configuring this won’t be a piece of cake – for instance parts of the request may not be relevant, which means that you may get lower hit-rate than possible.

5) Hard-coding may not be bad idea.
Simply hardcoding configuration into scripts will reduce the number of dependencies. It’s important since if you need additional modules (not available in AWS Lambda by default) you have to create a deployment package [5]. It’s not difficult, especially if you’re developing your functions in virtualenv, but will have impact on price, since it will increase memory requirements (important if you’re going above minimum of 128MB and you’re not waiting 20s for backend reply – as I was :).

Finally, special thanks goes to @JanGarlaj who opened the feature request and gave me a few hints on how the implementation should look like.

——-
You may be interested in other posts from this category.

[1] https://github.com/cinek810/snow-grafana-proxy/releases/tag/v2-rc1
[2] https://serverless.com/blog/flask-python-rest-api-serverless-lambda-dynamodb/
[3] https://docs.aws.amazon.com/lambda/latest/dg/python-programming-model-handler-types.html
[4] https://github.com/cinek810/snow-grafana-proxy/tree/master/aws-lambda
[5] https://docs.aws.amazon.com/lambda/latest/dg/lambda-python-how-to-create-deployment-package.html

Integration for ServiceNow table API and grafana.


Some time ago I wrote a blog post about my approach to ServiceNow and grafana integration, you can find it under this link[1]. The key concept used there is presented on the diagram below. Grafana snow integration scheme. Besides the technical aspects of integration, operational results were very good and reduced time incidents were spending in my “queue” – simply giving an overview of what is assigned to whom and what’s the status of tickets. However, due to the lack of flexibility in 1st version of snow-grafana-proxy implementation it was difficult to reuse it in other places. Attributes returned to grafana, lookup methods and the table were hard-coded. I decided to rewrite the service and here we are.

New version of snow-grafana-proxy available!

You can find new release available on projects github page[2]. I can say it’s in beta phase – there are no known issues. However, if you’ll encounter any difficulty just let me know opening an issue on project github page. (I’ve been testing it on Kingston developer instance.)

New configuration file

Configuration file format changed from ini to YAML. This change allowed much more structured configuration. In current state it’s possible to configure multiple queries, against any service-now table with arbitrary filters. Each value has a configurable “interpreter”, at the time of post publication available are:

  • none – Simply return value of the attribute specified as “name” argument.
  • map – Use “map” dictionary defined for attribute and send corresponding values assuming that value from service-now is a key for “map” dictionary.
  • object_attr_by_link – Assumes that for this attribute service-now API returns value/link pair. In this case additional HTTP request may be needed to get information available under the link. This interpreter requires additional parameters specified in interpreterParameters dictionary, for instance: interpreterParams: { linkAttribute: "name", default: "FailedToGetName"} will send to grafana value of the name attribute available under the link from previous get request. In case of failure interpreter will return the value “FailedToGetName”. Default value is important since sometimes the value is really undefined – like description of assignment group for unassigned incident. Those values are cached until snow-grafana-proxy restart which greatly reduce number of REST calls to service-now.

An example configuration file is available in the repository, let me quote it here:

As you see there are a few additional parameters I forgot to explain:

  • cacheTime – which will cache query replies for specified number of seconds, so if we get the information once and someone does the same query after a few seconds (another user having grafana dashboard opened or even the same one, but with very short auto refresh time) proxy will reply immediately with the same information.
  • snowFilter – the value passed to ServiceNow API as a request parameters. You can check available parameters in snow documentation[3].
  • table – quite self-explanatory, name of table you’d like to get information from, like incident, sc_task, change.

Resuls of the example configuration file, may look like on images below.

New command line options

Additionally to changed configuration file, this version supports two command line options. Explained in commands help:

 usage: snow-grafana-proxy.py [-h] [-c CONFIGFILE] [-d]

Simple JSON data source for grafana retreving data from service-now

optional arguments:
  -h, --help            show this help message and exit
  -c CONFIGFILE, --configFile CONFIGFILE
                        Configuration file to use
  -d, --debug           Start in foreground and print all logs to stdout

Yes! By default snow-grafana-proxy will daemonizeitself, if you’d like to run in foreground for debug purpose just use -d option as explained above.

Feedback appreciated!

[1] https://funinit.wordpress.com/2018/02/20/simple-integration-of-servicenow-and-grafana/
[2] https://github.com/cinek810/snow-grafana-proxy/
[3] https://docs.servicenow.com/bundle/geneva-servicenow-platform/page/integrate/inbound_rest/reference/r_TableAPI-GET.html

[VLOG] How to configure sssd idmap parameters to get minimal collision probability?

This post covers quite complicated topic of sssd configuration parameters for SID to UID/GID mapping. I promised to share this with you a few weeks ago. The key aspect here is to understand the principles of mapping algorithm implemented in sssd, which is something I described in previous post and vlog, however consequences may be not so obvious.

To help myself configuring parameters like ldap_idmap_range_max, ldap_idmap_range_min or ldap_idmap_range_size. I developed a mapping simulator – you can find the code on github. At the time of this post publication all parameters are hardcoded, but it’s very easy to adjust the script to your needs.

Interested in some details? Watch the video.

Ansible role used as plugin for simple REST API services.

For quite long time in our infrastructure we’ve been managing services in icinga2 based on set of lists defined in ansible variables. The advantage of this approach was nice icinga2 configuration with one file per host gathering all services. However, the direct consequence of this was the need to manage all monitoring endpoints within one ansible task – simply a template with definition of host and all “Apply” statements. At some point this approach started to be annoying because all the icinga definitions were in one role – different than software and monitoring plugins deployment. Second issue were files with variables reaching thousand lines (reading such a long yaml is not a pleasure). One day I thought – it’s time to switch to services creation over API, there must be an ansible module that will help us

How to dynamically create icinga2 services from ansible?

Surprisingly, when you check ansible website[1] you’ll notice that at the time of the post publication in main distribution there are two icinga2 modules – icinga2_feature and icinga2_host. I had some experience with icinga2 API, so my first thought was to develop icinga2_service module or join some project working on it. However, after a few minutes I realized that it’s really only about a few rest calls which can be implemented as ansible role applied to other roles as a dependency.

Icinga2 API has basically 4 calls for service objects, each realized by different HTTP method. Those calls are really self-explanatory:

  • GET, to check if service exist and read it’s attributes
  • PUT, to add new service
  • POST, to update existing one
  • DELETE.

Basic design for the role is to pass it information about the service to be monitored with parameter describing its state: deleted/present and dictionary with all necessary service attributes (name, check_comand, etc.). First task will just check if service exist and register result as a variable then if service state is present we create or update the existing service (based on the registered result) or delete the service if registered result of GET shows that service exist You can check full implementation github gist [2]. If you want to add specific monitoring service to software deployed in your role, you can simply add the below line to meta/main.yml
dependencies:

 - { role: icinga2-service, service: { state: present, command: PassiveCheck, name: TestService2}


Or apply it in a playbook, like here:

- hosts:
  - usinkok-acc01
  roles:
  - icinga2-service
  vars:
   service:
    name: TestService2
    state: present
    command: PassiveCheck

The only missing piece is the check if the service was changed during the update. Unfortunately, current icinga2 replay to POST in both cases is simply 200. Making it more difficult to correctly implement changed_when.

I’m checking the possibility to add this feature to icinga2. If I’ll be successful on this field I’ll let you know here :).
Nevertheless, what do you think about this approach? I thought about import_task as an alternative, but in this case I’m not sure where can I store some sensitive variables like API user secret password. If you want to share your ideas feel free to leave a comment.

[1] https://docs.ansible.com/ansible/2.5/modules/list_of_monitoring_modules.html
[2] https://gist.github.com/cinek810/e41fac186bbcd7645875833b97b1b372#file-tasks_main-yml

[VLOG] How sssd maps SID to UID/GID

A few months ago I wrote a blog post analysing idmap – internal library of sssd. I received quite positive feedback about it, but also a few requests for something more high level. Explaining the basics without direct reference to sssd code.
Here we are – 1st video on funinit.
PS. The date in lower right corner is wrong and really not needed – forgive me, it’s 1st vpost 🙂


You may be also interested in next about sssd algorithmic mapping: How to configure sssd idmap parameters to get minimal colision probability

How to use job_submit_lua plugin with Slurm ?

Lua is scripting language implemented as a C library, which makes it perfect choice for small plugins to bigger C applications. It’s both efficient (since it’s really a C library, lua file can be read once and functions are normally executed multiple times) and easy to modify.

In case of queuing system like Slurm[1] there is always a need for customization that will fulfill organization specific requirements – like some preliminary checks done on job submission. In this case Slurm framework offers so called job_submit plugins, which are normally a shared libraries (.so) implementing two required functions job_submit and job_modify and two optional init and fini. While compiling .c into shared library is not a big deal there are situations where one would rather use script version, for those Slurm provides job_submit_lua plugin that simply calls lua script for real work. In this post I’ll cover the issues I met configuring it.

How to use job_submit_lua plugin in Slurm?

Compile Slurm with job_submit_lua.so

When you run ./configure it has to discover lua libraries to link against them. In my case (Centos 6), when I just installed lua-devel (yum -y install lua-devel) and executed./configure I noticed

configure: WARNING: unable to locate lua package

and in config.log

[...]Package lua was not found in the pkg-config search path.                                                                                                                
Perhaps you should add the directory containing `lua.pc'                                                                                                                
to the PKG_CONFIG_PATH environment variable                                                                                                                             
No package 'lua' found                                                                                                                                                  
configure:24443: $? = 1                                                                                                                                                 
configure:24457: result: no                                                                                                                                             
No package 'lua' found                                                                                                                                                  
configure:24533: WARNING: unable to locate lua package
[...]


Simple check with manual execution of pkg-config, confirmed that it doesn’t exist

[root@hpc-slurmtest slurm-17.11.7]# pkg-config --exists --print-errors "lua-5.1"                                                                                        
Package lua-5.1 was not found in the pkg-config search path.                                                                                                            
Perhaps you should add the directory containing `lua-5.1.pc'                                                                                                            
to the PKG_CONFIG_PATH environment variable                                                                                                                             
No package 'lua-5.1' found


OK.. maybe CentOS 6 is too old and it has older version of lua, check:

[root@hpc-slurmtest slurm-17.11.7]# yum info lua-devel     | grep '^Version'                                                                                                             
Version     : 5.1.4

So maybe it doesn’t provide a .pc (package config file)

[root@hpc-slurmtest slurm-17.11.7]# rpm -ql lua-devel | grep ‘.pc’                           
/usr/lib64/pkgconfig/lua.pc

File is there, but it’s name is not lua-5.1 or lua5.1 it’s just lua, let’s check if symlink creation will help the .m4 macro to find it:

ln -s  /usr/lib64/pkgconfig/lua.pc /usr/lib64/pkgconfig/lua-5.1.pc
./configure
[...]
checking for lua... yes
checking for whether we can link to liblua... yes lua
[...]

I tried to check how the macro is implemented, you can find it in auxdir/x_ac_lua.m4. I rewritten it to more standard use of PKG_CHECK_MODULE and asked SchedMD for opinion[2]. Additional research revealed that lua development team doesn’t provide pkg-config files[3]. They are created by distribution packagers, so they may differ between operating systems and the dirty implementation in Slurm autotools macro may be just fixing incompatibility issues.

Remember that if you play with macros used by autotools you have to recreated configure. Slurm follows quite standard way of doing it by make configure, but it looks like Makefile dependencies are not implemented correctly. I had to remove configure manually, because make didn't notice changes in x_ac_lua.

Where should I put my .lua script?

I got it compiled so could switch to implementation of my .lua script, but where to create it? Checking slurm.conf manual in section about JobSubmitPlugin you’ll find explanation that was quite enigmatic, at least for me when I read it:

JobSubmitPlugin
[...]
For examples of use, see the Slurm code in "src/plugins/job_submit" and "contribs/lua/job_submit*.lua" then modify the code to satisfy your needs. Slurm can be configured to use multiple job_submit plugins if desired, however the lua plugin will only execute one lua script named "job_submit.lua" located in the default script directory (typically the subdirectory "etc" of the installation directory). No job submission plugins are used by default.

When I read this for the first time I was unable to understand where should I put my job_submit.lua, but quick check on src/plugin/job_submit/lua/job_submit_lua.c shown that location comes from DEFAULT_SCRIPT_DIR, which is

[root@hpc-slurmtest slurm-17.11.7]#  grep -ri  DEFAULT_SCRIPT_DIR ./src/plugins/job_submit/lua/Makefile
./src/plugins/job_submit/lua/Makefile:AM_CPPFLAGS = -DDEFAULT_SCRIPT_DIR=\"$(sysconfdir)\" \

Simply ${prefix}/etc in other words next to slurm.conf and I agree documentation stated it clearly 🙂

Develop the plugin

My goal was to develop a plugin that will prevent submission of jobs without explicit account specification (we found out that users don’t adjust account specification simply relying on the default, which caused some internal billing issues). I simply copied the example, changed the job_submit function to very simple:

function slurm_job_submit(job_desc, part_list, submit_uid)

        if job_desc.account == nil then
                slurm.log_user("You have to specify account. Usage of default accounts is forbidden.")
                return slurm.ESLURM_INVALID_ACCOUNT
        end
end

Added JobSubmitPlugins=lua line to slurm.conf, restarted slurmctld and attempted to test it by srun --pty bash -l. Surprisingly error message, one from slurm.log_user(), was displayed but the job started, which means that my return statement didn’t really return an error. I just used ESLURM_INVALID_ACCOUT since it’s one of the error values defined in slurm/slurm_error.h and was the most suitable for me. However, how is it passed to lua ?

Dig into Slurm code one more time you'll see it’s passed to the lua library in job_submit plugin init function and not all error codes are there. Now the sentence I like: "It’s open source so just add what you need". One more time create a patch and send it to main development team to check if it's fine and to be able to use my script after Slurm upgrade. It was accepted in a few hours[4] and job_submit script worked as expected:

[root@hpc-slurmtest slurm]# srun --pty bash -l
srun: error: You have to specify account. Usage of default accounts is forbidden.
srun: error: Unable to allocate resources: Invalid account or account/partition combination specified

[1] https://slurm.schedmd.com/
[2] https://bugs.schedmd.com/show_bug.cgi?id=5263
[3] http://lua-users.org/lists/lua-l/2010-05/msg00641.html
[4] https://github.com/SchedMD/slurm/commit/2aa74b52106ff41a39e849839774e6e499807a1e

Recovering orphaned vmware VMs from Power Shell

As a result of vmware cluster outage, I’ve run into situation were some VMs were “orphaned”. The way to have it recovered is to remove VMs object (if one exist) from inventory and then register a new VM from vmx file we have on data store. If you have a substantial number of VMs in this state, the only reasonable way to do that is to use PowerShell script. Searching the web you can find more than a few long and complicated scripts designed to deal with this situation. However, probably I’m not the only one sceptical about using scripts from the web on production environment in critical state without fulling checking what the script does. Finally, this takes more time then development of your own script. So instead of sharing long .ps I’ll try to gather short snippets of code addressing bits of the “exercise”.
Options -RunAsync and -Confirmed:$false are obviously optional, but if you’re trying to automate the process both are helpful.

Recovering orphaned vmware VMs from Power Shell

How to find .vmx file for my orphaned VM?

asnp *vmware*
Connect-VIServer -Server myVcenterServerAddress
$myDS="vsanDatastore"
$DS=Get-Datastore -Name $myDS
$DSView=Get-View -Id $DS.id
$DSBrowser = get-view $DSView.browser
$SearchSpecObj = New-Object VMware.Vim.HostDatastoreBrowserSearchSpec
$searchRes = $DSBrowser.SearchDatastoreSubFolders(( "[" + $myDS + "]" ), $SearchSpecObj)

$myVM="MyVMName*"
foreach ( $dir in $searchRes )
{
    $VMXFile = ($dir.file | where { $_.Path -like "*.vmx" })
    if ( $VMXFile.Path -like $myVM)
       {
          Write-Host $VMXFile.Path
        }
}

SearchDatastoreSubFolders – may take a while to execute, but you can comment it out if your executing the script subsequently from PowerShell ISE.

How to register a vm from .vmx file?

New-VM -VMFilePath $VMXPath -VMHost $vmHost -RunAsync

Remember that this will not start the VM, you need to execute Start-VM

What is the valid path for -VMFileName option? (“[…] is not valid path to a virtual machine”)

The correct format for this option is: '[yourDataStoreName] directory/yourvm.vmx', if you are in the foreach loop from the 1st listing, you can build it like here:

  foreach ($dir in $searchRes )
  {
    
    $VMFolder = (($dir.FolderPath.Split("]").trimstart())[1]).trimend('/')
    $VMXFile = ($dir.file | where { $_.Path -like "*.vmx" })
    $VMXPath = ("[" + $DSName + "] " +  $VMFolder + "/" + $VMXFile.Path)
}

Noticed the space after “]” ? – It’s required.

How to find VMHost (ESX) appropriate for my VM?

If you have an orphaned inventory entry you can still get this as:

$vmHost=(Get-VM -Name $myVM).VMHost.name

How to remove orphan entry from the inventory?

(“The specified key, name, or identifier already exists”)
Remember that you should get the VMHost from this entry before you remove it.

Remove-VM -VM VM-NAME	-Confirm:$false