User Tools

Site Tools


developersguide:liveupdate

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
developersguide:liveupdate [2018/09/03 13:31]
stux revert this code tag back to what it was
developersguide:liveupdate [2022/02/12 22:42] (current)
stux renamed service(8) to minix-service(8) in various places
Line 205: Line 205:
 === Live update commands === === Live update commands ===
  
-RS can be instructed to perform live updates through the service(8) command, specifically through its **service update** subcommand. This command is also used by the automated scripts. For a full overview of the command'​s functionality,​ please see the service(8) manual page as well as the command'​s output when it is run with no parameters.+RS can be instructed to perform live updates through the minix-service(8) command, specifically through its **minix-service update** subcommand. This command is also used by the automated scripts. For a full overview of the command'​s functionality,​ please see the minix-service(8) manual page as well as the command'​s output when it is run with no parameters.
  
-In its most fundamental form, the //service update// command will update a running service, identified by its label, to a new version provided as an on-disk binary file. It is however also possible to tell RS to update the service into a copy of itself. In addition, various flags and options can be used for fine-grained control of the live update action. The basic syntax to perform a live update on a single system service is as follows:+In its most fundamental form, the //minix-service update// command will update a running service, identified by its label, to a new version provided as an on-disk binary file. It is however also possible to tell RS to update the service into a copy of itself. In addition, various flags and options can be used for fine-grained control of the live update action. The basic syntax to perform a live update on a single system service is as follows:
  
-  minix# service [flags] update [self|<​binary>​] -label <​label>​ [options]+  minix# ​minix-service [flags] update [self|<​binary>​] -label <​label>​ [options]
  
 Through various combinations of this command'​s parameters, MINIX3 basically supports four types of updates, representing increasingly challenging conditions for the overall live update infrastructure in general, and state transfer in particular. We will now go through all of them, and explain how they can be performed. For more details regarding what is actually going on below the surface, please consult the developers guide section of this document. Through various combinations of this command'​s parameters, MINIX3 basically supports four types of updates, representing increasingly challenging conditions for the overall live update infrastructure in general, and state transfer in particular. We will now go through all of them, and explain how they can be performed. For more details regarding what is actually going on below the surface, please consult the developers guide section of this document.
Line 215: Line 215:
 == Identity transfer == == Identity transfer ==
  
-The first update type is **identity transfer**. In this case, the service is updated to an identical copy of itself, with all functions and static data in the new instance located at the exact same addresses as the old instance. Identity transfer bluntly copies over entire memory sections at once, thus requiring no instrumentation at all. This makes it suitable for testing of the MINIX3-specific side of the live update infrastructure,​ hence its use in the ''​testrelpol''​ script. Identity transfer is the default of the service(8) command when "​self"​ is given instead of a path to a new binary:+The first update type is **identity transfer**. In this case, the service is updated to an identical copy of itself, with all functions and static data in the new instance located at the exact same addresses as the old instance. Identity transfer bluntly copies over entire memory sections at once, thus requiring no instrumentation at all. This makes it suitable for testing of the MINIX3-specific side of the live update infrastructure,​ hence its use in the ''​testrelpol''​ script. Identity transfer is the default of the minix-service(8) command when "​self"​ is given instead of a path to a new binary:
  
-  minix# service update self -label pm+  minix# ​minix-service update self -label pm
  
 This will perform an identity transfer of the PM service. Identity transfer should work for literally all MINIX3 system services. As mentioned, it is guaranteed to work only when the system was built with ''​MKMAGIC=yes'',​ although it will mostly work on systems built without magic support as well. It works regardless of whether the target service was instrumented with the magic framework (or ASR). This will perform an identity transfer of the PM service. Identity transfer should work for literally all MINIX3 system services. As mentioned, it is guaranteed to work only when the system was built with ''​MKMAGIC=yes'',​ although it will mostly work on systems built without magic support as well. It works regardless of whether the target service was instrumented with the magic framework (or ASR).
  
-If the live update is successful, the service(8) command will be silent, but RS will print a system message that the update succeeded:+If the live update is successful, the minix-service(8) command will be silent, but RS will print a system message that the update succeeded:
  
   RS: update succeeded   RS: update succeeded
Line 227: Line 227:
 If the system was started on qemu with ''​OUT=F'',​ this message will end up in ''​serial.out''​. Otherwise, the message should show up in the MINIX3 system log (''/​var/​log/​messages''​) and possibly on the first console. If the system was started on qemu with ''​OUT=F'',​ this message will end up in ''​serial.out''​. Otherwise, the message should show up in the MINIX3 system log (''/​var/​log/​messages''​) and possibly on the first console.
  
-If the live update fails, RS should print an error to the system log, and service(8) will complain. In order to debug such failures, it may be useful to enable verbose mode in RS, buy starting the system with ''​rs_verbose=1''​ as shown earlier.+If the live update fails, RS should print an error to the system log, and minix-service(8) will complain. In order to debug such failures, it may be useful to enable verbose mode in RS, buy starting the system with ''​rs_verbose=1''​ as shown earlier.
  
 == Self state transfer == == Self state transfer ==
Line 233: Line 233:
 The second update type is **self state transfer**. Self state transfer also performs an update of a service into an identical copy of itself, but instead uses the state transfer functionality of the magic framework. Thus, self state transfer requires that the service be instrumented properly. This update type can be used to test whether a service'​s state can be transferred without problems. Please note that many of the points covered here also apply to the remaining two update types, as all three are using the state transfer of the magic framework. The second update type is **self state transfer**. Self state transfer also performs an update of a service into an identical copy of itself, but instead uses the state transfer functionality of the magic framework. Thus, self state transfer requires that the service be instrumented properly. This update type can be used to test whether a service'​s state can be transferred without problems. Please note that many of the points covered here also apply to the remaining two update types, as all three are using the state transfer of the magic framework.
  
-Self state transfer is performed by supplying the ''​-t''​ flag along with "​self"​ to the service update command:+Self state transfer is performed by supplying the ''​-t''​ flag along with "​self"​ to the minix-service update command:
  
-  minix# service -t update self -label pm+  minix# ​minix-service -t update self -label pm
  
 This command will perform self state transfer of the PM service. The libmagicrt state transfer routine in the new service instance will print additional system messages while it is running. Upon success, the system output will look somewhat like this: This command will perform self state transfer of the PM service. The libmagicrt state transfer routine in the new service instance will print additional system messages while it is running. Upon success, the system output will look somewhat like this:
Line 249: Line 249:
   RS: update succeeded   RS: update succeeded
  
-If the state transfer routine is not able to perform state transfer successfully,​ it will print messages that start with ''​[ERROR]''​. RS will then roll back the service to the old instance, and both RS and service(8) will report failure. Self state transfer should succeed for all MINIX3 system services that have been built with bitcode and instrumented with libmagicrt and the magic pass. As of writing, there are no system services for which self state transfer is known to result in ''​[ERROR]''​ lines and subsequent live update failure. However:+If the state transfer routine is not able to perform state transfer successfully,​ it will print messages that start with ''​[ERROR]''​. RS will then roll back the service to the old instance, and both RS and minix-service(8) will report failure. Self state transfer should succeed for all MINIX3 system services that have been built with bitcode and instrumented with libmagicrt and the magic pass. As of writing, there are no system services for which self state transfer is known to result in ''​[ERROR]''​ lines and subsequent live update failure. However:
  
   * It is possible that new changes to system services, and even usage scenarios which we have not yet tested, do result in state transfer errors. Such errors should be resolved. The developers guide further below contains information on how to resolve some of these errors.   * It is possible that new changes to system services, and even usage scenarios which we have not yet tested, do result in state transfer errors. Such errors should be resolved. The developers guide further below contains information on how to resolve some of these errors.
Line 257: Line 257:
   * Some services have no state to transfer, in which case their new instances will perform a fresh start instead of state transfer. In that case, live update with self state transfer will succeed, but not print the state transfer system messages shown above. This is the case for the IS (Information Server) and readclock.drv services, for example.   * Some services have no state to transfer, in which case their new instances will perform a fresh start instead of state transfer. In that case, live update with self state transfer will succeed, but not print the state transfer system messages shown above. This is the case for the IS (Information Server) and readclock.drv services, for example.
  
-  * Some services may only be updated once brought into a specific state of quiescence, because the default quiescence state is not sufficiently restrictive. In that case, the user must specify an alternative quiescence state explicitly, through the service(8) ''​-state''​ option. This currently applies to all services that make use of userspace threads, namely the VFS, ahci, and virtio_blk services. These services must be updated using quiescence state 2 (//request free//) rather than state 1 (//work free//):+  * Some services may only be updated once brought into a specific state of quiescence, because the default quiescence state is not sufficiently restrictive. In that case, the user must specify an alternative quiescence state explicitly, through the minix-service(8) ''​-state''​ option. This currently applies to all services that make use of userspace threads, namely the VFS, ahci, and virtio_blk services. These services must be updated using quiescence state 2 (//request free//) rather than state 1 (//work free//):
  
-  minix# service -t update self -label vfs -state 2+  minix# ​minix-service -t update self -label vfs -state 2
  
 Omitting the appropriate state parameter may result in a crash of the service after live update. At the moment, the update_asr(8) script has hardcoded knowledge about these necessary states. None of this is great, and we will be working towards a situation where the default state will not result in a crash - see the section on open issues further below. Omitting the appropriate state parameter may result in a crash of the service after live update. At the moment, the update_asr(8) script has hardcoded knowledge about these necessary states. None of this is great, and we will be working towards a situation where the default state will not result in a crash - see the section on open issues further below.
  
-  * State transfer may be slow, and RS applies a rather strict default timeout for live updates. Therefore, it may sometimes be necessary to set a longer timeout in order to avoid needless failures. This can be done through the ''​-maxtime''​ option to service(8):+  * State transfer may be slow, and RS applies a rather strict default timeout for live updates. Therefore, it may sometimes be necessary to set a longer timeout in order to avoid needless failures. This can be done through the ''​-maxtime''​ option to minix-service(8):
  
-  minix# service -t update self -label vfs -state 2 -maxtime 120HZ+  minix# ​minix-service -t update self -label vfs -state 2 -maxtime 120HZ
  
 The maximum time is specified in clock ticks by default, but may be given in seconds by appending "​HZ"​ to the timeout. The latter may sound confusing and it is, but the original idea was supposedly that the number of seconds is multiplied by the system'​s clock frequency, also known as its HZ setting. The above example allows the live update of VFS to take up to two minutes. The maximum time is specified in clock ticks by default, but may be given in seconds by appending "​HZ"​ to the timeout. The latter may sound confusing and it is, but the original idea was supposedly that the number of seconds is multiplied by the system'​s clock frequency, also known as its HZ setting. The above example allows the live update of VFS to take up to two minutes.
Line 271: Line 271:
 == ASR rerandomization == == ASR rerandomization ==
  
-The third update type is **ASR rerandomization**. Like self state transfer, ASR rerandomization uses the magic framework to perform state transfer. In this case, the service performs state transfer into a rerandomized version of the same service. This involves specifying the path to a rerandomized ASR binary to the service(8) command, as well as the ''​-a''​ flag. The ''​-a''​ flag tells the new instance to enable the run-time parts of rerandomization during the live update.+The third update type is **ASR rerandomization**. Like self state transfer, ASR rerandomization uses the magic framework to perform state transfer. In this case, the service performs state transfer into a rerandomized version of the same service. This involves specifying the path to a rerandomized ASR binary to the minix-service(8) command, as well as the ''​-a''​ flag. The ''​-a''​ flag tells the new instance to enable the run-time parts of rerandomization during the live update.
  
-  minix# service -a update /​service/​asr/​pm-1 -progname pm -label pm+  minix# ​minix-service -a update /​service/​asr/​pm-1 -progname pm -label pm
  
 In a system that has been built with ASR rerandomization,​ the (randomized) base service binaries are located in ''/​service''​ and the (randomized) alternative service binaries are located as numbered files in ''/​service/​asr''​. As mentioned before, the update_asr(8) command can be used to perform these updates semi-automatically. In a system that has been built with ASR rerandomization,​ the (randomized) base service binaries are located in ''/​service''​ and the (randomized) alternative service binaries are located as numbered files in ''/​service/​asr''​. As mentioned before, the update_asr(8) command can be used to perform these updates semi-automatically.
Line 283: Line 283:
 The final update type is a **functional update**. Compared to self state transfer, ASR rerandomization relocates code and more data. However, for ASR rerandomization,​ there are still fundamentally no differences between the old and the new version of the service. In contrast, in the case of a functional update, the service performs state transfer into a new program. While this new program is typically highly similar, it may be different from the running service in various ways. The final update type is a **functional update**. Compared to self state transfer, ASR rerandomization relocates code and more data. However, for ASR rerandomization,​ there are still fundamentally no differences between the old and the new version of the service. In contrast, in the case of a functional update, the service performs state transfer into a new program. While this new program is typically highly similar, it may be different from the running service in various ways.
  
-In terms of the service(8) command, such functional updates can be performed by simply using //service update// with a new binary. For example, one could test a new version of the UDS (UNIX Domain Sockets) service, without installing it into ''/​service''​ yet, and without affecting its open sockets:+In terms of the minix-service(8) command, such functional updates can be performed by simply using //minix-service update// with a new binary. For example, one could test a new version of the UDS (UNIX Domain Sockets) service, without installing it into ''/​service''​ yet, and without affecting its open sockets:
  
-  minix# service update /​usr/​src/​minix/​net/​uds/​uds -label uds+  minix# ​minix-service update /​usr/​src/​minix/​net/​uds/​uds -label uds
  
 The possibility of actual differences between the old and new service versions adds an extra dimension for the state transfer. Additional state transfer problems can be expected in this case, and must be dealt with accordingly. The developers guide will (eventually) elaborate on this point. The possibility of actual differences between the old and new service versions adds an extra dimension for the state transfer. Additional state transfer problems can be expected in this case, and must be dealt with accordingly. The developers guide will (eventually) elaborate on this point.
  
-Similarly, depending on the nature of the update, the update action may require a specific state of quiescence. Taking UDS as an example, an update may change file descriptor transfers over sockets, in which case the update may impose that no file descriptors be in flight at the time of the update. The old instance of the service must support this as a custom quiescence state. This custom state can then be specified through the ''​-state''​ option of the //service update// command.+Similarly, depending on the nature of the update, the update action may require a specific state of quiescence. Taking UDS as an example, an update may change file descriptor transfers over sockets, in which case the update may impose that no file descriptors be in flight at the time of the update. The old instance of the service must support this as a custom quiescence state. This custom state can then be specified through the ''​-state''​ option of the //minix-service update// command.
  
 Since the live update functionality is relatively new for MINIX3, we do not yet have much experience with the practical side of performing functional updates to services. This document will be expanded as we gain more insight into the common usage patterns of live update. Stay tuned! Since the live update functionality is relatively new for MINIX3, we do not yet have much experience with the practical side of performing functional updates to services. This document will be expanded as we gain more insight into the common usage patterns of live update. Stay tuned!
Line 295: Line 295:
 == Multicomponent updates == == Multicomponent updates ==
  
-From the user's perspective,​ updating multiple services at once is not much more complex than updating a single service. First, a number of **service update** commands should be issued, just as before, but each with the ''​-q''​ flag added:+From the user's perspective,​ updating multiple services at once is not much more complex than updating a single service. First, a number of **minix-service update** commands should be issued, just as before, but each with the ''​-q''​ flag added:
  
-  minix# service -q -t update /service/pm -label pm +  minix# ​minix-service -q -t update /service/pm -label pm 
-  minix# service -q -t update /​service/​vfs -label vfs -state 2+  minix# ​minix-service -q -t update /​service/​vfs -label vfs -state 2
  
-Then, the entire update can be launched with the **service sysctl upd_run** command:+Then, the entire update can be launched with the **minix-service sysctl upd_run** command:
  
-  minix# service sysctl upd_run+  minix# ​minix-service sysctl upd_run
  
-The RS output will be much more verbose in this case. Note that timeouts are still to be specified on a per-service basis, rather than for the entire update at once. If necessary, any queued //service update// commands may be canceled with the **upd_stop** subcommand:+The RS output will be much more verbose in this case. Note that timeouts are still to be specified on a per-service basis, rather than for the entire update at once. If necessary, any queued //minix-service update// commands may be canceled with the **upd_stop** subcommand:
  
-  minix# service sysctl upd_stop+  minix# ​minix-service sysctl upd_stop
  
 This will cancel the entire multicomponent live update action. This will cancel the entire multicomponent live update action.
Line 337: Line 337:
 In certain cases, a service may have to meet custom requirements before it is allowed to be updated. This depends on both the service and the update. We previously gave an example regarding the UDS service and transferring file descriptors before. As another example, an update that affects message protocols may have to ensure that the service has no outstanding requests to other services using that protocol. As yet another example, certain drivers may want to avoid being updated while certain types of DMA are ongoing, etcetera. In certain cases, a service may have to meet custom requirements before it is allowed to be updated. This depends on both the service and the update. We previously gave an example regarding the UDS service and transferring file descriptors before. As another example, an update that affects message protocols may have to ensure that the service has no outstanding requests to other services using that protocol. As yet another example, certain drivers may want to avoid being updated while certain types of DMA are ongoing, etcetera.
  
-It is up to the writer of the service to implement any such custom quiescence states, assigning a number to each of them. It is then up to the system administrator to supply such a state with the //service update// command, using the ''​-state <​number>''​ option. Some of the quiescence states are predefined; others must be defined by the service developer explicitly. The following states are defined:+It is up to the writer of the service to implement any such custom quiescence states, assigning a number to each of them. It is then up to the system administrator to supply such a state with the //minix-service update// command, using the ''​-state <​number>''​ option. Some of the quiescence states are predefined; others must be defined by the service developer explicitly. The following states are defined:
  
   * State **1** (''​SEF_LU_STATE_WORK_FREE''​):​ work free. This state ensures that the service is not currently performing any work. The fact that the service is being prepared at the time of verifying the quiescence state implies that it is not doing any other work, and thus, SEF is hardcoded to accept updates in this state. The service developer can not override the check for this state.   * State **1** (''​SEF_LU_STATE_WORK_FREE''​):​ work free. This state ensures that the service is not currently performing any work. The fact that the service is being prepared at the time of verifying the quiescence state implies that it is not doing any other work, and thus, SEF is hardcoded to accept updates in this state. The service developer can not override the check for this state.
Line 355: Line 355:
   sef_setcb_lu_state_isvalid(my_state_isvalid);​   sef_setcb_lu_state_isvalid(my_state_isvalid);​
  
-This routine has the signature ''​int my_state_isvalid(int state, int flags)'',​ and will be called when a live update is initiated through service(8). As its most important parameter, ''​state''​ is the requested quiescence state. The ''​flags''​ parameter contains update flags and is typically unused. The routine must return ''​TRUE''​ if the state is valid for the service, and ''​FALSE''​ otherwise. Most services will want to allow the standard states as well as any custom states:+This routine has the signature ''​int my_state_isvalid(int state, int flags)'',​ and will be called when a live update is initiated through ​minix-service(8). As its most important parameter, ''​state''​ is the requested quiescence state. The ''​flags''​ parameter contains update flags and is typically unused. The routine must return ''​TRUE''​ if the state is valid for the service, and ''​FALSE''​ otherwise. Most services will want to allow the standard states as well as any custom states:
  
   #define MY_CUSTOM_STATE_0 (SEF_LU_STATE_CUSTOM_BASE+0)   #define MY_CUSTOM_STATE_0 (SEF_LU_STATE_CUSTOM_BASE+0)
Line 366: Line 366:
   sef_setcb_lu_prepare(my_lu_prepare);​   sef_setcb_lu_prepare(my_lu_prepare);​
  
-This routine has the signature ''​int my_lu_prepare(int state)'',​ and will be called when a live update is initiated through service(8), after ensuring the given state is valid. Again, ''​state''​ is the requested quiescence state. The function must return ''​OK''​ if the live update can proceed in this state, and ''​ENOTREADY''​ otherwise. It should check the standard states and/or any custom states, typically in a switch statement.+This routine has the signature ''​int my_lu_prepare(int state)'',​ and will be called when a live update is initiated through ​minix-service(8), after ensuring the given state is valid. Again, ''​state''​ is the requested quiescence state. The function must return ''​OK''​ if the live update can proceed in this state, and ''​ENOTREADY''​ otherwise. It should check the standard states and/or any custom states, typically in a switch statement.
  
 Third, the service may optionally provide a quiescence state debugging function through the sef_setcb_lu_state_dump(3) SEF API call. The given callback routine has the signature ''​int my_lu_state_dump(int state)''​ and should use the sef_lu_dprint(3) printf-like function to print information about the given quiescence state and its current internal state as appropriate,​ using newline-terminated lines. Third, the service may optionally provide a quiescence state debugging function through the sef_setcb_lu_state_dump(3) SEF API call. The given callback routine has the signature ''​int my_lu_state_dump(int state)''​ and should use the sef_lu_dprint(3) printf-like function to print information about the given quiescence state and its current internal state as appropriate,​ using newline-terminated lines.
Line 402: Line 402:
 In general, properly achieving //​quiescence//​ is one of the main challenges for a live update system. For example, if a live update changes the implementation of a particular function, the component being updated must not be executing that function at the time of the live update - if it is, the live update will most likely result in a crash of the component. In MINIX3, the quiescence issue is resolved in a way that leaves little room for problems, by exploiting MINIX3'​s message-based nature. In essence, all the MINIX3 services consist of a main message loop that repeatedly receives a message and processes this message. MINIX3 supports no kernel threads, and thus, the MINIX3 services have no internal CPU-level concurrency. As a result, a message can be used to enforce quiescence. In general, properly achieving //​quiescence//​ is one of the main challenges for a live update system. For example, if a live update changes the implementation of a particular function, the component being updated must not be executing that function at the time of the live update - if it is, the live update will most likely result in a crash of the component. In MINIX3, the quiescence issue is resolved in a way that leaves little room for problems, by exploiting MINIX3'​s message-based nature. In essence, all the MINIX3 services consist of a main message loop that repeatedly receives a message and processes this message. MINIX3 supports no kernel threads, and thus, the MINIX3 services have no internal CPU-level concurrency. As a result, a message can be used to enforce quiescence.
  
-MINIX3 live updates are orchestrated by the RS (Reincarnation Server) service. The administrator of the system first compiles a new version of the service into an executable on disk, and then instructs RS to update a particular running system service into the new version, through the service(8) utility. RS starts by loading the new version of the service as a new service process, without letting it run. Thus, there are temporarily two instances of the service: the old instance, which is still running, and the new instance, which contains the new code but not yet any of the necessary state.+MINIX3 live updates are orchestrated by the RS (Reincarnation Server) service. The administrator of the system first compiles a new version of the service into an executable on disk, and then instructs RS to update a particular running system service into the new version, through the minix-service(8) utility. RS starts by loading the new version of the service as a new service process, without letting it run. Thus, there are temporarily two instances of the service: the old instance, which is still running, and the new instance, which contains the new code but not yet any of the necessary state.
  
 RS then asks the old instance of the service to prepare to be updated, by sending a __prepare__ request message to it. At the moment that the service receives and processes the preparation message, it is by definition in a known state, as it cannot also be doing something else at the same time. While this is a good start for quiescence, the service may have to meet additional requirements regarding its current activity, depending on the service and the type of live update. The administrator provides the intended //​quiescence state// for the live update when starting the update, and the service itself determines whether or not it is //ready// when handling the __prepare__ message. If the service decides that it does not meet the given quiescence requirements,​ the live update is aborted. RS then asks the old instance of the service to prepare to be updated, by sending a __prepare__ request message to it. At the moment that the service receives and processes the preparation message, it is by definition in a known state, as it cannot also be doing something else at the same time. While this is a good start for quiescence, the service may have to meet additional requirements regarding its current activity, depending on the service and the type of live update. The administrator provides the intended //​quiescence state// for the live update when starting the update, and the service itself determines whether or not it is //ready// when handling the __prepare__ message. If the service decides that it does not meet the given quiescence requirements,​ the live update is aborted.
Line 412: Line 412:
 This knowledge, in addition to full access to the memory of the old instance through a special memory grant, allows the libmagicrt state transfer procedure in the new instance to iterate over all data of the old process. This procedure recursively follows any pointers it encounters, and //pairs// each piece of data with the corresponding piece of data in the new process, copying over and adjusting (as necessary) the data for the new layout as necessary. In certain cases, the state transfer system may not be able to pair all pieces of data, or deal with all pointers. In that case, state transfer fails. Annotations in the service source code, as well as custom data transfer methods, can be provided in order to aid the state transfer process. This knowledge, in addition to full access to the memory of the old instance through a special memory grant, allows the libmagicrt state transfer procedure in the new instance to iterate over all data of the old process. This procedure recursively follows any pointers it encounters, and //pairs// each piece of data with the corresponding piece of data in the new process, copying over and adjusting (as necessary) the data for the new layout as necessary. In certain cases, the state transfer system may not be able to pair all pieces of data, or deal with all pointers. In that case, state transfer fails. Annotations in the service source code, as well as custom data transfer methods, can be provided in order to aid the state transfer process.
  
-Regardless of whether state transfer succeeded or failed, the new instance sends the result of the state transfer to RS using an __init__ request message. If state transfer succeeded, RS allows the new instance to continue to run, and kills the process of the old instance. If the state transfer fails, RS again swaps the process slots of the old and the new instance, allows the old instance to run again, and kills the new instance. In both cases, RS communicates the result to the service(8) utility as well, ultimately letting the system administrator know about the outcome of the live update.+Regardless of whether state transfer succeeded or failed, the new instance sends the result of the state transfer to RS using an __init__ request message. If state transfer succeeded, RS allows the new instance to continue to run, and kills the process of the old instance. If the state transfer fails, RS again swaps the process slots of the old and the new instance, allows the old instance to run again, and kills the new instance. In both cases, RS communicates the result to the minix-service(8) utility as well, ultimately letting the system administrator know about the outcome of the live update.
  
 For multicomponent live updates, all affected services are first brought into the //ready// state, after which they are all updated. Any service failing to get ready in the preparation phase will cause an abort of the entire update, and any service failing the state transfer phase causes a rollback of the entire update. For multicomponent live updates, all affected services are first brought into the //ready// state, after which they are all updated. Any service failing to get ready in the preparation phase will cause an abort of the entire update, and any service failing the state transfer phase causes a rollback of the entire update.
Line 793: Line 793:
 The case of userspace threads has shown that it may be not just useful, but actually //​necessary//​ for certain services to provide their own handlers for checking, entering, and leaving a custom state of quiescence. These services may crash if the default quiescence state is used for a live update instead of the custom state. The result is the requirement that not just users, but also scripts - the update_asr(8) script in particular - be aware of specific services requiring custom quiescence state. This is inconvenient and dangerous. The case of userspace threads has shown that it may be not just useful, but actually //​necessary//​ for certain services to provide their own handlers for checking, entering, and leaving a custom state of quiescence. These services may crash if the default quiescence state is used for a live update instead of the custom state. The result is the requirement that not just users, but also scripts - the update_asr(8) script in particular - be aware of specific services requiring custom quiescence state. This is inconvenient and dangerous.
  
-The default quiescence state is currently hardcoded in the service(8) utility, in the form of ''​DEFAULT_LU_STATE''​ in ''​minix/​commands/​service/​service.c''​. Instead, we believe that the service should be able to specify its own default quiescence state, possibly using an additional SEF API call. It is not yet clear whether RS would need to be aware of the alternative quiescence state. If not, the translation from a pseudo-state to the real state could take place entirely in the service'​s own SEF routines. Otherwise, the SEF may have to send the default state as extra data to RS at service initialization time.+The default quiescence state is currently hardcoded in the minix-service(8) utility, in the form of ''​DEFAULT_LU_STATE''​ in ''​minix/​commands/​minix-service/minix-service.c''​. Instead, we believe that the service should be able to specify its own default quiescence state, possibly using an additional SEF API call. It is not yet clear whether RS would need to be aware of the alternative quiescence state. If not, the translation from a pseudo-state to the real state could take place entirely in the service'​s own SEF routines. Otherwise, the SEF may have to send the default state as extra data to RS at service initialization time.
  
 === Policy redundancy === === Policy redundancy ===
Line 842: Line 842:
 === Testrelpol failure === === Testrelpol failure ===
  
-If the ''​testrelpol''​ script is run a number of times in a row, it will start to fail on the crash recovery tests for unclear reasons. We know that this is a test script failure rather than an actual failure. We suspect that it is caused by RS's default exponential backoff algorithm for crash recovery causing timeouts in //​testrelpol//​. If that is the case, it should be possible to change //​testrelpol//​ to disable the exponential backoff using existing service(8) flags.+If the ''​testrelpol''​ script is run a number of times in a row, it will start to fail on the crash recovery tests for unclear reasons. We know that this is a test script failure rather than an actual failure. We suspect that it is caused by RS's default exponential backoff algorithm for crash recovery causing timeouts in //​testrelpol//​. If that is the case, it should be possible to change //​testrelpol//​ to disable the exponential backoff using existing ​minix-service(8) flags.
  
 === Libmagicrt asserts === === Libmagicrt asserts ===
developersguide/liveupdate.1535974298.txt.gz · Last modified: 2018/09/03 13:31 by stux