|Age||Commit message (Collapse)||Author||Files||Lines|
sudo drops forwarding of signals sent by processes of the same
process group, which means by default will drop signals from
parent and children processes. By moving it to another group, we
will later be able to kill it.
Note: sudo documentation is wrong, since it states it only drops
signals from children.
See following link for more information:
This avoids extra unneeded logging about killing with signal when
actually no signal is being sent.
NetNSProcess are run in the following process tree:
osmo-gsm-tester -> sudo -> bash (osmo-gsm-tester_netns_exec.sh) ->
Lots of osmo-gsm-tester_netns_exec.sh scripts with tcpdump child process
were spotted in prod setup of osmo-gsm-tester. Apparently that happens
because sometimes tcpdump doesn't get killed in time with SIGTERM and
SIGINT, and as a result SIGKILL is sent by osmo-gsm-tester as usual
termination procedure. When SIGKILL is sent, the parent sudo process is
instantly killed without possibility to forward the signal to its
children, leaving the bash script and tcpdump alive.
In order to fix it, catch SIGKILL for this process class and send
instead SIGUSR1. Then, modify the script under sudo to handle SIGUSR1 as
if it was a SIGKILL towards its children to make sure child process in
the netns terminates.
This allows to easily differentiate different calls to kill in order to
terminate the process when looking at the logs.
Introduce a strategy to terminate processes and begin with an
implementation for parallel (that has no degree of parallelism
Since the modem iface and the GGSN iface are on the same host/netns,
it's really difficult to conveniently test data plane without getting
routing loops. As a result, either GGSN or modem iface must be moved to
a different namespace. The decision after a few discussions was finally
to move modem interfaces to a different netns.
* ofono is patched to avoid removing modem if it detects
through udev that its net iface was removed (due to for instance, net
iface being moved to another netns and thus not being reachable anymore
by systemd-udev process running in root netns).
* After ofono is started (and successfully configured all the modems and
detected its net ifaces through syfs/udev), script "modem-netns-setup.py
start" which creates a netns for each modem, naming it after its usb
path ID. net ifaces for that modem are moved into its netns.
* Modem is configured to use 802-3 data format, and as a result the net
iface is configured through DHCP (DHCP req only replied AFTER pdp ctx is
* Since osmo-gsm-tester knowns the modem USB path ID (available in
resources.conf), it can run required steps (ifup, DHCP) to configure the
interface. The interface name is provided by ofono to osmo-gsm-tester.
* As a result, any process willing to transmit data through the modem
must be in the modem netns.
First of all, it was found that vty allocation must be forced (-t -t)
during ssh session creation to make sure SIGHUP is forwarded when
session is closed.
Second, since osmo-trx ignores SIGHUP (osmo_init_ignore_signals()), we
must add a wrapper script which converts received SIGHUP into a SIGINT
to stop osmo-trx.
It can later on be used by other classes that need to run binaries in
After bug described in OS#3456 and fixed in last commit, let's
categorize and place variables in its correct plac to avoid similar
issus. We leave under the class keyword (class scoped variables) the
attributes which are to be used as static class attributes. All other
ones are initialized during __init__(). This way w avoid scenarios in
which while using an object from an instance attribute we end up reading
a class scoped variable which is shared among all instances.
Some tests may want to reproduce some scenarios in which it is expected
that a BTS process is stopped, for instance if the BSC link is dropped.
Provide a keepalive parameter to start() for bts and pcu objects to
inform suite that failures are expected and that it should keep them
alive in case that ocurrs by respawning the BTS process.
Take the chance to identify and drop modules importing event_loop but
not using it.
This is useful while debugging and trying to check events across other
outputs such as pcap files, process logs, etc.
With the recent fix of the junit report related issues, another issue arose:
the 'with log.Origin' was changed to disallow __enter__ing an object twice to
fix problems, now still code would fail because it tries to do 'with' on the
same object twice. The only reason is to ensure that logging is associated with
a given object. Instead of complicating even more, implement differently.
Refactor logging to simplify use: drop the 'with Origin' style completely, and
instead use the python stack to determine which objects are created by which,
and which object to associate a log statement with.
The new way: we rely on the convention that each class instance has a local
'self' referencing the object instance. If we need to find an origin as a new
object's parent, or to associate a log message with, we traverse each stack
frame, fetching the first local 'self' object that is a log.Origin class
How to use:
Simply call log.log() anywhere, and it finds an Origin object to log for, from
the stack. Alternatively call self.log() for any Origin() object to skip the
Create classes as child class of log.Origin and make sure to call
super().__init__(category, name). This constructor will magically find a parent
Origin on the stack.
When an exception happens, we first escalate the exception up through call
scopes to where ever it is handled by log.log_exn(). This then finds an Origin
object in the traceback's stack frames, no need to nest in 'with' scopes.
Hence the 'with log.Origin' now "happens implicitly", we can write pure natural
python code, no more hassles with scope ordering.
Furthermore, any frame can place additional logging information in a frame by
calling log.ctx(). This is automatically inserted in the ancestry associated
with a log statement / exception.
The "Affero" nature makes sense for the Osmocom network components like
BSC, SGSN, etc. as they are typically operated to provide a network
For testing, this doesn't make so much sense as it is difficult to
imagine people creating a business out of offering to run test cases on
an end-to-end Osmocom GSM network. So let's drop the 'Affero' here.
All code is so far developed by sysmocom staff, so as Managing Director
of sysmocom I can effect such a license change unilaterally.
The prompt() is useful for supervisor (user) interaction during tests.
However it had numerous problems:
- closed stdin, so second prompt() didn't work
- no editing
- no utf-8 multichar
- unflexible poll interval (poll often to stay responsive to input)
- stdin was hijacked by subprocess.Popen
Firstly pass stdin=PIPE to all subprocesses to leave the tester's stdin
Secondly use python input() to read the user entry (instead of mucking about
with the stdin fd), and import readline for history and editing features.
The old approach was put in place to allow polling DBus and processes
regularly. Instead, allow this by running input() in a separate thread while
polling regularly and slowly in the main thread.
The prompt code is now simpler, cleaner and works better.
Will be used in the upcoming 'debug' suite.
Use it to set root user for SysmoBTS, otherwise if osmo-gsm-tester is
run by another user it will fail to connect
I know that these commit messages aren't very good, but the code is not stable
yet, so I'm not bothering with details.
code bomb implementing the bulk of the osmo-gsm-tester
The original osmo-gsm-tester was an internal development at sysmocom, mostly by
D. Laszlo Sitzer <firstname.lastname@example.org>, of which this public osmo-gsm-tester
is a refactoring / rewrite.
This imports an early state of the refactoring and is not functional yet. Bits
from the earlier osmo-gsm-tester will be added as needed. The earlier commit
history is not imported.