Quantcast
Channel: Linaro Blog – Linaro
Viewing all 179 articles
Browse latest View live

Linaro Connect Budapest (BUD17) Registration has launched!

$
0
0

We are pleased to announce that registration for Linaro Connect Budapest (BUD17) has opened!  Linaro Connect has become the event to attend if you are interested in Linux development and related ecosystems on ARM, bringing together engineers and industry experts to discuss, learn, network and push forward new technologies. The event will begin on Monday 6 March at 8.30am with a Welcome keynote by Linaro CEO George Grey and finish on Friday 10 March at 2pm with Demo Friday.

BUD17 will take place at the Corinthia Hotel. The hotel is in the very heart of Budapest, within walking distance of many of the the must-see sights – including the Danube river, Basilica of St Stephen, the Royal Palace, Parliament and Great Synagogue to name but a few. The landmark hotel is renowned for its beautiful decor and hosts three restaurants, two bars, a spa and fitness centre.

Registration

The cost for the full week pass is $2,500. A single day pass will be $900. The registration fee includes access to all keynotes and technical sessions, as well as breakfast, lunch, coffee breaks and any socials relating to the date you have purchased a ticket for. If you purchase the pass for the full week, you will receive access to the full week’s agenda.

We offer an Early Bird Discount to all those who register before Friday 6 January 2017. This brings the cost for the full week down to $2,000.

If you are a Keynote Speaker, Sponsor, Linaro Employee, Assignee or Member, please contact connect@linaro.org to obtain the appropriate link and discount code to register. For all others, please click on the button below to register:

Register here

Accommodation

Linaro has negotiated a discounted room rate at the Corinthia Hotel – €125 for a single room and €145 for a double room (both rates include breakfast and WiFi). Rooms are available on a first come, first served basis so make sure to book accommodation as soon as possible. You can access the link to book accommodation here.

The cut-off date for booking a room at the Corinthia Hotel using Linaro’s discounted rate is Friday 10 February 2017.

Travel – Visas and Business Invitation Letters

If you plan to attend and require a visa for Hungary, please request an invitation letter by emailing connect@linaro.org. We aim to send all invitation letters within two weeks of receiving the request, please contact connect@linaro.org if you do not receive it within four weeks after sending the request. Please aim to send requests before Friday 6 January 2017, otherwise we cannot guarantee fulfilling requests in time for your visa application process.

We will continue updating the website with topics, keynote speakers and much more so keep checking in to stay updated.

We hope to see you in Budapest!


Linaro 16.12 Release Available for Download

$
0
0
“I retain the right to change my mind, as always. Le Linus e mobile.”  — Linus Torvalds
Linaro 16.12  release is now available for download.  See the detailed highlights of this release to get an overview of what has been accomplished by the Working Groups, Landing Teams and Platform Teams. We encourage everybody to use the 16.12 release.  To sign-up for the release mailing list go here:  https://lists.linaro.org/mailman/listinfo/linaro-release 

This post includes links to more information and instructions for using the images. The download links for all images and components are available on our downloads page:

USING THE ANDROID-BASED IMAGES

The Android-based images come in three parts: system, userdata and boot. These need to be combined to form a complete Android install. For an explanation of how to do this please see:

If you are interested in getting the source and building these images yourself please see the following pages:

USING THE OPEN EMBEDDED-BASED IMAGES

With the Linaro provided downloads and with ARM’s Fast Models virtual platform, you may boot a virtual ARMv8 system and run 64-bit binaries.  For more information please see:

USING THE DEBIAN-BASED IMAGES

The Debian-based images consist of two parts. The first part is a hardware pack, which can be found under the hwpacks directory and contains hardware specific packages (such as the kernel and bootloader).  The second part is the rootfs, which is combined with the hardware pack to create a complete image. For more information on how to create an image please see:

GETTING INVOLVED

More information on Linaro can be found on our websites:

Also subscribe to the important Linaro mailing lists and join our IRC channels to stay on top of Linaro developments:

KNOWN ISSUES WITH THIS RELEASE

  • Bug reports for this release should be filed in Bugzilla (http://bugs.linaro.org) against the individual packages or projects that are affected.

UPCOMING LINARO CONNECT EVENTS: LINARO CONNECT BUDAPEST 2017

Linaro Connect Budapest will be held March 6-10, 2017.  More information on this event can be found at: http://www.linaro.org/blog/bud17-registration-launched/

 

Kprobes Event Tracing on ARMv8

$
0
0

core-dump

Introduction

Kprobes is a kernel feature that allows instrumenting the kernel by setting arbitrary breakpoints that call out to developer-supplied routines before and after the breakpointed instruction is executed (or simulated). See the kprobes documentation[1] for more information. Basic kprobes functionality is selected with CONFIG_KPROBES. Kprobes support was added to mainline for arm64 in the v4.8 release.

In this article we describe the use of kprobes on arm64 using the debugfs event tracing interfaces from the command line to collect dynamic trace events. This feature has been available for some time on several architectures (including arm32), and is now available on arm64. The feature allows use of kprobes without having to write any code.

Types of Probes

The kprobes subsystem provides three different types of dynamic probes described below.

Kprobes

The basic probe is a software breakpoint kprobes inserts in place of the instruction you are probing, saving the original instruction for eventual single-stepping (or simulation) when the probe point is hit.

Kretprobes

Kretprobes is a part of kprobes that allows intercepting a returning function instead of having to set a probe (or possibly several probes) at the return points. This feature is selected whenever kprobes is selected, for supported architectures (including ARMv8).

Jprobes

Jprobes allows intercepting a call into a function by supplying an intermediary function with the same calling signature, which will be called first. Jprobes is a programming interface only and cannot be used through the debugfs event tracing subsystem. As such we will not be discussing jprobes further here. Consult the kprobes documentation if you wish to use jprobes.

Invoking Kprobes

Kprobes provides a set of APIs which can be called from kernel code to set up probe points and register functions to be called when probe points are hit. Kprobes is also accessible without adding code to the kernel, by writing to specific event tracing debugfs files to set the probe address and information to be recorded in the trace log when the probe is hit. The latter is the focus of what this document will be talking about. Lastly kprobes can be accessed through the perf command.

Kprobes API

The kernel developer can write functions in the kernel (often done in a dedicated debug module) to set probe points and take whatever action is desired right before and right after the probed instruction is executed. This is well documented in kprobes.txt.

Event Tracing

The event tracing subsystem has its own documentation[2] which might be worth a read to understand the background of event tracing in general. The event tracing subsystem serves as a foundation for both tracepoints and kprobes event tracing. The event tracing documentation focuses on tracepoints, so bear that in mind when consulting that documentation. Kprobes differs from tracepoints in that there is no predefined list of tracepoints but instead arbitrary dynamically created probe points that trigger the collection of trace event information. The event tracing subsystem is controlled and monitored through a set of debugfs files. Event tracing (CONFIG_EVENT_TRACING) will be selected automatically when needed by something like the kprobe event tracing subsystem.

Kprobes Events

With the kprobes event tracing subsystem the user can specify information to be reported at arbitrary breakpoints in the kernel, determined simply by specifying the address of any existing probeable instruction along with formatting information. When that breakpoint is encountered during execution kprobes passes the requested information to the common parts of the event tracing subsystem which formats and appends the data to the trace log, much like how tracepoints works. Kprobes uses a similar but mostly separate collection of debugfs files to control and display trace event information. This feature is selected with CONFIG_KPROBE_EVENT. The kprobetrace documentation[3] provides the essential information on how to use kprobes event tracing and should be consulted to understand details about the examples presented below.

Kprobes and Perf

The perf tools provide another command line interface to kprobes. In particular “perf probe” allows probe points to be specified by source file and line number, in addition to function name plus offset, and address. The perf interface is really a wrapper for using the debugfs interface for kprobes.

Arm64 Kprobes

All of the above aspects of kprobes are now implemented for arm64, in practice there are some differences from other architectures though:

  • Register name arguments are, of course, architecture specific and can be found in the ARM ARM.

  • Not all instruction types can currently be probed. Currently unprobeable instructions include mrs/msr (except DAIF read), exception generation instructions, eret, and hint (except for the nop variant). In these cases it is simplest to just probe a nearby instruction instead. These instructions are blacklisted from probing because the changes they cause to processor state are unsafe to do during kprobe single-stepping or instruction simulation, because the single-stepping context kprobes constructs is inconsistent with what the instruction needs, or because the instruction can’t tolerate the additional processing time and exception handling in kprobes (ldx/stx).
  • An attempt is made to identify instructions within a ldx/stx sequence and prevent probing, however it is theoretically possible for this check to fail resulting in allowing a probed atomic sequence which can never succeed. Be careful when probing around atomic code sequences.
  • Note that because of the details of Linux ARM64 calling conventions it is not possible to reliably duplicate the stack frame for the probed function and for that reason no attempt is made to do so with jprobes, unlike the majority of other architectures supporting jprobes. The reason for this is that there is insufficient information for the callee to know for certain the amount of the stack that is needed.

  • Note that the stack pointer information recorded from a probe will reflect the particular stack pointer in use at the time the probe was hit, be it the kernel stack pointer or the interrupt stack pointer.
  • There is a list of kernel functions which cannot be probed, usually because they are called as part of kprobes processing. Part of this list is architecture-specific and also includes things like exception entry code.

Using Kprobes Event Tracing

One common use case for kprobes is instrumenting function entry and/or exit. It is particularly easy to install probes for this since one can just use the function name for the probe address. Kprobes event tracing will look up the symbol name and determine the address. The ARMv8 calling standard defines where the function arguments and return values can be found, and these can be printed out as part of the kprobe event processing.

Example: Function entry probing

Instrumenting a USB ethernet driver reset function:

$ pwd
/sys/kernel/debug/tracing
$ cat > kprobe_events <<EOF
p ax88772_reset %x0
EOF
$ echo 1 > events/kprobes/enable

At this point a trace event will be recorded every time the driver’s ax8872_reset() function is called. The event will display the pointer to the usbnet structure passed in via X0 (as per the ARMv8 calling standard) as this function’s only argument. After plugging in a USB dongle requiring this ethernet driver we see the following trace information:

$ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 1/1   #P:8
#
#                           _—–=> irqs-off
#                          / _—-=> need-resched
#                         | / _—=> hardirq/softirq
#                         || / _–=> preempt-depth
#                         ||| / delay
#        TASK-PID   CPU#  |||| TIMESTAMP  FUNCTION
#           | |    |   ||||    |      |
kworker/0:0-4             [000] d… 10972.102939:   p_ax88772_reset_0:
(ax88772_reset+0x0/0x230)   arg1=0xffff800064824c80

Here we can see the value of the pointer argument passed in to our probed function. Since we did not use the optional labelling features of kprobes event tracing the information we requested is automatically labeled arg1.  Note that this refers to the first value in the list of values we requested that kprobes log for this probe, not the actual position of the argument to the function. In this case it also just happens to be the first argument to the function we’ve probed.

Example: Function entry and return probing

The kretprobe feature is used specifically to probe a function return. At function entry the kprobes subsystem will be called and will set up a hook to be called at function return, where it will record the requested event information. For the most common case the return information, typically in the X0 register, is quite useful. The return value in %x0 can also be referred to as $retval. The following example also demonstrates how to provide a human-readable label to be displayed with the information of interest.

Example of instrumenting the kernel _do_fork() function to record arguments and results using a kprobe and a kretprobe:

$ cd /sys/kernel/debug/tracing
$ cat > kprobe_events <<EOF
p _do_fork %x0 %x1 %x2 %x3 %x4 %x5
r _do_fork pid=%x0
EOF
$ echo 1 > events/kprobes/enable

At this point every call to _do_fork() will produce two kprobe events recorded into the “trace” file, one reporting the calling argument values and one reporting the return value. The return value shall be labeled “pid” in the trace file. Here are the contents of the trace file after three fork syscalls have been made:

$ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 6/6   #P:8
#
#                              _—–=> irqs-off
#                             / _—-=> need-resched
#                            | / _—=> hardirq/softirq
#                            || / _–=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
              bash-1671  [001] d…   204.946007: p__do_fork_0: (_do_fork+0x0/0x3e4) arg1=0x1200011 arg2=0x0 arg3=0x0 arg4=0x0 arg5=0xffff78b690d0 arg6=0x0
              bash-1671  [001] d..1   204.946391: r__do_fork_0: (SyS_clone+0x18/0x20 <- _do_fork) pid=0x724
              bash-1671  [001] d…   208.845749: p__do_fork_0: (_do_fork+0x0/0x3e4) arg1=0x1200011 arg2=0x0 arg3=0x0 arg4=0x0 arg5=0xffff78b690d0 arg6=0x0
              bash-1671  [001] d..1   208.846127: r__do_fork_0: (SyS_clone+0x18/0x20 <- _do_fork) pid=0x725
              bash-1671  [001] d…   214.401604: p__do_fork_0: (_do_fork+0x0/0x3e4) arg1=0x1200011 arg2=0x0 arg3=0x0 arg4=0x0 arg5=0xffff78b690d0 arg6=0x0
              bash-1671  [001] d..1   214.401975: r__do_fork_0: (SyS_clone+0x18/0x20 <- _do_fork) pid=0x726

Example: Dereferencing pointer arguments

For pointer values the kprobe event processing subsystem also allows dereferencing and printing of desired memory contents, for various base data types. It is necessary to manually calculate the offset into structures in order to display a desired field.

Instrumenting the do_wait() function:

$ cat > kprobe_events <<EOF
p:wait_p do_wait wo_type=+0(%x0):u32 wo_flags=+4(%x0):u32
r:wait_r do_wait $retval
EOF
$ echo 1 > events/kprobes/enable

Note that the argument labels used in the first probe are optional and can be used to more clearly identify the information recorded in the trace log. The signed offset and parentheses indicate that the register argument is a pointer to memory contents to be recorded in the trace log. The “:u32” indicates that the memory location contains an unsigned four-byte wide datum (an enum and an int in a locally defined structure in this case).

The probe labels (after the colon) are optional and will be used to identify the probe in the log. The label must be unique for each probe. If unspecified a useful label will be automatically generated from a nearby symbol name, as has been shown in earlier examples.

Also note the “$retval” argument could just be specified as “%x0“.

Here are the contents of the “trace” file after two fork syscalls have been made:

$ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 4/4   #P:8
#
#                              _—–=> irqs-off
#                             / _—-=> need-resched
#                            | / _—=> hardirq/softirq
#                            || / _–=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
             bash-1702  [001] d…   175.342074: wait_p: (do_wait+0x0/0x260) wo_type=0x3 wo_flags=0xe
             bash-1702  [002] d..1   175.347236: wait_r: (SyS_wait4+0x74/0xe4 <- do_wait) arg1=0x757
             bash-1702  [002] d…   175.347337: wait_p: (do_wait+0x0/0x260) wo_type=0x3 wo_flags=0xf
             bash-1702  [002] d..1   175.347349: wait_r: (SyS_wait4+0x74/0xe4 <- do_wait) arg1=0xfffffffffffffff6

Example: Probing arbitrary instruction addresses

In previous examples we have inserted probes for function entry and exit, however it is possible to probe an arbitrary instruction (with a few exceptions). If we are placing a probe inside a C function the first step is to look at the assembler version of the code to identify where we want to place the probe. One way to do this is to use gdb on the vmlinux file and display the instructions in the function where you wish to place the probe. An example of doing this for the module_alloc function in arch/arm64/kernel/modules.c follows. In this case, because gdb seems to prefer using the weak symbol definition and it’s associated stub code for this function, we get the symbol value from System.map instead:

$ grep module_alloc System.map
ffff2000080951c4 T module_alloc
ffff200008297770 T kasan_module_alloc

In this example we’re using cross-development tools and we invoke gdb on our host system to examine the instructions comprising our function of interest:

$ ${CROSS_COMPILE}gdb vmlinux
(gdb) x/30i 0xffff2000080951c4
        0xffff2000080951c4 <module_alloc>:    sub    sp, sp, #0x30
        0xffff2000080951c8 <module_alloc+4>:    adrp    x3, 0xffff200008d70000
        0xffff2000080951cc <module_alloc+8>:    add    x3, x3, #0x0
        0xffff2000080951d0 <module_alloc+12>:    mov    x5, #0x713             // #1811
        0xffff2000080951d4 <module_alloc+16>:    mov    w4, #0xc0              // #192
        0xffff2000080951d8 <module_alloc+20>:
              mov    x2, #0xfffffffff8000000    // #-134217728
        0xffff2000080951dc <module_alloc+24>:    stp    x29, x30, [sp,#16]         0xffff2000080951e0 <module_alloc+28>:    add    x29, sp, #0x10
        0xffff2000080951e4 <module_alloc+32>:    movk    x5, #0xc8, lsl #48
        0xffff2000080951e8 <module_alloc+36>:    movk    w4, #0x240, lsl #16
        0xffff2000080951ec <module_alloc+40>:    str    x30, [sp]         0xffff2000080951f0 <module_alloc+44>:    mov    w7, #0xffffffff        // #-1
        0xffff2000080951f4 <module_alloc+48>:    mov    x6, #0x0               // #0
        0xffff2000080951f8 <module_alloc+52>:    add    x2, x3, x2
        0xffff2000080951fc <module_alloc+56>:    mov    x1, #0x8000            // #32768
        0xffff200008095200 <module_alloc+60>:    stp    x19, x20, [sp,#32]         0xffff200008095204 <module_alloc+64>:    mov    x20, x0
        0xffff200008095208 <module_alloc+68>:    bl    0xffff2000082737a8 <__vmalloc_node_range>
        0xffff20000809520c <module_alloc+72>:    mov    x19, x0
        0xffff200008095210 <module_alloc+76>:    cbz    x0, 0xffff200008095234 <module_alloc+112>
        0xffff200008095214 <module_alloc+80>:    mov    x1, x20
        0xffff200008095218 <module_alloc+84>:    bl    0xffff200008297770 <kasan_module_alloc>
        0xffff20000809521c <module_alloc+88>:    tbnz    w0, #31, 0xffff20000809524c <module_alloc+136>
        0xffff200008095220 <module_alloc+92>:    mov    sp, x29
        0xffff200008095224 <module_alloc+96>:    mov    x0, x19
        0xffff200008095228 <module_alloc+100>:    ldp    x19, x20, [sp,#16]         0xffff20000809522c <module_alloc+104>:    ldp    x29, x30, [sp],#32
        0xffff200008095230 <module_alloc+108>:    ret
        0xffff200008095234 <module_alloc+112>:    mov    sp, x29
        0xffff200008095238 <module_alloc+116>:    mov    x19, #0x0               // #0

In this case we are going to display the result from the following source line in this function:

p = __vmalloc_node_range(size, MODULE_ALIGN, VMALLOC_START,
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
NUMA_NO_NODE, __builtin_return_address(0));

…and also the return value from the function call in this line:

if (p && (kasan_module_alloc(p, size) < 0)) {

We can identify these in the assembler code from the call to the external functions. To display these values we will place probes at 0xffff20000809520c and 0xffff20000809521c on our target system:

$ cat > kprobe_events <<EOF
p 0xffff20000809520c %x0
p 0xffff20000809521c %x0
EOF
$ echo 1 > events/kprobes/enable

Now after plugging an ethernet adapter dongle into the USB port we see the following written into the trace log:

$ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 12/12   #P:8
#
#                           _—–=> irqs-off
#                          / _—-=> need-resched
#                         | / _—=> hardirq/softirq
#                         || / _–=> preempt-depth
#                         ||| / delay
#        TASK-PID   CPU#  |||| TIMESTAMP  FUNCTION
#           | |    |   ||||    |      |
      systemd-udevd-2082  [000] d… 77.200991: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff200001188000
      systemd-udevd-2082  [000] d… 77.201059: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
      systemd-udevd-2082  [000] d… 77.201115: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff200001198000
      systemd-udevd-2082  [000] d… 77.201157: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
      systemd-udevd-2082  [000] d… 77.227456: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff2000011a0000
      systemd-udevd-2082  [000] d… 77.227522: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
      systemd-udevd-2082  [000] d… 77.227579: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff2000011b0000
      systemd-udevd-2082  [000] d… 77.227635: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
      modprobe-2097  [002] d… 78.030643: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff2000011b8000
      modprobe-2097  [002] d… 78.030761: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0
      modprobe-2097  [002] d… 78.031132: p_0xffff20000809520c: (module_alloc+0x48/0x98) arg1=0xffff200001270000
      modprobe-2097  [002] d… 78.031187: p_0xffff20000809521c: (module_alloc+0x58/0x98) arg1=0x0

One more feature of the kprobes event system is recording of statistics information, which can be found in kprobe_profile.  After the above trace the contents of that file are:

$ cat kprobe_profile
 p_0xffff20000809520c                                    6            0
p_0xffff20000809521c                                    6            0

This indicates that there have been a total of 8 hits each of the two breakpoints we set, which of course is consistent with the trace log data.  More kprobe_profile features are described in the kprobetrace documentation.

There is also the ability to further filter kprobes events.  The debugfs files used to control this are listed in the kprobetrace documentation while the details of their contents are (mostly) described in the trace events documentation.

Conclusion

Linux on ARMv8 now is on parity with other architectures supporting the kprobes feature. Work is being done by others to also add uprobes and systemtap support. These features/tools and other already completed features (e.g.: perf, coresight) allow the Linux ARMv8 user to debug and test performance as they would on other, older architectures.


Bibliography

[1] Jim Keniston, Prasanna S. Panchamukhi, Masami Hiramatsu. “Kernel Probes (Kprobes).” GitHub. GitHub, Inc., 15 Aug. 2016. Web. 13 Dec. 2016.

[2] Ts’o, Theodore, Li Zefan, and Tom Zanussi. “Event Tracing.” GitHub. GitHub, Inc., 3 Mar. 2016. Web. 13 Dec. 2016.

[3] Hiramatsu, Masami. “Kprobe-based Event Tracing.” GitHub. GitHub, Inc., 18 Aug. 2016. Web. 13 Dec. 2016.

 

Ensuring Bootable ARM VM Images

$
0
0

core-dump

A while back, during Linaro Connect 2013, Riku Voipio (Linaro) asked a simple but important question: “When you guys are done building hypervisors that work on ARM, how do we actually make sure that a user can run something on there?”. Of course, he didn’t have in mind that users could manually build a kernel with the required options, remember a long and complicated QEMU command line, bootstrap their own root file systems, pray to the KVM gods, and hope to get a system running. Instead he was thinking of the general problem of how distribution vendors could package a cloud image that was known to work across multiple versions of multiple different ARM hypervisor implementations.

To address this problem, we started writing the VM System Specification for ARM Processors v2. There are two versions of this spec, the first one was written completely in the open with input and discussions from the community, very much similar to the Linux kernel upstream contribution process, and took a pragmatic approach to ensure that people could realistically build hypervisors and images that conformed to the spec at the time of writing. Remember, this was before ACPI support was merged upstream for 64-bit ARM, for example. Recently we slightly updated the specification in collaboration with ARM to ensure a better alignment with the SBSA and SBBR standards and to require support for ACPI from the hypervisors as well as unifying requirements across Xen and KVM.

The VM spec, as it is usually referred to, defines such things as how to format a guest VM image using GPT including an EFI system partition, where to place the bootable EFI application, how to ensure there’s a working UART to give users a console, how to describe the hardware from the hypervisor to the VM, and which peripherals must be supported. For example, the spec requires the presence of a hot-pluggable bus like the Xen PV bus or an emulated PCIe instance, so that storage volumes can be hotplugged as needed in cloud installations.

But we wanted to go beyond just publishing a spec and convincing ourselves that the hypervisors we work on, KVM and Xen, are actually compliant with the spec. And we wanted to have a method to make sure we don’t happen to break the compliance in enthusiastic future attempts to improve or expand the feature set of the hypervisors. What we need is an automated verification tool that serves both distribution vendors and hypervisor developers.

We present vmspec-tools: https://github.com/Linaro/vmspec-tools

vmspec-tools is a test suite which can verify both images and hypervisor. One key idea behind the test suite is that verifying random images which themselves may not be spec compliant is not productive and may even be misleading. Therefore, the first thing the test suite does it to do a static analysis of the VM image and ensure it has the proper partition and file system layout. The test suite can also be run inside a VM instance which verifies that the hypervisor provides the required firmware interface implementation, hardware description data, UART for console output, that the UEFI RTC runtime service is supported, and that persistent variable storage for UEFI works across reboots of the VM.

The tool also supports a simple command, vmspec-boot, which downloads a known working reference image, verifies the image, boots the image, and runs the verification from within the VM. This automated procedure is based on the cloud-init initialization system, which is supported by some distributions, but does not work on other custom distros. We hope that if distro vendors find this tool useful and start using it, that they will add support to the tool to automatically verify their entire image from a single movecommand line executable. Until then, we can manually work with other images by manually downloading and running an image, and manually running the vmspec-verify tool inside the VM.

We encourage anyone actively engaged in developing cloud VM images for 64-bit ARM or working on ARM hypervisors to try using this tool, and contribute to it as needed.

The test suite was written by Riku Voipio with reviews and small contributions from Alex Bennée and Christoffer Dall.

Linaro #1 contributor to the Linux kernel 4.9 release

$
0
0

Linaro has been a long-standing top 5 company contributor to Linux kernel development.
With the release of Linux kernel 4.9, Linaro has for the first time made the top position, measured by number of changesets. Linaro was the most active employer with 1,876 changesets, due mainly to the integration of Greybus into Linux. Greybus was developed as part of Google ATAP’s Project Ara modular phone effort. The top three contributors for this release — John Hovold, Viresh Kumar and Alex Elder — worked in Linaro on the project with Greg Kroah-Hartman.

Greybus is a framework that allows the main processor on a portable device (i.e., a phone or tablet) to communicate with removable modules. It allows protocols to be defined that use a common remote procedure call mechanism to communicate with and control functionality on a module. Modules may be added to or removed from a running system, and Greybus defines how new modules are recognized and configured for use, and allows them to be gracefully removed from the system at any time.

Modules are envisioned to provide virtually unlimited capabilities–speakers, cameras, flash storage, displays, automobile remotes, and other functions not yet imagined. The Greybus architecture provides a way for additional features to be added to a phone long after it has been purchased (or even designed). Greybus is built as an application layer on the MIPI UniPro stack, but its basic constructs are generic enough that it could be layered on other transports as well.

Linaro CEO George Grey said “Linux is a truly collaborative project. While we are proud to have achieved the top contributor position for the first time, working in the kernel and other open source projects is a key part of our mission, and we are very pleased to be contributing at any level. We are excited that, despite the closing of Project Ara, this work has been merged upstream – we believe that it will be used as the model for future modular products based on Linux, and we look forward to seeing products utilizing this code for new solutions in the future.”

16.12 release for Linaro Enterprise Reference Platform is now available

$
0
0

The Linaro Enterprise Group (LEG) is pleased to announce the 16.12 release for the Linaro Enterprise Reference Platform. To find out more, visit platforms.linaro.org or click here to download the release.

The goal of the Linaro Enterprise Reference Platform is to provide a product quality, end to end, documented, open source platform for ARM based Enterprise servers. The Reference Platform includes boot firmware, kernel, a choice of userspace distributions and additional relevant open source projects. The Linaro Enterprise Reference Platform is built and tested on 96Boards RP-Certified hardware and the Linaro Developer Cloud. It is intended to be a reference example for use as a foundation for products based on open source technologies.

The Linaro Enterprise Group has worked closely with Linaro’s Core Technology & Tools teams to deliver the Linaro Enterprise Reference Platform with updates across the software stack (Firmware, Linux Kernel, and key server workloads) for ARM based Enterprise servers and a focus on QA testing and platform interoperability. OpenStack reference architecture is now available with ansible playbooks, allowing users to deploy an end to end Openstack reference on ARM servers. BigTop 1.1 has also been upgraded with OpenJDK 8, Spark 2.0 (from 1.6) and Hive 2.1 (from 1.2), all tested with Hadoop 2.7.2. You can review the test plan for the Linaro Enterprise Reference Platform 16.12 here.

Below is the complete list of 16.12 features:

Reference Platform Kernel

4.9 based, including under-review topic branches to extend hardware platform support
Unified tree, used by both the CentOS and Debian Reference Platforms
ACPI and PCIe support
Single kernel config and binary (package) for all hardware platforms

UEFI
Tianocore EDK II and OpenPlatformPkg containing reference implementations for Huawei D03/D05 and AMD Overdrive

16.12 with Debian based installer and userspace
Network Installer based on Debian 8.6 “Jessie”
Unified Reference Platform Kernel based on 4.9

16.12 with CentOS based installer and userspace
Network Installer based on CentOS 7.2.1603
Unified Reference Platform Kernel based on 4.9

Enterprise Components
Docker 1.10.3
OpenStack Newton
Ceph 10.2.3
Spark 2.0
Hadoop 2.7.2
OpenJDK 8
QEMU 2.7

Supported Hardware Platforms
AMD Overdrive
HiSilicon D03
HiSilicon D05
AppliedMicro X-Gene X-C1 (Mustang)
HP Proliant m400
Qualcomm QDF2432 Software Development Platform (SDP)
Cavium ThunderX

To find out more about the Linaro Enterprise Reference Platform, go to platforms.linaro.org.

Accelerated AES for the ARM64 Linux kernel

$
0
0

core-dump

The ARMv8 architecture extends the AArch64 and AArch32 instruction sets with dedicated instructions for AES encryption, SHA-1 and SHA-256 cryptographic hashing, and 64×64 to 128 polynomial multiplication, and implementations of the various algorithms that use these instructions have been added to the ARM and arm64 ports of the Linux kernel over the past couple of years. Given that my main focus is on enterprise class systems, which typically use high end SoCs, I have never felt the urge to spend too much time on accelerated implementations for systems that lack these optional instructions (although I did contribute a plain NEON version of AES in ECB/CBC/CTR/XTS modes back in 2013). Until recently, that is, when I received a Raspberry Pi 3 from my esteemed colleague Joakim Bech, the tech lead of the Linaro Security Working Group. This system is built around a Broadcom SoC containing 4 Cortex-A53 cores that lack the ARMv8 Crypto Extensions, and as it turns out, its AES performance was dreadful.

AES primer

The American Encryption Standard (AES) is a variant of the Rijndael cipher with a fixed block size of 16 bytes, and supports key sizes of 16, 24 and 32 bytes, referred to as AES-128, AES-192 and AES-256, respectively. It consists of a sequence of rounds (10, 12, or 14 for the respective key sizes) that operate on a state that can be expressed in matrix notation as follows:

where each element represents one byte, in column major order (i.e., the elements are assigned from the input in the order a0, a1, a2, a3, b0, b1, etc)

Each round consists of a sequence of operations performed on the state, called AddRoundKey, SubBytes, ShiftRows and MixColumns. All rounds are identical, except for the last one, which omits the MixColumns operation, and performs a final AddRoundKey operation instead.

AddRoundKey

AES defines a key schedule generation algorithm, which turns the input key into a key schedule consisting of 11, 13 or 15 round keys (depending on key size), of 16 bytes each. The AddRoundKey operation simply xor’s the round key of the current round with the AES state, i.e.,

where rkN refers to byte N of the round key of the current round.

SubBytes

The SubBytes operation is a byte wise substitution, using one of two S-boxes defined by AES, one for encryption and one for decryption. It simply maps each possible 8-bit value onto another 8-bit value, like below

ShiftRows

The ShiftRows operation is a transposition step, where all rows of the state except the first one are shifted left or right (for encryption or decryption, respectively), by 1, 2 or 3 positions (depending on the row). For encryption, it looks like this:

MixColumns

The MixColumns operation is also essentially a transposition step, but in a somewhat more complicated manner. It involves the following matrix multiplication, which is carried out in GF(2^8) using the characteristic polynomial 0x11b. (An excellent treatment of Galois fields can be found here.)

Table based AES

The MixColumns operation is computationally costly when executed sequentially, so it is typically implemented using lookup tables when coded in C. This turns the operation from a transposition into a substitution, which means it can be merged with the SubBytes operation. Even the ShiftRows operation can be folded in as well, resulting in the following transformation:

The generic AES implementation in the Linux kernel implements this by using 4 lookup tables of 256 32-bit words each, where each of those tables corresponds with a column in the matrix on the left, and each element N contains the product of that column with the vector { sub(N) }. (A separate set of 4 lookup tables based on the identity matrix is used in the last round, since it omits the MixColumns operation.)

The combined SubBytes/ShiftRows/MixColumns encryption operation can now be summarized as

where tbIN refers to each of the lookup tables, (+) refers to exclusive-or, and the AES state columns are represented using 32-bit words.

Note that lookup table based AES is sensitive to cache timing attacks, due to the fact that the memory access pattern during the first round is strongly correlated with the key xor’ed with the plaintext, allowing an attacker to discover key bits if it can observe the cache latencies of the memory accesses.

Please refer to this link for more information about the AES algorithm.

Scalar AES for arm64

The first observation one can make when looking at the structure of the lookup tables is that the 4 tables are identical under rotation of each element by a constant. Since rotations are cheap on arm64, it makes sense to use only a single table, and derive the other values by rotation. Note that this does not reduce the number of lookups performed, but it does reduce the D-cache footprint by 75%.

So for the v4.11 release of the Linux kernel, a scalar implementation of AES has been queued for arm64 that uses just 4 of the the 16 lookup tables from the generic driver. On the Raspberry Pi 3, this code manages 31.8 cycles per byte (down from 34.5 cycles per byte for the generic code). However, this is still a far cry from the 12.9 cycles per byte measured on Cortex-A57 (down from 18.0 cycles per byte), so perhaps we can do better using the NEON. (Note that the dedicated AES instructions manage 0.9 cycles per byte on recent Cortex-A57 versions.)

Accelerated AES using the NEON

The AArch64 version of the NEON instruction set has one huge advantage over other SIMD implementations: it has 32 registers, each 128 bits wide. (Other SIMD ISAs typically have 16 such registers). This means we can load the entire AES S-box (256 bytes) into 16 SIMD registers, and still have plenty of registers left to perform the actual computation, where the tbl/tbx NEON instructions can be used to perform the S-box substitutions on all bytes of the AES state in parallel.

This does imply that we will not be able to implement the MixColumns operation using table lookups, and instead, we will need to perform the matrix multiplication in GF(2^8) explicitly. Fortunately, this is not as complicated as it sounds: with some shifting, masking and xor’ing, and using a table lookup (using a permute vector in v14) to perform the 32-bit rotation, we can perform the entire matrix multiplication in 9 NEON instructions. The SubBytes operation takes another 8 instructions, since we need to split the 256 byte S-box lookup into 4 separate tbl/tbx instructions. This gives us the following sequence for a single inner round of encryption, where the input AES state is in register v0. (See below for a breakdown of the MixColumns transformation)

Looking at the instruction count, one would expect the performance of this algorithm to be around 15 cycles per byte when interleaved 2x or 4x (i.e., the code above, but operating on 2 or 4 AES states in parallel, to eliminate data dependencies between adjacent instructions). However, on the Raspberry Pi 3, this code manages only 22.0 cycles per byte, which is still a huge improvement over the scalar code, but not as fast as we had hoped. This is due to the micro-architectural properties of the tbl/tbx instructions, which take 4 cycles to complete on the Cortex-A53 when using the 4 register variant. And indeed, if we base the estimation on the cycle count, by taking 4 cycles for each such tbl/tbx instruction, and 1 cycle for all other instructions, we get the more realistic number of 21.25 cycles per byte.

As a bonus, this code is not vulnerable to cache timing attacks, given that the memory access patterns are not correlated with the input data or the key.

This code has been part of the arm64 Linux kernel since 2013, but some improvements to it have been queued for v4.11 as well.

Bit sliced AES using the NEON

The AES S-box is not an arbitrary bijective mapping, it has a carefully chosen structure, based again on finite field arithmetic. So rather than performing 16 lookups each round, it is possible to calculate the subsitution values, and one way to do this is described in the paper Faster and Timing-Attack Resistant AES-GCM by Emilia Kaesper and Peter Schwabe. It is based on bit slicing, which is a method to make hardware algorithms suitable for implementation in software. In the AES case, this involves bit slicing 8 blocks of input, i.e., collecting all bits N of each of the 128 bytes of input into NEON register qN. Subsequently, a sequence of logic operations is executed on those 8 AES states in parallel, which mimics the network of logic gates in a hardware implementation of the AES S-box. While software is usually orders of magnitude slower than hardware, the fact that the operations are performed on 128 bits at a time compensates for this.

An implementation of AES using bit slicing is queued for v4.11 as well, which manages 19.8 cycles per byte on the Raspberry Pi 3, which makes it the preferred option for parallelizable modes such as CTR or XTS. It is based on the ARM implementation, which I ported from OpenSSL to the kernel back in 2013, in collaboration with Andy Polyakov, who authored the ARM version of the code originally. However, it has been modified to reuse the key schedule generation routines of the generic AES code, and to use the same expanded key schedule both for encryption and decryption, which reduces the size of the per-key data structure by 1696 bytes.

The code can be found here.

Conclusion

For the Raspberry Pi 3 (as well as any other system using version r0p4 of the Cortex-A53), we can summarize the AES performance as follows:

Appendix: Breakdown of the MixColumns transform using NEON instructions

LHG Releases First Sample Android “AOSP TV” build on HiKey

$
0
0

Authors:  Khasim Syed Mohammed and Mark Gregotski

The Linaro Digital Home Group (LHG) has released an initial implementation of Android “AOSP TV” for the 96Boards HiKey platform.  This build is just the start of things to come on Android TV in LHG. 

You will be able to see a demonstration of the Android TV work and much more of the LHG activities at the upcoming Linaro Connect event taking place March 6-10 in Budapest.


Android TV Overview

The Android Open Source Project (AOSP) is used in a variety of device types and form factors.   The most commonly known form factor is the Android Handheld device for mobile phones and tablets.  However, for the TV form factor there are certain specific components such as the TV Input Framework and the Lean Back APIs that are unique to TV.  This is Android targeted to the entertainment interface for consuming media, movies, live TV, games and apps for the “10-foot user experience”.

The core of the Android TV device software is the TV Input Framework (TIF) which provides the framework for the delivery of live TV content.  The framework consists of many components including the TV Input Manager, TV App, and TV Input HAL.  TIF permits viewers to watch content from a variety of input sources such as cable, satellite, terrestrial, along with IP-based media delivery.  The input source is abstracted away from the viewer who is presented with a guide containing all available services.

Android TV contains Google Mobile Services (GMS) that are licensed by Google to the vendors (SoC, OEM/ODM, operator) who are deploying solutions.  The AOSP sources for the TV form factor do not contain GMS.

An Android TV solution must be verified by Google and are subject to requirements such as the Android Compatibility Test Suite (CTS), the Compatibility Definition Document (CDD) and stringent audio/video performance criteria. See:  https://static.googleusercontent.com/media/source.android.com/en//compatibility/7.1/android-7.1-cdd.pdf


LHG and Android TV

In LHG, one of the goals is to create implement AOSP TV as the open source subset that Android TV is built upon.  Among our members, there is value derived in working from a common AOSP TV starting point.  The target platforms for the Android work in LHG are the ARM-based Linaro 96Boards platforms.    The work will start on Consumer Edition boards and then migrate to the 96Boards targeted for TV and media, the TV Platform specification. http://www.96boards.org/specifications/.

The initial build is on the HiKey CE platform which is an approved Android reference board  (https://source.android.com/source/devices.html).  The preferred configuration is the 2GB RAM HiKey LeMaker version.  Since HiKey affords a stable AOSP baseline target, it serves as a good starting point for initial development efforts until TV platform boards get firmly established in open source and mainline software projects.


Development Steps to Build Android TV

Setup : LeMaker HiKey connected with HDMI output and USB Keyboard

  1. HiKey sources to be built for the TV form factor:
    1. By default HiKey AOSP sources are built for Android mobile, we enable the necessary flags in the device and HiKey make files to enable TV characteristics.
  2. Integrating the Live TV App
    1. A TV application that presents live TV content to the viewer is required for Android TV devices
    2. A reference TV application (Live TV) is provided in the Android Open Source Project
    3. This free application allows users to watch favorite live content from various sources (built in tuners – Satellite, Cable, Terrestrial) and IP-based tuners and have them all shown on Android TV
    4. The Live TV app depends on Android APIs.  It is a component of TIF and cannot be used independently of the other components.
    5. This application is built separately and integrated into the HiKey Android filesystem.
  1. Integrating the sample Android TV Channel Service
    1. There should be a service running in the background that works with Live TV app to a) display the channel list to user. b) To play the content when user selects a channel. As we don’t have this service implemented yet, we used a sample channel service from open source.
    2. Android TV channel service is installed to simulate and show the list of channels on Live TV app using TV Input Framework (TIF).
    3. The sample app displays a single TV input with 4 channels consisting of MP4 videos, HLS stream and MPEG-DASH stream, organized into various genres. The video files are served from Google Cloud Storage.
  1. Support for Adaptive Bit Rate Streaming
    1. The delivery of IP video services via Adaptive Bit Rate (ABR) streaming protocols is prevalent when delivering IP-based video variable bandwidth links
    2. There are several protocols for delivering ABR video including MPEG DASH (Dynamic Adaptive Streaming over HTTP), SmoothStreaming, HTTP Live Streaming (HLS)
    3. These protocols are used in Common Encryption solutions where content is encrypted once and can be decrypted by multiple key systems.
    4. An open source ABR media player called ExoPlayer is provided by Google (https://developer.android.com/guide/topics/media/exoplayer.html)
    5. The ExoPlayer app is downloaded separately and then built.

Putting It All Together

Once the filesystem is built and flashed on the HiKey, the next step is to connect via WiFi to the internet.  The Live TV app is launched and searches for channels over available input sources.  The Sample Android TV app acts a service that can simulate a few IP-based TV channels that are streamed over the network.  The user can make their content selection via an USB mouse or USB keyboard.  Implementations on 96Boards TV Platform boards will support IR remote controls for navigating through content choices.

Source : https://github.com/googlesamples/androidtv-sample-inputs/raw/master/screenshots/guide.png


Instructions to Build and Evaluate Android TV on HiKey

The instructions to download the source, build all the components and prepare the filesystem images are available here: https://wiki.linaro.org/LHG/Build-AndroidTV-For-Hikey

Have fun playing with this and look for future releases from LHG!

Remember to attend the Linaro Connect event in Budapest for a full week of interesting keynotes, presentations and demos from all the groups focusing on the exciting evolution of open source software for the ARM ecosystem.  Be sure to drop by the LHG hacking room and say hello. See all the technologies and demos we are working on!


Linaro Connect Budapest 2017 – Week in Review

$
0
0

The Linaro Connect Budapest 2017 (BUD17) marked the 21st Linaro Connect since 2010.   Linaro Connect Budapest was a five-day event with over 400 attendees and was packed full of keynotes by industry leaders, talks, training, meetings, hacking and a lot of socializing fun.  It is the conference for anyone interested in Linux development and related ecosystems on ARM and brings together the best and the brightest of the Linux on ARM community.  The first day of the event began with a keynote by Linaro’s CEO, George Grey and many announcements.  First was the announcement that Google has joined as a club member with Linaro.  Then was the announcement that Fujitsu will be collaborating with Linaro to accelerate High Performance Computing on ARM.  Next was HXT Semiconductor joining Linaro to work on accelerating Advanced Server Development on ARM.  Finally George announced that Acer joined Linaro’s IoT and Embedded group, which is focused on delivering end to end open source reference software for more secure connected products.

The week of sessions included many different tracks that attendees could attend, with each day focused on a particular segment within Linaro.  Along with all the sessions and hacking there was also the traditional demo Friday that was held to showcase all the hard work that was done by the various teams over the last several months.  Attendees were able to enjoy lunch while wandering the exhibit hall full of demos by both Linaro and it’s member companies.

Images from Demo Friday

Images from Thursday’s Linaro Gala Dinner

One of the highlights of the week was the Thursday night Linaro Gala where attendees had some time to socialize and get to know each other while enjoying games, drinks and dinner.  It was also an opportunity to recognize some outstanding contributions to Linaro.

Keynotes During the Week


Monday:

On Monday the Welcome keynote was given by Linaro’s CEO, George Grey, who welcomed attendees to the event and gave an overview of the many projects that Linaro is working on with its member companies.  George then went on to demo several of these projects.


Tuesday:

Christophe Arviset, who is Head of the Data and Engineering Division at ESA’s European Space Astronomy Centre (ESAC) in Spain.  Christophe gave a keynote titled: Big data, big challenges for ESA Space Science Missions’ Archives.  The keynote discussed the European Space Agency’s (ESA) current mission Gaia and upcoming one Euclid that will generate massive amounts of astronomical data will need to be made freely available on-line through powerful data management systems. This brings big challenges for building these missions’ archives.   Watch the keynote


Wednesday:

Björn Ekelund, who is Head of Hardware and Device Technology at Ericsson Research.  Mr. Ekelund’s keynote was titled:  Human communication, a niche use case in 5G.  He spoke about how the fifth generation mobile communications network is optimized for connected and intelligent machines.  Watch the keynote 

David Abdurachmanov, who is part of Compact Muon Solenoid (CMS) experiment at CERN in Geneva, Switzerland, and Jakob Blomer, a computer scientist in the scientific software group at CERN in Geneva.  They gave a keynote titled: High Energy Physics and ARMv8 64-bit? Investigating The Future of Computing at CERN.  Their talk contained information about future (a timeline of 10 years) computation needs for LHC experiments and the more recent progress done by ATLAS, CernVM and CMS teams on using ARMv8 64-bit/AArch64.  Watch the keynote


Thursday:

Max Wang, a software engineer working on HHVM at Facebook,  gave a keynote titled:  HHVM on AArch64.  He discussed how it is the fastest PHP runtime in the world, with support for PHP5, PHP7, and Hack—the programming language used for Facebook’s web server application logic.  He gave a quick demo, and talked about where optimization efforts can go from here.  Watch the keynote


Friday:

Jonathan Corbet, who is a kernel documentation maintainer and co-founder of LWN.net, gave a keynote titled:  The kernel’s limits to growth.  He discussed the number of process and community-management hurdles that had to be overcome to make the Linux kernel one of the most successful software-development projects ever.  He went on to talk about the number of obstacles they still face, and that beyond limiting our future growth, might even threaten our ability to sustain our current development pace. He talked about the obstacles, and how they can be addressed.  Watch the keynote


Segment Team Sessions 

Below are the sessions held by each of the Linaro Segment teams during the week of Linaro Connect BUD17.  To see all the sessions and keynotes held during the week and get access to all the available presentations and videos please visit the Linaro Connect BUD17 Resources.

96Boards 


LEG 


  • Distribution CI using QEMU and OpenQA – BUD17-113
    • Delivering a well working distribution is hard. There are a lot of different hardware platforms that need to be verified and the software stack is in a big flux during development phases. In rolling
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-113/

    Auto vectorization support in OpenJDK9 Hotspot C2 compiler – BUD17-117

    • OpenJDK is an implementation of the Java SE and Linaro has been involved into OpenJDK aarch64 support since 2014. In this session, we will take a look at auto vectorization support in the OpenJDK9 hotspot C2 compiler, especially the NEON support in AArch64 backend. Some potential optimization points of vectorization in OpenJDK9 Hotspot (especially for AArch64 backend) are also described.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-117/
  • Modern tooling with CentOS and DTS – BUD17-121
    • This session will cover getting, and using newer tools such as gcc6 to use when building distribution components or 3rd party software for various improvements, while maintaining compatibility with the base distribution.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-121/
  • ARM Server Standards: The Next Generation – BUD17-201
    • We have recently achieved a critical milestone of having core enablement for SBSA (Server Base System Architecture) and SBBR (Server Base Boot Requirements) in upstream kernels “out of the box”.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-201/
  • Updates on Server Base System Architecture and Boot Requirements – BUD17-205
  • Reliability, Availability, and Serviceability (RAS) on ARM64 – BUD17-209
  • libvirt integration and testing for enterprise KVM/ARM – BUD17-213
    • This technical discussion will highlight on-going Red Hat activities for the integration of an Enterprise Virtualization stack and present a vision of an enterprise guest. The status of features such as live migration, device assignment, H52PCIe topology, large guest support, and more will be presented. The status will cover the integration up to the libvirt level of the stack, and exhibit that components above libvirt can “”just work””, even when development, to this time, has been predominately focused on other architectures. As feature parity with x86 is a goal, the remaining gaps with x86 wil also be highlighted. Finally, the status of the verification efforts, specifically those involving the Avocado and kvm-unit-tests frameworks, will be presented.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-213/
  • Cross distro BoF – BUD17-311
    • Regular session taking place at Connect for developers working on Linux distributions to share progress on ARM Linux platform support. Users are also welcome to share their experiences using Linux
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-311/
  • DynInst on arm64 – Status – BUD17-323
    • The DynInst package (https://github.com/dyninst/dyninst) provides an API for program binary analysis and instrumentation. In this technical session, after a brief general introduction of the package
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-323/
  • OPNFV: Next steps for enablement – BUD17-401
  • Tutorial Scientific Computing on ARM-based Platforms – BUD17-403
  • Changes in UEFI land – BUD17-417
    • A summary of important changes in UEFI land in general, especially in the area of open source platform code, and how Linaro intends to deal with Reference Platform firmware releases in the future.
    • Presentations & Videos:  http://connect.linaro.org/resource/bud17/bud17-417/
  • 96Boards enablement for openSUSE – BUD17-500
    • Progress, difficulties and questions surrounding the creation of bootable openSUSE images in the Open Build Service for the various 96Boards (HiKey, DragonBoard 410c, Bubblegum-96, Poplar, Mediatek X20) are presented. Focus is kernel and bootloaders as well as openSUSE specific tools.
    • Presentations & Videos:  http://connect.linaro.org/resource/bud17/bud17-500/

LHG 


  • Status of Android “AOSP” TV – BUD17-114
    • LHG has recently launched an AOSP TV lead project which is focused on specific TV use cases using Android from AOSP for TV form factor. This presentation will describe the current AOSP TV builds on 96Boards and the strategy for Android TV for members and the community.
    • Presentations & Videos:  http://connect.linaro.org/resource/bud17/bud17-114/
  • Secure Data Path with OPTEE – BUD17-400
    • LHG is using the ION-based secure memory allocator integrated with OPTEE as the basis for secure data path processing pipeline. LHG is following the W3C EME protocol and supporting Content
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-400/
  • UEFI/EDK2 for RDK on HiKey – BUD17-404
    • The set-top industry is still heavily reliant upon proprietary U Boot bootloader schemes that present significant integration challenges to OEM vendors. LHG has undertaken an initiative to implement a UEFI/EDK2 solution for the RDK. This presentation will describe the implementation challenges and advantages by moving to a UEFI runtime environment.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-404/
  • RDK on 96Boards – BUD17-408
    • LHG has been working on integrating open source features to the RDK for almost three years. Over time the development team has run RDK on many target boards. This presentation will discuss the 96B platforms that are running RDK and the challenges related to each port.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-404/

LITE


  • mcuboot: A shared bootloader for IoT – BUD17-100
    • An important base for security is the beginning of the boot process. It is necessary to be able to verify signatures before upgrading images. This session will discuss the mcuboot project, the efforts to port this to Zephyr, and the functionality available. It will include a small demo of its functionality.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-100/
  • Scripting Languages in IoT – Challenges and Approaches – BUD17-104
    • Scripting languages is hot emerging topic in IoT. They allow easy learnability and rapid prototyping and further benefits (like production use) as they evolve. This session compares approaches of MicroPython and JerryScript/Zephyr.js projects and gives status update on their Zephyr RTOS ports.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-104/
  • Porting the TI SimpleLink CC32xx WiFi stack to the Zephyr IoT OS – BUD17-112
    • The TI SimpleLink CC32xx family of MCUs provides an SoC and supporting SDK which completely offloads the WiFi stack onto an integrated network coprocessor. The SimpleLink SDK currently has no explicit support for the Zephyr IoT OS, but is designed to be portable. A native IP stack for Zephyr is currently under development, which includes an experimental IP offload option. This session reviews the challenges of integrating a vendor TCP/IP offload engine into an existing OS IP stack in general, and in particular, evaluates options for integrating the TI SimpleLink WiFi stack into Zephyr.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-112/
  • Zephyr on Beetle – BUD17-116
  • Linux-wpan: IEEE 802.15.4 and 6LoWPAN in the Linux Kernel – BUD17-120
    • Adding support for IEEE 802.15.4 and 6LoWPAN to an embedded Linux system opens up new possibilities to communicate with tiny devices. The mainline kernel
      supports the wireless protocols to connect such devices to the internet, acting as border router for such networks.  This talk will show the current kernel support, how to enable and configure the subsystems to use it and how to communicate between Linux and IoT operating systems like RIOT, Contiki or Zephyr.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-120/
  • The Swarm on the Edge: Pushing “IoT” to the next step – BUD17-303
    • Sensory swarms of an IoT can be wirelessly interconnected to interact with the edge of a cloud, and offer an unprecedented ability to monitor and act on a range of evolving physical quantities. The Swarm leverages the paradigm of independent, cross-niche and heterogeneous devices that can cooperate with each other in order to execute tasks synergistically. The Swarm architecture is device-oriented, focused on machine-to-machine communications. The Heterogeneous Broker is responsible for dynamically recruiting resources from the cloud; allowing information aggregation to make or aid decisions; and then to dynamically recruit actuation resources. In this talk we will describe our experience implementing and deploying the heterogeneous broker developed by USP in partnership with UC Berkeley. An special focus will be given to the 96Boards program.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-303/
  • A functional Open GPU Upon ARM – BUD17-502
    • GPGPUs (General Purpose Graphics Processing Unit) are becoming a relevant functional block on SoCs, particularly on the ARM ecosystem. Extracting full performance of a GPU is now becoming a combination of well integrated and optimized software and hardware. Motivated by that, there are many Open GPU initiatives around the world using FPGAs, but most (if not all) of these are on Intel platforms. This project aims to present an Open GPU based on an FPGA using the ARM Instruction Set. The driver platform adopted was the well know MESA 3D (www.mesa3d.org). We will describe the co-design approach to designing the OpenGPU. A functional demonstration of the OpenGPU working on a range of OpenGl applications ported by Linaro will be shown. On the fly we will change drivers between: SW only, GPU on Asic, and OpenGPU to see its performance impact. The engineers that implemented this system will be at the session to support detailed technical questions.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-502/
  • Partnership in Open Design and Manufacturing: How Universities can Contribute with Developers Communities – BUD17-511
    • The University of Sao Paulo, with support of LSITEC (an NGO Design House), has all of the necessary equipment to design and manufacture 96boards computers and mezzanine boards. Working with one of Linaro’s partners, LeMaker, LSITEC has produced the LeMaker Guitar single board computer under license, and is now looking forward to producing 96boards and accessories the same way. The final goal is to strengthen the gap between industry and universities, producing professionals with high design skills in embedding computing to society, while providing to Linaro and its partners the ability to have high quality boards in various volumes of manufacturing.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-511/

LMG 


  • LMG Lightning Talks – BUD17-106
    • Short updates on various LMG initiatives, including boottime reduction, Android 4.9, ART, Serial Device Bus and PPP upstreaming. This is a series of short (~10 minutes) updates on LMG Initiatives:  Boottime reduction status updates (AOSP), Efforts in making ARM the best platform for Android (ART), Whats new in android-4.9 (Amit Pundir), Intro to Serial Device Bus (Rob Herring), PPP Effort Update (Sam Protsenko), (e.g. siockilladdr, switch class etc) or they are obsolete and no longer used in AOSP (e.g. uid, n/w activity stat etc).
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-106/
  • Status of Android “AOSP” TV – BUD17-118
    • LHG has recently launched an AOSP TV lead project which is focused on specific TV use cases using Android from AOSP for TV form factor. This presentation will describe the current AOSP TV builds on 96Boards and the strategy for Android TV for members and the community.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-118/
  • AOSP Toolchains – BUD17-202
    • LMG team, in collaboration with Toolchains team, has worked on multiple areas related to clang, gcc and native AOSP builds. Bero and Renato will take us through what has been done, and what we need to continue to do. This presentation will focus on LMG efforts around the following:  Clang CI efforts, Getting AOSP to build with gcc 6/7, Building kernels with clang, Native AOSP builds and development
    • Presentations & Videos:  http://connect.linaro.org/resource/bud17/bud17-202/
  • Future of Android Automated Testing – BUD17-206
    • This session will talk about efforts around increasing Android testing – frequency, automation and coverage. The discussion will include details about: Current testing Linaro does, Changes we’re working on for testing AOSP/master, More kernel focused testing & integrating community tests into android, Sharding” CTS test runs in the cloud for faster results
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-206/
  • Energy Awareness: The Next Step – BUD17-222
    • This will be a forward thinking session where, in the age of ML/AI/VR/AR, we share some ideas that could help transform EAS into a Hardware/Kernel solution that uniquely adapts, in realtime, to every single platform without the need for prior costly & precalculated knowledge.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-222/
  • AOSP BoF – BUD17-414
    • By definition, ‘unplanned’ session for people to come and discuss anything around AOSP – including AOSP for members, AOSP for non-Android Linux devs, OPTEE/Security, AOSP TV, to name a few. An initial list of topics for discussion could include:  An Intro: Uses of AOSP codebase at Linaro and members, AOSP for Linux developers (an intro to the AOSP codebase for regular Linux guys?, OP-TEE/Security, AOSP TV, Perhaps Completely open graphics stack
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-414/
  • Timekeeping in the Linux Kernel – BUD17-419
    • The timekeeping code in the Linux kernel is used by nearly everything from the low power idle paths to device drivers. In this presentation, Stephen Boyd will take the audience on a tour of the timekeeping code, exploring how the kernel abstracts the hardware, how those abstractions are built upon to implement NOHZ, timers, hrtimers, cpu-idle, POSIX clocks, etc. and how we keep things working when these abstractions break down with the tick-broadcast mechanism.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-419/
  • ION BoF – BUD17-506
    • Upstream community hasn’t had active participation around ION for past couple of years. Let us meet to discuss what can we do to change ‘status quo’. The session will include:  Brief status on where we stand on ION, Why isn’t upstream community ‘not interested’, How can we attempt to influence that
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-506/

LNG 


  • Compression support in OpenDataPlane(ODP) – BUD17-103
    • A proposal to add ODP based compression/decompression API to provide portable hardware accelerated access for data plane applications that require compression/decompression. The proposal outlines 2 schemes:  Reuse the existing Cryptography API to support Compression, Introduce independent Compression API.  The initial implementation will target Cavium OCTEON TX SoC to accelerate IP Compression (IPComp).
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-103/
  • PCI-e EndPoint mode of operation in OpenDataPlane(ODP) – BUD17-107
    • A proposal to add ODP based support for PCI-e Endpoint mode of operation. In the Endpoint mode the ODP implementation runs on the Endpoint device while the Root Complex(PCI-e host) will appear as a peripheral device. The proposal outlines 2 ways in which the Root Complex can be registered with ODP:  As a NIC device, As a Co-processor.  The initial implementation will target Cavium OCTEON TX SoC.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-107/
  • A Scalable Software Scheduler – BUD17-111
  •  VOSYSwitch on OPNFV – migration to ArmBand – BUD17-217
    • Virtual Open Sytems has demonstrated OPNFV Colorado integration of its Lua-based VOSYSwitch data plane, deployed within an x86 environment. With the port of LuaJIT to AArch64, the solution is migrated to the OPNFV/ArmBand release. The presentation will show experimental results, discuss the specifics of the migration and issues that we conquered during the process. At the end, the next steps and the future plans will be outlined.
  • Journey of a packet – BUD17-300
    • Describe step by step what components a packet goes through and details cases when components are implemented in hardware or in software. Attendees will have the definite presentation to understand fundamental differences with DPDK and how ODP solves low end and high end networking issues.
    • Presentations & Videos:  http://connect.linaro.org/resource/bud17/bud17-300/
  • ODP IPsec offload panel – BUD17-304
  • Linux networking and I/O – BoF – BUD17-312
    • Open discussion on Linux networking and IO.  Expected topics to be covered:  IPsec full offload status in Linux kernel network stack and projected LNG work, IOvisor status on ARM, Automotive “foundational blocks”: LSK-RT, static partitioning with open source OpenAMP or Jailhouses.
    • Presentations & Videos: http://connect.linaro.org/resource/bud17/bud17-312/
  • High resolution data plane timers – BUD17-320

We hope you join us for our next Linaro Connect to be held in September 25-29th, 2017 in San Francisco, USA.  Details and registration coming soon, keep checking the Linaro Connect site for more information.

Highlights are available on the event site, Facebook, Twitter, Flickr and Youtube.

So, let’s talk USB hubs.

$
0
0

Back when Linaro and LAVA Lab started, there was very little need for USB device support connected to our LAVA dispatcher servers. However, as time went on, more and more devices started coming to us with USB serial, and more latterly with USB OTG, which we would use to flash test images onto fastboot based devices, particularly 96Boards. Initially, since there was very little USB, we could plug directly into the server, but when the number rose to more than a handful, we had to start using hubs.

Over the years, we have tried numerous hubs – from relatively low cost to fairly expensive ones. After a period of time, they would start failing, which prevented our kernels from recognising any device plugged into them. While there did seem to be a correlation between the cost and how quickly they broke, they all would eventually stop working, and we would have to reset the hub. Sometimes we’d even have to reboot the LAVA server the hub was connected to. Add to this that the ports of these hubs weren’t delivering the full potential current, which meant devices that needed to charge off them would slowly discharge and go offline. This is not good when you are trying to maintain a 24/7 service, and crucial devices are plugged into them.

Finally there was the issue of controlling the port power. More and more over the last two years, we’ve had devices that required us, for varying reasons, to be able to virtually disconnect them by cutting power. This could be because, on 96Boards for example, the OTG port needed to be powered off when the device was booted so we could use the on board USB for ethernet; or it could be because sometimes a commercial device would lock in fastboot, and that would require a disconnect/reconnect to get it to re-register properly. Initially we built a Heath Robinson type solution that involved surgery on USB cables, relays and a Raspberry PI. It worked, but also had it’s issues.

So for many years I have vainly tried to find a hub that would..
(a) Be reliable
(b) Allow me to control the port power
(c) Supply the specified 2.1A per port, even when every port is plugged in and in use

Oh yes, and there’s a (d) I haven’t mentioned!

(d) Give me a USB hub that plugs into the network, that uses USB over IP and allows any server on the same network to grab a device so that it looks like it’s plugged in directly. I searched high and low for such a device, but could not find anything that fitted the bill.

So, you can imagine my initial reaction when, in December 2015, I was told about a Cambridge based company, Cambrionix, who made a “really good hub” that provides 2.1A per port, guaranteed. On 15 ports. I was skeptical. Then I heard the price. I was even more skeptical. So I contacted these guys and they shipped me a loaner for the month of January.

I got back to the office after the Western New Year and thought I’d take a look at it. I unpacked it and took a look at the power supply. It was BIG. It was rated to comfortably supply the required current. I plugged it into my laptop, and was immediately surprised to find that it registered as a serial device, with nothing else plugged into it. So, a little ser2net config later, I decided to try telnet’ing onto it. Imagine my surprise when I got a command prompt. I typed “help” and got a list of things I could do. Guess what one of them was? Switch the mode of the port between sync, power only and….(drum roll) OFF!

I swallowed. My fingers trembling with excitement (yes, I really should get out more often) I plugged the OTG port of a HiKey board into the hub, along with the USB serial. I then noticed a command that allowed me to look at the status of all the ports in real time… I typed it in. Bingo. It showed that all the ports were in “sync” mode, with a realtime display of their current draw. This was getting better by the minute.

I used the command line interface to power the OTG port off. Sure enough, it disappeared as a fastboot device.

So I started scripting. I wrote a Python script that allows me to specify which port, and which state, I want it to be in. I put the hub in our staging server, and integrated my script into the config file for a HiKey.

Wow. This was good, I could now cleanly control the power on ports. Further testing proved it really was stable.

I looked on the Cambrionix web site. OMG. They had EtherSync – an 8 port USB hub that connected to your network, and allowed servers to…. well, see point (d) above.

I phoned Cambrionix to tell them I was impressed, and suggested we have a meeting. Luca and I, the Lab team, went there and sat with them. They lent me an EtherSync. It was like they’d given me a brick of pure gold. I held it in my hands, not wanting to believe the holy grail was really in my hands at last.

We got the EtherSync back to the office and started testing. It’s impressive. It’s running a stripped down Linux on an ARM processor. They have a Linux daemon that Steve McIntyre suggested I should integrate as a systemd service. It seemed to work beautifully. Then we started noticing something. Over time, a server with a port connected to it would freeze. I emailed their software guy. Within a week I got an email from him. They’d found a bug in the USB/IP software stack that was causing the lock up. And, *gulp*, they’d upstreamed the fix. These guys are already in the light. Not too long after, I got an email from Andrew – the software guy – that it was now in the 4.6 kernel. In the meantime I’d created a new python script for controlling it, and was testing it in isolation.

The rest, as they say, is history. We bought a stack more of the 15 port hubs, and we are gradually replacing every hub in the lab. They really are that good.

What’s new in QEMU 2.9

$
0
0

QEMU is an interesting multi-faceted open source project. It is a standard component for the Linux virtualisation stack, used by both the KVM and Xen hypervisors for device emulation. Thanks to its dynamic just-in-time recompilation engine know as the Tiny Code Generator (TCG) it is also capable of emulating other architectures on a number of hosts. This takes the form of either a full system emulation or the lighter weight user-mode emulation that allows foreign user-space binaries to be run alongside the rest of the host system.

Started in 2003 by Fabrice Bellard QEMU is now maintained by a community of mostly corporate sponsored engineers, although unaffiliated individuals are still the second largest set of contributors. The projects codebase has continued to grow over the years and it now has reached the point of making around 3 stable releases a year, typically in one in April, August and December.

Linaro engineers takes an active part in development and maintenance of the project and we thought it would be useful to talk about some of the ARM related features in the up-coming 2.9 release.

1 AArch64 EL2 Support for TCG

Building on previous work to enable EL3 (otherwise known as TrustZone) we now fully support the hypervisor exception level EL2. As the virtualisation support of the interrupt controller is an important part EL2 you need to define a GICv3 as part of your machine definition.

qemu-system-aarch64 ${QEMU_OPTS} \
  -machine gic-version=3 \
  -machine virtualization=true

This is especially useful if you are wanting to debug hypervisor code while working on KVM as it is often easier to attach to QEMU’s GDB stub than debug on real hardware with a hardware assisted debugger over JTAG.

While it is still slow compared to real KVM support it is faster than running nested TCG emulations. It also means you can use QEMU instead of the fast model to test hyper-visors which is useful given the next feature…

2 Multi-threaded TCG for System Emulation

Previously system-emulation in QEMU has been single-threaded – with a single host-thread emulating all the guests vCPUs. As the number of SMP systems has grown this has slowly become more of a bottleneck in QEMUs performance. The multi-threaded TCG project (also known as MTTCG) is the culmination of several years of shared effort between commercial, community and academic contributors. Linaro has been proud to be heavily involved in coding, reviewing and helping get this feature accepted upstream.

While the work has focused on system emulation a number of the updates have also had benefits for the rest of TCG emulation including the efficient QHT translation-cache lookup algorithm and completely overhauling how TCG deals with emulating atomic operations. If you are interested in a more detailed write-up of the technical choices made we wrote an article for LWN last year.

While this work finally removes the single-threaded bottle-necks from system emulation it is not a performance panacea. While you have unused CPU cores on your host machine you should see performance improvement for each new vCPU you add to your guest up until around 8 cores. At that point the cost of keeping the system behaviour coherent will eventually catch-up with you.

The core technology on which MTTCG relies is target agnostic and designed so all the various architectures QEMU emulates can take advantage of it. However each front-end needs to make changes to their emulation to ensure they take advantage of the new TCG facilities for modelling atomic and barrier operations.

Currently MTTCG is enabled by default for both 32 and 64 bit ARM chips as well as the Alpha architecture when running on an x8664 host. This is by far the most common use case for ARM emulation.

3 Cortex M fixes

In the last few years Linaro has been mostly concentrating on the A-profile (Application profile) ARM processors. These are the ones designed to run full-stack operating systems like Linux. With the growing interest in Internet of Things (IoT) we are starting to turn our attention to the M-profile. The Microcontroller profile processors are targeted at much more constrained low-latency, low-power deeply embedded applications. Their memory is usually measured in kilobytes (kB) rather than megabytes (MB) so they tend to run custom run-loops or highly constrained real-time operating systems (RTOS) like Zephyr.

While QEMU nominally supports the Cortex-M3 processor support for boards using it has been sporadic and the resulted in a situation where there have been long standing un-fixed bugs and important features missing. As the architecture has progressed support for the newer M-profile CPUs has also lagged.

The 2.9 release sees a number of fixes to the Cortex-M series emulation as we ramp up our efforts to improve QEMU’s microcontroller support. The fixes have so far been aimed at architectural aspects which was known to be broken like the NVIC emulation. However part of the discussion at our recent BUD17 session was looking at what features we should prioritise for future QEMU releases.


This summary is not intended to be exhaustive and has concentrated on ARM specific features. For example we have not covered updates to the common sub-systems shared by all architectures. For those interested in all the details the full changelog is worth a read.

EdgeX Foundry Integration with Linaro’s Zephyr-based IoT demonstration system

$
0
0

“The creation of a standard, secure, open, and architecture- and vendor-neutral gateway framework is a critical component of IoT based solutions. Hosted by The Linux Foundation, EdgeX Foundry’s impressive industry support and open governance model allows open collaboration on a common gateway architecture by industry leaders,” said Matt Locke, Director of the Linaro IoT and Embedded (LITE) Group. “This much needed unifying project will allow vendors to define and build a common gateway platform; a platform upon which they can build unique and compelling solutions across a wide range of market segments. We look forward to welcoming new members into LITE to work closely on the engineering needed to accelerate adoption of EdgeX Foundry. Supporting this new project complements and builds on LITE’s engineering and technical support of The Linux Foundation’s Zephyr project, which is aimed at enabling embedded and IoT devices.”

The challenge: Integrate our end-to-end IoT device management platform running on 96Boards hardware and an all open source,and mostly upstream, software stack including key projects such as Hawkbit, Zephyr, Docker and the Linux Kernel with the new EdgeX Foundry?  It was a no brainer.  Do it in 2 days?  Hmmm, sure?

 

When we heard about the timing of the EdgeX platform, we knew we wanted to find an integration path that could leverage our standards based designs and get both of our platforms integrated showing the flexibility at Hannover Messe.  With limited time and resources, we knew we would need to ramp up quickly to get the systems integrated, and we knew that it would be a good measure of the readiness of EdgeX Foundry if we could integrate our solutions quickly without needing to write new code all the way across the software stack.  If we could do this on such a short timeline, we felt it would fully demonstrate the flexibility of our platforms.

 

The goal of EdgeX is to “Build a flexible, platform-independent, highly-scalable and industrial-grade open source edge software platform supported by a rich ecosystem of components that can quickly and easily deliver interoperability between things, applications and services, across a wide range of use cases.’.  It is “like Cloud Foundry for the IoT Edge.”

 

In an earlier demonstration of the Linaro end-to-end IoT demo at Linaro Connect Budapest in March[1], we connected 96Board Nitrogens, running Zephyr, to a 96Board Hikey-based IoT gateway, supported by the open source Hawkbit device management platform while sending data to IBM’s Bluemix IoT platform.  In the EdgeX Demo for Hannover Messe, thermal data from the Nitrogens is being sent through the Hikey to the EdgeX Platform via MQTT.  With EdgeX Foundry and the Linaro end-to-end IoT demo systems both being standards-based and leveraging containerized microservices, it was fairly straightforward to integrate the two systems and produce the Messe demo. In the end, no engineers were harmed in the making of the demo and most of the integration was achieved by simply creating a new MQTT service to route the data.  Based on our short experience with the EdgeX project, we are excited to see it evolve as a community project.

 

 

[1] BUD -17 Keynote Demonstration, Demo overview ~ 44 minutes, http://connect.linaro.org/resource/bud17/bud17-100k1/

 A small team within Linaro, the goal for Linaro Technologies is to accelerate the delivery of product-quality open source software, including Linaro output, into the ARM ecosystem. Currently, the Linaro Technologies team is responsible for the 96Boards program and also has assisted Linaro OCE with software builds and releases. Working with Zephyr, the team has built an IoT demonstration project that showcases an end-to-end IoT device to cloud system built on open source and running on ARM. We also participate with others in the community driving KernelCI, the open source community project that provides Linux kernel build/boot testing @ http://kernelci.org.

The End-to-end IoT Demonstration system.

  • 9 96Boards Nitrogens running as Zephyr-based thermal sensors
  • 1 HiKey BLE/6LoWPAN gateway
  • 1 Dell 5100 Industrial IoT gateway running EdgeX Foundry integrating the Zephry MQTT datastream with EdgeX

Day 1 at the OSPM Summit Pisa, Italy

$
0
0

The first summit on power management and scheduling disciplines in the Linux kernel was held at Scuola Superiore S. Anna in Pisa Italy on Monday 3 April and Tuesday 4 April 2017.  The event was organised by ARM and members of the ReTis lab.  It attracted a wide audience that spanned both the industry and academic realm. Linaro attended the conference and offers the following summary from day 1 (to view the summary from day 2, click here). To view the presentations listed below, click on the headings.

 
Tooling/LISA
By Patrick Bellasi and Brendan Jackman (slides)

The presentation started with an introduction to LISA and the motivation behind its development. It is a set of tools and scripts built on top of existing technology/frameworks. The goal is to understand the effect of change made to the scheduler and spot regressions. Everything is available on GitHub so that people can have common test cases to work with and compare results easily. What is currently available integrates different examples of analysis scripts and plots, making it easier for newcomers to quickly get started. A lot of recipes are also available. Patrick gave plenty of examples on the type of graph plots already available to use with the quality and relevance of those graphs being impressive. The library is powerful and gives a good insight as to what is happening from different perspectives. It also has good support for latency analysis. Brendan continued the presentation with more specific tools from the library, namely TRAPpy and Jupyter. The former is a python based library that provides support for rendering kernelshark-like results, while the latter is a browser based technology offering an environment where graphs can be plotted based on queries formulated by users in real-time.

The presenters admitted the learning curve is steep but the results are well worth it. Todd Kjos (Google’s Android kernel team) reported his experience saying that if what you need is already present then things are easy. Otherwise the investment becomes considerable. He also said that the current efforts to improve the documentation are definitely helping. The presentation finished with questions from the audience. The conclusion was drawn that things are bound to change a little with scheduler tracepoint modifications but the presented tools do not have strict dependencies with respect to the current trace format.

 
About The Need to Power Instrumenting The Linux Kernel
By Patrick Titiano (slides)

Patrick started his presentation with a description of the problems he is currently facing, i.e there is a lack of power data and instrumentation along with no probe points for power measurement. The HW currently available is costly and vendors aren’t usually fast to share power numbers with people. In his opinion, the situation is caused by the false belief that PM management is of no interest to people, something that can’t be further from the truth. To address the problem he suggests introducing a generic power instrumentation framework, allowing debug power management on any board without getting into expensive HW. That would help to further the modelling of power usage on current system and help design new generations.

What is needed to achieve something like that? First a common database to catalog how much devices are consuming (CPU, GPU, RAM, uart, i2c, …), then tools to plot and process the power traces generated by systems. The emphasis would be placed on keeping things generic. We currently have tools like Ftrace capable of exporting power related information but the tracepoints rarely get out.

The view of one of the participants was that this idea is dead from the start – there is already a DB available for ACPI and it is not being used. Another person stated that manufacturers know the numbers but don’t want to share the information. In summary, a lot of the infrastructure is already available, what is needed is some kind of central repository to publish power consumption data and user space tools for plotting/analysis.

 

What are the latest evolutions in PELT and what next
By Vincent Guittot (slides)

Vincent started his presentation by going over the different load tracking mechanisms currently used by the scheduling classes. CFS does so using PELT, RT using the RT average and deadline by tracking runqueues’ active utilisation. So far most of the focus has been on PELT. From there, he proceeded with a couple of graphs: one highlighting various problems with PELT in kernel 4.9 and another one with the tip kernel where fixes for those problems have been included. Interesting aspects cover a more stable utilisation along with the load and utilisation being carried with the task when moved. Things remaining to sort out include frequency invariance, the update of blocked idle loads and dropping of the utilisation metric upon DL/RT preemption and migration. On the frequency invariance front, the goal is to make min/max utilisation the same for every frequency and across architecture. That way load becomes invariant and more responsiveness is achieved from sudden load spikes. That sparked a conversation about what to do when approaching maximum CPU utilization, i.e should we go to max OPP directly or approach things from the bottom. The problem is to find the point at which it becomes worth it to boost the OPP. Regarding the updating of blocked idle load, Vincent said it needs to happen more frequently since it is used to set shares in task groups and determine OPPs when schedutil is used. He also has a prototype to track RT utilisation that adds a PELT like utilisation metric to the root RT runqueue. The presentation ended with an open-ended question about how to evaluate that a thread doesn’t have all the running time it wants. Knowing how much time tasks are waiting would be useful to know when (and by how much) to increase the operating frequency.

 

PELT decay clamping/UTIL_EST
By Morten Rasmussen and Patrick Bellasi

The problematic exposed by Morten and Patrick is that periodic tasks with very long sleep periods lose too much of their accumulated utilisation, something that leads to wrong estimates at next wake up time. Tasks that are not clamped see a very big ramp up upon waking up and as such a less responsive system. With clamping, more of the task’s history is conserved and the ramp up time to higher operating frequencies is shorter. At that point, a participant asked if long sleeping tasks can be treated as new tasks when they wake up. Morten thought it was a possible avenue but the problem is to determine how long a task needs to sleep before being considered a new task. Morten has patches that implement clamping that have a few issues to sort out but are good enough for reviewing. Overall participants were not keen on the approach. Other ideas that came up were to use PELT as an estimator and collect what was learned about previous tasks’ activation. This allows the possibility to generate a new metric on top of PELT for task and CPUs. This new metric (namely util_est) can be used to drive OPPs and better support task placement in the wake up path. Moverover, it has the advantage to keep the original PELT signal (thus not risking to break its mathematical properties) while creating a better abstraction for signals consolidation policies.

 

EAS where we are
By Morten Rasmussen

The presentation started with a short introduction on EAS and why it is important, i.e the idea is to maximise CPU utilisation and power efficiency. Then a short overview followed of why invariance scaling is needed. Morten addressed the fact that the current EAS/PELT energy model is the best thing we can do and that predicting the future based on the past is inherently wrong. It is possible to set CPU affinity for a task but all intelligence/choices made by the energy model is overwritten. The tradeoffs between performance and energy consumption is always very use case specific. Thermal is also a problem as system thermal throttling comes in the way often and will preempt decisions taken by EAS. There is currently no correlation or communication between the FW (thermal) and the scheduler (EAS). Regarding energy aware scheduling, a lot of things like schedutil, capacity-aware task wake-up and PELT group utilisation are already upstream. Discussion is now happening around SCHED_DEADLINE scale-invariance, schedtune and device tree capacity. In the long term, the issue of PELT_NOHZ updates, capacity aware load balancing and the placement of ‘misfit’ tasks are on the radar.

 

Energy model and exotic topologies
By Brendan Jackman (slides)

Brendan started his session with several figures on the EAS energy model concept and data structures. From there he proceeded to highlight the importance of cluster-level energy data for cluster packing, that is when to know that tasks should be packed together on clusters. It is easy to show the effect of cluster packing on scheduler behavior but harder to demonstrate energy savings on modern platforms. This was followed by a short overview of ARM’s DynamIQ Shared Unit (DSU). The concept involved packing different types of CPU in the same cluster, that is up to 8 CPUs that share an L3 cache with all/some/none CPUs having their own L2. Simply put the architectural topology boundaries we have seen so far are no longer congruent with frequency domains, power domains and CPU capacity boundaries. That led to a discussion on the ramification of an energy model for such heterogeneous topology.

 

Schedtune
By Patrick Bellasi (slides)

Patrick started his session with a very good description of the problem he is trying to solve. His goal is to communicate to the kernel information about user space task requirements so that existing policies for OPP selection and task placement can be improved. Taking the Pixel as a base platform, a set of concepts have been evaluated, more specifically the boosting of “top applications”, minimum capacity for specific tasks and the introduction of a “prefer_idle” for latency sensitive tasks. Preferably those would be added as extensions to existing concept. As an example, task boosting could be partially supported by the cpu.shares attribute of the CPU CGroup controller, while OPP biasing (minimum and maximum preferred CPU capacity) and perfer_idle with the introduction of new cpu.{min_capacity,max_capacity} flags. Patrick has published a “CPU utilisation clamping” patchset where the concept of OPP biasing and negative boosting are implemented with the introduction of new cpu.min_capacity and cpu.max_capacity attributes. Some of the advantages of the current proposal is that it is built on top of existing policies and the runtime overhead is negligible. From there, audience members expressed concerns about the feasibility of extending the current APIs. The concept of minimum capacity and the proper semantic to make it useful was brought forward but doubts about it being required were raised. Participants were of the opinion that other things such as PELT’s under estimation of task requirements could be improved before we get there. It was also underlined that the current energy models avoid over provisioning and efforts to address the situation are still very use case specific. The level of abstraction to describe a task’s requirement was also raised – to coarse and the model becomes inefficient and too many details risk leading to computation overhead and exposure of internal kernel specifics. The presentation concluded with an overview of the current work in progress such as finishing the task placement feature and the completion of integration with AOSP user space, i.e cleanup the current sched_policy and extend task classes to their proper mapping.

 

SCHED_DEADLINE and bandwidth reclaiming
By Luca Abeni and Juri Lelli (slides)

The presentation started with Luca talking about the SHED_DEADLINE class and the general concept behind bandwidth reclaiming. The implemented algorithm is called Greedy Reclamation of Unused Bandwidth (GRUB) as it makes it possible to reclaim the runtime unused by some deadline tasks to give other tasks more runtime than previously agreed upon, without breaking deadline guarantees nor starving non-deadline tasks. Alternatively, the reclaiming mechanism can be used for power management by lowering the CPU frequency based on the current load. Knowing how much time to reclaim can also help take better frequency scaling decisions. The current patchset determines how much to reclaim by tracking how long deadline tasks are inactive. This is currently done on a per-runqueue basis but another prototype does it globally. Another approach that tracks active utilisation was also considered but it had too many issues to be furthered. One of the main hurdles to deal with is knowing when to update the task utilisation metrics – doing so when the task blocks led to too much bandwidth being reclaimed. Instead the solution considers a blocking task to still be contributing (blocked active) until its the 0-lag time. At that point the utilisation is deduced from the total utilisation and the task is considered to be in a “blocked inactive” state. Scheduler maintainer Peter Zijlstra said he would have merged the current patchset had it not been for minor issues. Implementation optimisation still remain, notably in the area of reclaiming bandwidth for non-deadline tasks and the timely process of iterating over all active runqueues in the root domain when looking for bandwidth to reclaim. The presentation concluded with a patch-by-patch walkthrough of the current patchset and a word or two on the availability of another patchset that tracks inactive utilisation globally if people are interested.

 

Day 2 at the OSPM Summit Pisa, Italy

$
0
0

The first summit on power management and scheduling disciplines in the Linux kernel was held at Scuola Superiore S. Anna in Pisa Italy on Monday 3 April and Tuesday 4 April 2017.  The event was organised by ARM and members of the ReTis lab.  It attracted a wide audience that spanned both the industry and academic realm. Linaro attended the conference and offers the following summary from day 2 (to read about what took place on day 1, click here). To view the presentations listed below, click on the headings.

 

Possible improvements in the schedutil governor
By Rafael Wysocki

Rafael had no slides and invited people to have an open discussion on how the schedutil governor can be improved. He gave a general overview of the current CPUfreq governors along with schedutil that uses metrics from the scheduler to drive frequency scaling. The problem that emerges after a year since inception is that updating an OPP or P state is a slow operation and scaling requests happen much faster than the HW can handle. As such we are limited by the number of events to choose from. But that is also sub-optimal because events carrying meaningful information can be discarded. Rafael is of the opinion that something needs to be done with the events received between updates, some sort of aggregation but specifics are still not defined. This triggered a conversation about where to do the aggregation, i.e in the core or push it down to the drivers. The issue of the frequency update window was also breached but no clear conclusion came out of it. At that point another participant noted that many updates do not represent what is really going on in the system. Injecting events when they don’t correspond to something important can lead to an aggregation that may not be useful. One option would be to identify specific events in the core scheduler and choose those as decision points. Someone else suggested introducing policies that would dictate how events are considered. Regardless of the solution, people agree that the governor is much slower than the scheduler, hence the need to somehow aggregate events. It was also agreed that the problem should be fixed rather than masked.

 

Schedutil for SCHED_DEADLINE
By Juri Lelli and Claudio Scordino (slides)

The idea behind this presentation is to set CPU frequencies based on runqueues’ active bandwidth, a metric that comes directly from Luca Abeni’s bandwidth reclaiming patchset (see above). Doing so, careful considerations about the runtime reservation of tasks need to be taken into account, i.e tasks still need to meet their requirements when the clock is scaled. From there a graph showing the effect of frequency scaling on the execution time of a task in the context of deadline scheduling was presented. Abeni’s patch on bandwidth reclaiming introduces the per-runqueue “running_bw” variable, a CPU specific utilization contribution metric. Using that operation, points are modified when running_bw is updated. The main design decisions made have been listed, including which bandwidth information to use for frequency scaling (i.e. the more conservative this_bw or the more aggressive running_bw) and how the scheduling class should be informed of frequency changes. The classic problem often raised in scheduler context is that clock scaling is very slow compared to the execution windows allocated to tasks, leaving not enough time to react. Also raised was the issue of current HW design where PMIC hang off an I2C/SPI bus shared with other devices, making contention a big problem. Different approaches about how to raise the priority of the processes responsible for OPP management were discussed without identifying an exact solution. A lot of SW would have to be reworked and even then results are not guaranteed.

 

Parameterizing CFS load balancing: nr_running/utilization/load
By Dietmar Eggemann (slides)

This was a discussion about changing the existing completely fair scheduler (CFS) load balancer to use either nr_running, utilization or load as a central metric. At this time, tipping points are used to trigger load balancing but is there a way to change things so that system characteristics are taken into account? Is this feasible? If so what needs to be changed? Today the load balance code supports a lot of specific code for corner cases, resulting in many condition statements and poor readability. The presentation continued with a short breakdown of the current CFS scheduler along with the input signals currently available to the load balancer code, more specifically the load for each entity, the runnable load for CFS runqueues, the number of runnable tasks, utilization (running/blocked) and CPU capacity. Then followed the main heuristics involved in the load balance process, namely finding the busiest group, calculating the imbalance and fixing small imbalance corner cases. All that led to the question of parameterized approach, i.e can we use load balance input signal to simplify the code and if so, what order should be followed (nr_running –> utilization –> load) ? That triggered some discussion on the best way to proceed, one being that all statistics be taken into account before dealing with special cases.

 

Tracepoints for PELT
By Dietmar Eggemann (slides)

Dietmar started his second presentation by saying that it would be nice to standardize mainline kernel scheduler tracepoints in order to have a better understanding of what is going on. Several problems in PELT have been fixed over the last few years and having a set of standard tracepoints would have likely helped tracking those quicker. The question is, what should those tracepoints be for PELT? One requirement is that they be found in all the combination of kernel configuration switches, introduce minimal overhead on the CPU and don’t export information to the outside world. The presentation continued with examples of how metrics related to CFS runqueues, scheduling entities and task groups could be mapped to currently available tracepoints along with the best place to add them. The advantage of such approach would be that load tracking events are standardized, traces can be shared without losing their meaning and that generic post processing tools (for example LISA) can be used on them.

 

A unified solution for SoC idling – how far have we come?
By Ulf Hansson (slides)

Ulf started his presentation by going over the power management oriented subsystems in the Linux kernel, more specifically system PM, runtime PM, PM domains, device PM QoS and device wakeup/wakeupirqs. He then continued with a short list of things he would like to see addressed with emphasis on a better collaboration between system and runtime PM along with making it easier for people who write drivers to add PM awareness to their design. Then the concept of a runtime PM centric approach was presented with a parallel between the low power state of devices in both system and runtime PM. The idea is to re-use the runtime PM API from system PM when low power states are the same. This is implemented with two API: SET_SYTEM_SLEEP_PM_OPS(), SET_RUNTIME_PM_OPS(). The approach has been promoted at Linaro and the community for a while and acceptance is growing (46 instances as of 4.11). New ideas about the API respecting device links and drivers using “direct complete” being able to convert to the centric approach are being considered. Ulf followed with an update on the acceptance of genPDs and highlights in the area of reduced latencies in the power off sequence, IRQ save domain support and multiple domain idle states support. He also introduced the idea of a genPD governor to look at the constraints of all devices in the domain and make decisions on the common denominator. As topologies are getting very complex CPUidle doesn’t scale well for multi-cluster SMP and heterogeneous systems like big.LITTLE. The idea is to use runtime PM to decide when clusters can be switched off, treating CPUs the same way as any other element of the domain. So far the infrastructure needed in the generic PM domain is done with other topics such as CPU PM domain and PSCI changes for OS-initiated take down of cluster domains being under discussion.

 

Scheduler decisions regarding idle
By Daniel Lezcano

Daniel’s work concentrates on predicting wake up interrupts in a system, something that will help make better power related decisions. The main interrupts he is interested in are timer, hardware and inter-process. Other interrupt sources do exist but are either impossible to predict or their occurrence rare enough to worry about them. Wake up sources can be further classified in three groups i.e, fully predictable, partially predictable and unpredictable. The current CPUidle implementation tries to predict the next idle time by accumulating statistics on past events, and assumes that no correlation exists between those event. Daniel is under the opinion that we can do better by keeping track of interrupts that matter and discarding irrelevant ones. For timer events, idle times can be predicted with the help of the time framework while hardware interrupt events (for the same source) can be anticipated using statistical analysis and standard deviation. As a rule of thumb IPI events should be ignored with the exception of remote wake ups for device interrupts. The presentation continued with a list of four strategies aimed at improving how clusters are selected when tasks wake up. The first one is about not waking up a CPU or cluster if it hasn’t reached its target residency. The second one is concerned with choosing to pack a waking task on other CPUs if the waking task’s load is below a specific threshold and the CPU that was selected is part of an idle cluster. The third is the idea of favouring idle clusters over performance, meaning that if a CPU is the last in a cluster, simply pack a load balanced task on other CPUs and idle the whole cluster. Last but not least introducing hysteresis when tracking interrupt timing, much like the geometric series used in the per-entity load tracking metric. That way idle states can be made shallower on busy systems, something that would result in faster wake up times.

Implementing the above concept brings its fair share of challenges. The algorithm to predict the next event must be highly optimized as it happens periodically in critical sections and should not consume more energy than the savings it aims to achieve. Also of prime concern is the amount of kernel subsystems that need to be modified, hence the need to be as self contained as possible.

 

I/O scheduling and power management with storage devices
By Luca Miccio and Paolo Valente

The goal of this presentation was to explore the ways to bridge the IO scheduler and system power management. The IO scheduler is a block layer component deciding the order in which IO requests are to be served with the chosen order needing to satisfy many constraints. However Luca showed the current stock schedulers are quite ineffective. To highlight his point he gave a demonstration of a video being played back while other concurrent block layer IO requests are being served. The same video is played back using stock schedulers and, as a comparison, the new BFQ scheduler, which is not yet available in Linux (queued however for 4.12). With stock kernels the playback starts only after minutes and, after starting, freezes several times. With BFQ, there is no issue. Luca went on saying that these problems are likely to get worse, because of increasing queue depths and parallelism within storage devices. He then proceeded to show the two main components of the BFQ solution, i.e., the accurate proportional share scheduling engine and the set of low latency heuristics. Issues with the current version of BFQ reside in the area of high queue depths and minimal possible overhead. At this time the approach is also completely power-management agnostic, so questions are pending about where to start bridging that gap. One participant suggested to do the same as we currently do for the CPU scheduler, that is collect relevant statistics from the scheduler to control parameters that affect power consumption. Possible strategies for power management also included idling during IO transfers, something that would be effective for some periodic IO tasks.

 

SCHED_DEADLINE group scheduling
By Tommaso Cucinotta and Luca Abeni (slides)

This talk was focused on presenting a patchset to replace the current RT throttling mechanism with another one based on SCHED_DEADLINE. The goal of such an approach is to provide deadline-based scheduling and associated guarantees to groups of RT tasks. As an added bonus it would simply replace the current RT throttling code and exploit its cgroups-based interface to the user-space. More specifically, a 2-level scheduling hierarchy based on EDF + CBS/FP is proposed to support group scheduling with an RT cgroup viewed as a special DL entity. In such a case the period and runtime parameters are assigned to the cgroup, with the deadline implicitly set to match the period. The case where DL entities are connected to single tasks doesn’t change from the current way of working. DL entities representing RT cgroups would be connected to RT runqueues instead, and as such bound to a single CPU. However, a RT cgroup is associated a DL entity for each core in the cpuset, where RT tasks in the group are free to migrate across the per-CPU DL entities. Admission control still needs to be sorted out and the current patchset only guarantees there is no overload on the system. Preliminary work has been posted for review, with the first patch entailing a lot of cleanup in the RT code. The second patch introduces the hierarchical DL scheduling of RT groups and the third allows RT tasks to migrate between RT runqueues of the control group. The code is currently in its early stages but works relatively well with a few areas still under discussion. The presentation ended with a discussion on how to handle cases where there are less tasks than available CPUs.

 

A Hierarchical Scheduling Model for Dynamic Soft Real-Time Systems
By Vladimir Nikolov

Vladimir Nikolov from the Ulm University presented a new hierarchical approximation and scheduling approach for application and tasks with multiple modes on a single processor. The model allows for a temporal and spatial distribution of the feasibility problem for a variable set of tasks with non-deterministic and fluctuating costs during runtime. In case of overload, an optimal degradation strategy selects one of several application modes or even temporarily deactivates applications. Hence transient and permanent bottlenecks can be overcome with an optimal system quality that is decided dynamically. The presentation gave a comprehensive overview on several aspects, including an automated monitoring and cost approximation strategy for application and tasks, a novel concept to confine entire applications in single Constant Bandwidth Servers and a knapsack based algorithm for selection of optimal quality level for runtime applications. Furthermore, examples of extension for several resource dimensions like energy and network were also presented. Capacity reservations are established and updated at well-defined instants under the premise that applications and their internal tasks remain feasible with their actual measured costs. The system’s ability to handle cyclically occurring load and to suppress recurring reconfiguration of application quality levels has also been discussed. A prototype implementation based on RTSJ and JamaicaVM was integrated into a middleware for Java-based real-time applications on top of Linux. Experimental evaluations for the system have been based on artificial applications with custom load profiles. The results validate the correct functionality of the scheduler and shows how it adapts to varying and cyclically occurring computational loads. The overall quality benefit of the model was assessed for a video-on-demand application scenario, something that triggered an interesting discussion about the suitability of such a solution in user-space. In future, an extension of the model for distributed parallel applications and interactive HPC scenarios is planned.

The Linaro Digital Home Group celebrates three years

$
0
0

By Mark Gregotski, Director of the Linaro Digital Home Group (LHG)

The Linaro Digital Home Group (LHG) is celebrating its third year anniversary!

Officially launched in May 2014 with eight founding members, LHG has delivered a succession of secure media frameworks on ARM to its members. I would like to extend a big thank you to our member companies for their continued support and encouragement over the years. I would also like to thank members of the larger community who have shown an interest in our work by attending Linaro Connect and giving presentation/keynotes on behalf of LHG.

The mission of LHG has remained consistent over the last three years. However, the end applications for secure media frameworks have extended beyond TV and even the home itself, even reaching automotive In-Vehicle Infotainment (IVI) systems. Video is becoming ubiquitous in many facets of our day-to-day lives.

LHG: In the beginning

The early work of LHG targeted the migration of the Comcast Reference Design Kit (RDK) to ARMv8 processors. LHG employed open source features of the Linux kernel, and used open source projects related to media, graphics, security and web browsers, to create a reference implementation, named by Comcast as the ‘Linaro RDK’.

At the heart of the Linaro RDK was the OpenSDK which had its origins in a media framework put forward by STMicroelectronics.  The OpenSDK continues to serve as the reference LHG OE/Yocto media framework, comprised of ‘best of breed’ open source components, including Chromium, GStreamer, V4L2, Wayland/Weston, W3C EME, OP-TEE, and kernel features, dmabuf, drm/kms.

The OP-TEE integration with W3C EME DRMs is one of the prominent features of the OpenSDK that has consistently earned LHG very positive feedback from the open source community and industry. Starting initially with an EME Clear Key implementation with Chromium, OpenCDM, OP-TEE and software based decryption TAs, we progressed to implementing PlayReady Porting Kit for Trusted Execution Environments (TEEs) and encapsulated the PlayReady libraries into a Trusted Application (TA). The security work extended to support the PlayReady Porting Kit for Android, which reused the same PlayReady TA.  The security solutions based around OP-TEE integrated with commercial DRMs continue to evolve as the component parts of the solution are updated.

LHG: What’s Happening Now

LHG has been working with Linux-based multimedia on ARM since inception and that effort is reflected in the Linux Multimedia on ARM Lead Project.  In this Lead Project, LHG continues to evolve the OpenSDK and OP-TEE/DRM integrations on Linux-based set-top solutions, and provide innovation in the RDK. The latest implementation of the LHG OpenEmbedded builds can be found here.

Demo from Linaro Connect Budapest 2017 of Linaro RDK running on the DragonBoard410C

 In the last half of 2016, LHG formally started working with Android Open Source Project (AOSP) TV.  This activity has lead to the creation of the AOSP TV Lead Project in LHG.  The AOSP TV Lead Project has the mandate to integrate, develop, distribute and maintain AOSP based on the TV form factor as the basis for Android TV work by our members. LHG recently completed a Widevine DRM Level 1 playback on Android N with OP-TEE v2.4.0 with secure media buffers.

Demo from Linaro Connect Budapest 2017 of Linaro Android AOSP TV

LHG & 96Boards

One of the latest exciting developments for LHG was the creation of the 96Boards TV Platform Specification in January 2016.  This specification has given our members and the larger community access to low-cost, readily available development platforms tailored to the set-top/Smart TV market segment.  Currently one board is available and we expect several to follow. To find out more about the Poplar board, click here.

LHG: What’s to come

The past three years have passed quickly.  Now moving forward with ten member companies, we set our sights on an exciting fourth year.  There are many opportunities ahead which include expanding into the Android TV ecosystem with a Linaro reference design, continuing work on Linux/RDK, and providing complete set-top reference solutions based on fully featured TV Platform boards that permit access to hardware acceleration and low level security and key provisioning.   

We will continue to innovate and develop compelling media solutions with the aim of them becoming  commonplace in the ARM ecosystem. I am certain that with the dedication from the LHG engineers, steering committee and our member companies, this will indeed continue to be the case.

For more information on LHG, click here.

LHG Member Companies

                  

Recent LHG Achievements

  • LHG OE/Yocto OpenSDK media framework
    • GStreamer, Wayland/Weston, Chromium, V4L2,OP-TEE, OpenCDM, DRM/KMS, dma-buf
  • Integration of OpenCDM into the Linaro RDK
  • Integration of Wayland into RDK across all SoC platforms
  • Migration of Linaro RDK to LTS 4.9 kernel
  • Incorporate latest GStreamer v1.10 into RDK
  • Investigation of Chromium-GStreamer integrations
    • PPAPI, Mojo project, Samsung Chr/GSt backend
  • Implementation of Wayland and DRM/KMS on WebKit for Wayland browser with Westeros Compositor for RDK
  • LHG OpenSDK OE builds on HiKey and DB410C [Chromium, Wayland/Weston]
  • Port of RDK to 96Boards DB410C with V4L video acceleration
  • Implementation of RDK Bootloader in UEFI/EDK2 environment
  • Microsoft PlayReady DRM integrated with OPTEE (updates with PR porting kit v3.24 & PRiTEE)
  • W3C EME Clear Key implementation on HiKey
    • Chromium v53 – OpenCDM – OP-TEE v2.4.0
  • PlayReady and Widevine DRM integrations on HiKey with OP-TEE on Android
  • Reference OE platform builds for 32-bit user space on 64-bit platform (multilib)
  • Published 96Boards TV Platform specification in Jan 2016
  • Release of first TV Platform Board by HiSilicon – Poplar
  • Sample AOSP TV build for HiKey 96Boards platform
  • AOSP build with OP-TEE Secure Data Path extensions on HiKey
  • Upstream OP-TEE to AOSP HiKey branch

           

 

LHG Making News!

 


LHG updates W3C EME solution for 96Boards HiKey platform

$
0
0

Authors: Mark Gregotski and Peter Griffin

The Linaro Digital Home Group (LHG) is pleased to announce an updated reference build of W3C EME Clear Key on the 96Boards HiKey platform. The build uses open source components to implement an HTML5 browser-based playback of encrypted content using Linaros open source ‘Open Portable Trusted Execution Environment’ (OP-TEE) running on ARM TrustZone. The reference build uses the widely used OpenEmbedded build system for this Linux based implementation.

The Chromium browser-based implementation is an end-to-end solution that retrieves encrypted video from a server and locally provides secure decryption via OP-TEE [1]. 64-bit execution mode is being used for both Secure (including Trusted Applications) and Non-secure environments, and the build uses a pre-built binary (fip.bin) for the ARM Trusted Firmware and OP-TEE build. Using a Firmware Image Package (FIP) allows for packing bootloader images (and potentially other payloads) into a single archive that can be loaded by the ARM Trusted Firmware from nonvolatile platform storage.

Support for Secure Data Path extensions (included in OP-TEE v2.4.0 release, but not enabled for HiKey) has also been enabled for the HiKey platform in this release. The SDP extensions aren’t currently being leveraged by the OpenCDM and Chromium components, but can be tested using the ‘xtest –sdp-basic’ test suite, making this reference build a good starting point on which to base future SDP development.

The Clear Key build is comprised of the following components:

  • Chromium v53.0.2785.143
  • Wayland (v1.11)-Weston
  • Mali 450MP4 GPU r6p0 release with graphics drivers (supporting drm/kms, dma-buf)
  • OpenCDM
  • OP-TEE v 2.4.0
  • Sample Trusted Application (AES Decryption)
  • Linux kernel v. 4.9

Build & Flashing Instructions

The consolidated build and flashing instructions for the HiKey board on which this reference build is based are provided below. Pre-built images to enable quick evaluation of the EME solution are also provided here https://releases.linaro.org/openembedded/images/lhg/hikey/17.06/.

Build from source

 > repo init -u https://github.com/linaro-home/lhg-oe-manifests.git -b refs/tags/LHG-2017.06
 > repo sync
 > . setup-environment
 > bitbake rpb-westonchromium-image
 > cd build-rpb-wayland/tmp-rpb_wayland-glibc/deploy/images/hikey/
 > gunzip rpb-westonchromium-image-hikey.ext4.gz
 > ext2simg rpb-westonchromium-image-hikey.ext4 rpb-westonchromium-image-hikey.ext4.img

Flashing
More info about jumpers and flashing available here https://github.com/96boards/documentation/wiki/HiKeyUEFI. The commands below assume a 8Gb eMMC LeMaker HiKey board, if using an earlier HiKey board please flash the ptable-linux-8g.img partition table.

 > sudo fastboot flash ptable ptable-linux-8g.img
 > sudo fastboot flash fastboot fip.bin
 > sudo fastboot flash nvme nvme.img
 > sudo fastboot flash boot boot-hikey.uefi.img
 > sudo fastboot flash system rpb-westonchromium-image-hikey.ext4.img

Running Chromium with EME
Note: Due to a bug with the binary Mali driver [3], Chromium currently has to be run as root user. We are working with ARM to resolve this issue.

 > su
 > cdmiservice &
 > /usr/bin/chromium/chrome --no-sandbox --use-gl=egl --ozone-platform=wayland --no-sandbox --composite-to-mailbox --in-process-gpu --enable-low-end-device-mode --start-maximized --user-data-dir=data_dir --blink-platform-log-channels=Media --register-pepper-plugins="/usr/lib/chromium/libopencdmadapter.so#ClearKey CDM#ClearKey CDM0.1.0.0#0.1.0.0;application/x-ppapi-open-cdm" 
http://people.linaro.org/~peter.griffin/chrome/eme_player.html

Select “External Clearkey” key system and hit Play.

Testing OP-TEE Secure Data Path HiKey extensions
This build also enables SDP extensions for the HiKey platform. These can be exercised with the ‘xtest –sdp-basic’ test.

See example output below.

 hikey:/home/linaro# xtest --sdp-basic

Secure Data Path basic accesses: NS invokes SDP TA
 Allocate in ION heap 'unmapped'
 sdp_basic_test: success

Secure Data Path basic accesses: SDP TA invokes SDP TA
 Allocate in ION heap 'unmapped'
 sdp_basic_test: success

Secure Data Path basic accesses: SDP TA invokes SDP pTA
 Allocate in ION heap 'unmapped'
 sdp_basic_test: success

Secure Data Path basic accesses: NS invokes SDP pTA (shall fail)
 Allocate in ION heap 'unmapped'
 Error: invoke SDP test TA (inject) failed ffff0006 3
 test failed
 -> false negative: pTAs refuse SDP memref from NS clients.

For a more detailed description of the Linaro Clear Key solution, please see this document: https://wiki.linaro.org/LHG/LHGPublicDocuments?action=AttachFile&do=view&target=KeySystems.pdf 
The W3C EME specification [2] details the messaging flow between elements that support encrypted media recognition and support for obtaining keys to decrypt the video. The EME Clear Key solution is required for any compliant EME solution.

 

The content is decrypted using an AES Decryption Trusted Application that resides in Secure World running on the secure OP-TEE OS in ARM TrustZone.

 


Linaro ClearKey Implementation

So go ahead and give this a try.

Pre-built HiKey images to enable quick evaluation of this solution are available at https://releases.linaro.org/openembedded/images/lhg/hikey/17.06/. The engineers in LHG have also created full W3C EME OP-TEE integrations with commercial DRMs such as Microsoft’s PlayReady and Google’s Widevine on both Linux- and Android-based solutions. You will be able to see and hear more about LHG’s work in this area in our upcoming Connect event in San Francisco in September.

Some additional interesting links:
[1] https://www.op-tee.org/
[2] https://www.w3.org/TR/encrypted-media/
[3] Bug 2917 – HiKey: Weston: EGL is not available for non-root user

The 17.08 release for the Linaro Enterprise Reference Platform is now available

$
0
0

The Linaro Enterprise Group (LEG) is pleased to announce the 17.08 release for the Linaro Enterprise Reference Platform. To find out more, visit platforms.linaro.org or click here to download the release.

The goal of the Linaro Enterprise Reference Platform is to provide a fully tested, end to end, documented, open source implementation for Arm based Enterprise servers. The Reference Platform includes boot firmware, kernel, a community supported userspace and additional relevant open source projects. The Linaro Enterprise Reference Platform is built and tested on Linaro members’ hardware and the Linaro Developer Cloud. It is intended to be a reference example for use as a foundation for members and partners for their products based on open source technologies. Members, partners and ecosystem companies, including distributions, Open Source Projects, ISVs, hyperscalers, OEMs and ODMs, can leverage the reference for Arm in the datacenter.

The Linaro Enterprise Group has worked closely with Linaro’s Core Technology & Tools teams to deliver the Linaro Enterprise Reference Platform with updates across the software stack (Firmware, Linux Kernel, and key server workloads) for Arm based Enterprise servers, and with a focus on QA testing and platform interoperability. OpenStack reference architecture is now available with ansible playbooks, allowing users to deploy an end to end Openstack reference on Arm servers. BigTop 1.2 stack of BigData components have been built and tested with OpenJDK 8. Bigtop 1.2 consists of Hadoop 2.7.3 (upgraded from 2.7.2), Spark 2.1 (upgraded from 2.0), Hive 1.2.1 and HBase 1.1.3 as core components. In this release all smoke tests have been verified running on Arm for Hadoop – HDFS, Yarn and MapReduce, Hive and Spark. ELK v5.4.1 stack of components (ElasticSearch, Logstash and Kibana) are also built as part of this release.

Below is the complete list of 17.08 features:

Reference Platform Kernel

  • 4.12 based, including under-review topic branches to extend hardware platform support
  • Unified tree, used by the Debian Reference Platform
  • ACPI and PCIe support
  • Single kernel config and binary (package) for all hardware platforms

UEFI

  • Tianocore EDK II and OpenPlatformPkg containing reference implementations for Huawei D03/D05

17.08 with Debian based installer and userspace

  • Network Installer based on Debian 8.9 “Jessie”
  • Unified Reference Platform (Common) Kernel based on 4.12

Enterprise Components

  • Bigtop 1.2 (Hadoop 2.7.3, Spark 2.1.0, Hive 1.2.1,HBase 1.1.3)
  • Ceph 10.2.7
  • DPDK 17.05
  • Docker 1.12.6
  • ERP 17.08 OpenStack Newton
    • New Swift package released
  • ELK – ElasticSearch, LogStash and Kibana 5.4.1
  • Libvirt 3.4.0
  • OpenJDK 8
  • QEMU 2.9

Supported Hardware Platforms

  • Cavium ThunderX
  • HiSilicon D03
  • HiSilicon D05
  • HPE Proliant m400
  • Qualcomm Centriq 2400

To find out more about the Linaro Enterprise Reference Platform, go to platforms.linaro.org.

Linaro Connect SFO17 kicks off with new 96Boards products, members and technology demonstrations

$
0
0

The 22nd Linaro Connect began on Monday 25 September with the keynote given by Linaro CEO George Grey. As usual, Grey packed the opening hour with an update on new Linaro members, key engineering achievements, a look at areas of future interest and a series of demonstrations pulling together various aspects of Linaro’s current activities.

New members this time included Xilinx joining the Linaro Iot and Embedded (LITE) Group, Kylin joining the Linaro Enterprise Group (LEG) and NXP extending their existing LITE membership to also participate in the Linaro Digital Home Group (LHG).


The highlighted engineering achievements were the inclusion of Linaro’s open source trusted execution environment OP-TEE into the 4.12 Linux kernel; the release of the 17.08 Enterprise Reference Platform (ERP) including UEFI/ACPI and PCIe support plus packages and updates including Bigtop, Ceph, DPDK, Docker, OpenStack Newton, ELK, OpenJDK and more all tested and validated on member hardware platforms with all the source code available to all; the significant contributions by the new HPC special interest group to Arm support in the recently released OpenHPC 1.3.2; and LITE’s rapid development and substantial contributions to Zephyr 1.9 including the LWM2M stack that was delivered by the Linaro Technologies team.

Looking ahead, Grey briefly discussed where open source is going and the unimaginable number of connected devices having as much of an effect on society as the incandescent light bulb. Much of this change will run on open source software and this will need to be supported across the whole life time of these connected products. He then moved on to Artificial Intelligence and Linaro’s expertise and interest in this area, inviting parties interested in this area to contact Linaro with a view to establishing a collaborative, open source project or projects to develop related code and tools. Automotive was the next topic and Grey discussed current challenges and opportunities with some initial work that is happening with UK-based company Streetdrone and a proof of concept using the 96Boards DragonBoard 410c and Gumstix Aerocore 2. At this Linaro Connect, multiple Automotive sessions will gather interested parties and discuss the possibilities of collaborative engineering in this area.

The mention of 96Boards in automotive led Grey to talk about some of the new 96Boards products that will be shown at the event by sponsors such as Arrow and Hoperun and also in sessions. He talked about the HiKey 960, the Orange Pi i96, Uranus (TI CC3220 Cortex-M4), and mezzanine boards NeonKey and Secure96.

Arrow sponsor

Arrow sponsor

Arrow sponsor

Hoperun sponsor

With 96Boards providing developers with access to the latest Arm hardware in a standard form factor, Grey next talked about enabling native development on Arm with the Arm Fixed Virtual Platforms (FVP), Linaro’s Developer Cloud and, now, a new Arm developer platform from Socionext and Gigabyte. The latter was announced by Socionext a few days before Linaro Connect and the Socinext CEO will be at Linaro Connect on Tuesday 26 September to demonstrate the product.

Socionext

A series of demonstrations showing the microPlatforms developed by the Linaro Technologies group followed. These included fully open source Zephyr on Cortex-M and Linux on Cortex-A with the code and documentation available to all.

This demonstration marked the beginning on a new chapter for Linaro as Grey explained why Linaro was creating a spin off company Open Source Foundries to take some of this work forward and create products based around it.

In concluding, Grey talked about Linaro’s mission and a new Associates program that is designed to enable more organizations, such as OEMs, ODMs, service providers, start ups and universities to bring their input into the Linaro steering commmittee discussions. Interested parties should contact associates@linaro.org.

Grey made these announcements and demonstrations during the opening keynote at the company’s twice-a-year engineering event, held this time at the Hyatt Regency San Francisco Airport in Burlingame, California. Running through Friday 29 September, the Linaro Connect event brings together close to 500 engineers from across the Arm ecosystem to discuss the latest technology developments and work on today’s most pressing challenges.

To find out more about Linaro Connect SFO17, please visit connect.linaro.org and you can see what is happening at the following links:Flickr

Flickr / YouTube / Facebook / ArmDevices.net

Linaro Connect SFO17 day one summary and day two begins with new ARM developer system

$
0
0

Linaro CEO George Grey, in looking forward to day two of Linaro Connect SFO17, said “Linaro Connect is once again the place to be if you are involved in open source on ARM – from IoT to Digital Home, automotive to networking, datacenter to HPC. End to end ARM device to cloud demos and the announcement of Open Source Foundries started the week, and we look forward to announcements from Socionext on ARM hardware, as well as a keynote on AI/machine learning and automotive focused sessions today.”

The opening keynote by Kanta Vekaria PhD discussed how high performance computing (HPC) has developed and where it is going. Vekaria also tested the knowledge of the audience throughout the presentation with on-the-fly survey questions. Following the introductory slides, Vekaria talked about the open source components in HPC and Linaro’s collaborative engineering work in the Linaro Enterprise Group (LEG) HPC special interest group (SIG). Tech lead Renato Golin was also invited on stage to talk about OpenHPC and the latest 1.3.2 release. Vekaria concluded by inviting people to get more involved and attend the multiple sessions planned for later in the Linaro week.

Socionext Chairman and CEO Yasuo Nishiguchi PhD then introduced Socionext and the SynQuacer multi-core CPU and server. Sharing information on scaling and throughput, Nishiguchi stressed the advantages of parallel computing across multiple CPUs. He concluded by introducing the new desktop ARM developer platform and was then joined on stage by Linaro CEO George Grey and LEG director Martin Stadtler, who introduced and thanked the team who had been working on bringing up this new system, which came off the production line five days earlier. LEG engineer Ard Biesheuvel then demonstrated the system “just working”, i.e. booting and working like a normal desktop, including playing a YouTube video.
These keynotes began day two of the five-day Linaro Connect. The previous day began with a keynote by Linaro CEO George Grey. This was followed by team planning sessions in which the engineering groups discussed the work they would be doing in the week. The afternoon hosted twenty sessions in four tracks and committee meetings for the Networking, Enterprise and Digital Home groups.

 

Thursday Title Speaker Track Resources page Video Slides
SFO17-400K1 Keynote: Iliyan Malchev (Google) Iliyan Malchev Keynote View Resources Yes Yes
SFO17-400K2 Keynote: Imagine The Internet in Ten Years – Aaron Welch (Packet) Aaron Welch Keynote View Resources Yes No
SFO17-401 UEFI Secure Boot and DRI for RDK on HiKey Kalyan Nagabhirava LHG View Resources Yes Yes
SFO17-402 SDIO power on/off time impacts system suspend/resume time! Ulf Hansson Power Management, Kernel View Resources Yes Yes
SFO17-403 Optimizing the Design and Implementation of KVM/ARM Christoffer Dall Virtualization View Resources Yes Yes
SFO17-404 Ion integration challenges Archit Taneja LHG,LITE,LMG,Kernel View Resources Yes Yes
SFO17-405 Multi-threaded Programming on ARM – a MP/MC Ring Buffer Case Study Ola Liljedahl LNG View Resources Yes Yes
SFO17-406 IPsec Full Offload Support in ODP Bill Fischofer LNG View Resources Yes Yes
SFO17-407 Collaboras involvement in KernelCI Guillaume Tucker View Resources Yes Yes
SFO17-408 Active State Management of Power Domains Viresh Kumar LEG View Resources Yes Yes
SFO17-409 TSC BoF: OSS Toolchain Discussion Ryan Arnold Toolchain View Resources Yes Yes
SFO17-410 KVM/ARM Nested Virtualization Support and Performance Jintack Lim Virtualization View Resources Yes Yes
SFO17-411 BoF: AOSP LMG View Resources No No
SFO17-412 BoF: ION Laura Abbott – Sumit Semwal LMG, Kernel View Resources Yes Yes
SFO17-414 96Boards Mezzanine Workshop and info session Robert Wolff 96Boards View Resources Yes Yes
SFO17-415 Update Remotely IoT Devices using Eclipse Hawkbit and SWUpdate Nicola La Gloria LITE View Resources Yes Yes
SFO17-416 Preventing Linux in your car from killing you: The L4Re Open Source Microvisor System. Michael Hohmuth LITE View Resources Yes Yes
SFO17-417 The Open-Source seL4 Kernel. Military-Grade Security Through Mathematics Gernot Heiser LITE View Resources Yes Yes
SFO17-421 The Linux Kernel Scheduler (For Beginners) Viresh Kumar Power Management, Kernel View Resources Yes Yes
SFO17-451 An Overview of Post-K Supercomputer Development in Japan Yutaka Ishikawa LEG View Resources Yes No
SFO17-452 A Validation of Ceph Block Performance for Random I/O Applications Chris Marsh LEG View Resources Yes No
SFO17-453 Hyper scaling concept server LEG View Resources Yes No
SFO17-454 ARM v8A and java 8: Shifting The Datacenter Balance Simon Ritter LEG View Resources Yes No
SFO17-455 High performance IP communication for the datacenter Bogdan Pricope LEG View Resources Yes No
SFO17-456 HPC BoF Renato Golin – Kanta Vekaria LEG View Resources Yes No
SFO17-457 Works on Arm: CI/CD Infrastructure for Open Source and Commercial Software Testing Ed Vielmetti LEG View Resources Yes No
SFO17-458 6WIND High Performance Networking Software for ARM Servers Keith LEG View Resources Yes No
SFO17-459 RDO & RHEL 7.4 Jon Masters LEG View Resources Yes No
SFO17-460 Qualcomm in the Datacenter LEG View Resources No No
SFO17-461 Faster, Leaner and More Openness: Java is Accelerating Innovation for the Cloud LEG View Resources No No
SFO17-462 View Resources No No
SFO17-463 OS and architecture agnostic Docker Images LEG View Resources Yes No
SFO17-464 Spack LEG View Resources Yes No
SFO17-465 Cavium in the Datacenter LEG View Resources Yes No
SFO17-TR07 Intro to Linaro and 96Boards for Engineers Robert Wolff 96Boards View Resources Yes Yes

 

Senior Director of Engineering for Linaro Technologies Alan Bennett summarized Connect as “a mashup of the brightest minds in ARM and open source software to address, propose and debate the future direction of ARM in open source” and “a safe forum for presenting new ideas and receiving constructive feedback on complex and highly technical topics within almost all areas of open source software.”
To find out more about Linaro Connect SFO17, please visit connect.linaro.org and you can see what is happening at the following links: Flickr / YouTube / Facebook / ArmDevices.net

Viewing all 179 articles
Browse latest View live