Planet GStreamer

conservative gc can be faster than precise gc

From wingolog by Andy Wingo

(Andy Wingo)

Should your garbage collector be precise or conservative? The prevailing wisdom is that precise is always better. Conservative GC can retain more objects than strictly necessary, making GC slow: GC has to more frequently, and it has to trace a larger heap on each collection. However the calculus is not as straightforward as most people think, and indeed there are some reasons to expect that conservative root-finding can result in faster systems.

(I have made / relayed some of these arguments before but I feel like a dedicated article can make a contribution here.)

problem precision

Let us assume that by conservative GC we mean conservative root-finding, in which the collector assumes that any integer on the stack that happens to be a heap address indicates a reference on the object containing that address. The address doesn’t have to be at the start of the object. Assume that objects on the heap are traced precisely; contrast to BDW-GC which generally traces both the stack and the heap conservatively. Assume a collector that will pin referents of conservative roots, but in which objects not referred to by a conservative root can be moved, as in Conservative Immix or Whippet’s stack-conservative-mmc collector.

With that out of the way, let’s look at some reasons why conservative GC might be faster than precise GC.

smaller lifetimes

A compiler that does precise root-finding will typically output a side-table indicating which slots in a stack frame hold references to heap objects. These lifetimes aren’t always precise, in the sense that although they precisely enumerate heap references, those heap references might actually not be used in the continuation of the stack frame. When GC occurs, it might mark more objects as live than are actually live, which is the imputed disadvantage of conservative collectors.

This is most obviously the case when you need to explicitly register roots with some kind of handle API: the handle will typically be kept live until the scope ends, but that might be an overapproximation of lifetime. A compiler that can assume conservative stack scanning may well exhibit more precision than it would if it needed to emit stack maps.

no run-time overhead

For generated code, stack maps are great. But if a compiler needs to call out to C++ or something, it needs to precisely track roots in a run-time data structure. This is overhead, and conservative collectors avoid it.

smaller stack frames

A compiler may partition spill space on a stack into a part that contains pointers to the heap and a part containing numbers or other unboxed data. This may lead to larger stack sizes than if you could just re-use a slot for two purposes, if the lifetimes don’t overlap. A similar concern applies for compilers that partition registers.

no stack maps

The need to emit stack maps is annoying for a compiler and makes binaries bigger. Of course it’s necessary for precise roots. But then there is additional overhead when tracing the stack: for each frame on the stack, you need to look up the stack map for the return continuation, which takes time. It may be faster to just test if words on the stack might be pointers to the heap.

unconstrained compiler

Having to make stack maps is a constraint imposed on the compiler. Maybe if you don’t need them, the compiler could do a better job, or you could use a different compiler entirely. A conservative compiler can sometimes have better codegen, for example by the use of interior pointers.

anecdotal evidence

The Conservative Immix paper shows that conservative stack scanning can beat precise scanning in some cases. I have reproduced these results with parallel-stack-conservative-mmc compared to parallel-mmc. It’s small—maybe a percent—but it was a surprising result to me and I thought others might want to know.

Also, Apple’s JavaScriptCore uses conservative stack scanning, and V8 is looking at switching to it. Funny, right?

conclusion

When it comes to designing a system with GC, don’t count out conservative stack scanning; the tradeoffs don’t obviously go one way or the other, and conservative scanning might be the right engineering choice for your system.

on taking advantage of ragged stops

From wingolog by Andy Wingo

(Andy Wingo)

Many years ago I read one of those Cliff Click “here’s what I learned” articles in which he was giving advice about garbage collector design, and one of the recommendations was that at a GC pause, running mutator threads should cooperate with the collector by identifying roots from their own stacks. You can read a similar assertion in their VEE2005 paper, The Pauseless GC Algorithm, though this wasn’t the source of the information.

One motivation for the idea was locality: a thread’s stack is already local to a thread. Then specifically in the context of a pauseless collector, you need to avoid races between the collector and the mutator for a thread’s stack, and having the thread visit its own stack neatly handles this problem.

However, I am not so interested any more in (so-called) pauseless collectors; though I have not measured myself, I am convinced enough by the arguments in the Distilling the real costs of production garbage collectors paper, which finds that state of the art pause-minimizing collectors actually increase both average and p99 latency, relative to a well-engineered collector with a pause phase. So, the racing argument is not so relevant to me, because a pause is happening anyway.

There was one more argument that I thought was interesting, which was that having threads visit their own stacks is a kind of cheap parallelism: the mutator threads are already there, they might as well do some work; it could be that it saves time, if other threads haven’t seen the safepoint yet. Mutators exhibit a ragged stop, in the sense that there is no clean cutoff time at which all mutators stop simultaneously, only a time after which no more mutators are running.

Visiting roots during a ragged stop introduces concurrency between the mutator and the collector, which is not exactly free; notably, it prevents objects marked while mutators are running from being evacuated. Still, it could be worth it in some cases.

Or so I thought! Let’s try to look at the problem analytically. Consider that you have a system with N processors, a stop-the-world GC with N tracing threads, and M mutator threads. Let’s assume that we want to minimize GC latency, as defined by the time between GC is triggered and the time that mutators resume. There will be one triggering thread that causes GC to begin, and then M–1 remote threads that need to reach a safepoint before the GC pause can begin.

The total amount of work that needs to be done during GC can be broken down into rootsi, the time needed to visit roots for mutator i, and then graph, the time to trace the transitive closure of live objects. We want to know whether it’s better to perform rootsi during the ragged stop or in the GC pause itself.

Let’s first look to the case where M is 1 (just one mutator thread). If we visit roots before the pause, we have

latencyragged,M=1 = roots0 + graphN

Which is to say, thread 0 triggers GC, visits its own roots, then enters the pause in which the whole graph is traced by all workers with maximum parallelism. It may be that graph tracing doesn’t fully parallelize, for example if the graph has a long singly-linked list, but the parallelism with be maximal within the pause as there are N workers tracing the graph.

If instead we visit roots within the pause, we have:

latencypause,M=1= roots0+graphN

This is strictly better than the ragged-visit latency.

If we have two threads, then we will need to add in some delay, corresponding to the time it takes for remote threads to reach a safepoint. Let’s assume that there is a maximum period (in terms of instructions) at which a mutator will check for safepoints. In that case the worst-case delay will be a constant, and we add it on to the latency. Let us assume also there are more than two threads available. The marking-roots-during-the-pause case it’s easiest to analyze:

latencypause,M=2= delay + roots0+roots1+graphN

In this case, a ragged visit could win: while the triggering thread is waiting for the remote thread to stop, it could perform roots0, moving the work out of the pause, reducing pause time, and reducing latency, for free.

latencyragged,M=2= delay + roots1 + graphN

However, we only have this win if the root-visiting time is smaller than the safepoint delay; otherwise we are just prolonging the pause. Thing is, you don’t know in general. If indeed the root-visiting time is short, relative to the delay, we can assume the roots elements of our equation are 0, and so the choice to mark during ragged stop doesn’t matter at all! If we assume instead that root-visiting time is long, then it is suboptimally parallelised: under-parallelised if we have more than M cores, oversubscribed if M is greater than N, and needlessly serializing before the pause while it waits for the last mutator to finish marking its roots. What’s worse, root-visiting actually slows down delay, because the oversubscribed threads compete with visitation for CPU time.

So in summary, I plan to switch away from doing GC work during the ragged stop. It is complexity that doesn’t pay. Onwards and upwards!

Linux proves Elon Musk is wrong about entrepreneurs

From Planet – Felipe Contreras by Felipe Contreras

(Felipe Contreras)

I show with a simple example that Elon Musk's claim that only entrepreneurs generate wealth is completely wrong.

GStreamer and WebRTC HTTP signalling

From Arun Raghavan by Arun Raghavan

The WebRTC nerds among us will remember the first thing we learn about WebRTC, which is that it is a specification for peer-to-peer communication of media and data, but it does not specify how signalling is done.

Or put more simply, if you want call someone on the web, WebRTC tells you how you can transfer audio, video and data, but it leaves out the bit about how you make the call itself: how do you locate the person you’re calling, let them know you’d like to call them, and a few following steps before you can see and talk to each other.

WebRTC signalling
WebRTC signalling

While this allows services to provide their own mechanisms to manage how WebRTC calls work, the lack of a standard mechanism means that general-purpose applications need to individually integrate each service that they want to support. For example, GStreamer’s webrtcsrc and webrtcsink elements support various signalling protocols, including Janus Video Rooms, LiveKit, and Amazon Kinesis Video Streams.

However, having a standard way for clients to do signalling would help developers focus on their application and worry less about interoperability with different services.

Standardising Signalling

With this motivation, the IETF WebRTC Ingest Signalling over HTTPS (WISH) workgroup has been working on two specifications:

(author’s note: the puns really do write themselves :))

As the names suggest, the specifications provide a way to perform signalling using HTTP. WHIP gives us a way to send media to a server, to ingest into a WebRTC call or live stream, for example.

Conversely, WHEP gives us a way for a client to use HTTP signalling to consume a WebRTC stream – for example to create a simple web-based consumer of a WebRTC call, or tap into a live streaming pipeline.

WHIP and WHEP
WHIP and WHEP

With this view of the world, WHIP and WHEP can be used both for calling applications, but also as an alternative way to ingest or play back live streams, with lower latency and a near-ubiquitous real-time communication API.

In fact, several services already support this including Dolby Millicast, LiveKit and Cloudflare Stream.

WHIP and WHEP with GStreamer

We know GStreamer already provides developers two ways to work with WebRTC streams:

  • webrtcbin: provides a low-level API, akin to the PeerConnection API that browser-based users of WebRTC will be familiar with

  • webrtcsrc and webrtcsink: provide high-level elements that can respectively produce/consume media from/to a WebRTC endpoint

At Asymptotic, my colleagues Tarun and Sanchayan have been using these building blocks to implement GStreamer elements for both the WHIP and WHEP specifications. You can find these in the GStreamer Rust plugins repository.

Our initial implementations were based on webrtcbin, but have since been moved over to the higher-level APIs to reuse common functionality (such as automatic encoding/decoding and congestion control). Tarun covered our work in a talk at last year’s GStreamer Conference.

Today, we have 4 elements implementing WHIP and WHEP.

Clients

  • whipclientsink: This is a webrtcsink-based implementation of a WHIP client, using which you can send media to a WHIP server. For example, streaming your camera to a WHIP server is as simple as:
gst-launch-1.0 -e \
  v4l2src ! video/x-raw ! queue ! \
  whipclientsink signaller::whip-endpoint="https://my.webrtc/whip/room1"
  • whepclientsrc: This is work in progress and allows us to build player applications to connect to a WHEP server and consume media from it. The goal is to make playing a WHEP stream as simple as:
gst-launch-1.0 -e \
  whepclientsrc signaller:whep-endpoint="https://my.webrtc/whep/room1" ! \
  decodebin ! autovideosink

The client elements fit quite neatly into how we might imagine GStreamer-based clients could work. You could stream arbitrary stored or live media to a WHIP server, and play back any media a WHEP server provides. Both pipelines implicitly benefit from GStreamer’s ability to use hardware-acceleration capabilities of the platform they are running on.

GStreamer WHIP/WHEP clients
GStreamer WHIP/WHEP clients

Servers

  • whipserversrc: Allows us to create a WHIP server to which clients can connect and provide media, each of which will be exposed as GStreamer pads that can be arbitrarily routed and combined as required. We have an example server that can play all the streams being sent to it.

  • whepserversink: Finally we have ongoing work to publish arbitrary streams over WHEP for web-based clients to consume this media.

The two server elements open up a number of interesting possibilities. We can ingest arbitrary media with WHIP, and then decode and process, or forward it, depending on what the application requires. We expect that the server API will grow over time, based on the different kinds of use-cases we wish to support.

GStreamer WHIP/WHEP server
GStreamer WHIP/WHEP server

This is all pretty exciting, as we have all the pieces to create flexible pipelines for routing media between WebRTC-based endpoints without having to worry about service-specific signalling.

If you’re looking for help realising WHIP/WHEP based endpoints, or other media streaming pipelines, don’t hesitate to reach out to us!

whippet update: faster evacuation, eager sweeping of empty blocks

From wingolog by Andy Wingo

(Andy Wingo)

Good evening. Tonight, notes on things I have learned recently while hacking on the Whippet GC library.

service update

For some time now, the name Whippet has referred to three things. Firstly, it is the project as a whole, consisting of an include-only garbage collection library containing a compile-time configurable choice of specific collector implementations. Also, it is the name of a specific Immix-derived collector. Finally, it is the name of a specific space within that collector, in which objects are mostly marked in place but can be evacuated if appropriate.

Well, naming being one of the two hard problems of computer science, I can only ask for forgiveness and understanding. I have started fixing this situation with the third component, renaming the whippet space to the nofl space. Nofl stands for no-free-list, indicating that it’s a (mostly) mark space but which does bump-pointer allocation instead of using freelists. Also, it stands for novel, in the sense that as far as I can tell, it is a design that hasn’t been tried yet.

unreliable evacuation

Following Immix, the nofl space has always had optimistic evacuation. It prefers to mark objects in place, but if fragmentation gets too high, it will try to defragment by evacuating sparse blocks into a small set of empty blocks reserved for this purpose. If the evacuation reserve fills up, nofl will dynamically switch to marking in place.

My previous implementation was a bit daft: some subset of blocks would get marked as being evacuation targets, and evacuation would allocate into those blocks in ascending address order. Parallel GC threads would share a single global atomically-updated evacuation allocation pointer. As you can imagine, this was a serialization bottleneck; I initially thought it wouldn’t be so important but for some workloads it is.

I had chosen this strategy to maximize utilization of the evacuation reserve; if you had 8 GC workers, each allocating into their own block, their blocks won’t all be full at the end of GC; that would waste space.

But reliability turns out to be unimportant. It’s more important to let parallel GC threads do their thing without synchronization, to the extent possible.

Also, this serialized allocation discipline imposed some complexity on the nofl space implementation; the evacuation allocator was not like the “normal” allocator. With worker-local allocation buffers, these two allocators are now essentially the same. (They differ in that the normal allocator interleaves lazy sweeping with allocation, and can allocate into blocks with survivors from the last GC instead of requiring empty blocks.)

eager sweeping

Another initial bad idea I had was to lean too much on lazy sweeping as a design principle. The idea was that deferring sweeping work until just before an allocator would write to a block would minimize cache overhead (you page in a block right when you will use it) and provide for workload-appropriate levels of parallelism (multiple mutator threads naturally parallelize sweeping).

Lazy sweeping was very annoying when it came to discovery of empty blocks. Empty blocks are precious: they can be returned to the OS if needed, they are useful for evacuation, and they have nice allocation properties, in that you can just do bump-pointer from beginning to end.

Nofl was discovering empty blocks just in time, from the allocator. If the allocator acquired a new block and found that it was empty, it would return it to a special list of empty blocks. Only if all sweepable pages were exhausted would an allocator use an empty block. But to prevent an allocator from pausing forever, there was a limit to the number of empty swept blocks that would be returned to the collector (10, as it happens); an 11th empty swept block would be given to a mutator for allocation. And so on and so on. Complicated, and you only know the number of empty blocks yielded by the last collection when the whole next allocation cycle has finished.

The obvious solution is some kind of separate mark on blocks, in addition to a mark on objects. I didn’t do it initially out of fear of overhead; marking is a fast path. The implementation I ended up making was a packed bitvector, with one bit per 64 kB block, at the beginning of each 4 MB slab of blocks. The beginning blocks are for metadata like this. For reasons, I don’t have the space for full bytes. When marking an object, if I see that a block’s mark is unset, I do an atomic_fetch_or_explicit on the byte with memory_order_relaxed ordering. In this way I only do the atomic op very rarely. It seems that on ARMv8.1 there is actually an instruction to do atomic bit setting; everywhere else it’s a garbage compare-and-swap thing, but on my x64 machine it’s fine.

Then after collection, during the pause, if I see a block is unmarked, I move it directly to the empty set instead of sweeping it. (I could probably do this concurrently outside the pause, but that would be for another day.)

And the overhead? Negative! Negative, in the sense that because I don’t have to sweep empty blocks, and that (for some workloads) collection can produce a significant-enough fraction of empty blocks, I actually see speedups with this strategy, relative to lazy sweeping. It also simplifies the allocator (no need for that return-the-11th-block logic).

The only wrinkle is as regards generational collection: nofl currently uses the sticky mark bit algorithm, which has to be applied also to block marks. Subtle, but not complicated.

fin

Next up is growing and shrinking the nofl-using Whippet collector (which might need another name), using the membalancer algorithm, and then I think I will be ready to move on to getting Whippet into Guile. Until then, happy hacking!

GStreamer 1.24.7 stable bug fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • Fix APE and Musepack audio file and GIF playback with FFmpeg 7.0
  • playbin3: Fix potential deadlock with multiple playbin3s with glimagesink used in parallel
  • qt6: various qmlgl6src and qmlgl6sink fixes and improvements
  • rtspsrc: expose property to force usage of non-compliant setup URLs for RTSP servers where the automatic fallback doesn't work
  • urisourcebin: gapless playback and program switching fixes
  • v4l2: various fixes
  • va: Fix potential deadlock with multiple va elements used in parallel
  • meson: option to disable gst-full for static-library build configurations that do not need this
  • cerbero: libvpx updated to 1.14.1; map 2022Server to Windows11; disable rust variant on Linux if binutils is too old
  • Various bug fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.7 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

javascript weakmaps should be iterable

From wingolog by Andy Wingo

(Andy Wingo)

Good evening. Tonight, a brief position statement: it is a mistake for JavaScript’s WeakMap to not be iterable, and we should fix it.

story time

A WeakMap associates a key with a value, as long as the key is otherwise reachable in a program. (It is an ephemeron table.)

When WeakMap was added to JavaScript, back in the ES6 times, some implementors thought that it could be reasonable to implement weak maps not as a data structure in its own right, but rather as a kind of property on each object. Under the hood, adding an key→value association to a map M would set key[M] = value. GC would be free to notice dead maps and remove their associations in live objects.

If you implement weak maps like this, or are open to the idea of such an implementation, then you can’t rely on the associations being enumerable from the map itself, as they are instead spread out among all the key objects. So, ES6 specified WeakMap as not being iterable; given a map, you can’t know what’s in it.

As with many things GC-related, non-iterability of weak maps then gained a kind of legendary status: the lore states that non-iterability preserves some key flexibility for JS implementations, and therefore it is good, and you just have to accept it and move on.

dynamics

Time passed, and two things happened.

One was that this distributed WeakMap implementation strategy did not pan out; everyone ended up implementing weak maps as their own kind of object, and people use an algorithm like the one Steve Fink described a couple years ago to compute the map×key⇒value conjunction. The main original motivation for non-iterability was no longer valid.

The second development was WeakRef and FinalizationRegistry, which expose some details of reachability as viewed by the garbage collector to user JS code. With WeakRef (and WeakMap), you can build an iterable WeakMap.

(Full disclosure: I did work on ES6 and had a hand in FinalizationRegistry but don’t do JS language work currently.)

Thing is, your iterable WeakMap is strictly inferior to what the browser can provide: its implementation is extraordinarily gnarly, shipped over the wire instead of already in the browser, uses more memory, is single-threaded and high-latency (because FinalizationRegistry), and non-standard. What if instead as language engineers we just did our jobs and standardized iterability, as we do with literally every other collection in the JS standard?

Just this morning I wrote yet another iterable WeakSet (which has all the same concerns as WeakMap), and while it’s sufficient for my needs, it’s not good (lacking prompt cleanup of dead entries), and by construction can’t be great (because it has to be redundantly implemented on top of WeakSet instead of being there already).

I am sympathetic to deferring language specification decisions to allow the implementation space to be explored, but when the exploration is done and the dust has settled, we shouldn’t hesitate to pick a winner: JS weak maps and sets should be iterable. Godspeed, brave TC39 souls; should you take up this mantle, you are doing the Lord’s work!

Thanks to Philip Chimento for notes on the timeline and Joyee Cheung for notes on the iterable WeakMap implementation in the WeakRef spec. All errors mine, of course!

Everyone is using modulo wrong

From Planet – Felipe Contreras by Felipe Contreras

(Felipe Contreras)

All programmers use the modulo operator, but virtually none have stopped to consider what it actually is, and that's why one of the most powerful features of it doesn't work in most programming languages. This article touches abstract mathematics in order to explain why.

GStreamer Conference 2024: Call for Presentations is open - submit your talk proposal now!

From GStreamer News by GStreamer

(GStreamer)

About the GStreamer Conference

The GStreamer Conference 2024 GStreamer Conference will take place on Monday-Tuesday 7-8 October 2024 in Montréal, Québec, Canada.

It is a conference for developers, community members, decision-makers, industry partners, researchers, students and anyone else interested in the GStreamer multimedia framework or Open Source and cross-platform multimedia.

There will also be a hackfest just after the conference.

You can find more details about the conference on the GStreamer Conference 2024 web site.

Call for Presentations is open

The call for presentations is now open for talk proposals and lightning talks. Please submit your talk now!

Talks can be on almost anything multimedia related, ranging from talks about applications to challenges in the lower levels in the stack or hardware.

We want to hear from you, what you are working on, what you are using GStreamer for, what difficulties you have encountered and how you solved them, what your plans are, what you like, what you dislike, how to improve things!

Please submit all proposals through the GStreamer Conference 2024 Talk Submission Portal.

GStreamer 1.24.6 stable bug fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • Fix compatibility with FFmpeg 7.0
  • qmlglsink: Fix failure to display content on recent Android devices
  • adaptivedemux: Fix handling of closed caption streams
  • cuda: Fix runtime compiler loading with old CUDA tookit
  • decodebin3 stream selection handling fixes
  • d3d11compositor, d3d12compositor: Fix transparent background mode with YUV output
  • d3d12converter: Make gamma remap work as intended
  • h264decoder: Update output frame duration for interlaced video when second field frame is discarded
  • macOS audio device provider now listens to audio devices being added/removed at runtime
  • Rust plugins: audioloudnorm, s3hlssink, gtk4paintablesink, livesync and webrtcsink fixes
  • videoaggregator: preserve features in non-alpha caps for subclasses with non-system memory sink caps
  • vtenc: Fix redistribute latency spam
  • v4l2: fixes for complex video formats
  • va: Fix strides when importing DMABUFs, dmabuf handle leaks, and blocklist unmaintained Intel i965 driver for encoding
  • waylandsink: Fix surface cropping for rotated streams
  • webrtcdsp: Enable multi_channel processing to fix handling of stereo streams
  • Various bug fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.6 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

whippet progress update: funding, features, future

From wingolog by Andy Wingo

(Andy Wingo)

Greets greets! Today, an update on recent progress in Whippet, including sponsorship, a new collector, and a new feature.

the lob, the pitch

But first, a reminder of what the haps: Whippet is a garbage collector library. The target audience is language run-time authors, particularly “small” run-times: wasm2c, Guile, OCaml, and so on; to a first approximation, the kinds of projects that currently use the Boehm-Demers-Weiser collector.

The pitch is that if you use Whippet, you get a low-fuss small dependency to vendor into your source tree that offers you access to a choice of advanced garbage collectors: not just the conservative mark-sweep collector from BDW-GC, but also copying collectors, an Immix-derived collector, generational collectors, and so on. You can choose the GC that fits your problem domain, like Java people have done for many years. The Whippet API is designed to be a no-overhead abstraction that decouples your language run-time from the specific choice of GC.

I co-maintain Guile and will be working on integrating Whippet in the next months, and have high hopes for success.

bridgeroos!

I’m delighted to share that Whippet was granted financial support from the European Union via the NGI zero core fund, administered by the Dutch non-profit, NLnet foundation. See the NLnet project page for the overview.

This funding allows me to devote time to Whippet to bring it from proof-of-concept to production. I’ll finish the missing features, spend some time adding tracing support, measuring performance, and sanding off any rough edges, then work on integrating Whippet into Guile.

This bloggery is a first update of the progress of the funded NLnet project.

a new collector!

I landed a new collector a couple weeks ago, a parallel copying collector (PCC). It’s like a semi-space collector, in that it always evacuates objects (except large objects, which are managed in their own space). However instead of having a single global bump-pointer allocation region, it breaks the heap into 64-kB blocks. In this way it supports multiple mutator threads: mutators do local bump-pointer allocation into their own block, and when their block is full, they fetch another from the global store.

The block size is 64 kB, but really it’s 128 kB, because each block has two halves: the active region and the copy reserve. It’s a copying collector, after all. Dividing each block in two allows me to easily grow and shrink the heap while ensuring there is always enough reserve space.

Blocks are allocated in 64-MB aligned slabs, so you get 512 blocks in a slab. The first block in a slab is used by the collector itself, to keep metadata for the rest of the blocks, for example a chain pointer allowing blocks to be collected in lists, a saved allocation pointer for partially-filled blocks, whether the block is paged in or out, and so on.

The PCC not only supports parallel mutators, it can also trace in parallel. This mechanism works somewhat like allocation, in which multiple trace workers compete to evacuate objects into their local allocation buffers; when an allocation buffer is full, the trace worker grabs another, just like mutators do.

However, unlike the simple semi-space collector which uses a Cheney grey worklist, the PCC uses the fine-grained work-stealing parallel tracer originally developed for Whippet’s Immix-like collector. Each trace worker maintains a local queue of objects that need tracing, which currently has 1024 entries. If the local queue becomes full, the worker will publish 3/4 of those entries to the worker’s shared worklist. When a worker runs out of local work, it will first try to remove work from its own shared worklist, then will try to steal from other workers.

Of course, because threads compete to evacuate objects, we have to use atomic compare-and-swap instead of simple forwarding pointer updates; if you only have one mutator thread and are OK with just one tracing thread, you can avoid the ~30% performance penalty that atomic operations impose. The PCC generally starts to win over a semi-space collector when it can trace with 2 threads, and gets better with each thread you add.

I sometimes wonder whether the worklist should contain grey edges or grey objects. MMTk seems to do the former, and bundles edges into work packets, which are the unit of work sharing. I don’t know yet what is best and look forward to experimenting once I have better benchmarks.

Anyway, maintaining an external worklist is cheating in a way: unlike the Cheney worklist, this memory is not accounted for as part of the heap size. If you are targetting a microcontroller or something, probably you need to choose a different kind of collector. Fortunately, Whippet enables this kind of choice, as it contains a number of collector implementations.

What about benchmarks? Well, I’ll be doing those soon in a more rigorous way. For now I will just say that it seems to behave as expected and I am satisfied; it is useful to have a performance oracle against which to compare other collectors.

finalizers!

This week I landed support for finalizers!

Finalizers work in all the collectors: semi, pcc, whippet, and the BDW collector that is a shim to use BDW-GC behind the Whippet API. They have a well-defined relationship with ephemerons and are multi-priority, allowing embedders to build guardians or phantom references on top.

In the future I should do some more work to make finalizers support generations, if the collector is generational, allowing a minor collection to avoid visiting finalizers for old objects. But this is a straightforward extension that will happen at some point.

future!

And that’s the end of this update. Next up, I am finally going to tackle heap resizing, using the membalancer approach. Then basic Windows and Mac support, then I move on to the tracing and performance measurement phase. Onwards and upwards!

finalizers, guardians, phantom references, et cetera

From wingolog by Andy Wingo

(Andy Wingo)

Consider guardians. Compared to finalizers, in which the cleanup procedures are run concurrently with the mutator, by the garbage collector, guardians allow the mutator to control concurrency. See Guile’s documentation for more notes. Java’s PhantomReference / ReferenceQueue seems to be similar in spirit, though the details differ.

questions

If we want guardians, how should we implement them in Whippet? How do they relate to ephemerons and finalizers?

It would be a shame if guardians were primitive, as they are a relatively niche language feature. Guile has them, yes, but really what Guile has is bugs: because Guile implements guardians on top of BDW-GC’s finalizers (without topological ordering), all the other things that finalizers might do in Guile (e.g. closing file ports) might run at the same time as the objects protected by guardians. For the specific object being guarded, this isn’t so much of a problem, because when you put an object in the guardian, it arranges to prepend the guardian finalizer before any existing finalizer. But when a whole clique of objects becomes unreachable, objects referred to by the guarded object may be finalized. So the object you get back from the guardian might refer to, say, already-closed file ports.

The guardians-are-primitive solution is to add a special guardian pass to the collector that will identify unreachable guarded objects. In this way, the transitive closure of all guarded objects will be already visited by the time finalizables are computed, protecting them from finalization. This would be sufficient, but is it necessary?

answers?

Thinking more abstractly, perhaps we can solve this issue and others with a set of finalizer priorities: a collector could have, say, 10 finalizer priorities, and run the finalizer fixpoint once per priority, in order. If no finalizer is registered at a given priority, there is no overhead. A given embedder (Guile, say) could use priority 0 for guardians, priority 1 for “normal” finalizers, and ignore the other priorities. I think such a facility could also support other constructs, including Java’s phantom references, weak references, and reference queues, especially when combined with ephemerons.

Anyway, all this is a note for posterity. Are there other interesting mutator-exposed GC constructs that can’t be implemented with a combination of ephemerons and multi-priority finalizers? Do let me know!

Orc 0.4.39 bug-fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another release of liborc, the Optimized Inner Loop Runtime Compiler, which is used for SIMD acceleration in GStreamer plugins such as audioconvert, audiomixer, compositor, videoscale, and videoconvert, to name just a few.

This is a minor bug-fix release, and also includes a security fix.

Highlights:

  • Security: Fix error message printing buffer overflow leading to possible code execution in orcc with specific input files (CVE-2024-40897). This only affects developers and CI environments using orcc, not users of liborc.
  • div255w: fix off-by-one error in the implementations
  • x86: only run AVX detection if xgetbv is available
  • x86: fix AVX detection by implementing the check recommended by Intel
  • Only enable JIT compilation on Apple arm64 if running on macOS, fixes crashes on iOS
  • Fix potential crash in emulation mode if logging is enabled
  • Handle undefined TARGET_OS_OSX correctly
  • orconce: Fix typo in GCC __sync-based implementation
  • orconce: Fix usage of __STDC_NO_ATOMICS__
  • Fix build with MSVC 17.10 + C11
  • Support stack unwinding on Windows
  • Major opcode and instruction set code clean-ups and refactoring
  • Refactor allocation and chunk initialization of code regions
  • Fall back to emulation on Linux if JIT support is not available, e.g. because of SELinux sandboxing or noexec mounting)

Direct tarball download: orc-0.4.39.tar.xz.

GStreamer Rust bindings 0.23 / Rust Plugins 0.13 release

From GStreamer News by GStreamer

(GStreamer)

GStreamer Rust bindingsGStreamer Rust plugins

As usual this release follows the latest gtk-rs 0.20 release and the corresponding API changes.

This release features relatively few changes and mostly contains the addition of some convenience APIs, the addition of bindings for some minor APIs, addition of bindings for new GStreamer 1.26 APIs, and various optimizations.

The new release also brings a lot of bugfixes, most of which were already part of the bugfix releases of the previous release series.

Details can be found in the release notes for gstreamer-rs.

The code and documentation for the bindings is available on the freedesktop.org GitLab

as well as on crates.io.

The new 0.13 version of the GStreamer Rust plugins features many improvements to the existing plugins as well as various new plugins. A majority of the changes were already backported to the 0.12 release series and its bugfix releases, which is part of the GStreamer 1.24 binaries.

A full list of available plugins can be seen in the repository's README.md.

Details for this release can be found in the release notes for gst-plugins-rs.

If you find any bugs, notice any missing features or other issues please report them in GitLab for the bindings or the plugins.

copying collectors with block-structured heaps are unreliable

From wingolog by Andy Wingo

(Andy Wingo)

Good day, garbage pals! This morning, a quick note on “reliability” and garbage collectors, how a common GC construction is unreliable, and why we choose it anyway.

on reliability

For context, I’m easing back in to Whippet development. One of Whippet’s collectors is a semi-space collector. Semi-space collectors are useful as correctness oracles: they always move objects, so they require their embedder to be able to precisely enumerate all edges of the object graph, to update those edges to point to the relocated objects. A semi-space collector is so simple that if there is a bug, it is probably in the mutator rather than the collector. They also have well-understood performance, as as such are useful when comparing performance of other collectors.

But one other virtue of the simple semi-space collector is that it is reliable, in the sense that given a particular set of live objects, allocated in any order, there is a single heap size at which the allocation (and collection) will succeed, and below which the program fails (not enough memory). This is because all allocations go in the same linear region, collection itself doesn’t allocate memory, the amount of free space after an object (the fragmentation) does not depend on where it is allocated, and those object extents just add up in a commutative way.

Reliability is a virtue. Sometimes it is a requirement: for example, the Toit language and run-time targets embeded microcontrollers, and you there you have finite resources and either they workload fits or it doesn’t. You can’t really tolerate a result of “it works sometimes”. So, Toit uses a generational semi-space + mark-sweep collector that never makes things worse.

on block-structured heaps

But, threads make reliability tricky. With Whippet I am targetting embedders with multiple mutator threads, and a classic semi-space collector doesn’t scale – you could access the allocation pointer atomically, but that would be a bottleneck, serializing mutators, not to mention the cache contention.

The usual solution for this problem is to arrange the heap in such a way that different threads can allocate in different areas, so they don’t need to share an allocation pointer and so they don’t write to the same cache lines. And, the most common way to do this is to use a block-structured heap; for example you might have a 256 MB heap, but divided into 4096 blocks, each of which is 64 kB. That’s enough granularity to dynamically partition out space between many threads: you keep a list of available blocks and allocator threads compete to grab fresh blocks as needed. There’s central contention on the block list, so you want blocks big enough that you aren’t fetching blocks too often.

To break a heap into blocks requires a large-object space, to allow for allocations that are larger than a block. And actually, as I mentioned in the article about block-structured heaps, usually you choose a threshold for large object allocations that is smaller than the block size, to limit the maximum and expected amount of fragmentation at the end of each block, when an allocation doesn’t fit.

on unreliability

Which brings me to my point: a copying collector with a block-structured heap is unreliable, in the sense that there is no single heap size below which the program fails and above which it succeeds.

Consider a mutator with a single active thread, allocating a range of object sizes, all smaller than the large object threshold. There is a global list of empty blocks available for allocation, and the thread grabs blocks as needed and bump-pointer allocates into that block. The last allocation in each block will fail: that’s what forces the thread to grab a new fresh block. The space left at the end of the block is fragmentation.

Assuming that the sequence of allocations performed by the mutator is deterministic, by the time the mutator has forced the first collection, the total amount of fragmentation will also be deterministic, as will the set of live roots at the time of collection. Assume also that there is a single collector thread which evacuates the live objects; this will also produce deterministic fragmentation.

However, there is no guarantee that the post-collection fragmentation is less than the pre-collection fragmentation. Unless objects are copied in such a way that preserves allocation order—generally not the case for a semi-space collector, and it would negate any advantage of a block-structured heap—then different object order could produce different end-of-block fragmentation.

causes of unreliability

The unreliability discussed above is due to non-commutative evacuation. If your collector marks objects in place, you are not affected. If you don’t commute live objects—if you preserve their allocation order, as Toit’s collector does—then you are not affected. If your evacuation commutes, as in the case of the simple semi-space collector, you are not affected. But if you have a block-structured heap and you evacuate, your collector is probably unreliable.

There are other sources of unreliability in a collector, but to my mind they are not as fundamental as this one.

  • Multiple mutator threads generally lead to a kind of unreliability, because the size of the live object graph is not deterministic at the time of collection: even if all threads have the same allocation trace, they don’t necessarily proceed in lock-step nor stop in the same place.

  • Adding collector threads to evacuate in parallel adds its own form of unreliability: if you have 8 evacuator threads, then there are 8 blocks local to the evacuator threads which also contribute to post-collection wasted space, possibly an entire block per thread.

  • Some collectors will allocate memory during collection, for example to represent a worklist of objects that need tracing. This allocation can fail. Also, annoyingly, collection-time allocation complicates comparison: you can no longer compare two collectors at the same heap size, because one of them cheats.

  • Virtual memory and paging can make you have a bad time. For example, you go to allocate a large object, so you remove some empty blocks from the main space and return them to the OS, providing you enough budget to allocate the new large object. Then the new large object is collected, so you reclaim the pages you returned to the OS, adding them to the available list. But probably you don’t page them in already, because who wants a syscall? They get paged in lazily when the mutator needs them, but that could fail because of other processes on the system.

embracing unreliability

I think it only makes sense to insist on a reliable collector if your mutator does not have threads; otherwise, the fragmentation-related unreliability pales in comparison.

What’s more, fragmentation-related unreliability can be entirely mitigated by giving the heap more memory: the maximum amount of fragmentation is an object just shy of the large object threshold, per block, so in our case 8 kB per 64 kB. So, just increase your heap by 12.5%. You will certainly not regret increasing your heap by 12.5%.

And happily, increasing heap size also works to mitigate unreliability related to multiple mutator threads. Consider 10 threads each of which has a local object graph that is usually 10 MB but briefly 100MB when calculating: usually when GC happens, total live object size is 10×10MB=100MB, but sometimes as much as 1 GB; there is a minimum heap size for which the program sometimes works, but also a minimum heap size at which it always works. The trouble is, of course, that you generally only know the minimum always-works size by experimentation, and you are unlikely to be able to actually measure the maximum heap size.

Which brings me to my final point, which is that virtual memory and growable heaps are good. Unless you have a dedicated devops team or you are evaluating a garbage collector, you should not be using a fixed heap size. The ability to just allocate some pages to keep the heap from being too tight buys us a form of soft reliability.

And with that, end of observations. Happy fragmenting, and until next time!

enable persistent history in gdb

From wingolog by Andy Wingo

(Andy Wingo)

Friends. I have been using GDB for more than two decades and have been annoyed by the fact that, unlike the shell, it doesn’t keep a persistent history.

Of course, it has always been able to do that, but history saving is just not on by default. So do yourself a favor and turn it on by pasting this into your terminal:

mkdir -p ~/.config/gdb
mkdir -p ~/.cache/gdb
echo 'set history filename ~/.cache/gdb/history' >> ~/.config/gdb/gdbinit
echo 'set history save on' >> ~/.config/gdb/gdbinit
echo 'set history size unlimited' >> ~/.config/gdb/gdbinit

You are welcome!

scikit-survival 0.23.0 released

From Posts | Sebastian Pölsterl by Sebastian Pölsterl

(Sebastian Pölsterl)

I am pleased to announce the release of scikit-survival 0.23.0.

This release adds support for scikit-learn 1.4 and 1.5, which includes missing value support for RandomSurvivalForest. For more details on missing values support, see the section in the release announcement for 0.23.0.

Moreover, this release fixes critical bugs. When fitting SurvivalTree, the sample_weight is now correctly considered when computing the log-rank statistic for each split. This change also affects RandomSurvivalForest and ExtraSurvivalTrees which pass sample_weight to the individual trees in the ensemble. Therefore, the outputs produced by SurvivalTree, RandomSurvivalForest, and ExtraSurvivalTrees will differ from previous releases.

This release fixes a bug in ComponentwiseGradientBoostingSurvivalAnalysis and GradientBoostingSurvivalAnalysis when dropout is used. Previously, dropout was only applied starting with the third iteration, now dropout is applied in the second iteration too.

Finally, this release adds compatibility with numpy 2.0 and drops support for Python 3.8.

Install

scikit-survival is available for Linux, macOS, and Windows and can be installed either

via pip:

pip install scikit-survival

or via conda

 conda install -c conda-forge scikit-survival

GStreamer 1.24.5 stable bug fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • webrtcsink: Support for AV1 via nvav1enc, av1enc or rav1enc encoders
  • AV1 RTP payloader/depayloader fixes to work correctly with Chrome and Pion WebRTC
  • av1parse, av1dec error handling/robustness improvements
  • av1enc: Handle force-keyunit events properly for WebRTC
  • decodebin3: selection and collection handling improvements
  • hlsdemux2: Various fixes for discontinuities, variant switching, playlist updates
  • qml6glsink: fix RGB format support
  • rtspsrc: more control URL handling fixes
  • v4l2src: Interpret V4L2 report of sync loss as video signal loss
  • d3d12 encoder, memory and videosink fixes
  • vtdec: more robust error handling, fix regression
  • ndi: support for NDI SDK v6
  • Various bug fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.5 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Fedora Workstation development update – Artificial Intelligence edition

From Christian F.K. Schaller by Christian Schaller

(Christian Schaller)

There are times when you feel your making no progress and there are other times when things feel like they are landing in quick succession. Luckily this definitely is the second when a lot of our long term efforts are finally coming over the finish line. As many of you probably know our priorities tend to be driven by a combination of what our RHEL Workstation customers need, what our hardware partners are doing and what is needed for Fedora Workstation to succeed. We also try to be good upstream partners and do patch reviews and participate where we can in working on upstream standards, especially those of course of concern to our RHEL Workstation and Server users. So when all those things align we are at our most productive and that seems to be what is happening now. Everything below is features in flight that will at the latest land in Fedora Workstation 41.

Artificial Intelligence

Granite LLM

IBM Granite LLM models usher in a new era of open source AI.


One of the areas of great importance to Red Hat currently is working on enabling our customers and users to take advantage of the advances in Artificial Intelligence. We do this in a lot of interesting ways like our recently announced work with IBM to release the high quality Granite AI models under terms that make them the most open major vendor AI models according to the Stanford Foundation Model Transparency Index , but not only are we releasing the full LLM source code, we are also creating a project to make modifying and teaching the LLM a lot easier through a project we call Instructlab. Instructlab is enabling almost anyone to quickly download a Granite LLM model and start teaching it specific things relevant to you or your organization. This put you in control of the AI and what it knows and can do as opposed to being demoted to a pure consumer.

And it is not just Granite, we are ensuring other other major AI projects will work with Fedora too, like Meta’s popular Llama LLM. And a big step for that is how Tom Rix has been working on bringing in AMD accelerated support (ROCm) for PyTorch to Fedora. PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. The long term goal is that you should be able to just install PyTorch on Fedora and have it work hardware accelerated with any of the 3 major GPU vendors chipsets.

NVIDIA in Fedora
Fedora Workstation
So the clear market leader at the moment for powering AI workloads in NVIDIA so I am also happy to let you know about two updates we are working on that will make you life better on Fedora when using NVIDIA GPUs, be that for graphics or for compute or Artificial Intelligence. So for the longest time we have had easy install of the NVIDIA driver through GNOME Software in Fedora Workstation, unfortunately this setup never dealt with what is now the default usecase, which is using it with a system that has secure boot enabled. So the driver install was dropped from GNOME Software in our most recent release as the only way for people to get it working was through using mokutils on the command line, but the UI didn’t tell you that. Well we of course realize that sending people back to the command line to get this driver installed is highly unfortunate so Milan Crha has been working together with Alan Day and Jakub Steiner to come up with a streamlined user experience in GNOME Software to let you install the binary NVIDIA driver and provide you with an integrated graphical user interface help to sign the kernel module for use with secure boot. This is a bit different than what we for instance are doing in RHEL, where we are working with NVIDIA to provide pre-signed kernel modules, but that is a lot harder to do in Fedora due to the rapidly updating kernel versions and which most Fedora users appreciate as a big plus. So instead what we are for opting in Fedora is as I said to make it simple for you to self-sign the kernel module for use with secure boot. We are currently looking at when we can make this feature available, but no later than Fedora Workstation 41 for sure.

Toolbx getting top notch NVIDIA integration

Toolbx

Container Toolbx enables developers quick and easy access to their favorite development platforms


Toolbx, our incredible developer focused containers tool, is going from strength to strength these days with the rewrite from the old shell scripts to Go starting to pay dividends. The next major feature that we are closing in on is full NVIDIA driver support with Toolbx. As most of you know Toolbx is our developer container solution which makes it super simple to set up development containers with specific versions of Fedora or RHEL or many other distributions. Debarshi Ray has been working on implementing support for the official NVIDIA container device interface module which should enable us to provide full NVIDIA and CUDA support for Toolbx containers. This should provide reliable NVIDIA driver support going forward and Debarshi is currently testing various AI related container images to ensure they run smoothly on the new setup.

We are also hoping the packaging fixes to subscription manager will land soon as that will make using RHEL containers on Fedora a lot smoother. While this feature basically already works as outlined here we do hope to make it even more streamlined going forward.

Open Source NVIDIA support
Of course being Red Hat we haven’t forgotten about open source here, you probably heard about Nova our new Rust based upstream kernel driver for NVIDIA hardware which will provided optimized support for the hardware supported by NVIDIAs firmware (basically all newer ones) and accelerate Vulkan through the NVK module and provide OpenGL through Zink. That effort is still quite early days, but there is some really cool developments happening around Nova that I am not at liberty to share yet, but I hope to be able to talk about those soon.

High Dynamic Range (HDR)
Jonas Ådahl after completing the remote access work for GNOME under Wayland has moved his focus to help land the HDR support in mutter and GNOME Shell. He recently finished rebasing his HDR patches onto a wip merge request from
Georges Stavracas which ported gnome-shell to using paint nodes,

So the HDR enablement in mutter and GNOME shell is now a set of 3 patches.

With this the work is mostly done, what is left is avoiding over exposure of the cursor, and inhibiting direct scanout.

We also hope to help finalize the upstream Wayland specs soon so that everyone can implement this and know the protocols are stable and final.

DRM leasing – VR Headsets
VR Googles
The most common usecase for DRM leasing is VR headsets, but it is also a useful feature for things like video walls. José Expósito is working on finalizing a patch for it using the Wayland protocol adopted by KDE and others. We where somewhat hesitant to go down this route as we felt a portal would have been a better approach, especially as a lot of our experienced X.org developers are worried that Wayland is in the process of replicating one of the core issues with X through the unmanageable plethora of Wayland protocols that is being pushed. That said, the DRM leasing stuff was not a hill worth dying on here, getting this feature out to our users in a way they could quickly use was more critical, so DRM leasing will land soon through this merge request.

Explicit sync
Another effort that we have put a lot of effort into together with our colleagues at NVIDIA is landing support for what is called explicit sync into the Linux kernel and the graphics drivers.The linux graphics stack was up to this point using something called implicit sync, but the NVIDIA drivers did not work well with that and thus people where experiencing ‘blinking’ applications under Wayland. So we worked with NVIDIA and have landed the basic support in the kernel and in GNOME and thus once the 555 release of the NVIDIA driver is out we hope the ‘blinking’ issues are fully resolved for your display. There has been some online discussion about potential performance gains from this change too, across all graphics drivers, but the reality of this is somewhat uncertain or at least it is still unclear if there will be real world measurable gains from adding explicit sync. I heard knowledgeable people argue both sides with some saying there should be visible performance gains while others say the potential gains will be so specific that unless you write a test to benchmark it explicitly you will not be able to detect a difference. But what is beyond doubt is that this will make using the NVIDIA stack with Wayland a lot better a that is a worthwhile goal in itself. The one item we are still working on is integrating the PipeWire support for explicit sync into our stack, because without it you might have the same flickering issues with PipeWire streams on top of the NVIDIA driver that you have up to now seen on your screen. So for instance if you are using PipeWire for screen capture it might look fine on screen with the fixes already merged, but the captured video has flickering. Wim Taymans landed some initial support in PipeWire already so now Michel Dänzer is working on implementing the needed bits for PipeWire in mutter. At the same time Wim is working on ensuring we have a testing client available to verify the compositor support. Once everything has landed in mutter and we been able to verify that it works with the sample client we will need to add support to client applications interacting with PipeWire, like Firefox, Chrome, OBS Studio and GNOME-remote-desktop.

GStreamer Vulkan Operation API

From Herostratus’ legacy by Víctor Jáquez

Two weeks ago the GStreamer Spring Hackfest took place in Thessaloniki, Greece. I had a great time. I hacked a bit on VA, Vulkan and my toy, planet-rs, but mostly I ate delicious Greek food ☻. A big thanks to our hosts: Vivia, Jordan and Sebastian!

GStreamer Spring Hackfest
2024
First day of the GStreamer Spring Hackfest 2024 - https://floss.social/@gstreamer/112511912596084571

And now, writing this supposed small note, I recalled that I have in my to-do list an item to write a comment about GstVulkanOperation, an addition to GstVulkan API which helps with the synchronization of operations on frames, in order to enable Vulkan Video.

Originally, GstVulkan API didn’t provide almost any synchronization operation, beside fences, and that appeared to be enough for elements, since they do simple Vulkan operations. Nonetheless, as soon as we enabled VK_VALIDATION_FEATURE_ENABLE_SYNCHRONIZATION_VALIDATION_EXT feature, which reports resource access conflicts due to missing or incorrect synchronization operations between action [*], a sea of hazard operation warnings drowned us [*].

Hazard operations are a sequence of read/write commands in a memory area, such as an image, that might be re-ordered, or racy even.

Why are those hazard operations reported by the Vulkan Validation Layer, if the programmer pushes the commands to execute in queue in order? Why is explicit synchronization required? Because, as the great blog post from Hans-Kristian Arntzen, Yet another blog explaining Vulkan synchronization, (make sure you read it!) states:

[…] all commands in a queue execute out of order. Reordering may happen across command buffers and even vkQueueSubmits

In order to explain how synchronization is done in Vulkan, allow me to yank a couple definitions stated by the specification:

Commands are instructions that are recorded in a device’s queue. There are four types of commands: action, state, synchronization and indirection. Synchronization commands impose ordering constraints on action commands, by introducing explicit execution and memory dependencies.

Operation is an arbitrary amount of commands recorded in a device’s queue.

Since the driver can reorder commands (perhaps for better performance, dunno), we need to send explicit synchronization commands to the device’s queue to enforce a specific sequence of action commands.

Nevertheless, Vulkan doesn’t offer fine-grained dependencies between individual operations. Instead, dependencies are expressed as a relation of two elements, where each element is composed by the intersection of scope and operation. A scope is a concept in the specification that, in practical terms, can be either pipeline stage (for execution dependencies), or both pipeline stage and memory access type (for memory dependencies).

First let’s review execution dependencies through pipeline stages:

Every command submitted to a device’s queue goes through a sequence of steps known as pipeline stages. This sequence of steps is one of the very few implicit ordering guarantees that Vulkan has. Draw calls, copy commands, compute dispatches, all go through certain sequential stages, which amount of stages to cover depends on the specific command and the current command buffer state.

In order to visualize an abstract execution dependency let’s imagine two compute operations and the first must happen before the second.

Operation 1
Sync command
Operation 2
  1. The programmer has to specify the Sync command in terms of two scopes (Scope 1 and Scope 2), in this execution dependency case, two pipeline stages.
  2. The driver generates an intersection between commands in Operation 1 and Scope 1 defined as Scoped operation 1. The intersection contains all the commands in Operation 1 that go through up to the pipeline stage defined in Scope 1. The same is done with Operation 2 and Scope 2 generating Scoped operation 2.
  3. Finally, we got an execution dependency that guarantees that Scoped operation 1 happens before Scoped operation 2.

Now let’s talk about memory dependencies:

First we need to understand the concepts of memory availability and visibility. Their formal definition in Vulkan are a bit hard to grasp since they come from the Vulkan memory model, which is intended to abstract all the ways of how hardware access memory. Perhaps we could say that availability is the operation that assures the existence of the required memory; while visibility is the operation that assures it’s possible to read/write the data in that memory area.

Memory dependencies are limited the Operation 1 that be done before memory availability and Operation 2 that have to be done after its visibility.

But again, there’s no fine-grained way to declare that memory dependency. Instead, there are memory access types, which are functions used by descriptor types, or functions for pipeline stage to access memory, and they are used as access scopes.

All in all, if a synchronization command defining a memory dependency between two operations, it’s composed by the intersection of between each command and a pipeline stage, intersected with the memory access type associated with the memory processed by those commands.

Now that the concepts are more or less explained we could see those concepts expressed in code. The synchronization command for execution and memory dependencies is defined by VkDependencyInfoKHR. And it contains a set of barrier arrays, for memory, buffers and images. Barriers express the relation of dependency between two operations. For example, Image barriers use VkImageMemoryBarrier2 which contain the mask for source pipeline stage (to define Scoped operation 1), and the mask for the destination pipeline stage (to define Scoped operation 2); the mask for source memory access type and the mask for the destination memory access to define access scopes; and also layout transformation declaration.

A Vulkan synchronization example from Vulkan Documentation wiki:

vkCmdDraw(...);

... // First render pass teardown etc.

VkImageMemoryBarrier2KHR imageMemoryBarrier = {
...
.srcStageMask = VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT_KHR,
.dstStageMask = VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT_KHR,
.dstAccessMask = VK_ACCESS_2_COLOR_ATTACHMENT_WRITE_BIT_KHR,
.oldLayout = VK_IMAGE_LAYOUT_READ_ONLY_OPTIMAL,
.newLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL
/* .image and .subresourceRange should identify image subresource accessed */};

VkDependencyInfoKHR dependencyInfo = {
...
1, // imageMemoryBarrierCount
&imageMemoryBarrier, // pImageMemoryBarriers
...
}

vkCmdPipelineBarrier2KHR(commandBuffer, &dependencyInfo);

... // Second render pass setup etc.

vkCmdDraw(...);

First draw samples a texture in the fragment shader. Second draw writes to that texture as a color attachment.

This is a Write-After-Read (WAR) hazard, which you would usually only need an execution dependency for - meaning you wouldn’t need to supply any memory barriers. In this case you still need a memory barrier to do a layout transition though, but you don’t need any access types in the src access mask. The layout transition itself is considered a write operation though, so you do need the destination access mask to be correct - or there would be a Write-After-Write (WAW) hazard between the layout transition and the color attachment write.

Other explicit synchronization mechanisms, along with barriers, are semaphores and fences. Semaphores are a synchronization primitive that can be used to insert a dependency between operations without notifying the host; while fences are a synchronization primitive that can be used to insert a dependency from a queue to the host. Semaphores and fences are expressed in the VkSubmitInfo2KHR structure.

As a preliminary conclusion, synchronization in Vulkan is hard and a helper API would be very helpful. Inspired by FFmpeg work done by Lynne, I added GstVulkanOperation object helper to GStreamer Vulkan API.

GstVulkanOperation object helper aims to represent an operation in the sense of the Vulkan specification mentioned before. It owns a command buffer as public member where external commands can be pushed to the associated device’s queue.

It has a set of methods:

Internally, GstVulkanOperation contains two arrays:

  1. The array of dependency frames, which are the set of frames, each representing an operation, which will hold dependency relationships with other dependency frames.

    gst_vulkan_operation_add_dependency_frame appends frames to this array.

    When calling gst_vulkan_operation_end the frame’s barrier state for each frame in the array is updated.

    Also, each dependency frame creates a timeline semaphore, which will be signaled when a command, associated with the frame, is executed in the device’s queue.

  2. The array of barriers, which contains a list of synchronization commands. gst_vulkan_operation_add_frame_barrier fills and appends a VkImageMemoryBarrier2KHR associated with a frame, which can be in the array of dependency frames.

Here’s a generic view of video decoding example:

gst_vulkan_operation_begin (priv->exec, ...);

cmd_buf = priv->exec->cmd_buf->cmd;

gst_vulkan_operation_add_dependency_frame (exec, out,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR);

/* assume a engine where out frames can be used for DPB frames, */
/* so a barrier for layout transition is required */
gst_vulkan_operation_add_frame_barrier (exec, out,
VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT,
VK_ACCESS_2_VIDEO_DECODE_WRITE_BIT_KHR,
VK_IMAGE_LAYOUT_VIDEO_DECODE_DPB_KHR, NULL);

for (i = 0; i < dpb_size; i++) {
gst_vulkan_operation_add_dependency_frame (exec, dpb_frame,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR,
VK_PIPELINE_STAGE_2_VIDEO_DECODE_BIT_KHR);
}

barriers = gst_vulkan_operation_retrieve_image_barriers (exec);
vkCmdPipelineBarrier2 (cmd_buf, &(VkDependencyInfo) {
...
.pImageMemoryBarriers = barriers->data,
.imageMemoryBarrierCount = barriers->len,
});
g_array_unref (barriers);

vkCmdBeginVideoCodingKHR (cmd_buf, &decode_start);
vkCmdDecodeVideoKHR (cmd_buf, &decode_info);
vkCmdEndVideoCodingKHR (cmd_buf, &decode_end);

gst_vulkan_operation_end (exec, ...);

Here, just one memory barrier is required for memory layout transition, but semaphores are required to signal when an output frame and its DPB frames are processed, and later, the output frame can be used as a DPB frame. Otherwise, the output frame might not be fully reconstructed with it’s used as DPB for the next output frame, generating only noise.

And that’s all. Thank you.

GStreamer 1.24.4 stable bug fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • audioconvert: support more than 64 audio channels
  • avvidec: fix dropped frames when doing multi-threaded decoding of I-frame codecs such as DV Video
  • mpegtsmux: Correctly time out in live pipelines, esp. for sparse streams like KLV and DVB subtitles
  • vtdec deadlock fixes on shutdown and format/resolution changes (as might happen with e.g. HLS/DASH)
  • fmp4mux, isomp4mux: Add support for adding AV1 header OBUs into the MP4 headers, and add language from tags
  • gtk4paintablesink improvements: fullscreen mode and gst-play-1.0 support
  • webrtcsink: add support for insecure TLS and improve error handling and VP9 handling
  • vah264enc, vah265enc: timestamp handling fixes; generate IDR frames on force-keyunit-requests, not I frames
  • v4l2codecs: decoder: Reorder caps to prefer `DMA_DRM` ones, fixes issues with playbin3
  • Visualizer plugins fixes
  • Avoid using private APIs on iOS
  • various bug fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.4 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

cps in hoot

From wingolog by Andy Wingo

(Andy Wingo)

Good morning good morning! Today I have another article on the Hoot Scheme-to-Wasm compiler, this time on Hoot’s use of the continuation-passing-style (CPS) transformation.

calls calls calls

So, just a bit of context to start out: Hoot is a Guile, Guile is a Scheme, Scheme is a Lisp, one with “proper tail calls”: function calls are either in tail position, syntactically, in which case they are tail calls, or they are not in tail position, in which they are non-tail calls. A non-tail call suspends the calling function, putting the rest of it (the continuation) on some sort of stack, and will resume when the callee returns. Because non-tail calls push their continuation on a stack, we can call them push calls.

(define (f)
  ;; A push call to g, binding its first return value.
  (define x (g))
  ;; A tail call to h.
  (h x))

Usually the problem in implementing Scheme on other language run-times comes in tail calls, but WebAssembly supports them natively (except on JSC / Safari; should be coming at some point though). Hoot’s problem is the reverse: how to implement push calls?

The issue might seem trivial but it is not. Let me illustrate briefly by describing what Guile does natively (not compiled to WebAssembly). Firstly, note that I am discussing residual push calls, by which I mean to say that the optimizer might remove a push call in the source program via inlining: we are looking at those push calls that survive the optimizer. Secondly, note that native Guile manages its own stack instead of using the stack given to it by the OS; this allows for push-call recursion without arbitrary limits. It also lets Guile capture stack slices and rewind them, which is the fundamental building block we use to implement exception handling, Fibers and other forms of lightweight concurrency.

The straightforward function call will have an artificially limited total recursion depth in most WebAssembly implementations, meaning that many idiomatic uses of Guile will throw exceptions. Unpleasant, but perhaps we could stomach this tradeoff. The greater challenge is how to slice the stack. That I am aware of, there are three possible implementation strategies.

generic slicing

One possibility is that the platform provides a generic, powerful stack-capture primitive, which is what Guile does. The good news is that one day, the WebAssembly stack-switching proposal should provide this too. And in the meantime, the so-called JS Promise Integration (JSPI) proposal gets close: if you enter Wasm from JS via a function marked as async, and you call out to JavaScript to a function marked as async (i.e. returning a promise), then on that nested Wasm-to-JS call, the engine will suspend the continuation and resume it only when the returned promise settles (i.e. completes with a value or an exception). Each entry from JS to Wasm via an async function allocates a fresh stack, so I understand you can have multiple pending promises, and thus multiple wasm coroutines in progress. It gets a little gnarly if you want to control when you wait, for example if you might want to wait on multiple promises; in that case you might not actually mark promise-returning functions as async, and instead import an async-marked async function waitFor(p) { return await p} or so, allowing you to use Promise.race and friends. The main problem though is that JSPI is only for JavaScript. Also, its stack sizes are even smaller than the the default stack size.

instrumented slicing

So much for generic solutions. There is another option, to still use push calls from the target machine (WebAssembly), but to transform each function to allow it to suspend and resume. This is what I think of as Joe Marshall’s stack trick (also see §4.2 of the associated paper). The idea is that although there is no primitive to read the whole stack, each frame can access its own state. If you insert a try/catch around each push call, the catch handler can access local state for activations of that function. You can slice a stack by throwing a SaveContinuation exception, in which each frame’s catch handler saves its state and re-throws. And if we want to avoid exceptions, we can use checked returns as Asyncify does.

I never understood, though, how you resume a frame. The Generalized Stack Inspection paper would seem to indicate that you need the transformation to introduce a function to run “the rest of the frame” at each push call, which becomes the Invoke virtual method on the reified frame object. To avoid code duplication you would have to make normal execution flow run these Invoke snippets as well, and that might undo much of the advantages. I understand the implementation that Joe Marshall was working on was an interpreter, though, which bounds the number of sites needing such a transformation.

cps transformation

The third option is a continuation-passing-style transformation. A CPS transform results in a program whose procedures “return” by tail-calling their “continuations”, which themselves are procedures. Taking our previous example, a naïve CPS transformation would reify the following program:

(define (f' k)
  (g' (lambda (x) (h' k x))))

Here f' (“f-prime”) receives its continuation as an argument. We call g', for whose continuation argument we pass a closure. That closure is the return continuation of g, binding a name to its result, and then tail-calls h with respect to f. We know their continuations are the same because it is the same binding, k.

Unfortunately we can’t really slice abitrary ranges of a stack with the naïve CPS transformation: we can only capture the entire continuation, and can’t really inspect its structure. There is also no way to compose a captured continuation with the current continuation. And, in a naïve transformation, we would be constantly creating lots of heap allocation for these continuation closures; a push call effectively pushes a frame onto the heap as a closure, as we did above for g'.

There is also the question of when to perform the CPS transform; most optimizing compilers would like a large first-order graph to work on, which is out of step with the way CPS transformation breaks functions into many parts. Still, there is a nugget of wisdom here. What if we preserve the conventional compiler IR for most of the pipeline, and only perform the CPS transformation at the end? In that way we can have nice SSA-style optimizations. And, for return continuations of push calls, what if instead of allocating a closure, we save the continuation data on an explicit stack. As Andrew Kennedy notes, closures introduced by the CPS transform follow a stack discipline, so this seems promising; we would have:

(define (f'' k)
  (push! k)
  (push! h'')
  (g'' (lambda (x)
         (define h'' (pop!))
         (define k (pop!))
         (h'' k x))))

The explicit stack allows for generic slicing, which makes it a win for implementing delimited continuations.

hoot and cps

Hoot takes the CPS transformation approach with stack-allocated return closures. In fact, Hoot goes a little farther, too far probably:

(define (f''')
  (push! k)
  (push! h''')
  (push! (lambda (x)
           (define h'' (pop!))
           (define k (pop!))
           (h'' k x)))
  (g'''))

Here instead of passing the continuation as an argument, we pass it on the stack of saved values. Returning pops off from that stack; for example, (lambda () 42) would transform as (lambda () ((pop!) 42)). But some day I should go back and fix it to pass the continuation as an argument, to avoid excess stack traffic for leaf function calls.

There are some gnarly details though, which I know you are here for!

splits

For our function f, we had to break it into two pieces: the part before the push-call to g and the part after. If we had two successive push-calls, we would instead split into three parts. In general, each push-call introduces a split; let us use the term tails for the components produced by a split. (You could also call them continuations.) How many tails will a function have? Well, one for the entry, one for each push call, and one any time control-flow merges between two tails. This is a fixpoint problem, given that the input IR is a graph. (There is also some special logic for call-with-prompt but that is too much detail for even this post.)

where to save the variables

Guile is a dynamically-typed language, having a uniform SCM representation for every value. However in the compiler and run-time we can often unbox some values, generally as u64/s64/f64 values, but also raw pointers of some specific types, some GC-managed and some not. In native Guile, we can just splat all of these data members into 64-bit stack slots and rely on the compiler to emit stack maps to determine whether a given slot is a double or a tagged heap object reference or what. In WebAssembly though there is no sum type, and no place we can put either a u64 or a (ref eq) value. So we have not one stack but three (!) stacks: one for numeric values, implemented using a Wasm memory; one for (ref eq) values, using a table; and one for return continuations, because the func type hierarchy is disjoin from eq. It’s.... it’s gross? It’s gross.

what variables to save

Before a push-call, you save any local variables that will be live after the call. This is also a flow analysis problem. You can leave off constants, and instead reify them anew in the tail continuation.

I realized, though, that we have some pessimality related to stacked continuations. Consider:

(define (q x)
  (define y (f))
  (define z (f))
  (+ x y z))

Hoot’s CPS transform produces something like:

(define (q0 x)
  (save! x)
  (save! q1)
  (f))

(define (q1 y)
  (restore! x)
  (save! x)
  (save! y)
  (save! q2)
  (f))

(define (q2 z)
  (restore! x)
  (restore! y)
  ((pop!) (+ x y z)))

So q0 saved x, fine, indeed we need it later. But q1 didn’t need to restore x uselessly, only to save it again on q2‘s behalf. Really we should be applying a stack discipline for saved data within a function. Given that the source IR is a graph, this means another flow analysis problem, one that I haven’t thought about how to solve yet. I am not even sure if there is a solution in the literature, given that the SSA-like flow graphs plus tail calls / CPS is a somewhat niche combination.

calling conventions

The continuations introduced by CPS transformation have associated calling conventions: return continuations may have the generic varargs type, or the compiler may have concluded they have a fixed arity that doesn’t need checking. In any case, for a return, you call the return continuation with the returned values, and the return point then restores any live-in variables that were previously saved. But for a merge between tails, you can arrange to take the live-in variables directly as parameters; it is a direct call to a known continuation, rather than an indirect call to an unknown call site.

cps soup?

Guile’s intermediate representation is called CPS soup, and you might wonder what relationship that CPS has to this CPS. The answer is not much. The continuations in CPS soup are first-order; a term in one function cannot continue to a continuation in another function. (Inlining and contification can merge graphs from different functions, but the principle is the same.)

It might help to explain that it is the same relationship as it would be if Guile represented programs using SSA: the Hoot CPS transform runs at the back-end of Guile’s compilation pipeline, where closures representations have already been made explicit. The IR is still direct-style, just that syntactically speaking, every call in a transformed program is a tail call. We had to introduce save and restore primitives to implement the saved variable stack, and some other tweaks, but generally speaking, the Hoot CPS transform ensures the run-time all-tail-calls property rather than altering the compile-time language; a transformed program is still CPS soup.

fin

Did we actually make the right call in going for a CPS transformation?

I don’t have good performance numbers at the moment, but from what I can see, the overhead introduced by CPS transformation can impose some penalties, even 10x penalties in some cases. But some results are quite good, improving over native Guile, so I can’t be categorical.

But really the question is, is the performance acceptable for the functionality, and there I think the answer is more clear: we have a port of Fibers that I am sure Spritely colleagues will be writing more about soon, we have good integration with JavaScript promises while not relying on JSPI or Asyncify or anything else, and we haven’t had to compromise in significant ways regarding the source language. So, for now, I am satisfied, and looking forward to experimenting with the stack slicing proposal as it becomes available.

Until next time, happy hooting!

hoot's wasm toolkit

From wingolog by Andy Wingo

(Andy Wingo)

Good morning! Today we continue our dive into the Hoot Scheme-to-WebAssembly compiler. Instead of talking about Scheme, let’s focus on WebAssembly, specifically the set of tools that we have built in Hoot to wrangle Wasm. I am peddling a thesis: if you compile to Wasm, probably you should write a low-level Wasm toolchain as well.

(Incidentally, some of this material was taken from a presentation I gave to the Wasm standardization organization back in October, which I think I haven’t shared yet in this space, so if you want some more context, have at it.)

naming things

Compilers are all about names: definitions of globals, types, local variables, and so on. An intermediate representation in a compiler is a graph of definitions and uses in which the edges are names, and the set of possible names is generally unbounded; compilers make more names when they see fit, for example when copying a subgraph via inlining, and remove names if they determine that a control or data-flow edge is not necessary. Having an unlimited set of names facilitates the graph transformation work that is the essence of a compiler.

Machines, though, generally deal with addresses, not names; one of the jobs of the compiler back-end is to tabulate the various names in a compilation unit, assigning them to addresses, for example when laying out an ELF binary. Some uses may refer to names from outside the current compilation unit, as when you use a function from the C library. The linker intervenes at the back-end to splice in definitions for dangling uses and applies the final assignment of names to addresses.

When targetting Wasm, consider what kinds of graph transformations you would like to make. You would probably like for the compiler to emit calls to functions from a low-level run-time library written in wasm. Those functions are probably going to pull in some additional definitions, such as globals, types, exception tags, and so on. Then once you have your full graph, you might want to lower it, somehow: for example, you choose to use the stringref string representation, but browsers don’t currently support it; you run a post-pass to lower to UTF-8 arrays, but then all your strings are not constant, meaning they can’t be used as global initializers; so you run another post-pass to initialize globals in order from the start function. You might want to make other global optimizations as well, for example to turn references to named locals into unnamed stack operands (not yet working :).

Anyway what I am getting at is that you need a representation for Wasm in your compiler, and that representation needs to be fairly complete. At the very minimum, you need a facility to transform that in-memory representation to the standard WebAssembly text format, which allows you to use a third-party assembler and linker such as Binaryen’s wasm-opt. But since you have to have the in-memory representation for your own back-end purposes, probably you also implement the names-to-addresses mapping that will allow you to output binary WebAssembly also. Also it could be that Binaryen doesn’t support something you want to do; for example Hoot uses block parameters, which are supported fine in browsers but not in Binaryen.

(I exaggerate a little; Binaryen is a more reasonable choice now than it was before the GC proposal was stabilised. But it has been useful to be able to control Hoot’s output, for example as the exception-handling proposal has evolved.)

one thing leads to another

Once you have a textual and binary writer, and an in-memory representation, perhaps you want to be able to read binaries as well; and perhaps you want to be able to read text. Reading the text format is a little annoying, but I had implemented it already in JavaScript a few years ago; and porting it to Scheme was a no-brainer, allowing me to easily author the run-time Wasm library as text.

And so now you have the beginnings of a full toolchain, built just out of necessity: reading, writing, in-memory construction and transformation. But how are you going to test the output? Are you going to require a browser? That’s gross. Node? Sure, we have to check against production Wasm engines, and that’s probably the easiest path to take; still, would be nice if this were optional. Wasmtime? But that doesn’t do GC.

No, of course not, you are a dirty little compilers developer, you are just going to implement a little wasm interpreter, aren’t you. Of course you are. That way you can build nice debugging tools to help you understand when things go wrong. Hoot’s interpreter doesn’t pretend to be high-performance—it is not—but it is simple and it just works. Massive kudos to Spritely hacker David Thompson for implementing this. I think implementing a Wasm VM also had the pleasant side effect that David is now a Wasm expert; implementation is the best way to learn.

Finally, one more benefit of having a Wasm toolchain as part of the compiler: %inline-wasm. In my example from last time, I had this snippet that makes a new bytevector:

(%inline-wasm
 '(func (param $len i32) (param $init i32)
    (result (ref eq))
    (struct.new
     $mutable-bytevector
     (i32.const 0)
     (array.new $raw-bytevector
                (local.get $init)
                (local.get $len))))
 len init)

%inline-wasm takes a literal as its first argument, which should parse as a Wasm function. Parsing guarantees that the wasm is syntactically valid, and allows the arity of the wasm to become apparent: we just read off the function’s type. Knowing the number of parameters and results is one thing, but we can do better, in that we also know their type, which we use for intentional types, requiring in this case that the parameters be exact integers which get wrapped to the signed i32 range. The resulting term is spliced into the CPS graph, can be analyzed for its side effects, and ultimately when written to the binary we replace each local reference in the Wasm with a reference of the appropriate local variable. All this is possible because we have the tools to work on Wasm itself.

fin

Hoot’s Wasm toolchain is about 10K lines of code, and is fairly complete. I think it pays off for Hoot. If you are building a compiler targetting Wasm, consider budgetting for a 10K SLOC Wasm toolchain; you won’t regret it.

Next time, an article on Hoot’s use of CPS. Until then, happy hacking!

GStreamer Hackfest 2024

From Herostratus’ legacy by Víctor Jáquez

Last weeks were a bit hectic. First, with a couple friends we biked the southwest of the Netherlands for almost a week. The next week, the last one, I attended the 2024 Display Next Hackfest

This week was Igalia’s Assembly meetings, and next week, along with other colleagues, I’ll be in Thessaloniki for the GStreamer Spring Hackfest

I’m happy to meet again friends from the GStreamer community and talk and move things forward related with Vulkan, VA-API, KMS, video codecs, etc.

GStreamer Conference 2024 announced to take place 7-10 October 2024 in Montréal, Canada

From GStreamer News by GStreamer

(GStreamer)

The GStreamer project is thrilled to announce that this year's GStreamer Conference will take place on Monday-Tuesday 7-8 October 2024 in Montréal, Québec, Canada, followed by a hackfest.

You can find more details about the conference on the GStreamer Conference 2024 web site.

A call for papers will be sent out in due course.

Registration will open at a later time, in late June / early July.

We will announce those and any further updates on the gstreamer-announce mailing list, the website, on Twitter. and on Mastodon.

Talk slots will be available in varying durations from 20 minutes up to 45 minutes. Whatever you're doing or planning to do with GStreamer, we'd like to hear from you!

We also plan to have sessions with short lightning talks / demos / showcase talks for those who just want to show what they've been working on or do a mini-talk instead of a full-length talk. Lightning talk slots will be allocated on a first-come-first-serve basis, so make sure to reserve your slot if you plan on giving a lightning talk.

A GStreamer hackfest will take place right after the conference, on 9-10 October 2024.

We hope to see you in Montréal!

Please spread the word!

Dissecting GstSegments

From GStreamer – Happy coding by Enrique Ocaña González

During all these years using GStreamer, I’ve been having to deal with GstSegments in many situations. I’ve always have had an intuitive understanding of the meaning of each field, but never had the time to properly write a good reference explanation for myself, ready to be checked at those times when the task at hand stops being so intuitive and nuisances start being important. I used the notes I took during an interesting conversation with Alba and Alicia about those nuisances, during the GStreamer Hackfest in A Coruña, as the seed that evolved into this post.

But what are actually GstSegments? They are the structures that track the values needed to synchronize the playback of a region of interest in a media file.

GstSegments are used to coordinate the translation between Presentation Timestamps (PTS), supplied by the media, and Runtime.

PTS is the timestamp that specifies, in buffer time, when the frame must be displayed on screen. This buffer time concept (called buffer running-time in the docs) refers to the ideal time flow where rate isn’t being had into account.

Decode Timestamp (DTS) is the timestamp that specifies, in buffer time, when the frame must be supplied to the decoder. On decoders supporting P-frames (forward-predicted) and B-frames (bi-directionally predicted), the PTS of the frames reaching the decoder may not be monotonic, but the PTS of the frames reaching the sinks are (the decoder outputs monotonic PTSs).

Runtime (called clock running time in the docs) is the amount of physical time that the pipeline has been playing back. More specifically, the Runtime of a specific frame indicates the physical time that has passed or must pass until that frame is displayed on screen. It starts from zero.

Base time is the point when the Runtime starts with respect to the input timestamp in buffer time (PTS or DTS). It’s the Runtime of the PTS=0.

Start, stop, duration: Those fields are buffer timestamps that specify when the piece of media that is going to be played starts, stops and how long that portion of the media is (the absolute difference between start and stop, and I mean absolute because a segment being played backwards may have a higher start buffer timestamp than what its stop buffer timestamp is).

Position is like the Runtime, but in buffer time. This means that in a video being played back at 2x, Runtime would flow at 1x (it’s physical time after all, and reality goes at 1x pace) and Position would flow at 2x (the video moves twice as fast than physical time).

The Stream Time is the position in the stream. Not exactly the same concept as buffer time. When handling multiple streams, some of them can be offset with respect to each other, not starting to be played from the begining, or even can have loops (eg: repeating the same sound clip from PTS=100 until PTS=200 intefinitely). In this case of repeating, the Stream time would flow from PTS=100 to PTS=200 and then go back again to the start position of the sound clip (PTS=100). There’s a nice graphic in the docs illustrating this, so I won’t repeat it here.

Time is the base of Stream Time. It’s the Stream time of the PTS of the first frame being played. In our previous example of the repeating sound clip, it would be 100.

There are also concepts such as Rate and Applied Rate, but we didn’t get into them during the discussion that motivated this post.

So, for translating between Buffer Time (PTS, DTS) and Runtime, we would apply this formula:

Runtime = BufferTime * ( Rate * AppliedRate ) + BaseTime

And for translating between Buffer Time (PTS, DTS) and Stream Time, we would apply this other formula:

StreamTime = BufferTime * AppliedRate + Time

And that’s it. I hope these notes in the shape of a post serve me as reference in the future. Again, thanks to Alicia, and especially to Alba, for the valuable clarifications during the discussion we had that day in the Igalia office. This post wouldn’t have been possible without them.

GStreamer 1.24.3 stable bug fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • EXIF image tag parsing security fixes
  • Subtitle handling improvements in parsebin
  • Fix issues with HLS streams that contain VTT subtitles
  • Qt6 QML sink re-render and re-sizing fixes
  • unixfd ipc plugin timestamp and segment handling fixes
  • vah264enc, vah265enc: Do not touch the PTS of the output frame
  • vah264dec and vapostproc fixes and improvements
  • v4l2: multiple fixes and improvements, incl. for mediatek JPEG decoder and v4l2 loopback
  • v4l2: fix hang after seek with some v4l2 decoders
  • Wayland sink fixes
  • ximagesink: fix regression on RPi/aarch64
  • fmp4mux, mp4mux gained FLAC audio support
  • D3D11, D3D12: reliablity improvements and memory leak fixes
  • Media Foundation device provider fixes
  • GTK4 paintable sink improvements including support for directly importing dmabufs with GTK 4.14
  • WebRTC sink/source fixes and improvements
  • AWS s3sink, s3src, s3hlssink now support path-style addressing
  • MPEG-TS demuxer fixes
  • Python bindings fixes
  • various bug fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.3 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

GStreamer 1.22.12 old-stable bug fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the now old-stable 1.22 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.22.x.

Highlighted bugfixes:

  • EXIF image tag parsing security fixes
  • glimagesink, gl/macos: race and reference count fixes
  • GstPlay, dvbsubenc, alphadecodebin, d3dvideosink fixes
  • rtpjitterbuffer extended timestamp handling fixes
  • v4l2: fix regression with tiled formats
  • ximagesink: fix regression on RPi/aarch64
  • Thread-safety fixes
  • Python bindings fixes
  • cerbero build fixes with clang 15 on latest macOS/iOS
  • various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.22.12 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

PS: GStreamer 1.22 has now been superseded by GStreamer 1.24.

Update from the GNOME board

From Robotic Tendencies by Robert McQueen

(Robert McQueen)

It’s been around 6 months since the GNOME Foundation was joined by our new Executive Director, Holly Million, and the board and I wanted to update members on the Foundation’s current status and some exciting upcoming changes.

Finances

As you may be aware, the GNOME Foundation has operated at a deficit (nonprofit speak for a loss – ie spending more than we’ve been raising each year) for over three years, essentially running the Foundation on reserves from some substantial donations received 4-5 years ago. The Foundation has a reserves policy which specifies a minimum amount of money we have to keep in our accounts. This is so that if there is a significant interruption to our usual income, we can preserve our core operations while we work on new funding sources. We’ve now “hit the buffers” of this reserves policy, meaning the Board can’t approve any more deficit budgets – to keep spending at the same level we must increase our income.

One of the board’s top priorities in hiring Holly was therefore her experience in communications and fundraising, and building broader and more diverse support for our mission and work. Her goals since joining – as well as building her familiarity with the community and project – have been to set up better financial controls and reporting, develop a strategic plan, and start fundraising. You may have noticed the Foundation being more cautious with spending this year, because Holly prepared a break-even budget for the Board to approve in October, so that we can steady the ship while we prepare and launch our new fundraising initiatives.

Strategy & Fundraising

The biggest prerequisite for fundraising is a clear strategy – we need to explain what we’re doing and why it’s important, and use that to convince people to support our plans. I’m very pleased to report that Holly has been working hard on this and meeting with many stakeholders across the community, and has prepared a detailed and insightful five year strategic plan. The plan defines the areas where the Foundation will prioritise, develop and fund initiatives to support and grow the GNOME project and community. The board has approved a draft version of this plan, and over the coming weeks Holly and the Foundation team will be sharing this plan and running a consultation process to gather feedback input from GNOME foundation and community members.

In parallel, Holly has been working on a fundraising plan to stabilise the Foundation, growing our revenue and ability to deliver on these plans. We will be launching a variety of fundraising activities over the coming months, including a development fund for people to directly support GNOME development, working with professional grant writers and managers to apply for government and private foundation funding opportunities, and building better communications to explain the importance of our work to corporate and individual donors.

Board Development

Another observation that Holly had since joining was that we had, by general nonprofit standards, a very small board of just 7 directors. While we do have some committees which have (very much appreciated!) volunteers from outside the board, our officers are usually appointed from within the board, and many board members end up serving on multiple committees and wearing several hats. It also means the number of perspectives on the board is limited and less representative of the diverse contributors and users that make up the GNOME community.

Holly has been working with the board and the governance committee to reduce how much we ask from individual board members, and improve representation from the community within the Foundation’s governance. Firstly, the board has decided to increase its size from 7 to 9 members, effective from the upcoming elections this May & June, allowing more voices to be heard within the board discussions. After that, we’re going to be working on opening up the board to more participants, creating non-voting officer seats to represent certain regions or interests from across the community, and take part in committees and board meetings. These new non-voting roles are likely to be appointed with some kind of application process, and we’ll share details about these roles and how to be considered for them as we refine our plans over the coming year.

Elections

We’re really excited to develop and share these plans and increase the ways that people can get involved in shaping the Foundation’s strategy and how we raise and spend money to support and grow the GNOME community. This brings me to my final point, which is that we’re in the run up to the annual board elections which take place in the run up to GUADEC. Because of the expansion of the board, and four directors coming to the end of their terms, we’ll be electing 6 seats this election. It’s really important to Holly and the board that we use this opportunity to bring some new voices to the table, leading by example in growing and better representing our community.

Allan wrote in the past about what the board does and what’s expected from directors. As you can see we’re working hard on reducing what we ask from each individual board member by increasing the number of directors, and bringing additional members in to committees and non-voting roles. If you’re interested in seeing more diverse backgrounds and perspectives represented on the board, I would strongly encourage you consider standing for election and reach out to a board member to discuss their experience.

Thanks for reading! Until next time.

Best Wishes,
Rob
President, GNOME Foundation

Update 2024-04-27: It was suggested in the Discourse thread that I clarify the interaction between the break-even budget and the 1M EUR committed by the STF project. This money is received in the form of a contract for services rather than a grant to the Foundation, and must be spent on the development areas agreed during the planning and application process. It’s included within this year’s budget (October 23 – September 24) and is all expected to be spent during this fiscal year, so it doesn’t have an impact on the Foundation’s reserves position. The Foundation retains a small % fee to support its costs in connection with the project, including the new requirement to have our accounts externally audited at the end of the financial year. We are putting this money towards recruitment of an administrative assistant to improve financial and other operational support for the Foundation and community, including the STF project and future development initiatives.

(also posted to GNOME Discourse, please head there if you have any questions or comments)

From WebKit/GStreamer to rust-av, a journey on our stack’s layers

From Base-Art - Philippe Normand by Phil Normand

(Phil Normand)

In this post I’ll try to document the journey starting from a WebKit issue and ending up improving third-party projects that WebKitGTK and WPEWebKit depend on.

I’ve been working on WebKit’s GStreamer backends for a while. Usually some new feature needed on WebKit side would trigger work …

GStreamer 1.24.2 stable bug fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.24.x.

Highlighted bugfixes:

  • H.264 parsing regression fixes
  • WavPack typefinding improvements
  • Video4linux fixes and improvements
  • Android build and runtime fixes
  • macOS OpenGL memory leak and robustness fixes
  • Qt/QML video sink fixes
  • Windows MSVC binary packages: fix libvpx avx/avx2/avx512 instruction set detection
  • Package new analytics and mse libraries in binary packages
  • various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.2 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

xz backdoor and autotools insanity

From Planet – Felipe Contreras by Felipe Contreras

(Felipe Contreras)

I argue autotools' convoluted nature is what enabled the xz backdoor in the first place. The truth is nobody needs to use autotools, and I show why.

Fedora Workstation 40 – what are we working on

From Christian F.K. Schaller by Christian Schaller

(Christian Schaller)
So Fedora Workstation 40 Beta has just come out so I thought I share a bit about some of the things we are working on for Fedora Workstation currently and also major changes coming in from the community.

Flatpak

Flatpaks has been a key part of our strategy for desktop applications for a while now and we are working on a multitude of things to make Flatpaks an even stronger technology going forward. Christian Hergert is working on figuring out how applications that require system daemons will work with Flatpaks, using his own Sysprof project as the proof of concept application. The general idea here is to rely on the work that has happened in SystemD around sysext/confext/portablectl trying to figure out who we can get a system service installed from a Flatpak and the necessary bits wired up properly. The other part of this work, figuring out how to give applications permissions that today is handled with udev rules, that is being worked on by Hubert Figuière based on earlier work by Georges Stavracas on behalf of the GNOME Foundation thanks to the sponsorship from the Sovereign Tech Fund. So hopefully we will get both of these two important issues resolved soon. Kalev Lember is working on polishing up the Flatpak support in Foreman (and Satellite) to ensure there are good tools for managing Flatpaks when you have a fleet of systems you manage, building on the work of Stephan Bergman. Finally Jan Horak and Jan Grulich is working hard on polishing up the experience of using Firefox from a fully sandboxed Flatpak. This work is mainly about working with the upstream community to get some needed portals over the finish line and polish up some UI issues in Firefox, like this one.

Toolbx

Toolbx, our project for handling developer containers, is picking up pace with Debarshi Ray currently working on getting full NVIDIA binary driver support for the containers. One of our main goals for Toolbx atm is making it a great tool for AI development and thus getting the NVIDIA & CUDA support squared of is critical. Debarshi has also spent quite a lot of time cleaning up the Toolbx website, providing easier access to and updating the documentation there. We are also moving to use the new Ptyxis (formerly Prompt) terminal application created by Christian Hergert, in Fedora Workstation 40. This both gives us a great GTK4 terminal, but we also believe we will be able to further integrate Toolbx and Ptyxis going forward, creating an even better user experience.

Nova

So as you probably know, we have been the core maintainers of the Nouveau project for years, keeping this open source upstream NVIDIA GPU driver alive. We plan on keep doing that, but the opportunities offered by the availability of the new GSP firmware for NVIDIA hardware means we should now be able to offer a full featured and performant driver. But co-hosting both the old and the new way of doing things in the same upstream kernel driver has turned out to be counter productive, so we are now looking to split the driver in two. For older pre-GSP NVIDIA hardware we will keep the old Nouveau driver around as is. For GSP based hardware we are launching a new driver called Nova. It is important to note here that Nova is thus not a competitor to Nouveau, but a continuation of it. The idea is that the new driver will be primarily written in Rust, based on work already done in the community, we are also evaluating if some of the existing Nouveau code should be copied into the new driver since we already spent quite a bit of time trying to integrate GSP there. Worst case scenario, if we can’t reuse code, we use the lessons learned from Nouveau with GSP to implement the support in Nova more quickly. Contributing to this effort from our team at Red Hat is Danilo Krummrich, Dave Airlie, Lyude Paul, Abdiel Janulgue and Phillip Stanner.

Explicit Sync and VRR

Another exciting development that has been a priority for us is explicit sync, which is critical for especially the NVidia driver, but which might also provide performance improvements for other GPU architectures going forward. So a big thank you to Michel Dänzer , Olivier Fourdan, Carlos Garnacho; and Nvidia folks, Simon Ser and the rest of community for working on this. This work has just finshed upstream so we will look at backporting it into Fedora Workstaton 40. Another major Fedora Workstation 40 feature is experimental support for Variable Refresh Rate or VRR in GNOME Shell. The feature was mostly developed by community member Dor Askayo, but Jonas Ådahl, Michel Dänzer, Carlos Garnacho and Sebastian Wick have all contributed with code reviews and fixes. In Fedora Workstation 40 you need to enable it using the command

gsettings set org.gnome.mutter experimental-features "['variable-refresh-rate']"

PipeWire

Already covered PipeWire in my post a week ago, but to quickly summarize here too. Using PipeWire for video handling is now finally getting to the stage where it is actually happening, both Firefox and OBS Studio now comes with PipeWire support and hopefully we can also get Chromium and Chrome to start taking a serious look at merging the patches for this soon. Whats more Wim spent time fixing Firewire FFADO bugs, so hopefully for our pro-audio community users this makes their Firewire equipment fully usable and performant with PipeWire. Wim did point out when I spoke to him though that the FFADO drivers had obviously never had any other consumer than JACK, so when he tried to allow for more functionality the drivers quickly broke down, so Wim has limited the featureset of the PipeWire FFADO module to be an exact match of how these drivers where being used by JACK. If the upstream kernel maintainer is able to fix the issues found by Wim then we could look at providing a more full feature set. In Fedora Workstation 40 the de-duplication support for v4l vs libcamera devices should work as soon as we update Wireplumber to the new 0.5 release.

To hear more about PipeWire and the latest developments be sure to check out this interview with Wim Taymans by the good folks over at Destination Linux.

Remote Desktop

Another major feature landing in Fedora Workstation 40 that Jonas Ådahl and Ray Strode has spent a lot of effort on is finalizing the remote desktop support for GNOME on Wayland. So there has been support for remote connections for already logged in sessions already, but with these updates you can do the login remotely too and thus the session do not need to be started already on the remote machine. This work will also enable 3rd party solutions to do remote logins on Wayland systems, so while I am not at liberty to mention names, be on the lookout for more 3rd party Wayland remoting software becoming available this year.

This work is also important to help Anaconda with its Wayland transition as remote graphical install is an important feature there. So what you should see there is Anaconda using GNOME Kiosk mode and the GNOME remote support to handle this going forward and thus enabling Wayland native Anaconda.

HDR

Another feature we been working on for a long time is HDR, or High Dynamic Range. We wanted to do it properly and also needed to work with a wide range of partners in the industry to make this happen. So over the last year we been contributing to improve various standards around color handling and acceleration to prepare the ground, work on and contribute to key libraries needed to for instance gather the needed information from GPUs and screens. Things are coming together now and Jonas Ådahl and Sebastian Wick are now going to focus on getting Mutter HDR capable, once that work is done we are by no means finished, but it should put us close to at least be able to start running some simple usecases (like some fullscreen applications) while we work out the finer points to get great support for running SDR and HDR applications side by side for instance.

PyTorch

We want to make Fedora Workstation a great place to do AI development and testing. First step in that effort is packaging up PyTorch and making sure it can have working hardware acceleration out of the box. Tom Rix has been leading that effort on our end and you will see the first fruits of that labor in Fedora Workstation 40 where PyTorch should work with GPU acceleration on AMD hardware (ROCm) out of the box. We hope and expect to be able to provide the same for NVIDIA and Intel graphics eventually too, but this is definitely a step by step effort.

GStreamer 1.24.1 stable bug fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the new stable 1.24 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.24.0.

Highlighted bugfixes:

  • Fix instant-EOS regression in audio sinks in some cases when volume is 0
  • rtspsrc: server compatibility improvements and ONVIF trick mode fixes
  • rtsp-server: fix issues if RTSP media was set to be both shared and reusable
  • (uri)decodebin3 and playbin3 fixes
  • adaptivdemux2/hlsdemux2: Fix issues with failure updating playlists
  • mpeg123 audio decoder fixes
  • v4l2codecs: DMA_DRM caps support for decoders
  • va: various AV1 / H.264 / H.265 video encoder fixes
  • vtdec: fix potential deadlock regression with ProRes playback
  • gst-libav: fixes for video decoder frame handling, interlaced mode detection
  • avenc_aac: support for 7.1 and 16 channel modes
  • glimagesink: Fix the sink not always respecting preferred size on macOS
  • gtk4paintablesink: Fix scaling of texture position
  • webrtc: Allow resolution and framerate changes, and many other improvements
  • webrtc: Add new LiveKit source element
  • Fix usability of binary packages on arm64 iOS
  • various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.24.1 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

GStreamer 1.22.11 old-stable bug fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the now old-stable 1.22 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.22.x.

Highlighted bugfixes:

  • Fix instant-EOS regression in audio sinks in some cases when volume is 0
  • rtspsrc: server compatibility improvements and ONVIF trick mode fixes
  • libsoup linking improvements on non-Linux platforms
  • va: improvements for intel i965 driver
  • wasapi2: fix choppy audio and respect ringbuffer buffer/latency time
  • rtsp-server file descriptor leak fix
  • uridecodebin3 fixes
  • various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.22.11 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

Asymptotic: A 2023 Review

From Arun Raghavan by Arun Raghavan

It’s been a busy few several months, but now that we have some breathing room, I wanted to take stock of what we have done over the last year or so.

This is a good thing for most people and companies to do of course, but being a scrappy, (questionably) young organisation, it’s doubly important for us to introspect. This allows us to both recognise our achievements and ensure that we are accomplishing what we have set out to do.

One thing that is clear to me is that we have been lagging in writing about some of the interesting things that we have had the opportunity to work on, so you can expect to see some more posts expanding on what you find below, as well as some of the newer work that we have begun.

(note: I write about our open source contributions below, but needless to say, none of it is possible without the collaboration, input, and reviews of members of the community)

WHIP/WHEP client and server for GStreamer

If you’re in the WebRTC world, you likely have not missed the excitement around standardisation of HTTP-based signalling protocols, culminating in the WHIP and WHEP specifications.

Tarun has been driving our client and server implementations for both these protocols, and in the process has been refactoring some of the webrtcsink and webrtcsrc code to make it easier to add more signaller implementations. You can find out more about this work in his talk at GstConf 2023 and we’ll be writing more about the ongoing effort here as well.

Low-latency embedded audio with PipeWire

Some of our work involves implementing a framework for very low-latency audio processing on an embedded device. PipeWire is a good fit for this sort of application, but we have had to implement a couple of features to make it work.

It turns out that doing timer-based scheduling can be more CPU intensive than ALSA period interrupts at low latencies, so we implemented an IRQ-based scheduling mode for PipeWire. This is now used by default when a pro-audio profile is selected for an ALSA device.

In addition to this, we also implemented rate adaptation for USB gadget devices using the USB Audio Class “feedback control” mechanism. This allows USB gadget devices to adapt their playback/capture rates to the graph’s rate without having to perform resampling on the device, saving valuable CPU and latency.

There is likely still some room to optimise things, so expect to more hear on this front soon.

Compress offload in PipeWire

Sanchayan has written about the work we did to add support in PipeWire for offloading compressed audio. This is something we explored in PulseAudio (there’s even an implementation out there), but it’s a testament to the PipeWire design that we were able to get this done without any protocol changes.

This should be useful in various embedded devices that have both the hardware and firmware to make use of this power-saving feature.

GStreamer LC3 encoder and decoder

Tarun wrote a GStreamer plugin implementing the LC3 codec using the liblc3 library. This is the primary codec for next-generation wireless audio devices implementing the Bluetooth LE Audio specification. The plugin is upstream and can be used to encode and decode LC3 data already, but will likely be more useful when the existing Bluetooth plugins to talk to Bluetooth devices get LE audio support.

QUIC plugins for GStreamer

Sanchayan implemented a QUIC source and sink plugin in Rust, allowing us to start experimenting with the next generation of network transports. For the curious, the plugins sit on top of the Quinn implementation of the QUIC protocol.

There is a merge request open that should land soon, and we’re already seeing folks using these plugins.

AWS S3 plugins

We’ve been fleshing out the AWS S3 plugins over the years, and we’ve added a new awss3putobjectsink. This provides a better way to push small or sparse data to S3 (subtitles, for example), without potentially losing data in case of a pipeline crash.

We’ll also be expecting this to look a little more like multifilesink, allowing us to arbitrary split up data and write to S3 directly as multiple objects.

Update to webrtc-audio-processing

We also updated the webrtc-audio-processing library, based on more recent upstream libwebrtc. This is one of those things that becomes surprisingly hard as you get into it — packaging an API-unstable library correctly, while supporting a plethora of operating system and architecture combinations.

Clients

We can’t always speak publicly of the work we are doing with our clients, but there have been a few interesting developments we can (and have spoken about).

Both Sanchayan and I spoke a bit about our work with WebRTC-as-a-service provider, Daily. My talk at the GStreamer Conference was a summary of the work I wrote about previously about what we learned while building Daily’s live streaming, recording, and other backend services. There were other clients we worked with during the year with similar experiences.

Sanchayan spoke about the interesting approach to building SIP support that we took for Daily. This was a pretty fun project, allowing us to build a modern server-side SIP client with GStreamer and SIP.js.

An ongoing project we are working on is building AES67 support using GStreamer for FreeSWITCH, which essentially allows bridging low-latency network audio equipment with existing SIP and related infrastructure.

As you might have noticed from previous sections, we are also working on a low-latency audio appliance using PipeWire.

Retrospective

All in all, we’ve had a reasonably productive 2023. There are things I know we can do better in our upstream efforts to help move merge requests and issues, and I hope to address this in 2024.

We have ideas for larger projects that we would like to take on. Some of these we might be able to find clients who would be willing to pay for. For the ideas that we think are useful but may not find any funding, we will continue to spend our spare time to push forward.

If you made this this far, thank you, and look out for more updates!

PipeWire camera handling is now happening!

From Christian F.K. Schaller by Christian Schaller

(Christian Schaller)

We hit a major milestones this week with the long worked on adoption of PipeWire Camera support finally starting to land!

Not long ago Firefox was released with experimental PipeWire camera support thanks to the great work by Jan Grulich.

Then this week OBS Studio shipped with PipeWire camera support thanks to the great work of Georges Stavracas, who cleaned up the patches and pushed to get them merged based on earlier work by himself, Wim Taymans and Colulmbarius. This means we now have two major applications out there that can use PipeWire for camera handling and thus two applications whose video streams that can be interacted with through patchbay applications like Helvum and qpwgraph.
These applications are important and central enough that having them use PipeWire are in itself useful, but they will now also provide two examples of how to do it for application developers looking at how to add PipeWire camera support to their own applications; there is no better documentation than working code.

The PipeWire support is also paired with camera portal support. The use of the portal also means we are getting closer to being able to fully sandbox media applications in Flatpaks which is an important goal in itself. Which reminds me, to test out the new PipeWire support be sure to grab the official OBS Studio Flatpak from Flathub.

PipeWire camera handling with OBS Studio, Firefox and Helvum.

PipeWire camera handling with OBS Studio, Firefox and Helvum.


Let me explain what is going on in the screenshot above as it is a lot. First of all you see Helvum there on the right showning all the connections made through PipeWire, both the audio and in yellow, the video. So you can see how my Logitech BRIO camera is feeding a camera video stream into both OBS Studio and Firefox. You also see my Magewell HDMI capture card feeding a video stream into OBS Studio and finally gnome-shell providing a screen capture feed that is being fed into OBS Studio. On the left you see on the top Firefox running their WebRTC test app capturing my video then just below that you see the OBS Studio image with the direct camera feed on the top left corner, the screencast of Firefox just below it and finally the ‘no signal’ image is from my HDMI capture card since I had no HDMI device connected to it as I was testing this.

For those wondering work is also underway to bring this into Chromium and Google Chrome browsers where Michael Olbrich from Pengutronix has been pushing to get patches written and merged, he did a talk about this work at FOSDEM last year as you can see from these slides with this patch being the last step to get this working there too.

The move to PipeWire also prepared us for the new generation of MIPI cameras being rolled out in new laptops and helps push work on supporting those cameras towards libcamera, the new library for dealing with the new generation of complex cameras. This of course ties well into the work that Hans de Goede and Kate Hsuan has been doing recently, along with Bryan O’Donoghue from Linaro, on providing an open source driver for MIPI cameras and of course the incredible work by Laurent Pinchart and Kieran Bingham from Ideas on board on libcamera itself.

The PipeWire support is of course fresh and I am sure we will find bugs and corner cases that needs fixing as more people test out the functionality in both Firefox and OBS Studio and there are some interface annoyances we are working to resolve. For instance since PipeWire support both V4L and libcamera as a backend you do atm get double entries in your selection dialogs for most of your cameras. Wireplumber has implemented de-deplucation code which will ensure only the libcamera listing will show for cameras supported by both v4l and libcamera, but is only part of the development version of Wireplumber and thus it will land in Fedora Workstation 40, so until that is out you will have to deal with the duplicate options.

Camera selection dialog

Camera selection dialog


We are also trying to figure out how to better deal with infraread cameras that are part of many modern webcams. Obviously you usually do not want to use an IR camera for your video calls, so we need to figure out the best way to identify them and ensure they are clearly marked and not used by default.

Another recent good PipeWire new tidbit that became available with the PipeWire 1.0.4 release PipeWire maintainer Wim Taymans also fixed up the FireWire FFADO support. The FFADO support had been in there for some time, but after seeing Venn Stone do some thorough tests and find issues we decided it was time to bite the bullet and buy some second hand Firewire hardware for Wim to be able to test and verify himself.

Focusrite firewire device

Focusrite firewire device

.
Once the Focusrite device I bought landed at Wims house he got to work and cleaned up the FFADO support and make it both work and be performant.
For those unaware FFADO is a way to use Firewire devices without going through ALSA and is popular among pro-audio folks because it gives lower latencies. Firewire is of course a relatively old technology at this point, but the audio equipment is still great and many audio engineers have a lot of these devices, so with this fixed you can plop a Firewire PCI card into your PC and suddenly all those old Firewire devices gets a new lease on life on your Linux system. And you can buy these devices on places like ebay or facebook marketplace for a fraction of their original cost. In some sense this demonstrates the same strength of PipeWire as the libcamera support, in the libcamera case it allows Linux applications a way to smoothly transtion to a new generation of hardware and in this Firewire case it allows Linux applications to keep using older hardware with new applications.

So all in all its been a great few weeks for PipeWire and for Linux Audio AND Video, and if you are an application maintainer be sure to look at how you can add PipeWire camera support to your application and of course get that application packaged up as a Flatpak for people using Fedora Workstation and other distributions to consume.

GStreamer 1.24.0 new major stable release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is excited to announce a new major feature release of your favourite cross-platform multimedia framework!

As always, this release is again packed with new features, bug fixes and many other improvements.

The 1.24 release series adds new features on top of the previous 1.22 series and is part of the API and ABI-stable 1.x release series of the GStreamer multimedia framework.

Highlights:

  • New Discourse forum and Matrix chat space
  • New Analytics and Machine Learning abstractions and elements
  • Playbin3 and decodebin3 are now stable and the default in gst-play-1.0, GstPlay/GstPlayer
  • The va plugin is now preferred over gst-vaapi and has higher ranks
  • GstMeta serialization/deserialization and other GstMeta improvements
  • New GstMeta for SMPTE ST-291M HANC/VANC Ancillary Data
  • New unixfd plugin for efficient 1:N inter-process communication on Linux
  • cudaipc source and sink for zero-copy CUDA memory sharing between processes
  • New intersink and intersrc elements for 1:N pipeline decoupling within the same process
  • Qt5 + Qt6 QML integration improvements including qml6glsrc, qml6glmixer, qml6gloverlay, and qml6d3d11sink elements
  • DRM Modifier Support for dmabufs on Linux
  • OpenGL, Vulkan and CUDA integration enhancements
  • Vulkan H.264 and H.265 video decoders
  • RTP stack improvements including new RFC7273 modes and more correct header extension handling in depayloaders
  • WebRTC improvements such as support for ICE consent freshness, and a new webrtcsrc element to complement webrtcsink
  • WebRTC signallers and webrtcsink implementations for LiveKit and AWS Kinesis Video Streams
  • WHIP server source and client sink, and a WHEP source
  • Precision Time Protocol (PTP) clock support for Windows and other additions
  • Low-Latency HLS (LL-HLS) support and many other HLS and DASH enhancements
  • New W3C Media Source Extensions library
  • Countless closed caption handling improvements including new cea608mux and cea608tocea708 elements
  • Translation support for awstranscriber
  • Bayer 10/12/14/16-bit depth support
  • MPEG-TS support for asynchronous KLV demuxing and segment seeking, plus various new muxer features
  • Capture source and sink for AJA capture and playout cards
  • SVT-AV1 and VA-API AV1 encoders, stateless AV1 video decoder
  • New uvcsink element for exporting streams as UVC camera
  • DirectWrite text rendering plugin for windows
  • Direct3D12-based video decoding, conversion, composition, and rendering
  • AMD Advanced Media Framework AV1 + H.265 video encoders with 10-bit and HDR support
  • AVX/AVX2 support and NEON support on macOS on Apple ARM64 CPUs via new liborc
  • GStreamer C# bindings have been updated
  • Rust bindings improvements and many new and improved Rust plugins
  • Lots of new plugins, features, performance improvements and bug fixes

For more details check out the GStreamer 1.24 release notes.

Binaries for Android, iOS, macOS and Windows will be provided in due course.

You can download release tarballs directly here: gstreamer, gst-plugins-base, gst-plugins-good, gst-plugins-ugly, gst-plugins-bad, gst-libav, gst-rtsp-server, gst-python, gst-editing-services, gst-devtools, gstreamer-vaapi, gstreamer-sharp, gstreamer-docs.

C skill issue; how the White House is wrong

From Planet – Felipe Contreras by Felipe Contreras

(Felipe Contreras)

My argument that C is superior to Rust in certain scenarios and in the right hands with clear examples.

Orc 0.4.38 bug-fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another release of liborc, the Optimized Inner Loop Runtime Compiler, which is used for SIMD acceleration in GStreamer plugins such as audioconvert, audiomixer, compositor, videoscale, and videoconvert, to name just a few.

This is a minor bug-fix release which fixes an issue on x86 architectures with hardened Linux kernels.

Highlights:

  • x86: account for XSAVE when checking for AVX support, fixing usage on hardened linux kernels where AVX support has been disabled
  • neon: Use the real intrinsics for divf and sqrtf
  • orc.m4 for autotools is no longer shipped. If anyone still uses it they can copy it into their source tree

Direct tarball download: orc-0.4.38.tar.xz.

GStreamer 1.23.90 (1.24.0 rc1) pre-release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is excited to announce the first release candidate for the upcoming stable 1.24.0 feature release.

This 1.23.90 pre-release is for testing and development purposes in the lead-up to the stable 1.24 series which is now frozen for commits and scheduled for release very soon.

Depending on how things go there might be more release candidates in the next couple of days, but in any case we're aiming to get 1.24.0 out as soon as possible.

Preliminary release notes highlighting all the new features, bugfixes, performance optimizations and other important changes will be available in the next few days.

Binaries for Android, iOS, Mac OS X and Windows will be made available shortly at the usual location.

Release tarballs can be downloaded directly here:

As always, please give it a spin and let us know of any issues you run into by filing an issue in GitLab.

GStreamer 1.23.2 unstable development release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another development release in the unstable 1.23 release series.

The unstable 1.23 release series is for testing and development purposes in the lead-up to the stable 1.24 series which is scheduled for release ASAP. Any newly-added API can still change until that point.

Full release notes will be provided in the near future, highlighting all the new features, bugfixes, performance optimizations and other important changes.

This development release is primarily for developers and early adopters.

Binaries for Android, iOS, Mac OS X and Windows will be made available shortly at the usual location.

Release tarballs can be downloaded directly here:

As always, please give it a spin and let us know of any issues you run into by filing an issue in GitLab.

GStreamer 1.22.10 stable bug fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the stable 1.22 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.22.x.

Highlighted bugfixes:

  • gst-python: fix bindings overrides for Python >= 3.12
  • glcolorconvert: fix wrong RGB to YUV matrix with bt709
  • glvideoflip: fix "method" property setting at construction time
  • gtk4paintablesink: Always draw a black background behind the video frame, and other fixes
  • pad: keep segment event seqnums the same when applying a pad offset
  • basesink: Preroll on out of segment buffers when not dropping them
  • Prefer FFmpeg musepack decoder/demuxer, fixing musepack playback in decodebin3/playbin3
  • livesync: add support for image formats such as JPEG or PNG
  • sdpdemux: Add SDP message (aka session) attributes to the caps too
  • textwrap: add support for gaps
  • macos: Fix gst_macos_main() terminating whole process, and set activation policy
  • webrtcbin: Improve SDP intersection for Opus
  • various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.22.10 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

GstVA library in GStreamer 1.22 and some new features in 1.24

From Herostratus’ legacy by Víctor Jáquez

I know, it’s old news, but still, I was pending to write about the GstVA library to clarify its purpose and scope.I didn’t want to have a library for GstVA, because to maintain a library is a though duty. I learnt it in the bad way with GStreamer-VAAPI. Therefore, I wanted GstVA as simple as possible, direct in its libva usage, and self-contained.

As far as I know, the main usage of GStreamer-VAAPI library was to have access to the VASurface from the GstBuffer, for example, with appsink. Since the beginning of GstVA, a mechanism to access the surface ID was provided, as `GStreamer OpenGL** offers access to the texture ID: through a flag when mapping a GstGL memory backed buffer. You can see how this mechanism is used in the one of the GstVA sample apps.

Nevertheless, later another use case appeared which demanded a library for GstVA: Other GStreamer elements needed to produce or consume VA surface backed buffers. Or to say it more concretely: they needed to use the GstVA allocator and buffer pool, and the mechanism to share the GstVA context along the pipeline too. The plugins with those VA related elements are msdk and qsv, when they operate on Linux. Both elements use Intel OneVPL, so they are basically competitors. The main difference is that the first is maintained by Intel, and has more features, specially for Linux, whilst the former in maintained by Seungha Yang, and it appears to be better tested in Windows.

These are the objects exposed by the GstVA library API:

GstVaDisplay #

GstVaDisplay represents a VADisplay, the interface between the application and the hardware accelerator. This class is abstract, and it’s supposed not to be instantiated. Instantiation has to go through it derived classes, such as

This class is shared among all the elements in the pipeline via GstContext, so all the plugged elements share the same connection the hardware accelerator. Unless the plugged element in the pipeline has a specific name, as in the case of multi GPU systems.

Let’s talk a bit about multi GPU systems. This is the gst-inspect-1.0 output of a system with an Intel and an AMD GPU, both with VA support:

$ gst-inspect-1.0 va
Plugin Details:
Name va
Description VA-API codecs plugin
Filename /home/igalia/vjaquez/gstreamer/build/subprojects/gst-plugins-bad/sys/va/libgstva.so
Version 1.23.1.1
License LGPL
Source module gst-plugins-bad
Documentation https://gstreamer.freedesktop.org/documentation/va/
Binary package GStreamer Bad Plug-ins git
Origin URL Unknown package origin

vaav1dec: VA-API AV1 Decoder in Intel(R) Gen Graphics
vacompositor: VA-API Video Compositor in Intel(R) Gen Graphics
vadeinterlace: VA-API Deinterlacer in Intel(R) Gen Graphics
vah264dec: VA-API H.264 Decoder in Intel(R) Gen Graphics
vah264lpenc: VA-API H.264 Low Power Encoder in Intel(R) Gen Graphics
vah265dec: VA-API H.265 Decoder in Intel(R) Gen Graphics
vah265lpenc: VA-API H.265 Low Power Encoder in Intel(R) Gen Graphics
vajpegdec: VA-API JPEG Decoder in Intel(R) Gen Graphics
vampeg2dec: VA-API Mpeg2 Decoder in Intel(R) Gen Graphics
vapostproc: VA-API Video Postprocessor in Intel(R) Gen Graphics
varenderD129av1dec: VA-API AV1 Decoder in AMD Radeon Graphics in renderD129
varenderD129av1enc: VA-API AV1 Encoder in AMD Radeon Graphics in renderD129
varenderD129compositor: VA-API Video Compositor in AMD Radeon Graphics in renderD129
varenderD129deinterlace: VA-API Deinterlacer in AMD Radeon Graphics in renderD129
varenderD129h264dec: VA-API H.264 Decoder in AMD Radeon Graphics in renderD129
varenderD129h264enc: VA-API H.264 Encoder in AMD Radeon Graphics in renderD129
varenderD129h265dec: VA-API H.265 Decoder in AMD Radeon Graphics in renderD129
varenderD129h265enc: VA-API H.265 Encoder in AMD Radeon Graphics in renderD129
varenderD129jpegdec: VA-API JPEG Decoder in AMD Radeon Graphics in renderD129
varenderD129postproc: VA-API Video Postprocessor in AMD Radeon Graphics in renderD129
varenderD129vp9dec: VA-API VP9 Decoder in AMD Radeon Graphics in renderD129
vavp9dec: VA-API VP9 Decoder in Intel(R) Gen Graphics

22 features:
+-- 22 elements

As you can observe, each card registers the supported element. The first GPU, the Intel one, doesn’t insert renderD129 after the va prefix, while the AMD Radeon, does. As you could imagine, renderD129 is the device name in /dev/dri:

$ ls /dev/dri
by-path card0 card1 renderD128 renderD129

The appended device name expresses the DRM device the elements use. And only after the second GPU the device name is appended. This allows to use the untagged elements either for the first GPU or to allow the usage of a wrapped display injected by the user application, such as VA X11 or Wayland connections.

Notice that sharing DMABuf-based buffers between different GPUs is theoretically possible, but not assured.

Keep in mind that, currently, nvidia-vaapi-driver is not supported by GstVA, and the driver will be ignored by the plugin register since 1.24.

VA allocators #

There are two types of memory allocators in GstVA:

  • VA surface allocator. It allocates a GstMemory that wraps a VASurfaceID, which represents a complete frame directly consumable by VA-based elements. Thus, a GstBuffer will hold one and only one GstMemory of this type. These buffers are the very same used by the system memory buffers, since the surfaces generally can map their content to CPU memory (unless they are encrypted, but GstVA haven’t been tested for that use case).
  • VA-DMABuf allocator. It descends from GstDmaBufAllocator, though it’s not an allocator in strict sense, since it only imports DMABufs to VA surfaces, and exports VA surfaces to DMABufs. Notice that a single VA surface can be exported as multiple DMABuf-backed GstMemories, and the contrary, a VA surface can be imported from multiple DMABufs.

Also, VA allocators offer a couple methods that can be useful for applications that use appsink, for example gst_va_buffer_get_surface and gst_va_buffer_peek_display.

One particularity of both allocators is that they keep an internal pool of the allocated VA surfaces, since the mapping of buffers/memories is not always 1:1, as in the case of DMABuf; to reuse surfaces even if the buffer pool ditches them, and to avoid surface leaks, keeping track of them all the time.

GstVaPool #

This class is only useful for other GStreamer elements capable of consume VA surfaces, so they could instantiate, configure and propose a VA buffer pool, but not for applications.

And that’s all for now. Thank you for bear with me up to here. But remember, the API of the library is unstable, and it can change any time. So, no promises are made :)

New and old apps on Flathub

From /bɑs ˈtjɛ̃ no ˈse ʁɑ/ (hadess) | News by Bastien Nocera

(Bastien Nocera)

3D Printing Slicers

 I recently replaced my Flashforge Adventurer 3 printer that I had been using for a few years as my first printer with a BambuLab X1 Carbon, wanting a printer that was not a “project” so I could focus on modelling and printing. It's an investment, but my partner convinced me that I was using the printer often enough to warrant it, and told me to look out for Black Friday sales, which I did.

The hardware-specific slicer, Bambu Studio, was available for Linux, but only as an AppImage, with many people reporting crashes on startup, non-working video live view, and other problems that the hardware maker tried to work-around by shipping separate AppImage variants for Ubuntu and Fedora.

After close to 150 patches to the upstream software (which, in hindsight, I could probably have avoided by compiling the C++ code with LLVM), I manage to “flatpak” the application and make it available on Flathub. It's reached 3k installs in about a month, which is quite a bit for a niche piece of software.

Note that if you click the “Donate” button on the Flathub page, it will take you a page where you can feed my transformed fossil fuel addiction buy filament for repairs and printing perfectly fitting everyday items, rather than bulk importing them from the other side of the planet.

Screenshot
 

Preparing a Game Gear consoliser shell

I will continue to maintain the FlashPrint slicer for FlashForge printers, installed by nearly 15k users, although I enabled automated updates now, and will not be updating the release notes, which required manual intervention.

FlashForge have unfortunately never answered my queries about making this distribution of their software official (and fixing the crash when using a VPN...).

 Rhythmbox

As I was updating the Rhythmbox Flatpak on Flathub, I realised that it just reached 250k installs, which puts the number of installations of those 3D printing slicers above into perspective.

rhythmbox-main-window.png 

The updated screenshot used on Flathub

Congratulations, and many thanks, to all the developers that keep on contributing to this very mature project, especially Jonathan Matthew who's been maintaining the app since 2008.

GStreamer 1.23.1 unstable development release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce the first development release in the unstable 1.23 release series.

The unstable 1.23 release series is for testing and development purposes in the lead-up to the stable 1.24 series which is scheduled for release ASAP. Any newly-added API can still change until that point.

This development release is primarily for developers and early adopters.

Binaries for Android, iOS, Mac OS X and Windows will be made available shortly at the usual location.

Release tarballs can be downloaded directly here:

As always, please give it a spin and let us know of any issues you run into by filing an issue in GitLab.

Orc 0.4.37 bug-fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another release of liborc, the Optimized Inner Loop Runtime Compiler, which is used for SIMD acceleration in GStreamer plugins such as audioconvert, audiomixer, compositor, videoscale, and videoconvert, to name just a few.

This is a bug-fix release that fixes some recent regressions, but also properly enables liborc on Apple arm64 devices.

Highlights:

  • Apple arm64 enablement
  • macOS/iOS fixes
  • orcc regression fix, would default to "sse" backend
  • MMX backend fixes
  • orc testsuite fixes

Direct tarball download: orc-0.4.37.

Flathub: Pros and Cons of Direct Uploads

From Robotic Tendencies by Robert McQueen

(Robert McQueen)

I attended FOSDEM last weekend and had the pleasure to participate in the Flathub / Flatpak BOF on Saturday. A lot of the session was used up by an extensive discussion about the merits (or not) of allowing direct uploads versus building everything centrally on Flathub’s infrastructure, and related concerns such as automated security/dependency scanning.

My original motivation behind the idea was essentially two things. The first was to offer a simpler way forward for applications that use language-specific build tools that resolve and retrieve their own dependencies from the internet. Flathub doesn’t allow network access during builds, and so a lot of manual work and additional tooling is currently needed (see Python and Electron Flatpak guides). And the second was to offer a maybe more familiar flow to developers from other platforms who would just build something and then run another command to upload it to the store, without having to learn the syntax of a new build tool. There were many valid concerns raised in the room, and I think on reflection that this is still worth doing, but might not be as valuable a way forward for Flathub as I had initially hoped.

Of course, for a proprietary application where Flathub never sees the source or where it’s built, whether that binary is uploaded to us or downloaded by us doesn’t change much. But for an FLOSS application, a direct upload driven by the developer causes a regression on a number of fronts. We’re not getting too hung up on the “malicious developer inserts evil code in the binary” case because Flathub already works on the model of verifying the developer and the user makes a decision to trust that app – we don’t review the source after all. But we do lose other things such as our infrastructure building on multiple architectures, and visibility on whether the build environment or upload credentials have been compromised unbeknownst to the developer.

There is now a manual review process for when apps change their metadata such as name, icon, license and permissions – which would apply to any direct uploads as well. It was suggested that if only heavily sandboxed apps (eg no direct filesystem access without proper use of portals) were permitted to make direct uploads, the impact of such concerns might be somewhat mitigated by the sandboxing.

However, it was also pointed out that my go-to example of “Electron app developers can upload to Flathub with one command” was also a bit of a fiction. At present, none of them would pass that stricter sandboxing requirement. Almost all Electron apps run old versions of Chromium with less complete portal support, needing sandbox escapes to function correctly, and Electron (and Chromium’s) sandboxing still needs additional tooling/downstream patching to run inside a Flatpak. Buh-boh.

I think for established projects who already ship their own binaries from their own centralised/trusted infrastructure, and for developers who have understandable sensitivities about binary integrity such such as encryption, password or financial tools, it’s a definite improvement that we’re able to set up direct uploads with such projects with less manual work. There are already quite a few applications – including verified ones – where the build recipe simply fetches a binary built elsewhere and unpacks it, and if this already done centrally by the developer, repeating the exercise on Flathub’s server adds little value.

However for the individual developer experience, I think we need to zoom out a bit and think about how to improve this from a tools and infrastructure perspective as we grow Flathub, and as we seek to raise funds for different sources for these improvements. I took notes for everything that was mentioned as a tooling limitation during the BOF, along with a few ideas about how we could improve things, and hope to share these soon as part of an RFP/RFI (Request For Proposals/Request for Information) process. We don’t have funding yet but if we have some prospective collaborators to help refine the scope and estimate the cost/effort, we can use this to go and pursue funding opportunities.

Re: New responsibilities

From /bɑs ˈtjɛ̃ no ˈse ʁɑ/ (hadess) | News by Bastien Nocera

(Bastien Nocera)

 A few months have passed since New Responsibilities was posted, so I thought I would provide an update.

Projects Maintenance

Of all the freedesktop projects I created and maintained, only one doesn't have a new maintainer, low-memory-monitor.

This daemon is what the GMemoryMonitor GLib API is based on, so it can't be replaced trivially. Efforts seem to be under way to replace it with systemd APIs.

As for the other daemons:

(As an aside, there's posturing towards replacing power-profiles-daemon with tuned in Fedora. I would advise stakeholders to figure out whether having a large Python script in the boot hot path is a good idea, taking a look at bootcharts, and then thinking about whether hardware manufacturers would be able to help with supporting a tool with so many moving parts. Useful for tinkering, not for shipping in a product)

Updated responsibilities

Since mid-August, I've joined the Platform Enablement Team. Right now, I'm helping out with maintenance of the Bluetooth kernel stack in RHEL (and thus CentOS).

The goal is to eventually pivot to hardware enablement, which is likely to involve backporting and testing, more so than upstream enablement. This is currently dependent on attending some formal kernel development (and debugging) training sessions which should make it easier to see where my hodge-podge kernel knowledge stands.

Blog backlog

Before being moved to a different project, and apart from the usual and very time-consuming bug triage, user support and project maintenance, I also worked on a few new features. I have a few posts planned that will lay that out.

Orc 0.4.36 bug-fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another release of liborc, the Optimized Inner Loop Runtime Compiler, which is used for SIMD acceleration in GStreamer plugins such as audioconvert, audiomixer, compositor, videoscale, and videoconvert, to name just a few.

This is a bug-fix release that fixes a regression introduced by the recent 0.4.35 release, fixing a crash on machines that only support AVX but not AVX2.

Highlights:

  • Only use AVX / AVX2 instructions on CPUs that support both AVX and AVX2

Direct tarball download: orc-0.4.36.

GStreamer 1.22.9 stable bug fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the stable 1.22 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.22.x.

Highlighted bugfixes:

  • More Security fixes for the AV1 codec parser
  • va: fixes for Mesa Gallium drivers in Mesa versions older than v23.2
  • v4l2src: Consider framerate during caps selection
  • v4l2codec: decoder fixes
  • rtspsrc: multicast fixes
  • camerabin viewfinder fixes
  • various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.22.9 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

Orc 0.4.35 release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another release of liborc, the Optimized Inner Loop Runtime Compiler, which is used for SIMD acceleration in GStreamer plugins such as audioconvert, audiomixer, compositor, videoscale, and videoconvert, to name just a few.

Highlights:

  • Add support for AVX / AVX2
  • SSE backend improvements
  • New `orf` and `andf` opcodes for bitwise AND and OR for single precision floats
  • Add support for `convwf`, int16 to float conversion
  • Allow backend/target selection through ORC_TARGET environment variable
  • orconce: Use Win32 once implementation with MSVC for better performance
  • orcc: add --binary option to output raw machine code for functions
  • orcprofile: Implement Windows high-resolution timestamp for MSVC to allow benchmarking on MSVC builds
  • Documentation improvements

Direct tarball download: orc-0.4.35.

GStreamer 1.22.8 stable bug fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the stable 1.22 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.22.x.

Highlighted bugfixes:

  • Security fixes for the AV1 codec parser
  • [Security fixes](https://gstreamer.freedesktop.org/security/) for the AV1 video codec parser
  • avdec video decoder: fix another possible deadlock with FFmpeg 6.1
  • qtdemux: reverse playback and seeking fixes for files with raw audio streams
  • v4l2: fix "newly allocated buffer ... is not free" warning log flood
  • GstPlay + GstPlayer library fixes
  • dtls: Fix build failure on Windows when compiling against OpenSSL 3.2.0
  • d3d11screencapturesrc: Fix wrong color with HDR enabled
  • Cerbero build tool: More python 3.12 string escape warning fixes; make sure to bundle build tools as well
  • various bug fixes, build fixes, memory leak fixes, and other stability and reliability improvements

See the GStreamer 1.22.8 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

Vulkan Video encoder in GStreamer

From dabrain34.github.io by Stéphane Cerveau

Vulkan Video encoder in GStreamer

During the last months of 2023, we, at Igalia, decided to focus on the latest provisional specs proposed by the Vulkan Video Khronos TSG group to support encode operations in an open reference.

As the Khronos TSG Finalizes Vulkan Video Extensions for Accelerated H.264 and H.265 Encode the 19th of December 2023, here is an update on our work to support both h264 and h265 codec in GStreamer.

This work has been possible thanks to the relentless work of the Khronos TSG during the past years, including the IHV vendors, such as AMD, NVIDIA, Intel, but also Rastergrid and its massive work on the specifications wording and the vulkan tooling, and of course Cognizant for its valuable contributions to the Vulkan CTS.

This work started with moving structures in drivers and specifications along the CTS tests definition. It has been a joint venture to validate the specifications and check that the drivers suppports properly the features necessary to encode a video with h264 and h265 codecs. Breaking news, more formats should come very soon…

The GStreamer code

Building a community

Thanks first to Lynne and David Airlie, on their RADV/FFmpeg effort, a first implementation was available last year to validate the provisional specifications of the encoders. This work helped us a lot to design the GStreamer version and offer a performant solution for it.

The GStreamer Vulkan bits have been also possible thanks to the tight collaboration with the CTS folks which helped us a lot to understand the crashes we could experience without any clues in the drivers.

To write this GStreamer code, we got inspiration from the code written by the GStreamer community along the GstVA encoders and v4l2codecs plugins. So great kudos to the GStreamer community and especially to He Junayan from Intel, Collabora folks paving the way to stateless codecs and of course Igalia’s past contributions.

And finally the active and accurate reviews from Centricular folks to achieve a valid synchronized operation in Vulkan, see for example https://gitlab.freedesktop.org/gstreamer/gstreamer/-/merge_requests/5079 in addition to their initial work to support Vulkan.

Give me the (de)coded bits

The code is available on the GStreamer MR and we plan to have it ready in the 1.24 version.

This code needs at least the Vulkan SDK version 1.3.274 with KHR Vulkan Video encoding extension support. It should be available publicly when the specifications and the tooling will be available along the next SDK release early 2024.

To build it, nothing is more simple than (after installing the LunarG SDK and the GST requirements):

$ meson setup builddir -Dgst-plugins-bad:vulkan-video=enabled
$ ninja -C builddir

And to run it, you’ll need an IHV driver supporting the KHR extension, which should happen very soon, or the community driven driver such as Igalia’s Intel ANV driver. You can also try the AMD RADV driver available here when KHR will be available.

To encode and validate the bitstream you can run this transcode GStreamer pipeline:

$ gst-launch-1.0 videotestsrc ! vulkanupload ! vulkanh265enc ! h265parse ! avdec_h265 ! autovideosink

As you can see a vulkanupload is necessary to upload the GStreamer buffers to the GPU memory, then the buffer will be encoded through the vulkanh265enc which will generate a byte-stream AU aligned stream which can be decoded with any decoder, here the FFMpeg one.

We are planning to simplify this pipeline and allow to select an encoder from a multiple device configuration.

It supports most of the Vulkan video encoder features and can encode I, P and B Frames for better streaming performance. See this blog post

Challenges

One of the most important challenge here was to put all the bits together and understand when the drivers tell you that you are doing something wrong with only an obscure segfault.

The vulkan process to encode stateless a raw bitstream is quite simple as it will be:

  • Initialize the video session with the correct video profile (h264,h265 etc…)
  • Initialize the sessions paramaters such as SPS/PPS which will be used by the driver during the encode session
  • Reset the codec state
  • Encode the buffers:
    • Change the quality/rate control if necessary
    • Retrieve the video session parameters from the driver if necessary
    • Set the bitstream standard parameters such as slice header data.
    • Set the begin video info to give the context to the encoder for the reference
    • Encode the buffer
    • Query the encoded result
    • End the video coding operation
    • repeat ..

The challenge here was not the state machine design but more the understanding of the underlying bits necessary to tell the driver what and how to encode it. Indeed to encode a video stream, a lot of parameters (more than 200..) are necessary and a bad use of these parameters can lead to obscure crashes in the driver stack.

So first it was important to validate the data provided to the Vulkan drivers through the Validation layer. This part in implementing an application using Vulkan is the first and essential step to avoid headaches, see here

You can checkout the code from here to get the latest one and configure it to validate your project. Otherwise it should come from the Vulkan SDK.

This step helped us a lot to understand the mechanics expected by the driver to provide the frames and their references to encode those properly. It also helped to use some API such as the rate control or the quality level.

But Validation Layers cant do everything and that’s why designing, reviewing a solid codebase in CTS as a reference is crucial to implement the GStreamer application or any other 3rd party application. So a lot of time in Igalia has been dedicated to ease and simplify the CTS tests to facilitate its usage.

Indeed the Validation Layers do not validate the Video Standard parameters for example such as the one provided for the format underlying specific data.

So we spent a few hours checking some missing parameters in CTS as in GStreamer which was leading to obscure crash, especially when we were playing with references. A blog post should follow explaining this part.

To debug this missing bits, we used some facilitating tools such as GfxReconstruct Vulkan layer or the VK_LAYER_LUNARG_api_dump which allows you to get an essential dump of your calls and compare the results with other implemetations.

Conclusion

It was a great journey providing a cross platform solution to the community to encode video using largely adopted codec such as h264 and h265. As said before other codecs will come very soon, so stay tuned!

I mentioned the team work in this journey as without the dedication of all, it wouldn’t have been possible to achieve this work in a short time span.

And last but not least, I’d like to special thank also Valve to sponsor this work without whom it wouldn’t have been possible to go as far and as fast!

We’ll attend the Vulkanised 2024 event where we’ll demonstrate this work. Feel free to come and talk to us there. We’ll be delighted to hear and discuss your doubts or brilliant ideas !

As usual, if you would like to learn more about Vulkan, GStreamer or any other open multimedia framework, please contact us!

Fedora Workstation 39 and beyond

From Christian F.K. Schaller by Christian Schaller

(Christian Schaller)

I have not been so active for a while with writing these Fedora Workstation updates and part of the reason was that I felt I was beginning to repeat myself a lot, which I partly felt was a side effect of writing them so often, but with some time now since my last update I felt that time was ripe again. So what are some of the things we have been working on and what are our main targets going forward? This is not a exhaustive list, but hopefully items you find interesting. Apologize for weird sentences and potential spelling mistakes, but it ended up a a long post and when you read your own words over for the Nth time you start going blind to issues :)

PipeWire

PipeWire 1.0 is available! PipeWire keeps the Linux Multimedia revolution rolling[/caption]So lets start with one of your favorite topics, PipeWire. As you probably know PipeWire 1.0 is now out and I feel it is a project we definitely succeeded with, so big kudos to Wim Taymans for leading this effort. I think the fact that we got both the creator of JACK, Paul Davis and the creator of PulseAudio Lennart Poettering to endorse it means our goal of unifying the Linux audio landscape is being met. I include their endorsement comments from the PipeWire 1.0 release announcement here :

“PipeWire represents the next evolution of audio handling for Linux, taking
the best of both pro-audio (JACK) and desktop audio servers (PulseAudio) and
linking them into a single, seamless, powerful new system.”
– Paul Davis, JACK and Ardour author

“PipeWire is a worthy successor to PulseAudio, providing a feature set
closer to how modern audio hardware works, and with a security model
with today’s application concepts in mind. Version 1.0 marks a
major milestone in completing the adoption of PipeWire in the standard
set of Linux subsystems. Congratulations to the team!”
– Lennart Poettering, Pulseaudio and systemd author

So for new readers, PipeWire is a audio and video server we created for Fedora Workstation to replace PulseAudio for consumer audio, JACK for pro-audio and add similar functionality for video to your linux operating system. So instead of having to deal with two different sound server architectures users now just have to deal with one and at the same time they get the same advantages for video handling. Since PipeWire implemented both the PulseAudio API and the JACK API it is a drop in replacement for both of them without needing any changes to the audio applications built for those two sound servers. Wim Taymans alongside the amazing community that has grown around the project has been hard at work maturing PipeWire and adding any missing feature they could find that blocked anyone from moving to it from either PulseAudio and JACK. Wims personal focus recently has been on an IRQ based ALSA driver for PipeWire to be able to provide 100% performance parity with the old JACK server. So while a lot of Pro-audio users felt that PipeWire’s latency was already good enough, this work by Wim shaves of the last few milliseconds to reach the same level of latency as JACK itself had.

In parallel with the work on PipeWire the community and especially Collabora has been hard at work on the new 0.5 release of WirePlumber, the session manager which handles all policy issues for PipeWire. I know people often get a little confused about PipeWire vs WirePlumber, but think of it like this: PipeWire provides you the ability to output audio through a connected speaker, through a bluetooth headset, through an HDMI connection and so on, but it doesn’t provide any ‘smarts’ for how that happens. The smarts are instead provided by WirePlumber which then contains policies to decide where to route your audio or video, either based on user choice or through preset policies making the right choices automatically, like if you disconnect your USB speaker it will move the audio to your internal speaker instead. Anyway, WirePlumber 0.5 will be a major step forward for WirePlumber moving from using lua scripts for configuration to instead using JSON for configuration while retaining lua for scripting. This has many advantages, but I point you to this excellent blog post by Collabora’s Ashok Sidipotu for the details. Ashok got further details about WirePlumber 0.5 that you can find here.

With PipeWire 1.0 out the door I feel we are very close to reaching one of our initial goals with PipeWire, to remove the need for custom pro-audio distributions like Fedora JAM or Ubuntu Studio, and instead just let audio folks be able to use the same great Fedora Workstation as the rest of the world. With 1.0 done Wim plans next to look a bit at things like configuration tools and similar used by pro-audio folks and also dive into the Flatpak portal needs of pro-audio applications more, to ensure that Flatpaks + PipeWire is the future of pro-audio.

On the video handling side its been a little slow going since there applications need to be ported from relying directly on v4l. Jan Grulich has been working with our friends at Mozilla and Google to get PipeWire camera handling support into Firefox and Google Chrome. At the moment it looks like the Firefox support will land first, in fact Jan has set up a COPR that lets you try it out here. For tracking the upstream work in WebRTC to add PipeWire support Jan set up this tracker bug. Getting the web browsers to use PipeWire is important both to enable the advanced video routing capabilities of PipeWire, but it will also provide applications the ability to use libcamera which is a needed for new modern MIPI cameras to work properly under Linux.

Another important application to get PipeWire camera support into is OBS Studio and the great thing is that community member Georges Stavracas is working on getting the PipeWire patches merged into OBS Studio, hopefully in time for their planned release early next year. You can track Georges work in this pull request.

For more information about PipeWire 1.0 I recommend our interview with Wim Taymans in Fedora Magazine and also the interview with Wim on Linux Unplugged podcast.

HDR
HDRHDR, or High Dynamic Range, is another major effort for us. HDR is a technology I think many of you have become familiar with due to it becoming quite common in TVs these days. It basically provides for greatly increased color depth and luminescence on your screen. This is a change that entails a lot of changes through the stack, because when you introduce into an existing ecosystem like the Linux desktop you have to figure out how to combine both new HDR capable applications and content and old non-HDR applications and content. Sebastian Wick, Jonas Ådahl, Oliver Fourdan, Michel Daenzer and more on the team has been working with other members of the ecosystem from Intel, AMD, NVIDIA, Collabora and more to pick and define the standards and protocols needed in this space. A lot of design work was done early in the year so we been quite focused on implementation work across the drivers, Wayland, Mesa, GStreamer, Mutter, GTK+ and more. Some of the more basic scenarios, like running a fullscreen HDR application is close to be ready, while we are still working hard on getting all the needed pieces together for the more complex scenarios like running SDR and HDR windows composited together on your desktop. So getting for instance full screen games to run in HDR mode with Steam should happen shortly, but the windowed support will probably land closer to summer next year.

Wayland remoting
One feature we been also spending a lot of time on is enabling remote logins to a Wayland desktop. You have been able to share your screen under Wayland more or less from day one, but it required your desktop session to be already active. But lets say you wanted to access your Wayland desktop running on a headless system you been out of luck so far and had to rely on the old X session instead. So putting in place all the pieces for this has been quite an undertaking with work having been done on PipeWire, on Wayland portals, gnome remote desktop daemon, libei; the new input emulation library, gdm and more. The pieces needed are finally falling into place and we expect to have everything needed landed in time for GNOME 46. This support is currently done using a private GNOME API, but a vendor less API is being worked on to replace it.

As a sidenote here not directly related to desktop remoting, but libei has also enabled us to bring xtest support to XWayland which was important for various applications including Valves gamescope.

NVIDIA drivers
One area we keep investing in is improving the state of NVIDIA support on Linux. This comes both in the form of being the main company backing the continued development of the Nouveau graphics driver. So the challenge with Nouveau is that for the longest while it offered next to no hardware acceleration for 3D graphics. The reason for this was that the firmware that NVIDIA provided for Nouveau to use didn’t expose that functionality and since recent generations of NVIDIA cards only works with firmware signed by NVIDIA this left us stuck. So Nouveau was a good tool for doing an initial install of a system, but if you where doing any kind of serious 3D acceleration, including playing games, then you would need to install the NVIDIA binary driver. So in the last year that landscape around that has changed drastically, with the release of the new out-of-tree open source driver from NVIDIA. Alongside that driver a new firmware has also been made available , one that do provide full support for hardware acceleration.
Let me quickly inject a quick explanation of out-of-tree versus in-tree drivers here. An in-tree driver is basically a kernel driver for a piece of hardware that has been merged into the official Linux kernel from Linus Torvalds and is thus being maintained as part of the official Linux kernel releases. This ensures that the driver integrates well with the rest of the Linux kernel and that it gets updated in sync with the rest of the Linux kernel. So Nouveau is an in-tree kernel driver which also integrates with the rest of the open source graphics stack, like Mesa. The new NVIDIA open source driver is an out-of-tree driver which ships as a separate source code release on its own schedule, but of course NVIDIA works to keeps it working with the upstream kernel releases (which is a lot of work of course and thus considered a major downside to being an out of tree driver).

As of the time of writing this blog post NVIDIAs out-of-tree kernel driver and firmware is still a work in progress for display usercases, but that is changing with NVIDIA exposing more and more display features in the driver (and the firmware) with each new release they do. But if you saw the original announcement of the new open source driver from NVIDIA and have been wondering why no distribution relies on it yet, this is why. So what does this mean for Nouveau? Well our plan is to keep supporting Nouveau for the foreseeable future because it is an in-tree driver, which is a lot easier to ensure keeps working with each new upstream kernel release.

At the same time the new firmware updates allows Nouveau to eventually offer performance levels competitive with the official out-of-tree driver, kind of how the open source AMD driver with MESA offers comparable performance to AMD binary GPU driver userspace. So Nouvea maintainer Ben Skeggs spent the last year working hard on refactoring Nouveau to work with the new firmware and we now have a new release of Nouveau out showing the fruits of that labor, enabling support for NVIDIAs latest chipset. Over time we will have it cover more chipset and expand Vulkan and OpenGL (using Zink) support to be a full fledged accelerated graphics driver.
So some news here, Ben after having worked tirelessly on keeping Nouveau afloat for so many years decided he needed a change of pace and thus decided to leave software development behind for the time being. A big thank you to Ben from all us at Red Hat and Fedora ! The good news is that Danilo Krummrich will take over as the development lead, with Lyude Paul taking on working on the Display side specifically of the driver. We also expect to have other members of the team chipping in too. They will pick up Bens work and continue working with NVIDIA and the community on a bright future for Nouveau.

So as I mentioned though the new open source driver from NVIDIA is still being matured for the display usercase and until it works fully as a display driver neither will Nouveau be able to be a full alternative since they share the same firmware. So people will need to rely on the binary NVIDIA Driver for some time still. One thing we are looking at there and discussing is if there are ways for us to improve the experience of using that binary driver with Secure Boot enabled. Atm that requires quite a bit of manual fiddling with tools like mokutils, but we have some ideas on how to streamline that a bit, but it is a hard nut to solve due to a combination of policy issues, legal issues, security issues and hardware/UEFI bugs so I am making no promises at this point, just a promise that it is something we are looking at.

Accessibility
Accessibility is an important feature for us in Fedora Workstation and thus we hired Lukáš Tyrychtr to focus on the issue. Lukáš has been working through across the stack fixing issues blocking proper accessibility support in Fedora Workstation and also participated in various accessibility related events. There is still a lot to do there so I was very happy to hear recently that the GNOME Foundation got a million Euro sponsorship from the Sovereign Tech Fund to improve various things across the stack, especially improving accessibility. So the combination of Lukáš continued efforts and that new investment should make for a much improved accessibility experience in GNOME and in Fedora Workstation going forward.

GNOME Software
Another area that we keep investing in is improving GNOME Software, with Milan Crha working continuously on bugfixing and performance improvements. GNOME Software is actually a fairly complex piece of software as it has to be able to handle the installation and updating of RPMS, OSTree system images, Flatpaks, fonts and firmware for us in addition to the formats it handles for other distributions. For some time it felt was GNOME Software was struggling with the load of all those different formats and usercases and was becoming both slow and with a lot of error messages. Milan has been spending a lot of time dealing with those issues one by one and also recently landed some major performance improvements making the GNOME Software experience a lot better. One major change that Milan is working on that I think we will be able to land in Fedora Workstation 40/41 is porting GNOME Software to use DNF5. The main improvement end users will probably notice is that it unifies the caches used for GNOME Software and using dnf on the command line, saving you storage space and also ensuring the two are fully in sync on what RPMS is installed/updated at any given time.

Fedora and Flatpaks

Flatpaks is another key element of our strategy for moving the Linux desktop forward and as part of that we have now enabled all of Flathub to be available if you choose to enable 3rd party repositories when you install Fedora Workstation. This means that the huge universe of applications available on Flathub will be easy to install through GNOME Software alongside the content available in Fedora’s own repositories. That said we have also spent time improving the ease of making Fedora Flatpaks. Owen Taylor jumped in and removed the dependency on a technology called ‘modularity‘ which was initially introduced to Fedora to bring new features around having different types of content and ease keeping containers up to date. Unfortunately it did not work out as intended and instead it became something that everyone just felt made things a lot more complicated, including building Flatpaks from Fedora content. With Owens updates building Flatpaks in Fedora has become a lot simpler and should help energize the effort building Flatpaks in Fedora.

Toolbx
As we continue marching towards a vision for Fedora Workstation to be a highly robust operating we keep evolving Toolbx. Our tool for making running your development environment(s) inside a container and thus allows you to both keep your host OS pristine and up to date, while at the same time using specific toolchains and tools inside the development container. This is a hard requirement for immutable operating systems such as Fedora Silverblue or Universal blue, but it is also useful on operating systems like Fedora Workstation as a way to do development for other platforms, like for instance Red Hat Enterprise Linux.

A major focus for Toolbx since the inception is to get it a stage where it is robust and reliable. So for instance while we prototyped it as a shell script, today it is written in Go to be more maintainable and also to confirm with the rest of the container ecosystem. A recent major step forward for getting that stability there is that starting with Fedora 39, the toolbox image is now a release blocking deliverable. This means it is now built as part of the nightly compose and the whole Toolbx stack (ie. the fedora-toolbox image and the toolbox RPM) is part of the release-blocking test criteria. This shows the level of importance we put on Toolbx as the future of Linux software development and its criticality to Fedora Workstation. Earlier, we built the fedora-toobox image as a somewhat separate and standalone thing, and people interested in Toolbx would try to test and keep the whole thing working, as much as possible, on their own. This was becoming unmanageable because Toolbx integrates with many parts of the distribution from Mutter (ie, the Wayland and X sockets) to Kerberos to RPM (ie., %_netsharedpath in /usr/lib/rpm/macros.d/macros.toolbox) to glibc locale definitions and translations. The list of things that could change elsewhere in Fedora, and end up breaking Toolbx, was growing too large for a small group of Toolbx contributors to keep track of.

We the next release we now also have built-in support for Arch Linux and Ubuntu through the –distro flag in toolbox.git main, thanks again to the community contributors who worked with us on this allowing us to widen the amount of distros supported while keeping with our policy of reliability and dependability. And along the same theme of ensuring Toolbx is a tool developers can rely on we have added lots and lots of new tests. We now have more than 280 tests that run on CentOS Stream 9, all supported Fedoras and Rawhide, and Ubuntu 22.04.

Another feature that Toolbx maintainer Debarshi Ray put a lot of effort into is setting up full RHEL containers in Toolbx on top of Fedora. Today, thanks to Debarshi work you do subscription-manager register --username user@domain.name on the Fedora or RHEL host, and the container is automatically entitled to RHEL content. We are still looking at how we can provide a graphical interface for that process or at least how to polish up the CLI for doing subscription-manager register. If you are interested in this feature, Debarshi provides a full breakdown here.

Other nice to haves added is support for enterprise FreeIPA set-ups, where the user logs into their machine through Kerberos and support for automatically generated shell completions for Bash, fish and Z shell.

Flatpak and Foreman & Katello
For those out there using Foreman to manage your fleet of Linux installs we have some good news. We are in the process of implementing support for Flatpaks in these tools so that you can manage and deploy applications in the Flatpak format using them. This is still a work in progress, but relevant Pulp and Katello commits are Pulp commit Support for Flatpak index endpoints and Katello commits Reporting results of docker v2 repo discovery” and Support Link header in docker v2 repo discovery“.

LVFS
Another effort that Fedora Workstation has brought to the world of Linux and that is very popular arethe LVFS and fwdup formware update repository and tools. Thanks to that effort we are soon going to be passing one hundred million firmware updates on Linux devices soon! These firmware updates has helped resolve countless bugs and much improved security for Linux users.

But we are not slowing down. Richard Hughes worked with industry partners this year to define a Bill of Materials defintion to firmware updates allowing usings to be better informed on what is included in their firmware updates.

We now support over 1400 different devices on the LVFS (covering 78 different protocols!), with over 8000 public firmware versions (image below) from over 150 OEMs and ODMs. We’ve now done over 100,000 static analysis tests on over 2,000,000 EFI binaries in the firmware capsules!

Some examples of recently added hardware:
* AMD dGPUs, Navi3x and above, AVer FONE540, Belkin Thunderbolt 4 Core Hub dock, CE-LINK TB4 Docks,CH347 SPI programmer, EPOS ADAPT 1×5, Fibocom FM101, Foxconn T99W373, SDX12, SDX55 and SDX6X devices, Genesys GL32XX SD readers, GL352350, GL3590, GL3525S and GL3525 USB hubs, Goodix Touch controllers, HP Rata/Remi BLE Mice, Intel USB-4 retimers, Jabra Evolve 65e/t and SE, Evolve2, Speak2 and Link devices, Logitech Huddle, Rally System and Tap devices, Luxshare Quad USB4 Dock, MediaTek DP AUX Scalers, Microsoft USB-C Travel Hub, More Logitech Unifying receivers, More PixartRF HPAC devices, More Synaptics Prometheus fingerprint readers, Nordic HID devices, nRF52 Desktop Keyboard, PixArt BLE HPAC OTA, Quectel EM160 and RM520, Some Western Digital eMMC devices, Star Labs StarBook Mk VIr2, Synaptics Triton devices, System76 Launch 3, Launch Heavy 3 and Thelio IO 2, TUXEDO InfinityBook Pro 13 v3, VIA VL122, VL817S, VL822T, VL830 and VL832, Wacom Cintiq Pro 27, DTH134 and DTC121, One 13 and One 12 Tablets

InputLeap on Wayland
One really interesting feature that landed for Fedora Workstation 39 was the support for InputLeap. It’s probably not on most peoples radar, but it’s an important feature for system administrators, developers and generally anyone with more than a single computer on their desk.

Historically, InputLeap is a fork of Barrier which itself was a fork of Synergy, it allows to share the same input devices (mouse, keyboard) across different computers (Linux, Windows, MacOS) and to move the pointer between the screens of these computers seamlessly as if they were one.

InputLeap has a client/server architecture with the server running on the main host (the one with the keyboard and mouse connected) and multiple clients, the other machines sitting next to the server machine. That implies two things, the InputLeap daemon on the server must be able to “capture” all the input events to forward them to the remote clients when the pointer reaches the edge of the screen, and the InputLeap client must be able to “replay” those input events on the client host to make it as if the keyboard and mouse were connected directly to the (other) computer. Historically, that relied on X11 mechanisms and neither InputLeap (nor Barrier or even Synergy as a matter of fact) would work on Wayland.

This is one of the use cases that Peter Hutterer had in mind when he started libEI, a low-level library aimed at providing a separate communication channel for input emulation in Wayland compositors and clients (even though libEI is not strictly tied to Wayland). But libEI alone is far from being sufficient to implement InputLeap features, with Wayland we had the opportunity to make things more secure than X11 and take benefit from the XDG portal mechanisms.

On the client side, for replaying input events, it’s similar to remote desktop but we needed to update the existing RemoteDesktop portal to pass the libEI socket. On the server side, it required a brand new portal for input capture . These also required their counterparts in the GNOME portal, for both RemoteDesktop and InputCapture [8], and of course, all that needs to be supported by the Wayland compositor, in the case of GNOME that’s mutter. That alone was a lot of work.

Yet, even with all that in place, that’s just the basic requirements to support a Synergy/Barrier/InputLeap-like feature, the tools in question need to have support for the portal and libEI implemented to benefit from the mechanisms we’ve put in place and for the all feature to work and be usable. So libportal was also updated to support the new portal features and a new “Wayland” backend alongside the X11, Windows and Mac OS backends was contributed to InputLeap.

The merge request in InputLeap was accepted very early, even before the libEI API was completely stabilized and before the rest of the stack was merged, which I believe was a courageous choice from Povilas (who maintains InputLeap) which helped reduce the time to have the feature actually working, considering the number of components and inter-dependencies involved. Of course, there are still features missing in the Wayland backend, like copy/pasting between hosts, but a clipboard interface was fairly recently added to the remote desktop portal and therefore could be used by InputLeap to implement that feature.

Fun fact, Xwayland also grew support for libEI also using the remote desktop portal and wires that to the XTEST extension on X11 that InputLeap’s X11 backend uses, so it might even be possible to use the X11 backend of InputLeap in the client side through Xwayland, but of course it’s better to use the Wayland backend on both the client and server sides.

InputLeap is a great example of collaboration between multiple parties upstream including key contributions from us at Red Hat to implement and contribute a feature that has been requested for years upstream..

Thank you to Olivier Fourdan, Debarshi Ray, Richard Hughes, Sebastian Wick and Jonas Ådahl for their contributions to this blog post.

Pied Beta

From Michael Sheldon's Stuff by Michael Sheldon

(Michael Sheldon)

For the past couple of months I’ve been working on Pied, an application that makes it easy to use modern, natural sounding, text-to-speech voices on Linux. It does this by integrating the Piper neural text-to-speech engine with speech-dispatcher, so most existing software will work with it out of the box.

The first beta version is now available in the snap store:

Get it from the Snap Store

(Other package formats will follow)

I’d appreciate any feedback if you’re able to test it, thanks!

GStreamer 1.22.7 stable bug fix release

From GStreamer News by GStreamer

(GStreamer)

The GStreamer team is pleased to announce another bug fix release in the stable 1.22 release series of your favourite cross-platform multimedia framework!

This release only contains bugfixes and security fixes and it should be safe to update from 1.22.x.

Highlighted bugfixes:

  • Security fixes for the MXF demuxer and AV1 codec parser
  • glfilter: Memory leak fix for OpenGL filter elements
  • d3d11videosink: Fix toggling between fullscreen and maximized, and window switching in fullscreen mode
  • DASH / HLS adaptive streaming fixes
  • Decklink card device provider device name string handling fixes
  • interaudiosrc: handle non-interleaved audio properly
  • openh264: Fail gracefully if openh264 encoder/decoder creation fails
  • rtspsrc: improved whitespace handling in response headers by certain cameras
  • v4l2codecs: avoid wrap-around after 1000000 frames; tiled formats handling fixes
  • video-scaler, audio-resampler: downgraded "Can't find exact taps" debug log messages
  • wasapi2: Don't use global volume control object
  • Rust plugins: various improvements in aws, fmp4mux, hlssink3, livesync, ndisrc, rtpav1depay, rsfilesink, s3sink, sccparse
  • WebRTC: various webrtchttp, webrtcsrc, and webrtcsink improvements and fixes
  • Cerbero build tools: recognise Windows 11; restrict parallelism of gst-plugins-rs build on small systems
  • Packages: ca-certificates update; fix gio module loading and TLS support on macOS

See the GStreamer 1.22.7 release notes for more details.

Binaries for Android, iOS, Mac OS X and Windows will be available shortly.

Release tarballs can be downloaded directly here:

GStreamer Conference 2023

From Herostratus’ legacy by Víctor Jáquez

This year the GStreamer Conference happened in A Coruña, basically at home, along with the hackfest.The conference was the first after a long hiatus of four years of pandemics. The community craved it and long expected it. Some igalians helped to the GStreamer Foundation and our warm community with the organization and logistics. I’m very thankful with my peers and the sponsors of the event. Personally, I’m happy with the outcome. Though, I ought to say, organizing a conference like this is quite a challenge and very demanding.

Palexco Marina
Marina from Palexco

The conference were recorded and streamed by Ubicast. And you can watch any presentation of the conference in their GStreamer Archive.

Tim sharing the State of the Union
Tim sharing the State of the Union
Lunch time in Palexco
Lunch time in Palexco

This is the list of talks where fellow Igalians participated:

There were two days of conference. The following two were for the hackfest, at Igalia’s Head Quarters.

Day one of the Hackfest
Day one of the Hackfest

How to make Flutter’s AlertDialog screen reader friendly

From Michael Sheldon's Stuff by Michael Sheldon

(Michael Sheldon)

While developing Pied, I discovered that Flutter’s AlertDialog widget isn’t accessible to screen reader users (there’s been a bug report filed against this for over a year). Text within the AlertDialog doesn’t get sent to the screen reader when the dialog is focused. The user has no indication that the dialog is present, let alone what its contents are.

Example

The following video demonstrates a typical Flutter AlertDialog with the Orca screen reader enabled. When the ‘Launch inaccessible alert’ button is pressed an alert dialog appears, but the screen reader is unable to read its contents. The ‘Launch accessible alert’ button is then pressed and an accessible alert dialog is demonstrated, with the screen reader successfully reading the contents of the dialog.

The example application used in the video can be found in the accessible_alert_example GitHub repository.

Creating a standard alert dialog

The following code will create a normal Flutter AlertDialog as recommended in the official Flutter documentation. Unfortunately this doesn’t work correctly for screen reader users:

showDialog(
  context: context,
  builder: (context) => AlertDialog(
    actions: [
      TextButton(
        onPressed: () {
          Navigator.pop(context);
        },
      child: const Text('Close Alert'),
      )
    ],
    content: Text(
      'This text is invisible to screen readers. There is no indication that this dialog has even appeared.'
    ),
  )
);

Creating an accessible alert dialog

By making a few small changes to the above code we can make it work correctly with screen readers. First we wrap the AlertDialog in a Semantics widget. This allows us to attach a screen reader friendly label to the AlertDialog. The Semantics label text should be the same as the text displayed in the AlertDialog. Finally, to have this text read as soon as the alert is triggered, we enable autofocus on the TextButton:

showDialog(
  context: context,
  builder: (context) => Semantics(
    label: 'This text will be read by a screen reader. It is clear to the user that something has happened.',
    enabled: true,
    container: true,
    child: AlertDialog(
      actions: [
        TextButton(
          autofocus: true,
          onPressed: () {
            Navigator.pop(context);
          },
          child: const Text('Close Alert'),
        )
      ],
      content: Text(
        'This text will be read by a screen reader. It is clear to the user that something has happened.'
      ),
    )
  )
);

Introducing GstPipelineStudio 0.3.4

From dabrain34.github.io by Stéphane Cerveau

GstPipelineStudio

As it’s not always convenient to use the powerful command line based, gst-launch tool and also manage all the debug possibilities on all the platforms supported by GStreamer, I started this personal project in 2021 to facilitate the adoption to the GStreamer framework and help newbies as confirmed engineers enjoy the power of it.

Indeed a few other projects, such as Pipeviz (greatly inspired from…) or gst-debugger, already tried to offer this GUI capability, my idea with GPS was to provide a cross-platform tool written in Rust with the powerful framework, gtk-rs.

The aim of this project is to provide the GUI for GStreamer but also being able to remote debug existing pipeline while offering a very simple and accessible interface as back in the days I discovered DirectShow with the help of graphedit or GraphStudioNext

Project details

GPS

The interface includes 5 important zones:

  • (1) The registry area gives you an access to the GStreamer registry including all the plugins/elements available on your system. It provides you details on each elements. You can also access to a favorite list.
  • (2) the main drawing area is where you can add elements from the registry area and connect them together. This area allows you to have multiple independent pipelines with its own player for each drawing area.
  • (3) The control playback area, each pipeline can be controlled from this area including basic play/stop/pause but also a seekbar.
  • (4) the debug zone where you’ll receive the messages from the application.
  • (5) The render zone where you can have a video preview if a video sink has been added to the pipeline. Future work includes to have tracers or audio analysis in this zone.

The project has been written in Rust to offer more stability and thanks to the wonderful work to use the GTK framework, it was perfectly fitting to this project as it gives an easy way to use it over the 3 platforms targeted such as GNU/Linux, MacOS and Windows. On this last platform which is quite well “implanted” in the desktop eco-system, the use of GStreamer can lead to difficulties, that’s why GstPipelineStudio Windows MSI will be a perfect match to test the power of the GStreamer framework.

This project has been written under the GPL v3 License.

How it works under the hood

The trick is quite simple as it uses the power of gst-parse-launch API to build a pipeline as a transformation of the visual pipeline to a command line.

So its a clearly a sibling of gst-launch.

Right now its directly linked to the GStreamer installed on your system but future work could be to connect it over daemons such as GstPrinceOfParser or gstd

What’s new in 0.3.4

The main feature of this release is the cross platform ready state. These are beta versions but the CI is now ready to build and deploy Flathub (Linux), Mac OS and Windows version of GstPipelineStudio.

You can download the installers from the project page or with:

  • Linux: flatpak install org.freedesktop.dabrain34.GstPipelineStudio
  • MacOS: DMG file
  • Windows: MSI file

Here is a list of main features added to the app:

  • Open a pipeline from the command line and it will be drawn automatically on the screen. This feature allows you to take any command line pipeline and draw it on the screen to allow any new play tricks with the pipeline, such as change of elements, properties etc.

  • Multiple graphview allows you to draw multiple independent pipeline in the same instance of GstPipelineStudio. The playback state is totally independent for each of the views.

  • Capsfilter has been added to the links allowing to add this crucial feature of GStreamer pipelines.

  • gstreamer-1.0 wrap support to the build system. So you can build your own version of GPS using a dedicated GStreamer version.

What’s next in the pipeline

Among multiple use case, key and debug features, the most upcoming features are:

  • Support the zoom on the graphview. As a pipeline can be quite big, the zoom is a key/must features for GPS. See MR
  • Debug sections such as receiving events/tags/messages or tracers and terminal support, see MR
  • Elements compatibility to check if an element can connect to the previous/next one.
  • Remote debugging: A tracer wsserver is currently under development allowing to send over websocket the pipeline events such as connections, properties or element addition. A MR is under development to connect this tracer and render the corresponding pipeline.
  • Auto plugging according to the rank of each compatible elements for a given pad caps.
  • Display the audio signal in a dedicated render tab.
  • Translations
  • Documentation
  • Unit tests

Here is a lighning talk, I gave about this release (0.3.3), during the 2023 GStreamer conference.

Hope you’ll enjoy this tool and please feel free to provide new features with an RFC here or merge requests here.

As usual, if you would like to learn more about GstPipelineStudio, GStreamer or any other open multimedia framework, please contact us!