Workflow for extracting model training images #971

john- · 2025-04-11T15:04:53Z

john-
Apr 11, 2025

For my use case I spend most of my time in Viseron fgathering images to train object detection models. To get these images out of Viseron seems like it should be easy but for a few reasons it is not for me. For sure I can list those reasons.

However, it would be great to be able to click on an event image and have Viseron provide the full size screen image without a bounding box (if present).

Anyone else have this need? Do you have a straightforward way of collecting full size images for model training purposes?

roflcoopter · 2025-04-14T19:06:44Z

roflcoopter
Apr 14, 2025
Maintainer

Right now the snapshots stored are zoomed in on the object, but i could probably make that configurable.

But just so that i understand your use case properly, would you like to have both snapshots saved (a zoomed in with bounding box drawn, and one "unedited" snapshot), or would it suffice to choose one or the other?

1 reply

john- Apr 14, 2025
Author

For my use case I do not need both. Choosing would be fine: Choice between zoomed with bounding box and full size unedited.

john- · 2025-04-15T15:13:26Z

john-
Apr 15, 2025
Author

I want to put some observations here that may be relevant. It may not make implementation easier so can be ignored.

When browsing EVENTS:

When hovering over the recording icon, one or more full size images are available (via pop-up) to copy/paste. If recording is triggered by object detection, the images also contain bounding boxes. If they are triggered by motion, there is no bounding box.

For the change proposed in this ticket:

My initial thought is when hovering over the recordings triggered by object there would be a configuration to disable the bounding box on the screenshot. These images are already full size and can by copy/pasted for processing outside of Viseron if they did not have the bounding box. This would allow the top level EVENT listing to be unchanged: It would still show events zoomed in to the bounding box.

However, I see that I have some events indicated but no recording for them. I have trigger_event_recording set for every label in my config. If this is a misconfiguration or defect that could be fixed, then maybe this approach work instead of changing the snapshot image in the EVENT listing.

Here is an additional thing I noticed unrelated to this ticket:

I only now just realized that if I scroll down in the pop-up there is metadata listed as well as download icon (for wither snapshot or recording). No matter how small I make my Firefox display it always requires scrolling down to view this.

I also just noticed that if there are more than 3 recordings that scrolling is needed to see the 3rd recording (and beyond).

I mention this as I did not notice in the documentation here. Alternately, I am not sure what a software change would be to make this clearer.

3 replies

kaburagisec Jan 19, 2026

perhaps a bbox_snapshot key as a boolean under the <CAMERA_IDENTIFIER> key in the component configuration with the object_detector domain could solve this.

john- Jan 19, 2026
Author

I can take a stab at proving this out.

Correct me if I am wrong but a possible negative would be that the bounding box would not be drawn on the snapshot in the EVENT list as well as not being drawn on the image as it is available for download.

kaburagisec Jan 19, 2026

yes only if bbox_snapshot is set to false, and it is possible to use another workflow options, many possibilities..

My initial thought is when hovering over the recordings triggered by object there would be a configuration to disable the bounding box on the screenshot. These images are already full size and can by copy/pasted for processing outside of Viseron if they did not have the bounding box. This would allow the top level EVENT listing to be unchanged: It would still show events zoomed in to the bounding box.

And what screenshot are you referring to here?

john- · 2025-06-01T20:10:33Z

john-
Jun 1, 2025
Author

There is another approach I want to explore (before modifying user facing code).

I want to see what is involved in writing a script that will query the events and based on that export the images that correspond those those events. The export could include metadata in the name (e.g. label of object detected) or as a separate file (bounding box).

3 replies

roflcoopter Jun 6, 2025
Maintainer

Right that is actually a good idea. Viseron should store the full images, along with bounding boxes in the database so that they can be shown with/without bboxes.

Users could then toggle the bboxes on/off in the UI or when exporting.

What do you think about that?

john- Jun 6, 2025
Author

I like the approach of storing the data like you outlined.

There is already an image cache mechanism in Viseron to help UI performance? Might be needed if rendering is done from the raw data.

roflcoopter Jun 10, 2025
Maintainer

No there is no such thing currently, would need to think about how to handle that. For simplicity its probably better to do the drawing and cropping server side, and store them somewhere with a configurable size in case of needing to reuse the images

john- · 2025-07-06T00:42:39Z

john-
Jul 6, 2025
Author

But just so that i understand your use case properly, would you like to have both snapshots saved (a zoomed in with bounding box drawn, and one "unedited" snapshot), or would it suffice to choose one or the other?

So my original answer was that I do not need both. However, I hacked a version of Viseron that does not do the bounding box or zoom/crop. It only saves the raw image. It turns out that it is nice to see the zoomed/boxed version to immediately see that was detected. If the detection is wrong I would then save the raw image for future training.

In addition to the approach of using the metadata to render the zoomed/cropped version for UI and make available the raw image for download, I was thinking if there was any feasibility of saving the zoomed/cropped version as a thumbnail inside the full image. I haven't found anything that does this but thought I would mention it as an alternate solution. In any case it might not be any more straightforward then rendering on demand or saving two versions.

1 reply

roflcoopter Jul 13, 2025
Maintainer

I think the best way is to save two versions, but make it configurable.

Rendering on demand will make the UI slower, and the Events page is already pretty heavy

john- · 2026-01-19T17:24:53Z

john-
Jan 19, 2026
Author

I have been working with the changed UI as implemented by @kaburagisec and propose this as way to streamline the gathering training images from Viseron:

Select this existing enlarge button that appears when viewing events, timelines or recordings:

The enlarged view has controls for navigating the video and exiting full screen mode which is very helpful. However, to provide a clean image for training can these be removed after user inactivity or disabled/enabled with a hotkey? This would be a modification to Viseron.
With a clean image, user can then take a screenshot using their screenshot tool. They can then save the image as appropriate. There is no change to Viseron for this step.

Note: the other enlarge button () seems to modify the image by placing a border around it. If this cannot be removed it would be problematic for image capture).

There may be other ways of accomplishing this but the goal is:

Allow the user to navigate to a frame of interest in a Viseron video
Present a full size image that contains no navigation or other controls

11 replies

john- Jan 19, 2026
Author

Since you've already contributed to the Yolo component in Viseron, why not include this workflow directly within the Yolo component in Viseron?

Can you give me an idea what that would look like? I am not seeing the connection between the snapshots/video in Vierson and the functionality of an object detector.

I am open to any approach.

kaburagisec Jan 19, 2026

Before I try to give you a conceptual opinion, I want validate your current workflow for doing Ultralytics Yolo training:

There is a trigger event for object detection in Viseron.
You crop stream from VLC with the screen capture tool, and save it locally.
You train all snapshots with the ultalytics library.

Is that true? Perhaps you could explain your current workflow in more detail.

john- Jan 19, 2026
Author

Hopefully this additional detail is helps. If not, let me know.

There is a trigger event for object detection in Viseron.
I use VLC to view video files in Viseron event_clips directory

I use VLC controls to navigate the video until I find point of interest (e.g. false positive - chair is detected as a person)
I use screen capture tool to save the images to disk

I use 3rd party tool to label the images with appropriate bounding box (human needs to be involved) and use Ultralytics library to train.

These images are batched up for a training run. For example, after a few hundred or more additional labeled images I kick of a training run. The last training run was almost 60 hours so I don't do it often.

Context:

The workflow is not specific to Ultralytics YOLO training. Others could possibly have same need.
In my case Viseron is set to do object detection once a second independent of motion and It may be that none (or very few) of these need to be labelled for subsequent training.
With this approach I will sometimes do not bother with the Viseron event_clips directory. My camera puts out out motion activated video and will use VLC against that directory.

kaburagisec Jan 19, 2026

The key here is to first annotate the area of interest from an event clip/recording before using it for model training. Hmm, this means this is a HITL (Human in the Loop) workflow.

If so, it still requires a UI component for it. And of course, the workflow code for each object_detector component will differ depending on the library/binary used.

john- Jan 19, 2026
Author

Yeah makes sense. Thanks for taking a look.

john- · 2026-01-19T21:01:51Z

john-
Jan 19, 2026
Author

So as an additional data point we can see how Frigate does this. I used Frigate previously but will need to take a look to see workflow as I don't fully recall. I will update the thread when I confirm.

Frigate doesn't let the user use the images to train their own models. Frigate developer does the training with images labelled by the user and uploaded to Frigate server. To access models you need to pay Frigate developer (as I recall) for models they train with your data.

0 replies

roflcoopter · 2026-01-19T21:21:31Z

roflcoopter
Jan 19, 2026
Maintainer

I think long term it would be best for Viseron to have a built in annotations tool.
We could mimic your workflow:

Viseron detects something that shows up on the Events page
You could then either directly open the snapshot in the annotations tool, or open the recording in a video player and choose your own snapshots
User can then create annotations as needed and Viseron will save it to an appropriate location

I do not have much experience with annotating images for models, is there a standard format or something we could use?

Components that supply object detector domains could also implement this learning process to make it easy for people to use, as i guess there is a certain barrier for entry?

If we can land that we basically have Frigate+ but local for each user, which would be really cool imo.

Edit: Found this, could be useful? https://labelstud.io/

4 replies

john- Jan 19, 2026
Author

Actually I am using Label Studio to do the bounding boxes and labels. It has a self-hosting option which was a requirement for me due to not wanting images from my cameras uploaded to a server somewhere.

For identifying the extents of an item of interest there are a few options I know of. I present this as an indicator as to how much is involved in this...not as something it makes sense to develop:

Simple bounding boxes (rectangles defining the bounds of object such as a person).
Oriented bounding box (OBB). These are rectangles that can rotated to better fit the object.
Polygons
?

As far as standard for exporting there are a bunch of them :) There are about 15 listed in my Label Studio install.

My gut feeling is that it is probably a relatively small percentage of people that train their own models. Assuming that is the case streamline getting images of interest out of Viseron makes the most sense.

That said I have not through this and maybe there are other approaches that can be taken.

john- Jan 19, 2026
Author

You could then either directly open the snapshot in the annotations tool, or open the recording in a video player and choose your own snapshots

Label Studio has an API that can be used to get the images into it. This would save the step of saving images somewhere and using the Label Studio UI to upload.

I do not have much experience with annotating images for models, is there a standard format or something we could use?

I think anyone that does this themselves would be OK with simple rectangle bounding box and one of the formats. There are tools and applications that can convert between the formats. Sometimes it seems like that is most of the work that is done.

john- Jan 19, 2026
Author

One more thing your response got me thinking about: Human Signal (Label Studio developer) has another app called label studio ml backend. This allows label studio to do inference on images the user wants to label using this separate application. I use this as well.

Viseron has already done the inference. So it could send not only the image but the bounding box and label to Label Studio. I haven't checked the API to confirm that. The user can then use Label Studio to adjust what was classified. For example, change the extents of the box or change the label from "dog" to "person".

Anyway, having your target state in mind for Viseron is the first step...thanks!

kaburagisec Jan 20, 2026

There are three standard formats that can be used for annotation output:

COCO JSON
YOLO TXT
Pascal VOC (XML)

However, embedding Label Studio within a Viseron container/image would be an overkill workflow. Integrating it as a connector to Label Studio (opening within Label Studio) might be more feasible, but adding it as an instance that also runs within Viseron would increase the workload of Viseron itself.

This annotation workflow shouldn't be a significant challenge since the AI Model here is only for "vision". The main challenge is to train each component of the detection model.

john- · 2026-01-19T22:41:53Z

john-
Jan 19, 2026
Author

So this got me thinking about how to retain privacy yet have a community driven approach to training a machine learning object detection model. Yes, this is beyond what my brain can comprehend so I figured I would risk getting slop and asked an LLM:

Is it possible to train a machine learning object detector (e.g. YOLO) in an arbitrary location with source images that cannot be seen or reconstructed beyond their source?
Basically, is it possible to create a repository of image information stored in such a way that a site that uses it to train cannot get insight into the contents of any of the source images?

I probably got slop but I might never know.

I am back to working on Viserion.

0 replies

kaburagisec · 2026-01-20T10:14:47Z

kaburagisec
Jan 20, 2026

Regarding your feedback on PIP Mode behavior in Mozilla @john- , I've tried it, and everything works as expected. If PIP Control is disabled according to https://support.mozilla.org/en-US/kb/turn-picture-picture-mode, the PIP Mode Button Toggle will completely disappear from all players.

Yes, all these PIP Mode toggles are located in each individual player; that's the design and function intended.

However, the unique thing is that in Chrome, the PIP Mode toggle will negate each other's player, meaning only one PIP Mode window can be open at a time, unlike Mozilla, which allows more than one PIP Mode window at a time.

Regarding right-clicking on the video player, there's no toggle to enter PIP Mode; there are only the Change Slot, Flip View, and Reconnect options. I've tried this in Mozilla.

Hope this information helps.

11 replies

kaburagisec Jan 20, 2026

I'm using the latest Chrome, and all players appear without issue. Maybe you're using Chromium instead of Google Chrome? And are there any errors appearing in the console?

john- Jan 20, 2026
Author

I am using Chromium and not Chrome. I will download and install Chrome.

I turned on the console for Chromium and getting up to speed with the options. That said there are two errors on page load but none occur on the hover-over:

[Violation] 'setTimeout' handler took 82ms
[Violation] Forced reflow while executing JavaScript took 30ms

I don't see these in Firefox console.

kaburagisec Jan 20, 2026

Interesting. I've never tried Chromium before. Maybe I'll try it, debug it, and fix the error. Thanks for the information!

john- Jan 20, 2026
Author

I did more testing:

Fedora laptop (KDE):

Chromium and Chrome (both) do not do hover-over tool tips

Debian system (KDE):

Chrome DOES hover-over tool tips.

I did not test chromium on Debian as it looks like problem is related to Fedora.

I do not think there is a Chromium specific issue.

kaburagisec Jan 20, 2026

Great! Thanks for the information! All frontend development is mostly written assuming the user is using Google Chrome, so perhaps further testing is needed for other browsers.

john- · 2026-01-20T20:09:26Z

john-
Jan 20, 2026
Author

I am going to spend some time summarizing this thread and will post back here when I am done.

My current process to extract out images for review/annotation (as called out) is very much HITL. This is obviously labor intensive and after doing a tiny bit of research it looks like it is problematic for creating a robust training dataset.

A target process may be something along the lines of what was indicated in this thread: A process tied into object/motion detection that can pull out images and associated predictions (if any). High/low confidence images would be straightforward. Some things like examples of false negatives might need motion events to extract these out. For example, there is motion at the time there is no prediction determined so extract this for review.

This images/predictions can be culled by components that feed labeling tools (for example, Label Studio or Roboflow).

Anyway, I am learning as I go along so thanks both of you for your patience.

1 reply

roflcoopter Jan 21, 2026
Maintainer

Great! I appreciate you taking the time to look into it.
I agree it makes sense to try and interface with existing tools rather than re-inventing the wheel.

Would be great if Viseron could facilitate the re-training of models in a simple way tho since i feel that can have quite a larger barrier of entry for most users

Uh oh!

Workflow for extracting model training images #971

Uh oh!

Replies: 10 comments · 35 replies

Uh oh!

roflcoopter Apr 14, 2025 Maintainer

Uh oh!

john- Apr 14, 2025 Author

Uh oh!

john- Apr 15, 2025 Author

Uh oh!

Uh oh!

john- Jan 19, 2026 Author

Uh oh!

Uh oh!

john- Jun 1, 2025 Author

Uh oh!

roflcoopter Jun 6, 2025 Maintainer

Uh oh!

john- Jun 6, 2025 Author

Uh oh!

roflcoopter Jun 10, 2025 Maintainer

Uh oh!

john- Jul 6, 2025 Author

Uh oh!

roflcoopter Jul 13, 2025 Maintainer

Uh oh!

john- Jan 19, 2026 Author

Uh oh!

john- Jan 19, 2026 Author

Uh oh!

Uh oh!

john- Jan 19, 2026 Author

Uh oh!

Uh oh!

john- Jan 19, 2026 Author

Uh oh!

john- Jan 19, 2026 Author

Uh oh!

Uh oh!

roflcoopter Jan 19, 2026 Maintainer

Uh oh!

john- Jan 19, 2026 Author

Uh oh!

john- Jan 19, 2026 Author

Uh oh!

john- Jan 19, 2026 Author

Uh oh!

Uh oh!

john- Jan 19, 2026 Author

Uh oh!

Uh oh!

Uh oh!

john- Jan 20, 2026 Author

Uh oh!

Uh oh!

john- Jan 20, 2026 Author

Uh oh!

Uh oh!

john- Jan 20, 2026 Author

Uh oh!

roflcoopter Jan 21, 2026 Maintainer

Replies: 10 comments 35 replies

roflcoopter
Apr 14, 2025
Maintainer

john- Apr 14, 2025
Author

john-
Apr 15, 2025
Author

john- Jan 19, 2026
Author

john-
Jun 1, 2025
Author

roflcoopter Jun 6, 2025
Maintainer

john- Jun 6, 2025
Author

roflcoopter Jun 10, 2025
Maintainer

john-
Jul 6, 2025
Author

roflcoopter Jul 13, 2025
Maintainer

john-
Jan 19, 2026
Author

john- Jan 19, 2026
Author

john- Jan 19, 2026
Author

john- Jan 19, 2026
Author

john-
Jan 19, 2026
Author

roflcoopter
Jan 19, 2026
Maintainer

john- Jan 19, 2026
Author

john- Jan 19, 2026
Author

john- Jan 19, 2026
Author

john-
Jan 19, 2026
Author

john- Jan 20, 2026
Author

john- Jan 20, 2026
Author

john-
Jan 20, 2026
Author

roflcoopter Jan 21, 2026
Maintainer