Monday, April 23, 2018

Improving the image handling in LibreOffice - Part 3

GraphicObject refactoring


GraphicObject and the implementation of XGraphicObject (UnoGraphicObject) and XGraphic (UnoGraphic) were located in module svtools, which is hierarchically above vcl. This is problematic when creating new instances like in Graphic.GetXGraphic method, which needs to bend backward to make it even work (ugly hack by sending the pointer value as URL string to GraphicProvider). The solution to this is to move all GraphicObject related things to vcl, which surprisingly didn't cause a lot problem and once done, it looks like a much more "natural" place.

Regarding the UNO API of XGraphicObject - what is left to do here is to properly clean up the uniqueID, as it is not possible to use it anymore for anything else as a uniqueID (used only in filters for the image names, if the name is not yet known).

Managing memory used by images


Figure1: Hierarchy before refactoring


Previously the memory managing was done on the level of GraphicObjects, where a GraphicManager and GraphicCache (see figure 1) were responsible to create new instances from uniqueID and manage the memory usage that GraphicObject take. This is not possible anymore as we don't operate with uniqueIDs anymore, but always use Graphic and XGraphic objects  (in UNO), so we need to manage the creation of Graphic object or more precisely - ImpGraphic (Graphic objects are just ref. counted objects of ImpGraphic). 
Figure 2: Hierarchy after refactoring
So to make this possible GraphicManager and GraphicCache need to be decoupled and removed from GraphicObject and a new manager needs to be introduced between Graphic and ImpGraphic, where the manager controls the creation and accounts for the memory usage (see Figure 2).

Graphic swapping and swapping strategy


In the To release the memory of graphic objects, we swap them out to a temp file and read back (swap-in) when we need them again. In the previous implementation this was partially directed by the SdrGrafObj (common image implementation) and SwGrfNode (Writer image implementation). For each graphic object there was a timer when to trigger an automatic swap-out + the swap-out that can happen when a memory limit is exceeded.

For the new code external swapping directed from SdrGrafObj and SwGrfNode was removed, so they can't influence when swapping will happen (maybe in the future they can provide hints when it is a good time to do swapping). There is now a global timer which triggers checking of all Graphic objects if any of them can be swapped out in case we exceed memory limit. Same code is triggered when a new object is created too. A object will be swapped out if it is not used for a certain amount of time. Each object tracks the timestamp when it was last used.

A swap-in happens if the object is swapped-out (obviously) and certain data is needed (under-laying bitmap, animation or metafile). This is checked at the same code-path where the timestamp updating happens.

The new swapping strategy is relatively simple - if a lot of memory is needed by graphic objects in a certain time, we let it use it and don't try to over-aggressively try to free it. In the past this cased swap-out and swap-in cycle that made the application completely unusable. In the future, external hints when a certain Graphic object can be swapped out may be added, so we can perform swapping more effectively. There are also several other ideas which will increase performance and reduce memory usage that can be implemented now with the new hierarchy where most all of the swapping is contained inside the Graphic itself, but all of this is currently out of the scope of this work.

Other changes to Graphic


Another changes to Graphic done were related to lazy loading. When a document is loaded, we don't want to load Graphic into memory, if it is not needed yet (for example we display the first page but the graphic is on page 10). In document filters (ODF for example) we previously transported the URL of an external or internal graphic to the document model, where it was lazily loaded when it was actually needed. This is not possible now anymore as we need to create a XGraphic object already in the document filter. To overcome this we need to to have an unloaded Graphic, which is created already in a swapped-out state and swapped-in when needed.

The GraphicFilter didn't allow something like this, so I needed to add a new method, which doesn't actually load the image, but just gathers what kind of the image is loaded and its metadata (image size) and creates a GfxLink object that includes the (compressed) image data. The metadata is needed as we don't want to actually force a load when this basic information is requested. Actually we want to load the image as late as this is possible.

Another issue is also that we can have an external image (loaded from a file or even URL on the internet). The issue is similar to the lazy loading scenario, but it is different that a Graphic now must know the URL with which it was created and can be created completely empty (no loading of any kind). The reason for this is that loading is directed by the LinkManager, which is part of the document model. For security reasons the LinkManager can not allow that a Graphic is loaded so loading is directed by the LinkManager on demand (first usage). LinkManager also takes care of all URLs of various external resources. The user can look at those resources and change the URL of them or trigger an update. Changing URL and updating an object was previously done in SdrGrafObj and SwGrfNode, but now this is moved to the common code in Graphic object where SdrGrafObj and SwGrfNode only direct what to do. There are still rooms to improve things here, however not the scope of this work.

Next steps


Finishing up this work by revising the UNO API and fixing known bugs.

Credits


Many thanks to Collabora ProductivityTDF and users that support the foundation by providing donations, to make this work possible.

To be continued...

Tuesday, March 13, 2018

Improving the image handling in LibreOffice - Part 2

It's been some times from my last blog post and in that time I continued with refactoring the code to get rid of use of GraphicObject uniqueID being passed around and stored in the model. The state of the code now is looking fine as we almost don't use the uniqueID anymore, which means that I can start with the next step of Graphic and GraphicObject improvements. 

There is a thing I forgot to clarify in the last blog post and this is that the GraphicObject uniqueID is usually passed around in the form of a URL string. The string has the prefix "vnd.sun.star.GraphicObject" and followed by the GraphicObject uniqueID. Using that URL it is possible to re-create a GraphicObject by passing the unique ID as the construction parameter (see constructor with OUString parameter on GraphicObject or UNO serviceGraphicObject::createWithId).

Usage in filters

The most "heavy" users of the uniqueID were the document format filters (xmloff, oox & writerfilter) which generally use it to read the images from the storage (usually ZIP) and convert the GraphicObject and pass GraphicObject uniqueID around. At writing it does the reverse, get the GraphicObject URL and "resolve" the URL to the package URL. At conversion the GraphicObject is created and the image is stored into the storage. To do this there XGraphicObjectResolver published UNO interface which has only resolveGraphicObjectURL which converts a GraphicObject URL to the Package URL and back. 

Resolving the URL is not the correct approach anymore so I had to do it in a different way. The result of that is XGraphicStorageHandler, which has explicit method to load and save an XGraphic from the package URL, which does everything without the need to use the GraphicObject unique ID.

In addition a graphic can also be external - somewhere on the disk or internet, identified by an external URL. For this case I implemented a GraphicLoader, which is generally just uses XGraphicProvider to load the graphic (in one of the next steps this will be reversed so that XGraphicProvider is just a UNO interface that uses GraphicLoader).

The special case with external URLs is also that we need to remember the URL, which was used to load the graphic, so that we can later just save the URL and not the Graphic into the storage. Previously the URL was always passed along as string so this wasn't a problem, but now we pass XGraphic. So for this I had to extend the Graphic in VCL with an origin URL attribute, to solve this use case. In a next steps the URL loading will be extended even more so the Graphic itself will handle URL completely transparently to the outside.

UNO properties

Usually the filters used the UNO API to set the GraphicObject unique ID into the document model. This was mostly implemented as a properties on various interfaces in UNO. Mostly used name of the properties was GraphicURL (used in different places), but there were also other properties: 
  • BackGraphicURL (for backgrounds)
  • HeaderBackGraphicURL (for backgrounds in header)
  • FooterBackGraphicURL (for backgrounds in footer)
  • ParaBackGraphicURL (for backgrounds in paragraphs)
  • ThumbnailURL (for thumbnail of a graphic - not in IDL)
  • ReplacementGraphicURL (for replacement graphic - not in IDL)
  • FillBitmapURL (BitmapTable)
All these properties are now deprecated and removed and an alternative was added (where needed) that uses the XGraphic or XBitmap types (they use the same implementation so either can be used). This was done as following:
  • GraphicURL -> Graphic (type XGraphic) and GraphicBitmap (type XBitmap) for bullets
  • BackGraphicURL -> BackGraphic (type XGraphic) 
  • HeaderBackGraphicURL -> HeaderBackGraphic (type XGraphic)
  • FooterBackGraphicURL -> FooterBackGraphic (type XGraphic)
  • ParaBackGraphicURL -> ParaBackGraphic (type XGraphic)
  • ThumbnailURL  -> Thumbnail (type XGraphic)
  • ReplacementGraphicURL -> ReplacementGraphic (type XGraphic)
  • FillBitmapURL -> FillBitmap (type XBitmap)
There is also ImageURL which is used in form controls, but this was still left inside as there is already a Graphic property which is an alternative.

As the GraphicObject URL is going away (they won't be created anymore and won't be possible to get the GraphicObject back using the URL), so will the properties, as the content of them won't make much sense anymore.

Next steps

The next step is now to finally work on Graphic itself, which I'm much more excited about. The managing of memory (GraphicManager) will move from the GraphicObject to the Graphic itself.  When a new Graphic is created, the original bit-stream needs to be saved immediately to the temp folder, where the Graphic can always load the image again and is always free to release the memory if needed (this also means it won't need to load the image to the memory until it is actually needed). With this I think that handling of images will finally be a lot more predictable and homogeneous (no different implementations of things through different modules) and we can actually introduce new features in the future much more easily. 

Back to work...

Credits

Many thanks to Collabora Productivity, TDF and users that support the foundation by providing donations, to make this work possible.

Wednesday, January 31, 2018

Improving the image handling in LibreOffice - Part 1

Prologue

It is known for some time that the image life-cycle in LibreOffice is problematic and can potentially lead to image loss, but to make the life-cycle more robust against loss, a lot of refactoring would need to be done. Another issue is also the mechanism of images swapping in and out of the memory. Keeping images in memory takes a lot of space so when a certain amount is hit, the images get swapped to disk and memory is freed. The problem is that it can happen that the cache handler starts constantly to swap images in and out (especially with with multi-megapixel images that are the norm today) and LibreOffice stalls to halt.

Because of this issues, TDF put up a tender to improve the situation with image handling and Collabora Productivity was selected to implement it, and I will do the development work.


Problems with the image life-cycle - detailed

Currently, when an image is read from a document, a GraphicObject is created for the image and handled over to the GraphicManager which manages the life-cycle. When this happens we usually get back the string based unique ID of the GraphicObject with which we can always get access the image by creating a new GraphicObject with the unique ID (GraphicManager will look for the image with that unique ID). Usually the unique ID is the one that is passed on between layers in LibreOffice (for example from ODF filter when loaded, to the model, where it is manipulated and then to the OOXML filter when saving) but the unique ID itself is just a "reference" to the image and by itself it doesn't have any control over when the image can safely be removed and when not. It could happen that in a certain situation we would still have the unique ID referenced somewhere in the model, but the image would already be removed. This is dangerous and needs to be changed. 
Usually for this kind of object we use reference counting technique, where we pass a objects around that holds a reference to the object resource. When the object is created, the reference count is increased, when destroyed, the reference count is decreased, when the reference count reaches zero, the resource object is destroyed.


The solution for the life-cycle

So instead of passing around of unique ID the idea is to use the usual reference counting technique, that is normally used in this situation. The GraphicObject in mainly a wrapper around Graphic (which then holds a pixel-based image, or animated image, or possibly a vector image), and in addition it keeps additional attributes (gamma, crop, transparency, ...). It also has the implementation of swapping-in and out (but I'll explain this another time). On the other hand Graphic is properly reference-counted already (Graphic objects are reference counting the private ImpGraphic) so the solution to the life-cycle problem is that instead of GraphicObject unique ID we would just pass along the Graphic object instead, or XGraphic, XBitmap which are just UNO wrappers around Graphic. Potentially we could also pass along the GraphicObject or XGraphicObject (UNO wrapper for the GraphicObject) when we would need to take into account the graphic attributes too. This should make the life-cycle much more manageable, but the problem is that there are many many places this needs to be changed.
I will do the work as much incrementally as possible, with ensuring that the test cover the code and if needed add new tests or extend the existing ones. 

Currently almost finished is refactoring of the bitmap table (a list of named bitmaps, mostly used for shape fills or backgrounds) to use XBitmap instead of string based unique ID in the table. For this I needed to change OOXML (oox) and especially the ODF (xmloff) filter, and the document model.


Credits

Many thanks to TDF and users that support the foundation by providing donations, to make this work possible. 


To be continued...

Saturday, May 27, 2017

Pivot charts in LibreOffice: Final part 3

It has been a while when I posted an update on pivot charts. In the mean time I finished what was planned and iterated through cycles of needed fixes and polish. In the mean time we branched off the code for LibreOffice 5.4 and the pivot chart implementation is part of that too. If you want to try it out, you can get the LibreOffice 5.4 pre-release on the download page.

Pivot chart field button actions

Last time I explained about the buttons, but I didn't explain what action is performed when we click on them. The buttons generally have a similar function as in pivot table - to show the pivot table layout and to apply filtering of data. The filtering in the pivot table opens a non-modal windows where you can choose the filtering. For pivot charts I wanted to reuse that, so when clicking on the field button, the request is send from the Chart component back to the Calc, where the same window is shown (shown in Figure 1).

Figure 1: Pivot chart field filter

Improvements to pivot chart buttons

In previous post, the pivot chart field buttons were still very basic. Now I improved them, so they show a down arrow, so they look more like they have a pop-up action attached to them. If there is some filtering applied, then the arrow turns blue (similar to the pivot table), so it is easier to see when a field has any filter applied. 
For page fields we also show what is filtered: when nothing is filtered "- all -" is shown, when some all filtered, then "- multiple -" is shown and when only one value is not filtered, then we show that value.

ODF support and compatibility

A pivot chart is useless if we can't save it to a file and later reopen. For this it was needed to extend the ODF format. Luckily, this was relatively easy to do, as the only thing needed is the name of the pivot table that a chart links to (I added "data-pilot-source" attribute to "chart:chart" element). Everything else is already present in the existing import/export code so no additional elements were needed to recreate the exact state that was present when the document was saved. 

A bit related is also copy and paste, which uses the ODF as an intermediate format (copy saves parts to the ODF format and paste loads the format) so things like copy/paste between documents works. A difference here is that we can copy the pivot chart and paste to a different (empty) document, which doesn't have the pivot table. In this case I had to make sure that a normal chart is pasted, which uses the table internal data and not the pivot table. The table internal data is always written with the chart object even if it is not used, just for situations like this (another one is also when we copy from Calc document and paste in Writer document).

Tests

It would be really hard for me to implement this properly without tests, as they cemented the behaviour and, if they failed, I knew that probably I made a mistake or I have took a wrong approach to solve the problem. First I added a import / export tests, which just used an existing document to get the data, pivot table and already existing pivot chart. The purpose of these are to test the ODF import and export code. 
Later, I added tests which programatically add data into a sheet and create a pivot table from scratch as a set-up, then create the pivot chart and test various pivot table layouts, and assert what we expect to see in a pivot chart. This approach is better as a document is not needed, and it demonstrates that a pivot chart can be made from scratch with the available API.

Final demo

Finally, I want to show the complete demo of the pivot chart feature:



You can find the video on YouTube at the following URL: https://youtu.be/txvL1UrsQCw

Credits

Many thanks to Nantes Métropole and Ville de Nantes for making this work possible.

Read more about Nantes deployment here.

Monday, March 27, 2017

Pivot charts in LibreOffice: Part 2

This time I'll present some necessary changes to make pivot charts actually useful and one unique feature that pivot charts have and normal charts don't - field buttons.

Pivot chart creation

If you watched the first video, you should notice that I showed the pivot chart was already created from the start. The reason for this was that the functionality to create a new pivot chart from the pivot table wasn't implemented yet. I have fixed this, so it is now possible to create a new pivot chart if you position the cursor on the pivot table, and select from the menu to create a new chart. The chart creation code will detect the pivot table and create a pivot chart instead of a normal chart.

Pivot chart wizard

When we want to create a new chart, we first get the chart creation wizard, where we can select the chart type, define the ranges for labels and data, define data series ranges, and add some additional chart elements like title, subtitle,...

For the pivot chart we get a similar wizard now, where we can select the chart type and additional data. The wizard step to add data ranges and the step to define the data series is however disabled as these steps are not needed when we get the data from the pivot table.

Pivot chart creation wizard

Pivot chart buttons

This time the biggest change are the pivot chart buttons, which are unique to pivot charts (normal charts don't have them). The purpose of the buttons is to show the layout of the pivot table, so it shows the pivot table fields. On the top it shows the buttons that represent the page fields (if present) and the data fields of the pivot table. At the bottom it shows buttons for row fields next to each other, and in the legend it show the buttons from column fields stacked.

Field buttons in a pivot chart

As they are buttons, there is an action performed when clicking on them, but this is not implemented yet and I'll described this next time in more detail.

From the implementation point of view, the most challenging thing with the buttons was to position them correctly inside the chart as they are part of the chart structure, and to position everything else accordingly (and not breaking the normal charts in the process).


Demo

This is an updated video of the current state of pivot charts:


You can find the video on YouTube at the following URL: https://www.youtube.com/watch?v=hzl8N9-wpc4

Credits

Again, many thanks to Nantes Métropole and Ville de Nantes for making this work possible.

Read more about Nantes deployment here.

To be continued...

Wednesday, March 15, 2017

Pivot charts in LibreOffice: Part 1

About

Pivot tables are a powerful tool to reorganise, manipulate and summarise the data set in spreadsheets to get the valuable information from it. To get a quick visual representation of the information, pivot charts can be used. A pivot chart can be created from the output of the pivot tables, and if the pivot table gets changed, so does the pivot chart.

Support for pivot tables in LibreOffice is available for a long time, but there was no support for pivot charts until now. For the past week I was working on pivot charts in a feature branch (feature/pivotcharts) and I got to a first milestone. Pivot charts will be released in LibreOffice 5.4.

Pivot chart data provider

From development point of view, pivot charts are just like normal charts but with a different data provider (source of data), so this was the task with which I started. Normal charts use a data provider which is based around reading from cell ranges, but for pivot charts I created a new data provider, which reads the output data from the pivot table and prepares it for the chart. The data columns are mapped to data series and the data rows become the number of data series in chart (See Figure 1).

Figure1: Pivot table to pivot chart data mapping
Now what is left is naming of each axis and data series in chart. The y-axis categories are mapped to row field names in the pivot table and the data series names, which are shown in the chart are combined names of all column field names of the pivot table.

Each data point and row or column field name also has an associated number format, which needs to be assign to chart data, otherwise the the number format would not the values correctly as in pivot table (this is especially important with date and time).

Updating a pivot chart

Once I managed to do the mapping correctly, the pivot chart showed up as expected, but the pivot chart wasn't updated when I update the pivot table. So to solve this, I had to implement a listener of pivot table updates in the pivot chart data provider, and for every update send the signal to chart to update the data again (which it gets from the pivot chart data provider). The whole update procedure sounds like a ping-pong play between components, but it works quite well.

Demo

In the following video you can see the current status of development:



Credits

One of the real privileges here is working on LibreOffice for a Collabora Productivity customer who funds significant feature work. Many thanks to Nantes Métropole and Ville de Nantes for their investment here, and making this feature available to all LibreOffice users. You can read more about Nantes deployment here.

To be continued...

Monday, April 13, 2015

FOSSASIA 2015

On March 12 (wow - that was 1 month ago) I went to FOSSASIA 2015 conference in Singapore. This is my first time visiting Singapore and a conference in Asia. Singapore is located very near the equator so it has quite a constant weather all around the year - which means hot, sometimes humid, and almost daily rainfall. I arrived a day earlier, so I could enjoy one day walking around and exploring Singapore, the very diverse people and food.

I stayed in Chinatown which is the place that is quite attractive for tourists as it has many bars and shops with Chinese merchandise.  Really convenient after a long day to get out for a beer and relax.
Pagoda street, China town in Singapore, early in the morning


The following day the conference started. On the first day there was only one track, with various and very interesting presentations. For me the most interesting were the talks about systemd, mariadb, Firefox OS and others. I also learned that knitting machines are the next big thing after 3D printing. I'm not a hardware guy but after seeing what some people make I wish I learn more about hardware in my youth.
I was quite impressed by the talk of Dr. Vivian Balakrishnan (Singapore’s Environment Minister) about open data and why it is important for a government (transparency). After the conference there was an organised event at Labrador park, where we enjoyed the barbecue while socializing.

On the second day of the conference, there were 3 or 4 specialized tracks. I mostly hanged around the "OpenTech" track which still had very diverse talks like: web development, developing methodologies, community, computer vision, etc. Interesting.

On the last day of the conference I was presenting about LibreOffice on Android (LibreOffice on Android, a development update). I made a quick introduction to the LibreOffice Viewer which is available in Google play, and after that in more detail about editing functionality we are working on currently and is sponsored by TDF.
My presentation, picture by Michael Cannon (CC BY 4.0) 
Slides can be downloaded here.

After I finished my talk I visited other tracks I did not visit before but sadly the conference concluded quite soon. I'm looking forward to next year.

Thanks to FOSSASIA organizers to organize such a wonderful conference and thanks to TDF and Collabora Productivity to make it possible for me to visit the conference.