Tuesday, April 16, 2013

Bulk file import in Alfresco

Used the bulk file import, found a bug in the in-place variant, so I had to use the streaming version (copy). But it was not that bad, 1 terabyte in a couple of hours. Well, Alfresco likes a great disk system :)

My recommendations:
  • So don't use In-place before this is fixed: http://code.google.com/p/alfresco-bulk-filesystem-import/issues/detail?id=125
  • Follow the tweaking here: http://wiki.alfresco.com/wiki/Bulk_Importer
  • Do not have the status page open (factor 2-4 slower)!
  • Delete from Alfresco can be slow, so do a test run (need a bulk delete?)


General Statistics
Status:
Successful
Source Directory:
/opt/alfresco/alf_data/bulk_fileimport
Target Space:
/Company Home/Data_Import
Import Type:
Streaming
Batch Weight:
100
Active Threads:
0 (of 0 total)
Start Date:
2013-04-15T20:08:54Z
End Date:
2013-04-15T23:08:32Z
Duration:
0d 2h 59m 37s 99.317ms
Number of Completed Batches:
21920
Source (read) Statistics
Last file or folder processed:
....
Scanned:
Folders
Files
Unreadable
21001
273420
0



Read:
Content
Metadata
Content Versions
Metadata Versions
273344 (915.49GB)
0 (0B
0 (0B)
0 (0B)



Throughput:
27.32 entries scanned / sec
25.36 files read / sec
86.99MB / sec
Target (write) Statistics
Space Nodes:
# Created
# Replaced
# Skipped
# Properties
21041
0
0
168328



Content Nodes:
# Created
# Replaced
# Skipped
Data Written
# Properties
273329
6
0
915.57GB
2186680



Content Versions:
# Created
Data Written
# Properties
0
0B
0



Throughput:
27.31 nodes / sec
86.99MB / sec

Freemind module now has indexing also

The previous post presented the freemind file preview without file indexing support, this post announces that this is now fixed by transforming to a text rendition of the mm files.