TACTIC Open Source
Speeding up naming Convertions - Printable Version

+- TACTIC Open Source (http://forum.southpawtech.com)
+-- Forum: TACTIC Open Source (http://forum.southpawtech.com/forumdisplay.php?fid=3)
+--- Forum: TACTIC Discussion (http://forum.southpawtech.com/forumdisplay.php?fid=4)
+--- Thread: Speeding up naming Convertions (/showthread.php?tid=34)

Pages: 1 2


Speeding up naming Convertions - listy - 11-22-2019

Hi.
When i need to checkin a snapshot, i need to get virtual snapshot, so the time needed to compute naming can be fairly large.

Simple @GET() and {sobject.parent} can both take 0.1 sceonds to compute. If there is 10 files in single snapshot it will take 1-3 sconds to give result. I think there is not just db queryes, but some calculations TACTIC need to do before give the proper name and path.

Is there a way to speed up naming?


RE: Speeding up naming Convertions - listy - 11-24-2019

May be some kind of smart-cache, or generating predefined naming table with sort of possible variants.
Create naming script > fire generator which create a expanded templates table, and then query from this table.

Code:
dir = {project.code}/episodes/{sobject.name}/{snapshot.process}{@CASE( @GET(file.type) == 'playblast', '/'+'__preview', @GET(file.type) == 'web',  '/'+'__preview/web', @GET(file.type) == 'icon', '/'+'__preview/icon')}
file = {parent.name}_{sobject.name}/{snapshot.context[1]}/v{version}{@IF(@GET(file.type) == 'playblast', '_playblast' ))}.{ext}
condition = @GET(snapshot.process) == publish


What i already can cut from query is {project.code}. That's all.

Running this Takes a 0.1 sesonds on 2.2 ghz cpu, it is really fast, until i got more and more files in a single or a batch of snapshots.


RE: Speeding up naming Convertions - listy - 12-11-2019

Any thoughts on that?
May be some tips on that?


RE: Speeding up naming Convertions - Celton McGrath - 12-12-2019

Hi Listy,

I'm looking into this and will try to get back to you soon.

Best,

Celton


RE: Speeding up naming Convertions - Celton McGrath - 12-13-2019

Hi Listy,

How did you time your naming? I timed the functions NamingUtil.naming_to_dir and naming_to_file in src/pyasm/biz/naming.py using time.time() at the beginning and end of the functions. I also used your naming conventions.

For naming_to_dir on an expression with multiple references to parent, times during checkin are:

0.0396330356598
0.0060019493103
0.00517797470093
0.00679898262024
0.00790619850159
0.00686097145081

TACTIC is calling this function for the main, web and icon types, presumably for checkin and versionless checkin. It appears that that after the first call, information is cached.

It is possible that something in the naming engine is causing the bottle neck, and I would be happy to work with you to find that bottleneck.

Although it would be ideal to use the naming table as is, there are other naming methods available:

- @PYTHON script - can reference a Python script that does custom caching using the Container.
- Custom naming classes - can reference a class that performs custom caching using the Container

Hope this helps!
Celton


RE: Speeding up naming Convertions - listy - 12-13-2019

Hi, Celton.

I am mostly using standard client api of tactic:
server.get_virtual_snapshot_path()
server.get_preallocated_path()

I just tested speed again, and this is my tests on local machine using remote api:


Code:
import thlib.tactic_classes as tc
import thlib.global_functions as gf

t = gf.time_it()
server = tc.server_start(project='dolly3d')
preallocated_path = server.get_virtual_snapshot_path('complex/shot?project=dolly3d&code=SHOT00646', 'animation')
print(preallocated_path)

gf.time_it(t)

virtual_snapshot = server.get_preallocated_path('SNAPSHOT00009768', mkdir=False)
print(virtual_snapshot)

gf.time_it(t)
Code:
/home/apache/assets/dolly3d/episodes/EpTEST/animation/versions/EpTEST_Sh01_animation_v001
Code flow running time: 0.378999948502
/home/apache/assets/dolly3d/episodes/EpTEST/EpTEST_Sh01
Code flow running time: 0.662999868393

The same code on the server-side:
Code:
import time
import json
start_time = time.time()

server.set_project('dolly3d')

results_dict = {}
results_list = []


preallocated_path = server.get_virtual_snapshot_path('complex/shot?project=dolly3d&code=SHOT00646', 'animation')
results_dict['preallocated_path'] = time.time() - start_time
results_list.append(preallocated_path)

virtual_snapshot = server.get_preallocated_path('SNAPSHOT00009768', mkdir=False)
results_dict['virtual_snapshot'] = time.time() - start_time
results_list.append(virtual_snapshot)

results_dict['result'] = results_list

return json.dumps(results_dict)
Code:
{u'preallocated_path': 0.33849501609802246,
u'result': [u'/home/apache/assets/dolly3d/episodes/EpTEST/animation/versions/EpTEST_Sh01_animation_v001',
             u'/home/apache/assets/dolly3d/episodes/EpTEST/EpTEST_Sh01'],
u'virtual_snapshot': 0.43698716163635254}


Obviously there is small gain in speed when doing on server-side, but it is far from 0.03 seconds per iteration.

I am going to add some print statements and timing into tactic source codes to see where is bottleneck, and may be point you in this place.

(12-13-2019, 01:22 PM)Celton McGrath Wrote: - @PYTHON script - can reference a Python script that does custom caching using the Container.
- Custom naming classes - can reference a class that performs custom caching using the Container
In tactic 4.5 (on which i am now) @PYTHON is really unusable, as it does not provide needed "context", like file, snapshot etc, just main SObject. I know it was expanded in 4.7, but i need 4.5 api by now.

Custom naming classes should do the trick, but i really don't know how i can do smart caching in my particular situation, will try it sometimes.

Can you provide the source code you wrote to do the test? I could check them on my server.

Did some test with namingUtil:

Code:
from pyasm.biz.naming import NamingUtil
from pyasm.search import Search

sobject = Search.get_by_search_key('complex/shot?project=dolly3d&code=SHOT00646')
snapshot = Search.get_by_search_key('sthpw/snapshot?code=SNAPSHOT00009768')

server.set_project('dolly3d')
naming = NamingUtil()

template = "{project.code}/episodes/{sobject.name}/{snapshot.process}{@CASE( @GET(file.type) == 'playblast', '/'+'__preview', @GET(file.type) == 'web',  '/'+'__preview/web', @GET(file.type) == 'icon', '/'+'__preview/icon')}"
name = naming.naming_to_dir(template, sobject, snapshot)

Time: 0.07758712768554688
Resulting name: 'dolly3d/episodes/Sh01/publish'

Returned path lacks snapshot info and parent


RE: Speeding up naming Convertions - listy - 01-19-2020

Did some test with timing. Now i know where the bottleneck is:
Results:


(0.009768962860107422, '**************************************************', 'STARTING')
(0.016762971878051758, '**************************************************', 'CREATED FILE OBJECT')
(0.02452397346496582, '**************************************************', 'GOT FILE NAMING')
(0.12882614135742188, '**************************************************', 'GOT FILE NAME', u'EpTEST_Sh01_animation_v001')
(0.12989497184753418, '**************************************************', 'GOT DIRS')
(0.18693208694458008, '**************************************************', 'GOR LIB DIR', u'/home/apache/assets/dolly3d/episodes/EpTEST/animation/versions')
client_repo
(0.25337696075439453, '**************************************************', 'GOT REPOS', u'/home/apache/assets/dolly3d/episodes/EpTEST/animation/versions')
/home/apache/assets/dolly3d/episodes/EpTEST/animation/versions/EpTEST_Sh01_animation_v001

(0.0014290809631347656, '**************************************************', 'STARTING')
(0.0025000572204589844, '**************************************************', 'CREATED FILE OBJECT')
(0.003049135208129883, '**************************************************', 'GOT FILE NAMING')
(0.03217816352844238, '**************************************************', 'GOT FILE NAME', u'EpTEST_Sh01')
(0.03326010704040527, '**************************************************', 'GOT DIRS')
(0.08859395980834961, '**************************************************', 'GOR LIB DIR', u'/home/apache/assets/dolly3d/episodes/EpTEST')
client_repo
(0.1293330192565918, '**************************************************', 'GOT REPOS', u'/home/apache/assets/dolly3d/episodes/EpTEST')
/home/apache/assets/dolly3d/episodes/EpTEST/EpTEST_Sh01

Looks like after name has got time is doubled. I should go deeper and find out how to optimize all this.

Code from file_checkin.py
Code:
def get_preallocated_path(cls, snapshot, file_type='main', file_name='', file_range='', mkdir=True, protocol=None, ext='', parent=None, checkin_type=''):

    import time
    start_time = time.time()

    # we need a dummy file_code and range
    #file_code = '123UNI'
    #if not file_range:
    #    file_range = "1-30"

    if not parent:
        parent = snapshot.get_parent()
    assert parent
    if not file_name:
        file_name = parent.get_code()
        if not file_name:
            file_name = parent.get_name()
        if not file_name:
            file_name = "unknown"

    print(time.time() - start_time, '*'*50, 'STARTING')

    file_object = SearchType.create("sthpw/file")
    file_object.set_value("file_name", file_name)
    file_object.set_value("type", file_type)
    #file_object.set_value("code", file_code)
    #file_object.set_value("range", file_range)
    print(time.time() - start_time, '*' * 50, 'CREATED FILE OBJECT')
    # build the file name
    file_naming = Project.get_file_naming()
    file_naming.set_sobject(parent)
    file_naming.set_snapshot(snapshot)
    file_naming.set_file_object(file_object)
    file_naming.set_ext(ext)
    print(time.time() - start_time, '*' * 50, 'GOT FILE NAMING')
    file_name = file_naming.get_file_name()
    print(time.time() - start_time, '*' * 50, 'GOT FILE NAME', file_name)
    # update the file_name of the file_object from file_naming
    file_object.set_value("file_name", file_name)
 
    context = snapshot.get_context()
    process = snapshot.get_process()
    if not process:
        process = context
    # assume is_revision = False
    return_data = cls.process_checkin_type(checkin_type, parent, process,\
            context , file_name, snapshot.get_value('snapshot_type'))
    dir_naming = return_data.get('dir_naming')
    file_naming = return_data.get('file_naming')

    print(time.time() - start_time, '*' * 50, 'GOT DIRS')
    lib_dir = snapshot.get_lib_dir(file_type=file_type, create=True, file_object=file_object, dir_naming=dir_naming)
    if mkdir and not os.path.exists(lib_dir):
        System().makedirs(lib_dir)
    print(time.time() - start_time, '*' * 50, 'GOR LIB DIR', lib_dir)
    print protocol
    # get the client lib dir
    if protocol == "client_repo":
        client_lib_dir = snapshot.get_client_lib_dir(file_type=file_type, create=True, file_object=file_object, dir_naming=dir_naming)
    elif protocol =='sandbox':
        client_lib_dir = snapshot.get_sandbox_dir(file_type=file_type)
    else:
        client_lib_dir = snapshot.get_lib_dir(file_type=file_type, create=True, file_object=file_object, dir_naming=dir_naming)
    print(time.time() - start_time, '*' * 50, 'GOT REPOS', client_lib_dir)
    # put some protection in for ending slash
    client_lib_dir = client_lib_dir.rstrip("/")
    path = "%s/%s" % (client_lib_dir, file_name)
    print path
    return path

get_preallocated_path = classmethod(get_preallocated_path)


Code i run to test:
Code:
import time
import json
start_time = time.time()

server.set_project('dolly3d')

results_dict = {}
results_list = []


preallocated_path = server.get_virtual_snapshot_path('complex/shot?project=dolly3d&code=SHOT00646', 'animation')
results_dict['preallocated_path'] = time.time() - start_time
results_list.append(preallocated_path)

virtual_snapshot = server.get_preallocated_path('SNAPSHOT00009768', mkdir=False)
results_dict['virtual_snapshot'] = time.time() - start_time
results_list.append(virtual_snapshot)

results_dict['result'] = results_list

return json.dumps(results_dict)



RE: Speeding up naming Convertions - listy - 01-26-2020

Most time consuming parts was:
1. In expression parsing {parent.name}
2. getting client lib dir twice:
snapshot.get_lib_dir(file_type=file_type, create=True, file_object=file_object, dir_naming=dir_naming) - only to create dirs
snapshot.get_client_lib_dir(file_type=file_type, create=True, file_object=file_object, dir_naming=dir_naming) - for getting client lib dir

Every next iterations of naming takes less time. Most time consuming is cold start.

In my case, i can write {parent.name} in a column of my sobject. And i will remove first get_lib_dir.


RE: Speeding up naming Convertions - listy - 02-16-2020

Soooooo
I found something interesting.
When i mix together TEL and simplified expression i have my time increased 3 times
For example:
"{parent.code}/versions/{@GET(sobject.code)}"
Slower 3x time than:
"{parent.code}/versions/{sobject.code}"

same with {$LOGIN}, etc

What can be happening?

Something like this will solved the case:
{case file.type == 'main', 'playblast'; file.type == 'web', 'playblast/web'; file.type == 'icon', 'playblast/icon';}


This slow naming is like a nail in my foott


RE: Speeding up naming Convertions - remkonoteboom - 02-21-2020

This naming "expression":

"{parent.code}/versions/{sobject.code}"

doesn't actually use the expression parser.

The function that handles this is "naming_to_dir()" in the file "src/pyasm/biz/naming.py". If you look at the function, it will check to see if the value in the parenthesis either starts with a $ or @ and assumes an expression. It will then go through the overhead of parsing the expression which has some overhead. If not, it goes through a bunch of "if" statements to evaluate things like "sobject.code" directly (which is much faster).

The reason there are two types of "expressions" in the naming is that naming conventions existed long before TEL (file naming conventions was actually a precursor to the design of TEL). We later on updated the naming conventions to handle full expressions while trying to maintain full backwards compatibility.