Прискорення обчисленого Python поля часової мітки в ArcGIS Desktop?

9

Я новачок у Python і почав створювати сценарії для робочих процесів ArcGIS. Мені цікаво, як я можу пришвидшити свій код, щоб створити подвійне числове поле "Години" з поля часової мітки. Я починаю з файлу форми журналу точки треку (хлібний хліб), створеного DNR Garmin, із полем часової позначки LTIME (текстове поле, довжина 20), коли було зроблено кожен запис траєкторії. Сценарій обчислює різницю в годинах між кожною послідовною часовою міткою ("LTIME") і ставить її в нове поле ("Години").

Таким чином я можу повернутися назад і підбити підсумки, скільки часу я провів у певній місцевості / полігоні. Основна частина - це print "Executing getnextLTIME.py script..."ось код:

# ---------------------------------------------------------------------------
# 
# Created on: Sept 9, 2010
# Created by: The Nature Conservancy
# Calculates delta time (hours) between successive rows based on timestamp field
#
# Credit should go to Richard Crissup, ESRI DTC, Washington DC for his
# 6-27-2008 date_diff.py posted as an ArcScript
'''
    This script assumes the format "month/day/year hours:minutes:seconds".
    The hour needs to be in military time. 
    If you are using another format please alter the script accordingly. 
    I do a little checking to see if the input string is in the format
    "month/day/year hours:minutes:seconds" as this is a common date time
    format. Also the hours:minute:seconds is included, otherwise we could 
    be off by almost a day.

    I am not sure if the time functions do any conversion to GMT, 
    so if the times passed in are in another time zone than the computer
    running the script, you will need to pad the time given back in 
    seconds by the difference in time from where the computer is in relation
    to where they were collected.

'''
# ---------------------------------------------------------------------------
#       FUNCTIONS
#----------------------------------------------------------------------------        
import arcgisscripting, sys, os, re
import time, calendar, string, decimal
def func_check_format(time_string):
    if time_string.find("/") == -1:
        print "Error: time string doesn't contain any '/' expected format \
            is month/day/year hour:minutes:seconds"
    elif time_string.find(":") == -1:
        print "Error: time string doesn't contain any ':' expected format \
            is month/day/year hour:minutes:seconds"

        list = time_string.split()
        if (len(list)) <> 2:
            print "Error time string doesn't contain and date and time separated \
                by a space. Expected format is 'month/day/year hour:minutes:seconds'"


def func_parse_time(time_string):
'''
    take the time value and make it into a tuple with 9 values
    example = "2004/03/01 23:50:00". If the date values don't look like this
    then the script will fail. 
'''
    year=0;month=0;day=0;hour=0;minute=0;sec=0;
    time_string = str(time_string)
    l=time_string.split()
    if not len(l) == 2:
        gp.AddError("Error: func_parse_time, expected 2 items in list l got" + \
            str(len(l)) + "time field value = " + time_string)
        raise Exception 
    cal=l[0];cal=cal.split("/")
    if not len(cal) == 3:
        gp.AddError("Error: func_parse_time, expected 3 items in list cal got " + \
            str(len(cal)) + "time field value = " + time_string)
        raise Exception
    ti=l[1];ti=ti.split(":")
    if not len(ti) == 3:
        gp.AddError("Error: func_parse_time, expected 3 items in list ti got " + \
            str(len(ti)) + "time field value = " + time_string)
        raise Exception
    if int(len(cal[0]))== 4:
        year=int(cal[0])
        month=int(cal[1])
        day=int(cal[2])
    else:
        year=int(cal[2])
        month=int(cal[0])
        day=int(cal[1])       
    hour=int(ti[0])
    minute=int(ti[1])
    sec=int(ti[2])
    # formated tuple to match input for time functions
    result=(year,month,day,hour,minute,sec,0,0,0)
    return result


#----------------------------------------------------------------------------

def func_time_diff(start_t,end_t):
    '''
    Take the two numbers that represent seconds
    since Jan 1 1970 and return the difference of
    those two numbers in hours. There are 3600 seconds
    in an hour. 60 secs * 60 min   '''

    start_secs = calendar.timegm(start_t)
    end_secs = calendar.timegm(end_t)

    x=abs(end_secs - start_secs)
    #diff = number hours difference
    #as ((x/60)/60)
    diff = float(x)/float(3600)   
    return diff

#----------------------------------------------------------------------------

print "Executing getnextLTIME.py script..."

try:
    gp = arcgisscripting.create(9.3)

    # set parameter to what user drags in
    fcdrag = gp.GetParameterAsText(0)
    psplit = os.path.split(fcdrag)

    folder = str(psplit[0]) #containing folder
    fc = str(psplit[1]) #feature class
    fullpath = str(fcdrag)

    gp.Workspace = folder

    fldA = gp.GetParameterAsText(1) # Timestamp field
    fldDiff = gp.GetParameterAsText(2) # Hours field

    # set the toolbox for adding the field to data managment
    gp.Toolbox = "management"
    # add the user named hours field to the feature class
    gp.addfield (fc,fldDiff,"double")
    #gp.addindex(fc,fldA,"indA","NON_UNIQUE", "ASCENDING")

    desc = gp.describe(fullpath)
    updateCursor = gp.UpdateCursor(fullpath, "", desc.SpatialReference, \
        fldA+"; "+ fldDiff, fldA)
    row = updateCursor.Next()
    count = 0
    oldtime = str(row.GetValue(fldA))
    #check datetime to see if parseable
    func_check_format(oldtime)
    gp.addmessage("Calculating " + fldDiff + " field...")

    while row <> None:
        if count == 0:
            row.SetValue(fldDiff, 0)
        else:
            start_t = func_parse_time(oldtime)
            b = str(row.GetValue(fldA))
            end_t = func_parse_time(b)
            diff_hrs = func_time_diff(start_t, end_t)
            row.SetValue(fldDiff, diff_hrs)
            oldtime = b

        count += 1
        updateCursor.UpdateRow(row)
        row = updateCursor.Next()

    gp.addmessage("Updated " +str(count+1)+ " rows.")
    #gp.removeindex(fc,"indA")
    del updateCursor
    del row

except Exception, ErrDesc:
    import traceback;traceback.print_exc()

print "Script complete."

arcpy time

— Рассел
джерело

1

приємна програма! Я не бачив нічого, щоб прискорити обчислення. Польовий калькулятор займає назавжди !!

— Бред Несом

12

Курсор завжди дуже повільний у геопроцесорному середовищі. Найпростіший спосіб цього - передати блок коду Python в інструмент для геообробки CalculateField.

Щось подібне повинно працювати:

import arcgisscripting
gp = arcgisscripting.create(9.3)

# Create a code block to be executed for each row in the table
# The code block is necessary for anything over a one-liner.
codeblock = """
import datetime
class CalcDiff(object):
    # Class attributes are static, that is, only one exists for all 
    # instances, kind of like a global variable for classes.
    Last = None
    def calcDiff(self,timestring):
        # parse the time string according to our format.
        t = datetime.datetime.strptime(timestring, '%m/%d/%Y %H:%M:%S')
        # return the difference from the last date/time
        if CalcDiff.Last:
            diff =  t - CalcDiff.Last
        else:
            diff = datetime.timedelta()
        CalcDiff.Last = t
        return float(diff.seconds)/3600.0
"""

expression = """CalcDiff().calcDiff(!timelabel!)"""

gp.CalculateField_management(r'c:\workspace\test.gdb\test','timediff',expression,   "PYTHON", codeblock)

Очевидно, вам доведеться модифікувати його, щоб приймати поля та такі параметри, але це має бути досить швидким.

Зауважте, що хоча ваші функції розбору дати / часу насправді є швидшими, ніж функція strptime (), стандартна бібліотека майже завжди є без помилок.

— Девід
джерело

Дякую, Девід. Я не розумів, що CalculateField швидше; Я спробую це перевірити. Єдиною проблемою, на яку я можу бути, є те, що набір даних може вийти з ладу. При нагоді це трапляється. Чи є спосіб спочатку сортувати за зростанням у полі LTIME, а потім застосувати CalculateField або сказати CalculateField виконати у певному порядку?

— Рассел

Лише зауваження, виклик попередньо консервованих функцій gp відбуватиметься швидше більшу частину часу. Я пояснив, чому в попередньому дописі gis.stackexchange.com/questions/8186/…

— Рагі Ясер Бурхум

+1 за використання вбудованого пакета datetime , оскільки він пропонує чудові функціональні можливості і майже замінює пакети часу / календаря

— Mike T

1

це було неймовірно! Я спробував ваш код і інтегрував його з пропозицією @OptimizePrime "в пам'яті", і для цього сценарій зайняв середнє час роботи від 55 секунд до 2 секунд (810 записів). Це саме та річ, яку я шукав. Дуже дякую. Я багато чому навчився.

— Рассел

3

@David дав вам дуже чисте рішення. +1 за використання сильних сторін коду написання сценарію.

Інший варіант - скопіювати набір даних у пам'ять за допомогою:

gp.CopyFeatureclass ("шлях до джерела", "in_memory \ скопійовано ім'я функції") - для класу функцій, бази даних форми,
gp.CopyRows ("шлях до вашого джерела",) - для таблиці Geodatabase, dbf тощо

Це видаляє накладні витрати, які виникають при запиті курсору з бази коду ESRI COM.

Накладні витрати виникають через перетворення типів даних python в типи даних C та доступ до бази коду ESRI COM.

Коли у вас є дані в пам'яті, ви зменшуєте потребу в доступі до диска (процес з високою вартістю). Крім того, ви зменшуєте потребу в бібліотеках python та C / C ++ для передачі даних, коли ви використовуєте arcgiskcripting.

Сподіваюсь, це допомагає.

— OptimizePrime
джерело

1

Чудовою альтернативою для використання старого стилю UpdateCursor від arcgiskcripting, який був доступний з ArcGIS 10.1 для Desktop, є arcpy.da.UpdateCursor .

Я виявив, що це, як правило, приблизно в 10 разів швидше.

Вони не могли б бути варіантом, коли це питання було написане, але його ніхто не повинен оминути увагою.

— PolyGeo
джерело