Найшвидший код пітона, щоб знайти набір виграшних слів у цій грі

Це гра в слова з набору карт активності для дітей. Нижче правил - код, щоб знайти найкращий тріплет, використовуючи / usr / share / dict / words. Я подумав, що це цікава проблема оптимізації, і мені цікаво, чи люди можуть знайти покращення.

Правила

Виберіть по одній букві з кожного набору нижче.
Виберіть слово, використовуючи вибрані літери (та будь-які інші).
Оцініть слово.
- Кожна літера з обраного набору отримує число, показане разом із набором (включені повтори).
- AEIOU рахувати 0
- Усі інші букви - -2
Повторіть кроки 1-3 вище (жодного повторного використання літер на кроці 1) ще двічі.
Підсумкова оцінка - це сума трьох балів слів.

Набори

(встановити 1 бал 1 бал, 2 бали 2 бали тощо)

LTN
RDS
ГБМ
ТЕЦ
FWV
YKJ
QXZ

Код:

from itertools import permutations
import numpy as np

points = {'LTN' : 1,
          'RDS' : 2,
          'GBM' : 3,
          'CHP' : 4,
          'FWV' : 5,
          'YKJ' : 6,
          'QXZ' : 7}

def tonum(word):
    word_array = np.zeros(26, dtype=np.int)
    for l in word:
        word_array[ord(l) - ord('A')] += 1
    return word_array.reshape((26, 1))

def to_score_array(letters):
    score_array = np.zeros(26, dtype=np.int) - 2
    for v in 'AEIOU':
        score_array[ord(v) - ord('A')] = 0
    for idx, l in enumerate(letters):
        score_array[ord(l) - ord('A')] = idx + 1
    return np.matrix(score_array.reshape(1, 26))

def find_best_words():
    wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
    wlist = [l for l in wlist if len(l) > 4]
    orig = [l for l in wlist]
    for rep in 'AEIOU':
        wlist = [l.replace(rep, '') for l in wlist]
    wlist = np.hstack([tonum(w) for w in wlist])

    best = 0
    ct = 0
    bestwords = ()
    for c1 in ['LTN']:
        for c2 in permutations('RDS'):
            for c3 in permutations('GBM'):
                for c4 in permutations('CHP'):
                    for c5 in permutations('FWV'):
                        for c6 in permutations('YJK'):
                            for c7 in permutations('QZX'):
                                vals = [to_score_array(''.join(s)) for s in zip(c1, c2, c3, c4, c5, c6, c7)]
                                ct += 1
                                print ct, 6**6
                                scores1 = (vals[0] * wlist).A.flatten()
                                scores2 = (vals[1] * wlist).A.flatten()
                                scores3 = (vals[2] * wlist).A.flatten()
                                m1 = max(scores1)
                                m2 = max(scores2)
                                m3 = max(scores3)
                                if m1 + m2 + m3 > best:
                                    print orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()], m1 + m2 + m3
                                    best = m1 + m2 + m3
                                    bestwords = (orig[scores1.argmax()], orig[scores2.argmax()], orig[scores3.argmax()])
    return bestwords, best


if __name__ == '__main__':
    import timeit
    print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)

Версія матриці - це те, що я придумав, написавши один на чистому пітоні (використовуючи словники та оцінюючи кожне слово незалежно), а інший - в нуме, але використовуючи індексацію, а не множення матриці.

Наступною оптимізацією було б повністю видалити голосні звуки (і використовувати модифіковану ord()функцію), але мені цікаво, чи є ще більш швидкі підходи.

EDIT : додано timeit.timeit код

EDIT : Я додаю щедрості, яку я дам тому, яке вдосконалення мені найбільше подобається (або, можливо, кілька відповідей, але мені доведеться набрати ще трохи репутації, якщо це так).

fastest-code python optimization

— ти
джерело

До речі, я написав код, щоб дати своїм восьмирічним дітям три слова запам'ятати, коли він грав у гру проти своєї матері. Тепер я знаю, що означає ксилопірографія.

Це цікаве запитання. Я думаю, що ви можете більше шансів отримати відповіді, якщо ви надасте наступне: (1) Посилання на онлайн-список слів, щоб усі працювали з однаковим набором даних. (2) Поставте своє рішення в одну функцію. (3) Запустіть цю функцію, використовуючи модуль time-it, щоб показати таймінги. (4) Переконайтеся, що ви завантажуєте дані словника за межі функції, щоб ми не перевіряли швидкість диска. Тоді люди можуть використовувати існуючий код як основу для порівняння своїх рішень.

Я перепишу, щоб використати timeit, але для справедливого порівняння мені доведеться використовувати власну машину (що я радий зробити для людей, що розміщують рішення). Список слів має бути доступний у більшості систем, але якщо ні, то тут їх декілька: wordlist.sourceforge.net

Справедливі порівняння можна здійснити, якщо кожен користувач на своєму комп’ютері розглядає рішення та будь-які інші розміщені рішення проти власних. Будуть деякі відмінності між платформою, але в цілому цей метод працює.

Гм, в такому випадку мені цікаво, чи це правильний сайт. Я думаю, що так було б найкраще.

— Джої

Відповіді:

Використовуючи ідею Кіта про попередньо обчислити найкращий можливий бал за кожне слово, мені вдалося скоротити час виконання до 0,7 секунди на моєму комп’ютері (використовуючи список із 75 288 слів).

Хитрість полягає в тому, щоб пройти комбінації слів, які слід грати, а не всі комбінації букв, що вибираються. Ми можемо ігнорувати всі, крім кількох комбінацій слів (203 за допомогою мого списку слів), оскільки вони не можуть отримати більш високий бал, ніж ми вже знайшли. Майже весь час виконання витрачається на попереднє обчислення кількості балів.

Python 2.7:

import collections
import itertools


WORDS_SOURCE = '../word lists/wordsinf.txt'

WORDS_PER_ROUND = 3
LETTER_GROUP_STRS = ['LTN', 'RDS', 'GBM', 'CHP', 'FWV', 'YKJ', 'QXZ']
LETTER_GROUPS = [list(group) for group in LETTER_GROUP_STRS]
GROUP_POINTS = [(group, i+1) for i, group in enumerate(LETTER_GROUPS)]
POINTS_IF_NOT_CHOSEN = -2


def best_word_score(word):
    """Return the best possible score for a given word."""

    word_score = 0

    # Score the letters that are in groups, chosing the best letter for each
    # group of letters.
    total_not_chosen = 0
    for group, points_if_chosen in GROUP_POINTS:
        letter_counts_sum = 0
        max_letter_count = 0
        for letter in group:
            if letter in word:
                count = word.count(letter)
                letter_counts_sum += count
                if count > max_letter_count:
                    max_letter_count = count
        if letter_counts_sum:
            word_score += points_if_chosen * max_letter_count
            total_not_chosen += letter_counts_sum - max_letter_count
    word_score += POINTS_IF_NOT_CHOSEN * total_not_chosen

    return word_score

def best_total_score(words):
    """Return the best score possible for a given list of words.

    It is fine if the number of words provided is not WORDS_PER_ROUND. Only the
    words provided are scored."""

    num_words = len(words)
    total_score = 0

    # Score the letters that are in groups, chosing the best permutation of
    # letters for each group of letters.
    total_not_chosen = 0
    for group, points_if_chosen in GROUP_POINTS:
        letter_counts = []
        # Structure:  letter_counts[word_index][letter] = count
        letter_counts_sum = 0
        for word in words:
            this_word_letter_counts = {}
            for letter in group:
                count = word.count(letter)
                this_word_letter_counts[letter] = count
                letter_counts_sum += count
            letter_counts.append(this_word_letter_counts)

        max_chosen = None
        for letters in itertools.permutations(group, num_words):
            num_chosen = 0
            for word_index, letter in enumerate(letters):
                num_chosen += letter_counts[word_index][letter]
            if num_chosen > max_chosen:
                max_chosen = num_chosen

        total_score += points_if_chosen * max_chosen
        total_not_chosen += letter_counts_sum - max_chosen
    total_score += POINTS_IF_NOT_CHOSEN * total_not_chosen

    return total_score


def get_words():
    """Return the list of valid words."""
    with open(WORDS_SOURCE, 'r') as source:
        return [line.rstrip().upper() for line in source]

def get_words_by_score():
    """Return a dictionary mapping each score to a list of words.

    The key is the best possible score for each word in the corresponding
    list."""

    words = get_words()
    words_by_score = collections.defaultdict(list)
    for word in words:
        words_by_score[best_word_score(word)].append(word)
    return words_by_score


def get_winning_words():
    """Return a list of words for an optimal play."""

    # A word's position is a tuple of its score's index and the index of the
    # word within the list of words with this score.
    # 
    # word played: A word in the context of a combination of words to be played
    # word chosen: A word in the context of the list it was picked from

    words_by_score = get_words_by_score()
    num_word_scores = len(words_by_score)
    word_scores = sorted(words_by_score, reverse=True)
    words_by_position = []
    # Structure:  words_by_position[score_index][word_index] = word
    num_words_for_scores = []
    for score in word_scores:
        words = words_by_score[score]
        words_by_position.append(words)
        num_words_for_scores.append(len(words))

    # Go through the combinations of words in lexicographic order by word
    # position to find the best combination.
    best_score = None
    positions = [(0, 0)] * WORDS_PER_ROUND
    words = [words_by_position[0][0]] * WORDS_PER_ROUND
    scores_before_words = []
    for i in xrange(WORDS_PER_ROUND):
        scores_before_words.append(best_total_score(words[:i]))
    while True:
        # Keep track of the best possible combination of words so far.
        score = best_total_score(words)
        if score > best_score:
            best_score = score
            best_words = words[:]

        # Go to the next combination of words that could get a new best score.
        for word_played_index in reversed(xrange(WORDS_PER_ROUND)):
            # Go to the next valid word position.
            score_index, word_chosen_index = positions[word_played_index]
            word_chosen_index += 1
            if word_chosen_index == num_words_for_scores[score_index]:
                score_index += 1
                if score_index == num_word_scores:
                    continue
                word_chosen_index = 0

            # Check whether the new combination of words could possibly get a
            # new best score.
            num_words_changed = WORDS_PER_ROUND - word_played_index
            score_before_this_word = scores_before_words[word_played_index]
            further_points_limit = word_scores[score_index] * num_words_changed
            score_limit = score_before_this_word + further_points_limit
            if score_limit <= best_score:
                continue

            # Update to the new combination of words.
            position = score_index, word_chosen_index
            positions[word_played_index:] = [position] * num_words_changed
            word = words_by_position[score_index][word_chosen_index]
            words[word_played_index:] = [word] * num_words_changed
            for i in xrange(word_played_index+1, WORDS_PER_ROUND):
                scores_before_words[i] = best_total_score(words[:i])
            break
        else:
            # None of the remaining combinations of words can get a new best
            # score.
            break

    return best_words


def main():
    winning_words = get_winning_words()
    print winning_words
    print best_total_score(winning_words)

if __name__ == '__main__':
    main()

Це повертає рішення ['KNICKKNACK', 'RAZZMATAZZ', 'POLYSYLLABLES']з оцінкою 95. Слова з розчину Кіта, додані до списку слів, я отримую такий же результат, як і він. З доданою вами "ксилопірографією" я отримую ['XYLOPYROGRAPHY', 'KNICKKNACKS', 'RAZZMATAZZ']оцінку 105.

— flornquake
джерело

Ось ідея - ви можете уникнути перевірки багатьох слів, помітивши, що більшість слів мають жахливі бали. Скажіть, ви знайшли досить хорошу гру, яка набирає 50 очок. Тоді будь-яка гра з більш ніж 50 балами повинна мати слово принаймні ceil (51/3) = 17 балів. Таким чином, будь-яке слово, яке не може створити 17 балів, може бути проігноровано.

Ось деякий код, який робить вище. Ми обчислюємо найкращий можливий бал за кожне слово у словнику та зберігаємо його у масиві, індексованому балом. Тоді ми використовуємо цей масив для перевірки лише тих слів, які мають необхідний мінімальний бал.

from itertools import permutations
import time

S={'A':0,'E':0,'I':0,'O':0,'U':0,
   'L':1,'T':1,'N':1,
   'R':2,'D':2,'S':2,
   'G':3,'B':3,'M':3,
   'C':4,'H':4,'P':4,
   'F':5,'W':5,'V':5,
   'Y':6,'K':6,'J':6,
   'Q':7,'X':7,'Z':7,
   }

def best_word(min, s):
    global score_to_words
    best_score = 0
    best_word = ''
    for i in xrange(min, 100):
        for w in score_to_words[i]:
            score = (-2*len(w)+2*(w.count('A')+w.count('E')+w.count('I')+w.count('O')+w.count('U')) +
                      3*w.count(s[0])+4*w.count(s[1])+5*w.count(s[2])+6*w.count(s[3])+7*w.count(s[4])+
                      8*w.count(s[5])+9*w.count(s[6]))
            if score > best_score:
                best_score = score
                best_word = w
    return (best_score, best_word)

def load_words():
    global score_to_words
    wlist = [l.strip().upper() for l in open('/usr/share/dict/words') if l[0].lower() == l[0]]
    score_to_words = [[] for i in xrange(100)]
    for w in wlist: score_to_words[sum(S[c] for c in w)].append(w)
    for i in xrange(100):
        if score_to_words[i]: print i, len(score_to_words[i])

def find_best_words():
    load_words()
    best = 0
    bestwords = ()
    for c1 in permutations('LTN'):
        for c2 in permutations('RDS'):
            for c3 in permutations('GBM'):
            print time.ctime(),c1,c2,c3
                for c4 in permutations('CHP'):
                    for c5 in permutations('FWV'):
                        for c6 in permutations('YJK'):
                            for c7 in permutations('QZX'):
                                sets = zip(c1, c2, c3, c4, c5, c6, c7)
                                (s1, w1) = best_word((best + 3) / 3, sets[0])
                                (s2, w2) = best_word((best - s1 + 2) / 2, sets[1])
                                (s3, w3) = best_word(best - s1 - s2 + 1, sets[2])
                                score = s1 + s2 + s3
                                if score > best:
                                    best = score
                                    bestwords = (w1, w2, w3)
                                    print score, w1, w2, w3
    return bestwords, best


if __name__ == '__main__':
    import timeit
    print timeit.timeit('print find_best_words()', 'from __main__ import find_best_words', number=1)

Мінімальна оцінка швидко досягає 100, це означає, що нам потрібно враховувати лише 33 + бальних слів, що є дуже невеликою часткою від загальної суми ( /usr/share/dict/words 33 + бальних 208662 дійсних слів, лише 1723 з яких - 33+ балів = 0,8%). Запускається приблизно на півгодини на моїй машині і генерує:

(('MAXILLOPREMAXILLARY', 'KNICKKNACKED', 'ZIGZAGWISE'), 101)

— Кіт Рендалл
джерело

Приємно. Я додам це до матричного рішення (видалення слів, оскільки їх оцінка падає занадто низько), але це значно краще, ніж будь-яке з чистих рішень python, які я придумав.

— Ти

Я не впевнений, що раніше я бачив, як багато вкладених циклів.

— Пітер Олсон

Поєднуючи свою ідею з матричним підрахунком (і більш чіткою верхньою межею найкращого результату), скорочується час на моїй машині приблизно до 80 секунд (приблизно з години). код тут

— ти

Значна частина цього часу полягає у попередній обчисленні найкращих результатів, які можна зробити набагато швидше.

— ти