Підрахуйте частоту слів у списку та сортуйте за частотою

Question 1

Я використовую Python 3.3

Мені потрібно створити два списки, один для унікальних слів, а інший для частоти слів.

Я повинен відсортувати унікальний список слів на основі списку частот, щоб слово з найвищою частотою було першим у списку.

Я маю дизайн у тексті, але не знаю, як його реалізувати в Python.

У методах, які я знайшов до цього часу, використовуються Counterабо словники, яких ми не вивчали. Я вже створив список із файлу, що містить усі слова, але не знаю, як знайти частоту кожного слова у списку. Я знаю, що для цього мені знадобиться цикл, але не можу зрозуміти.

Ось основний дизайн:

 original list = ["the", "car",....]
 newlst = []
 frequency = []
 for word in the original list
       if word not in newlst:
           newlst.append(word)
           set frequency = 1
       else
           increase the frequency
 sort newlst based on frequency list

Question 2

використовуй це

from collections import Counter
list1=['apple','egg','apple','banana','egg','apple']
counts = Counter(list1)
print(counts)
# Counter({'apple': 3, 'egg': 2, 'banana': 1})

Question 3

Можна використовувати

from collections import Counter

Він підтримує Python 2.7 ，, читайте більше інформації тут

1.

>>>c = Counter('abracadabra')
>>>c.most_common(3)
[('a', 5), ('r', 2), ('b', 2)]

використовувати дикт

>>>d={1:'one', 2:'one', 3:'two'}
>>>c = Counter(d.values())
[('one', 2), ('two', 1)]

Але спочатку потрібно прочитати файл і перетворити його на dict.

2. це приклад python docs, використовуйте re та Counter

# Find the ten most common words in Hamlet
>>> import re
>>> words = re.findall(r'\w+', open('hamlet.txt').read().lower())
>>> Counter(words).most_common(10)
[('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631),
 ('you', 554),  ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)]

Question 4

words = file("test.txt", "r").read().split() #read the words into a list.
uniqWords = sorted(set(words)) #remove duplicate words and sort
for word in uniqWords:
    print words.count(word), word

Question 5

Відповідь панди:

import pandas as pd
original_list = ["the", "car", "is", "red", "red", "red", "yes", "it", "is", "is", "is"]
pd.Series(original_list).value_counts()

Якщо ви хотіли, щоб це було зростаюче, це настільки просто, як:

pd.Series(original_list).value_counts().sort_values(ascending=True)

Question 6

Ще одне рішення з іншим алгоритмом без використання колекцій:

def countWords(A):
   dic={}
   for x in A:
       if not x in  dic:        #Python 2.7: if not dic.has_key(x):
          dic[x] = A.count(x)
   return dic

dic = countWords(['apple','egg','apple','banana','egg','apple'])
sorted_items=sorted(dic.items())   # if you want it sorted

Question 7

Одним із способів було б скласти список списків, при цьому кожен підспис у новому списку містив би слово та кількість:

list1 = []    #this is your original list of words
list2 = []    #this is a new list

for word in list1:
    if word in list2:
        list2.index(word)[1] += 1
    else:
        list2.append([word,0])

Або, більш ефективно:

for word in list1:
    try:
        list2.index(word)[1] += 1
    except:
        list2.append([word,0])

Це було б менш ефективно, ніж використання словника, але воно використовує більш основні поняття.

Question 8

Ви можете використовувати зменшити () - Функціональний спосіб.

words = "apple banana apple strawberry banana lemon"
reduce( lambda d, c: d.update([(c, d.get(c,0)+1)]) or d, words.split(), {})

повертає:

{'strawberry': 1, 'lemon': 1, 'apple': 2, 'banana': 2}

Question 9

Використання Counter було б найкращим способом, але якщо ви не хочете цього робити, ви можете реалізувати його самостійно таким чином.

# The list you already have
word_list = ['words', ..., 'other', 'words']
# Get a set of unique words from the list
word_set = set(word_list)
# create your frequency dictionary
freq = {}
# iterate through them, once per unique word.
for word in word_set:
    freq[word] = word_list.count(word) / float(len(word_list))

Частота закінчуватиметься частотою кожного слова у вашому списку.

Вам потрібно floatтам перетворити одне з цілих чисел у плаваюче, тому отримане значення буде плаваючим.

Редагувати:

Якщо ви не можете скористатися диктом або набором, ось ще один менш ефективний спосіб:

# The list you already have
word_list = ['words', ..., 'other', 'words']
unique_words = []
for word in word_list:
    if word not in unique_words:
        unique_words += [word]
word_frequencies = []
for word in unique_words:
    word_frequencies += [float(word_list.count(word)) / len(word_list)]
for i in range(len(unique_words)):
    print(unique_words[i] + ": " + word_frequencies[i])

Показники unique_wordsта word_frequenciesзбігатимуться.

Question 10

Ідеальний спосіб - скористатися словником, який відображає слово на його рахунок. Але якщо ви не можете цим скористатися, можливо, ви захочете використовувати 2 списки - 1 зберігає слова, а інший зберігає кількість слів. Зверніть увагу, що порядок слів і підрахунок тут має значення. Здійснити це було б важко і не дуже ефективно.

Question 11

Спробуйте це:

words = []
freqs = []

for line in sorted(original list): #takes all the lines in a text and sorts them
    line = line.rstrip() #strips them of their spaces
    if line not in words: #checks to see if line is in words
        words.append(line) #if not it adds it to the end words
        freqs.append(1) #and adds 1 to the end of freqs
    else:
        index = words.index(line) #if it is it will find where in words
        freqs[index] += 1 #and use the to change add 1 to the matching index in freqs

Question 12

Ось підтримка коду, ваше питання is_char () перевірити, чи перевіряють перевірку рядків, підраховувати ці рядки поодинці, Hashmap - це словник у python

def is_word(word):
   cnt =0
   for c in word:

      if 'a' <= c <='z' or 'A' <= c <= 'Z' or '0' <= c <= '9' or c == '$':
          cnt +=1
   if cnt==len(word):
      return True
  return False

def words_freq(s):
  d={}
  for i in s.split():
    if is_word(i):
        if i in d:
            d[i] +=1
        else:
            d[i] = 1
   return d

 print(words_freq('the the sky$ is blue not green'))

Question 13

найкраще зробити:

def wordListToFreqDict(wordlist):
    wordfreq = [wordlist.count(p) for p in wordlist]
    return dict(zip(wordlist, wordfreq))

потім спробуйте: wordListToFreqDict(originallist)