[STL] Unordered_set, map (Hash)

baba-sun 2025. 6. 18. 17:56

2025. 6. 18. 17:56

2025.3.15

1. Search

원하는 데이터의 실제 저장 정보를 찾아나가는 방법
효율적인 검색을 위해 요구사항에 따라 적절한 자료구조로 데이터를 관리함.

Search algorithm 종류
Linear Search	Direct Addressing	Binary Search (ordered sequence) = 정렬되어있을시 유용 Hash (hash table) = 범위 벗어나거나 클 때 유용 trie (tree)
(sequence) , O(n)	(direct table), O(1)
가장 기본적인 순차탐색. 데이터의 수, 검색횟수가 적은 경우 사용	Key 값을 배열의 index로 사용 key값의 범위가 배열을 잡을 수 있는 수준일때
	너무 커지면 사용X

Linear Search의 경우 하나를 통채로 관리할 경우 n번 돌아야하지만 특정 기준으로 분류를 할 경우 효율적이 됨.

ex) 알파벳 첫 글자를 기준으로 해서 s칸에 son, l칸에 lee 이런 식으로 구분하면 N번 탐색할걸 N/26 번 돌면 됨

+ Direct Addressing의 경우 배열의 크기가 너무 커지면 사용이 어려운데 이를 보완하는 것이 필요함.

=> Hash!

Hash의 분류가 1:1로 매치되는 것이 Direct Addressing 임!

2. Hash

검색의 한가지 방법
현업에서도 많이 쓰이는 자료구조 알고리즘
key 값과 일치하는 데이터를 효율적으로 검색하기 위해 사용
key 값은 고유값 (ex) id, 주민번호)
인덱싱(indexing) 기법 활용
파이썬의 경우 dictinary, set을 기반으로 메모리를 늘려서 opener 방식을 사용함

		probing - 구한 bucket에 저장된 모든 원소들을 검색하며 원하는 data를 찾기 - 동일한 key값은 항상 같은 bucket - 다른 Key값도 같은 bucket가능 (하지만 충돌 발생) - 충돌을 최소화해야 성능이 잘나옴 * hash function 최대한 고유하게 * size를 크게
1. Hash Key	data의 고유값을 나타낼 수 있는 정보로 설정
2. Hash Function	hash value 생성 : unsigned int / unsigned long long 일반적으로 key 값을 전부 활용하여 진법 변환 (직접 설정도 가능, key값의 일부만 사용하기도 함)
3. Bucket	나눗셈 법 : hash value % size
4. Collision 처리 방식	chaining - bucket 별로 리스트화 하여 관리
5. Hash Table Size	chaining시 - hash table에 등록되는 data수 이상의 2의 제곱수 or 소수 Unordered_set, map에서는 2의 제곱수로 알아서 설정됨

3. Hash - EXAMPLE

EXAMPLE 1

EXAMPLE 2

EXAMPLE 3

****** 충돌이 많을 수록 성능이 떨어지므로 충돌을 최소화해야함.

어떻게?
1. hash function을 최대한 고유하게 만들기
2. bucket을 최대한 많이 만들기

EXAMPLE 4

이런식으로!

4. Hash Function

Hash value를 고유하게 생성하는 방법 (반드시 고유할 필요는 없으나 고유하면 좋음)
- 일반적으로는 hash value를 최대한 고유하게 생성하여 충돌 확률을 낮춘다
- 만약, hash value의 범위가 unsigned long long범위를 벗어나면 %연산이 들어가므로 고유하지 않음.

개수가 고정되어 있을 때 수의 범위를 0부터로 맞춰주고 (max값 +1) 진법변환
- -100 ~ 100 범위의 1개 정수
- 0 ~ 100 범위의 4개 정수
- -100 ~ 100 범위의 3 정수
- 소문자 10자리
개수가 고정되지 않을 때, 수의 범위를 1부터로 맞춰주고 (max값 +1) 진법변환
- 소문자 4~10자리
- 대문자 4~10자리

Hash Function 구현
- Hornor's method 활용 (K진법 -> 10진법)

5k^3 + 3k^2 + 2k^1 + 4k^0
= [{(0*k+5)*k+3}*k+2]*k+4

① 0*k+5 ② ①*k+3
③ ②*k +2 ④ ③*k +4

초기값 hash =0

① hash = hash*k +5
② hash = hash*k +3
③ hash = hash*k +2
④ hash = hash*k +4

key값 : 0~k-1 범위의 정수 n개

ull hash Func(int key[]) {
ull hash = 0;
for (int i = 0; i < n; i++)
hash = hash * k + key[i];
return hash;
}

ex) 0 ~ 50 범위의 x, y => x*51 +y // 0 ~ 20 범위의 x, y , z => x*21^2 + y*21 + z

Hash Tavle size
- hash value를 최대한 고유하게 설정했을 때 hash table size가 성능에 미치는 영향

5. Hash 구현방식

inked list 직접 구현 chaining
일반 배열로 open addressing	T htab[SIZE];
vector로 chaining	vector<T>htab[SIZE];	n/size = 조절
unordered_map, unordered_set	unordered_map<T,U>um; unordered_set<T>us	n/size = 1 * 특징은 set, map과 동일하나 순서 무관

6. Unordered Associative

Hash table
chaining (linked list)
Forward Iterator (next만 사용 가능)

	Space	Search	Insert	Delete
average	O(1)	O(1)	O(1)	O(1)
worst	O(n)	O(n)	O(n)	O(n)

key에 따라 bucket에 잘 분배되도록 hash function을 만들어야함
순서 중요X 넣을수록 뒤죽박죽. 순서가 필요하면 unordered_set,map말고 다른거 사용하는게 나음.

7. Predefined Functor

Comparision operator
1. less <T> : class
2. less <T>() : function
3. less <T>{} : function

// equal_to
template<class T>
struct equal_to {
	bool operator()(const T& l, const T& r) const {
    	return l == r;
     }
}

// less
template<class T>
struct less {
	bool operator()(const T& l, const T& r) const {
    	return l < r;
     }
}

// greater
template<class T>
struct greater {
	bool operator()(const T& l, const T& r) const {
    	return l > r;
     }
}

Hash

Requirement	Specialization 되어있는 type
Key값에 대해 size_t(== unsigned long long) type의 hash 값을 반환한다 k1, k2가 같다면 hash<Key>(k1)==(k2)이어야 하고, 다르다면 hash<Key>(k1)==(k2)인 확률이 매우 적어야함	특정 data type에 대해 별도의 동작을 정의해줌 (= 각 type의 특성에 맞게 hash function이 적절히 구현되어있음. = 별도로 설정 필요X) primitive type : int, long long, char, bool, … / string / ~~pair~~

// ***key 값을 받아 hash value를 반환함.

struct hash {
	size_t operator()(const Key& key) const{
    	return (hash value);
    }
};

Custom : Hash, KeyEqual -1
- hash : functor
- KeyEqual : equal_to operator (==) overloading

struct Key {
	int x,y;
	bool operator==(const Key& r) const {
    	return x == r.x && y == r.y;
     }
};

struct CustomHash{
	size_t operator()(const Key& key) const{
    	return key.x * 10000 + key.y;
    }
};

unordered_set <Key, CustomHash> us;
unordered_map <Key, int, CustomHash> um;

Custom : Hash, KeyEqual -2
- hash : functor
- KeyEqual : functor

struct Key { int x,y;};

struct MyHash{
	size_t operator()(const Key& key) const{
    	return key.x * 10000 + key.y;
     }
};

struct MyEqual{
	bool operator()(const Key& r) const {
    	return x == r.x && y == r.y;
    }
};

unordered_set <Key, MyHash, MyEqual> us;
unordered_map <Key, int, MyHash, MyEqual> um;

8. Policy

rehash	reserve	load_factor	mas_load_factor
bucket_count를 늘려서 hash table를 제구성 O(n)	bucket_count설정 설정한 값 이상의 2k크기로 설정됨	average number of elements per bucket n / bucket_count	regash가 일어나는 threshold defalt 1.0

htab.reserve(n)을 통하여 미리 hash table크기를 결정하면 불필요한 rehasing이 일어나지 않아 효율적일 것 같지만
실제 test해보면 별 차이 없고 오히려 느린 경우도 존재 = 굳이 reserve 안하고 써도 됨.

9. Unordered_set

template<
       class Key,
       class Hash = std::hash<key>,
       class KeyEqual = std::equal_to<Key>
       class Allocator = std::allocator<Key>
> class unordered_set;

// * size_t: unsigned long long

unordered_set<T> htab
unordered_set<T, Hash> htab
unordered_set<T, Hash, KeyEqual> htab

iteratorhtab.begin(), htab.end() // iteraor 반환 O(1)

bool htab.empty() // 비었으면 true 아님 false  O(1)
size_t htab.size() // size 반환 O(1)
void htab.clear()  // 초기화  O(n)
size_t htab.count(T key)  // x개수 반환 0 or 1  O(1)
iterator htab.find(T key) // x의 iterator반환 없으면 htab.end()반환 O(1)
/* 검색 과정이 상수 시간대라 O(1)로 표현하지만 
배열처럼 direct 접근이 아니라 경우에 따라 상수 값이 상당히 커질 수 있음. */

void htab.reserve(int n) // n이상의2^k로bucket_count설정

size_t htab.bucket(T key) // key 값의 bucket 번호 반환
size_t htab.bucket_count() // hash table bucket 개수
size_t htab.bucket_size(intbucket) // bucket의 element 개수

// insert O(1) - 구간은 구간만큼 - 등록된 값 있으면 무시
pair <iterator, bool> htab.insert(T key) // { 등록된iterator, 성공여부}
void htab.insert(iterator first, iterator last)
// erase O(1) - 구간은 구간만큼
iterator htab.erase(iterator pos)
iterator htab.erase(iterator first, iterator last) // 마지막으로지운element의다음iterator
 
size_t htab.erase(T key) // 성공여부 0,1

EXAMPLE

unordered_set<int> s;
for (int i = 5; i > 0; i--) s.insert(i);

for (auto x : s) cout << x << ' ';  // 1 2 3 4 5  (순서 다를 수 있음)
for (auto it = s.begin(); it != s.end(); ++it) cout << *it << ' ';  // 1 2 3 4 5

s.empty();  // 0
s.size();   // 5
s.count(5); // 1
s.count(6); // 0


auto ret = s.find(1); // 1 검색, ret : 값 1 의 iterator

auto ret = s.insert(6);  // 6 등록, ret.first : 등록된 iterator,   ret.second : 1 (등록 여부)
auto ret = s.insert(1).first;  // 1 등록, ret : 등록된 iterator
auto ret = s.insert(1).second;  // 1 등록, ret : 0 (등록 여부)

auto ret = s.erase(2);  // 2 삭제, ret : 1 (삭제 여부)
auto ret = s.erase(2);  // 2 삭제, ret : 0 (삭제 여부)

auto ret = s.erase(s.begin());  // 맨 앞 element 삭제, ret : 삭제된 element의 다음 iterator

10. Unordered_map

template<
       class Key,
       class T
       class Hash = std::hash<key>,
       class KeyEqual = std::equal_to<Key>
       class Allocator = std::allocator<std::pair<const Key, T>>
> class unordered_map;

// * size_t: unsigned long long

unordered_map<T, U> htab
unordered_map<T, U, Hash> htab
unordered_map<T, U, Hash, KeyEqual> htab

iteratorhtab.begin(), htab.end() // iteraor 반환 O(1)

U htab[T key] // key 값이 등록X => vaiue = default값(0)으로 자동 등록

bool htab.empty() // 비었으면 true 아님 false  O(1)
size_t htab.size() // size 반환 O(1)
void htab.clear()  // 초기화  O(n)
size_t htab.count(T key)  // x개수 반환 0 or 1  O(1)
iterator htab.find(T key) // x의 iterator반환 없으면 htab.end()반환 O(1)
/* 검색 과정이 상수 시간대라 O(1)로 표현하지만 
배열처럼 direct 접근이 아니라 경우에 따라 상수 값이 상당히 커질 수 있음. */

void htab.reserve(int n) // n이상의2^k로bucket_count설정

size_t htab.bucket(T key) // key 값의 bucket 번호 반환
size_t htab.bucket_count() // hash table bucket 개수
size_t htab.bucket_size(intbucket) // bucket의 element 개수

// insert O(1) - 구간은 구간만큼 - 등록된 값 있으면 무시
pair <iterator, bool> htab.insert({T key, U value}) // { 등록된iterator, 성공여부}
void htab.insert(iterator first, iterator last)

// erase O(1) - 구간은 구간만큼
iterator htab.erase(iterator pos)
iterator htab.erase(iterator first, iterator last) // 마지막으로지운element의다음iterator
 
size_t htab.erase(T key) // 성공여부 0,1

EXAMPLE

// key 값에 대한 value값 추가, pair <key, value> 형태로 존재

unordered_map<int, int> m;
for (int i = 5; i > 0; i--)  m.insert({ i, 0 });
// {1,0} {2,0} {3,0} {4,0} {5,0}: 순서 다를 수 있음

for (auto x : m) cout << x.first << ',' << x.second;
for (auto it = m.begin(); it != m.end(); ++it) cout << it->first << ',' << it->second;

m[1] = 3; // {1,3} {2,0} {3,0} {4,0} {5,0}
m[6];     // 등록되어 있지 않은 key값이면 value=0으로 자동 insert  : {6, 0}
m[7]++;   // key값 7이 등록되어 있지 않으므로 value=0으로 insert 후, value 1 증가 : {7, 1}
 
m.insert({ 1, 4 }); //// key값 1은 이미 등록되어 있으므로 무시

 // m  : {1,3} {2,0} {3,0} {4,0} {5,0} {6,0} {7,1}

'C++ > STL' 카테고리의 다른 글

[STL] Set, Map (0)	2025.06.17
[STL] Priority Queue(2) - Lazy Update (1)	2025.06.17
[STL] Priority Queue(1) - heap, sort, compair 설정 (0)	2025.06.16
[STL] Stack, Queue (0)	2025.06.13
[STL] String (0)	2025.06.13

바바썬

[STL] Unordered_set, map (Hash)

'C++ > STL' 카테고리의 다른 글

+ Recent posts

티스토리툴바