【字符串】【hash】【倍增】洛谷 P3502 [POI2010]CHO-Hamsters 题解
这是一道字符串建模+图论的问题。
题目描述
Byteasar breeds hamsters.
Each hamster has a unique name, consisting of lower case letters of the English alphabet.
The hamsters have a vast and comfortable cage. Byteasar intends to place a display under the cage to visualize the names of his hamsters. This display is simply a sequence of letters, each of which can be either lit or not independently.
Only one name will be displayed simultaneously.
The lit letters forming the name have to stand next to each other, i.e., form a contiguous subsequence.
Byteasar wants to be able to display the names of the hamsters on at least different positions.
However, he allows displaying the same name on multiple different positions, and does not require to be able to display each and every hamster’s name.
Note that the occurrences of the names on the display can overlap.
You can assume that no hamster’s name occurs (as a contiguous fragment) in any other hamster’s name.
Bytesar asks your help in determining the minimum number of letters the display has to have.
In other words, you are to determine the minimum length of a string (consisting of non-capital letters of the English alphabet) that has at least $latex m$ total occurrences of the hamsters’ names (counting multiplicities).
(We say that a string $latex s$ occurs in the string $latex t$ if $latex s$ forms a contiguous fragment of $latex t$.)
输入输出格式
输入格式:
The first line of the standard input holds two integers $latex n$ and $latex m(1\le n\le 200,1\le m\le 10^9)$, separated by a single space, that denote the number of Byteasar’s hamsters and the minimum number of occurrences of the hamsters’ names on the display.
Each of the following $latex n$ lines contains a non-empty string of non-capital letters of the English alphabet that is the hamster’s name.
The total length of all names does not exceed $latex 100000$ letters.
输出格式:
The first and only line of the standard output should hold a single integer – the minimum number of letters the display has to have.
输入输出样例
输入样例#1:
4 5 monika tomek szymon bernard输出样例#1:
23
题意:
给出$n$个字符串$s_i$,这些字符串互不包含。请求出一个最短的字符串$S$,使得这个字符串中出现了$m$次$s$中的字符串。输出$S$的长度。
题解:
建图是比较容易想到的。不过距离怎么定,$10^9$的长度又怎么控制呢?我们看到字符串的个数只有200,因此考虑floyd。而边有边权,点有点权(1),一个字符串中出现$m$个子串,就要让一条路径经过$m$个点。两个点$(i,j)$之间的边权是$s_i$后面至少添加几个字符能凑出$s_j$。
因此可以用倍增floyd来做,floyd状态全面,可以表示很多东西。所以用$f[k][i][j]$表示$i$到$j$之间经过$2^k$个点的最短路径。然后做floyd,其中转移只能从$2^{k-1}$处转移。
而每次内层都是正常的floyd,外层是倍增。此处复杂度是$n^3\log m$。不过匹配字符串需要一定的技巧,这里我用的是字符串hash,虽然复杂度不对,但是可以开-o2啊,还是过了。正解用了AC自动机和KMP来保证复杂度,不过用字符串hash也算学到了一点东西。
字符串hash就是把字符串用$26/27$进制来表示,字符串的第$i$位要乘上$26^i$或$26^{|s|-i-1}$。在比较两个字符串是否相同时,要把它们的其中一个用乘法变成与另一个同级的。比如“` abc “`和“` bcd “`,把它们分解就是$1+2\times 26+3\times 26^2$和$latex 2+3\times 26+4 \times 26^2$,我们要比较第一个字符串的“` bc“`和第二个字符串的“` bc“`是否相等,就要分别取出这两段数字(用前缀和处理即可)。发现取出来是$2\times 26+3\times 26^2$和$latex 2+3\times 26$,可以计算出原来字符串中二者的商值,接着让较小的乘上这个商就可以变到同级了。
Code:
#include<cstdio> #include<cstring> long long Min(long long x,long long y) { return x<y?x:y; } long long f[35][205][205]; char s[205][100010]; int L[205]; long long dis[205],tmp[205]; int Hash[205][100010]; int pow26[100100]; bool Equal(int x,int y,int l)//默认为第一个结尾l个和第二个开头l个 { return (long long)(Hash[x][L[x]-1]-Hash[x][L[x]-l-1]+19260817)%19260817==(long long)((long long)Hash[y][l-1]*pow26[L[x]-l]%19260817); } int main() { pow26[0]=1; for(int i=1;i<=100000;++i) pow26[i]=pow26[i-1]*26%19260817; memset(f,0x3f,sizeof(f)); int n,m; scanf("%d%d",&n,&m); --m; for(int i=1;i<=n;++i) { scanf("%s",s[i]); L[i]=strlen(s[i]); dis[i]=L[i]; for(int j=0;j<L[i];++j) if(j) Hash[i][j]=(Hash[i][j-1]+pow26[j]*(s[i][j]-'a'+1))%19260817; else Hash[i][j]=s[i][j]-'a'+1; } for(int i=1;i<=n;++i) for(int j=1;j<=n;++j) { int l=Min(L[i],L[j]); for(int k=(i==j?l-1:l);k;--k) if(Equal(i,j,k)) { f[0][i][j]=L[j]-k; break; } if(f[0][i][j]>10000000) f[0][i][j]=L[j]; } for(int t=1;t<=30;++t) for(int k=1;k<=n;++k) for(int j=1;j<=n;++j) for(int i=1;i<=n;++i) f[t][i][j]=Min(f[t-1][i][k]+f[t-1][k][j],f[t][i][j]); for(int i=0;i<=30;++i) if(m&(1<<i)) { for(int j=1;j<=n;++j) { tmp[j]=0x7ffffffffffffffll; for(int k=1;k<=n;++k) tmp[j]=tmp[j]<dis[k]+f[i][k][j]?tmp[j]:dis[k]+f[i][k][j]; } for(int j=1;j<=n;++j) dis[j]=tmp[j]; } long long ans=0x7ffffffffffffffll; for(int i=1;i<=n;++i) ans=ans<dis[i]?ans:dis[i]; printf("%lld\n",ans); return 0; }